LiSA Adapts AI Safety with Structured Memory

Why this is here: LiSA maintained robustness under noisy user feedback, even when 20% of the reported labels were incorrect.

Researchers propose LiSA, a framework to improve AI safety through ongoing learning, in simulated environments. LiSA addresses challenges when AI agents access private data and perform complex tasks. Current safety measures, called guardrails, often struggle with context-specific situations and rely on limited user feedback.

LiSA uses structured memory to convert occasional failures into reusable safety rules. It adds local rules to prevent overgeneralization and uses confidence gating based on accumulated evidence. Testing across three datasets—PrivacyLens+, ConFaide+, and AgentHarm—shows LiSA consistently outperforms existing memory-based methods with sparse feedback.

The system remains stable even with noisy user reports, tolerating up to 20% incorrect labels. While LiSA improves performance without scaling the core AI model, the researchers acknowledge the need to test it further in real-world deployments. The work continues toward more secure AI agents.