Guarding Against Prompt Injection: The Security Engineering Approach

This appears to be a significant development in the security of LLMs against prompt injection:

Google DeepMind has introduced CaMeL (CApabilities for MachinE Learning), a novel strategy for preventing prompt-injection assaults that discards the ineffective method of having AI systems monitor themselves. Rather, CaMeL considers language models as inherently untrustworthy elements within a secure software architecture, establishing clear distinctions between user directives and potentially harmful content.

[…]

To grasp CaMeL, it’s essential to recognize that prompt injections occur when AI systems fail to differentiate between valid user commands and harmful directives concealed within the data they’re analyzing.

[…]

While CaMeL employs various AI models (a privileged LLM and a separated LLM), its innovation lies not in the reduction of models but in the fundamental transformation of the security framework. Instead of relying on AI to identify threats, CaMeL applies proven security engineering concepts such as capability-based access control and data flow monitoring to establish barriers that remain robust even if an AI element is compromised.

Research document. Insightful evaluation by Simon Willison.

I discussed the issue of LLMs merging the data and control paths here.

Leave a Reply Cancel reply