SPML: A DSL for Defending LLMs Against Prompt Attacks
arxiv.orgPrompt injection attacks represent a significant challenge for LLM-based systems, such as chatbots. Several techniques are in place to proactively detect these attacks, including classifying the input prompt as either safe or unsafe, or determining whether the prompt violates the system's guidelines. However, merely classifying input prompts does not take into account the context in which the chatbot operates, and identifying violations can be complex for LLMs. We propose a technique that uses a meta language and the compiling-parsing approach to detect prompt injection attacks. This technique utilizes a meta language, SPML (System Prompt Meta Language), allowing for detection independent of the attack method used. It focuses solely on identifying conflicts with system prompts, ensuring a robust defense against prompt injection attacks.
SPML, a meta language designed for writing system prompts, includes high-level language features such as support for user-defined types and comments. These features make system prompts easier to develop and more maintainable compared to those written in natural language.
The SPML compiler processes an SPML system prompt, performing type checking before converting it into SPML-IR. SPML-IR facilitates various types of analysis and transformations, similar to other compiler intermediate representations. Finally, the SPML-IR is lowered into a natural language system prompt.