Why This Exists
Every policy document begins with someone else’s bad day.
This one is no different. These rules were written after AI systems behaved unexpectedly in production, after agents took actions that couldn’t be undone, after data went somewhere it shouldn’t have. They are not theoretical. They are the residue of consequences.
Murphy’s Law has always applied to software. Applied to AI agents, it applies with unusual force.
AI agents now read your documents, call your APIs, write and execute code, query your databases, and send communications on behalf of your users. That capability is the point. But it also means every security failure mode in traditional software now has a faster, harder-to-predict counterpart, and several entirely new ones. An agent that can write to a database can be manipulated into deleting one. An agent that can send emails can be convinced to send the wrong ones. An agent with access to your systems will eventually encounter an input designed to misuse that accessl by an attacker, by an edge case, or by its own unexpected behaviour.
The attack surface for AI systems is language itself. You cannot enumerate every bad input. You cannot anticipate every manipulation. You cannot assume that because a system worked correctly a thousand times, the thousand-and-first will go the same way.
What you can do is design systems that fail safely, fail loudly, and recover deliberately. That is what these rules are for.
What These Rules Are Trying to Achieve
These rules have three objectives.
- Shrink the blast radius. When something goes wrong – and something will – the damage should be contained. Minimal privilege, rollback-first design, reversible actions by default, and human approval for high-stakes decisions mean that a failure is an incident you recover from, not a catastrophe you explain.
- Make exploitation harder than legitimate use. Allowlists over blocklists, validated inputs, structured outputs, and authenticated actions ensure the path of least resistance runs through your controls, not around them. Attackers follow incentives. Design accordingly.
- Create systems you can understand under pressure. Logging, monitoring, auditable logic, and documented decision-making mean that when an incident happens, you can diagnose it, contain it, and fix it. Rather than guessing at what the agent did and why.
Most of these rules apply lessons from decades of infrastructure and application security to a new class of system. The novelty is that AI agents can be manipulated through natural language, can behave unexpectedly at scale, and can take real-world actions faster than any human can supervise. That combination makes the familiar disciplines of least privilege, defence in depth, and fail-safe design more important, not less.
How to Use This Document
These rules are written for engineers building, deploying, or maintaining AI systems. They cover infrastructure, production operations, and security controls, not prompt engineering or model-specific optimisation.
Treat them as a checklist for new systems and a diagnostic for existing ones. Integrate them into your own documentation, processes and workflow. Where a rule does not apply, document why. Where a rule creates tension with a business requirement, escalate. Do not simply bypass it.
The pattern in these rules is consistent: teams that skip these steps encounter the consequences eventually. The goal is that you learn the pattern here.
- Never inject untrusted input directly into a prompt. Structuring prompts with raw user input, unvalidated data, or external content is the AI equivalent of SQL injection. Use prompt templates with clearly delimited, sanitised inputs. Separate instructions from data in every prompt. This is the primary defence against prompt injection attacks.
- Treat all inputs to your AI as untrusted. Validate every prompt, message, web page, email, document, and data source before passing it to a model (including other inputs you’ve created!). Reject inputs that fail validation. Never attempt to “fix” a malicious or malformed prompt. Always validate first, then either escape or sanitise hazardous content before it reaches the model. This rule includes models specifically designed to validate input.
- On any error or unexpected AI behaviour, roll back and fail safely. Never allow an AI agent to continue a partially completed action. Never fail open. Roll back fully and start again from a known safe state. This is especially critical for agentic tasks with real-world consequences (sending emails, executing code, calling APIs).
- Human-in-the-loop for high impact actions. To limit excessive autonomy, all high-impact, irreversible agent actions such as sending communications, modifying records, executing transactions, should require explicit human approval before proceeding. Expect the threshold for autonomy to shift over time as trust is established, but evaluate that threshold against the risk of failure, not the number of past successes.
- Exercise extreme caution when AI agents make system calls, execute code, or call external APIs. Passing AI-generated output directly to a system call, shell command, or code interpreter is one of the highest-risk capabilities you can give an AI agent. If the agent doesn’t need access, don’t allow it. If it does, limit it to only what’s needed.
- Protect sensitive data before it reaches an AI model. Mask, anonymise, or hash personally identifiable information (PII) and sensitive data before it is included in any prompt or context window. What you send to a model may be logged, retained, or exposed. Do not send data the AI does not need. Only send the minimum data necessary.
- Sanitise and encode all AI output before downstream use. You have no control how your AI-generated output is going to be used. Before sending output via a user interface, a database, an API call, or a system command, treat it as you would any user input: validate, encode, and sanitise it to prevent attacks, misuse, and unintended execution.
- Authorise every AI agent action individually. Do not assume that because a user or system has authenticated once, all subsequent AI agent actions on their behalf are permitted. Validate authorisation for every action an AI agent takes, especially for sensitive operations.
- AI agents should operate using minimal access accounts. Apply the principle of least privilege strictly. Every access grant should be explicit, minimal, and reviewed. Individual and admin accounts grant excessive privilege and create accountability gaps. An agent that can do everything will eventually do something you did not intend.
- Design AI systems to assume they will be manipulated. Plan for prompt injection, jailbreaks, model manipulation, data poisoning, and unexpected outputs. Each aspect of the system is a potential exploitation target, design accordingly.
- Never trust AI output blindly. Validate AI-generated content before using it, especially when it will be used in code, database queries, system commands, or other sensitive contexts. AI output can be incorrect, manipulated, or adversarially crafted.
- Use a secrets management tool. Use secrets scanning on every code commit to catch accidental exposure. This is especially critical for AI systems, where prompt content may be logged, cached, or leaked through model outputs.
- Log, monitor, and alert on all AI system errors and unexpected behaviours. AI errors are signals so treat them accordingly. Do not allow AI systems to fail silently. Log every significant AI input, decision, output (redact PII and sensitive data), and telemetry. AI errors might be context drift, distortion, or misalignment so monitor for anomalies, unexpected behaviour, and policy violations that alert on threshold or trends as well as hard failures. This applies to AI APIs, agents, and pipelines, not just user-facing interfaces.
- Use allowlists, not blocklists, to control what AI agents can do. Define explicitly what actions, tools, and data sources an AI agent is permitted to use. Blocklists are trivially bypassed. Allowlists are easier to maintain and far more reliable.
- Prefer reversible actions. Design the system so if two paths accomplish the same goal, the agent chooses the reversible one by default. Irreversibility amplifies every other failure.
- Secure the AI supply chain. This includes the models, datasets, tools, SDKs, vector databases, and embedding pipelines you use. Validate that every component you depend on is from a trusted source and is being used safely. Lock down your AI development environment, version control, CI/CD pipeline, and any system used to build or deploy AI. Validate this regularly.
- Classify all data before sending it to an AI. Know what you are sending, how sensitive it is, and whether it is appropriate to send. Document sensitive data flows into and out of AI systems. Encrypt sensitive data in transit and at rest. Test these flows for security.
- Use structured, typed inputs and outputs for AI systems. Avoid ambiguous, loosely formatted prompts and responses. Define expected input and output schemas. Use JSON schemas, structured outputs, or typed response formats where supported. Ambiguity in AI is a security and reliability risk.
- Default all AI systems to the most restrictive settings. Require explicit configuration to expand permissions or capabilities. If you set restrictive defaults, users are more likely to leave them in place; which is exactly what you want.
- Model the threats for all AI systems. Include AI-specific threats: prompt injection, data poisoning, training data extraction, adversarial inputs, model inversion, jailbreaking, denial-of-service, and agent misuse. Mitigate or eliminate all threats assessed as significant.
- Secure the training and data pipeline end-to-end Vector stores, embedding pipelines, and retrieval logic are a distinct attack surface. Poisoned documents retrieved at query time can manipulate agent behaviour invisibly.
- Verify the integrity of AI models and agents before deployment. Use model signing, checksums, or another integrity verification method to ensure models have not been tampered with. Immutable builds and verified deployments are the standard.
- Apply appropriate API security controls to all AI endpoints. Such as rate limiting, authentication, authorisation, input validation, and monitoring to ensure protection against traditional attacks and failures.
- Rate limit all AI agent actions. Nothing an AI agent does should be unlimited. Apply limits at every layer: API calls, tool invocations, file operations, external requests, and token consumption. Unlimited AI agents create unlimited risk.
- Perform all critical AI validation and decision-making on the server side. Client-side safety controls can be intercepted or bypassed. Trust only what happens on systems you control.
- Keep AI models, frameworks, SDKs, and dependencies up to date. Outdated components are a known risk, especially in fast-moving AI ecosystems. Where possible, automate updates and patching. Slow release and update processes are a serious organisational risk and should be treated as a priority for improvement. Temper this against not updating too soon as supply chain attacks exploit organisations that update uncritically.
- Minimal agent footprint. At each step of execution, an agent should request only what it needs now, release access when done, and avoid accumulating permissions or retaining sensitive data across steps. This is least privilege applied dynamically, per action.
- Enable strict safety and content settings in all AI frameworks and platforms. If a framework or API offers a strict mode, safety classifier, or content filter then turn it on. These are not enabled by default in all systems. Check, and enable them explicitly.
- Implement anomaly detection for all AI agent interactions. Monitor for abuse patterns, eg unusually high request volumes, adversarial inputs, or attempts to probe the AI’s limits. Implement defences against prompt flooding, token exhaustion attacks, systematic jailbreak attempts, and bot-driven misuse. Log all interactions. Alert on thresholds that suggest an attack may be in progress.
- Retest AI systems for safety regressions after every update or change. Safety testing is not a one-time activity. Build automated red-teaming, adversarial testing, output classifiers, and bias evaluation tools, and prompt injection tests into your CI/CD pipeline where possible.
- Protect all AI infrastructure comprehensively. This includes the model endpoints, repositories, vector databases, embedding stores, training pipelines, and supporting systems. All must be hardened, monitored, logged, patched, and tested for security.
- Apply security design principles to AI systems. Where possible, enforce least privilege (limit what the AI can access and do), zero trust (never assume an AI-to-AI call is safe), defence in depth (layer multiple controls), and attack surface reduction (limit the AI’s reach to only what is required). Do not rely on a single guardrail. Apply stricter principles for higher-risk AI systems.
- Control what files and data AI agents can access, read, write, or delete. Use strict access controls. Treat all files produced by or passed through an AI agent as potentially untrusted. Ensure important data is backed up, stored encrypted, and protected with monitored access controls.
- Prevent race conditions in AI agent workflows. When multiple agents or processes interact with shared state or external systems, use proper coordination, locking, and sequencing to prevent conflicts. Most modern AI orchestration frameworks provide tools for this.
- Use established identity, authentication, and access control systems for all AI agent interactions. Do not build your own AI authorisation logic from scratch. Existing solutions are well-tested. Writing custom access control for AI agents introduces serious risk.
- Apply encrypted, certificate-validated connections. Use HTTPS and validated certificates or similar for every AI API call. Follow your organisation’s cryptographic standards, or those of OWASP, NIST, or your relevant government body. Choose the strictest applicable standard and check that certificates are valid and from the expected host. Do not connect to unverified AI endpoints. This integrity check prevents man-in-the-middle attacks on AI API calls.
- Follow a secure AI development lifecycle. Integrate safety and security activities at every stage: design, development, testing, deployment, and monitoring. If your organisation does not have a defined lifecylce, create one. Add safety activities to your existing SDLC where possible.
- Select AI frameworks and platforms with strong, built-in safety features and guardrails. Do not write your own safety controls from scratch when established solutions exist. Always use a supported, up-to-date version of any AI framework or SDK. Then avoid bypassing content moderation, or undermining output constraints. If the defaults do not fit your use case, work with your security team, do not simply disable them.
- Manage agent state explicitly and consistently. Establish a standard approach to manage context windows or agent memory. Inconsistent context management introduces subtle bugs and security risks that are difficult to detect and diagnose. Initialise AI agent state explicitly before use. Do not rely on implicit defaults or assume prior state is clean. Uninitialised or stale state in AI agents produces unpredictable and potentially unsafe behaviour.
- Do not use real user data for AI development, testing, or fine-tuning without proper anonymisation and approval. Raw production data in non-production AI environments is a serious privacy and compliance risk. Use purpose-built anonymisation or synthetic data generation tools, not home-grown masking scripts.
- Protect all AI API keys and administrative accounts with multi-factor authentication. Use long, unique, complex credentials for every account with elevated AI system access. Use a password manager. Reset credentials immediately if you suspect a breach. Rotate keys regularly.
- Offer strong authentication to users of AI-powered systems. Provide MFA where possible. Implement defences against credential stuffing, supply chain, and malware attacks on their workspace. Carefully log all access and alert on suspicious patterns. Geoblock their access limiting exposure if access is breached.
- Create response plans for AI failure scenarios. Assume a failure will happen and prepare in advance. Practice these scenarios where possible. Ensure your AI systems are included in business continuity and disaster recovery planning.
- Audit AI system configurations and permissions at least annually. Review what models are in use, purpose, what access they have, what data they can reach, and whether their configurations remain appropriate.
- Manage AI session tokens and credentials securely. Apply the same standards as secure cookie management: limit scope, set expiry, enforce secure transport, and never expose credentials unnecessarily.
- Protect proprietary AI prompts and agent logic from unauthorised exposure. System instructions, reasoning chains, and agent architectures can contain competitive intelligence and safety-critical logic. Where appropriate, treat them with the same care as source code.
- Protect AI interfaces from cross-origin and cross-site abuse. Apply the same cross-site request forgery (CSRF) and cross-origin resource sharing (CORS) protections to AI endpoints as you would to any web application. These protections are not always on by default.
- Be cautious about AI model version updates and backwards compatibility. A model update may silently change safety behaviours, output formats, or reasoning patterns. Balance usability against the risks a new version introduces. Ideally fix model choice. Test thoroughly on updates and document your decisions.
- Build reusable, tested AI safety controls rather than reinventing them. If you build a safety control once, make it reusable and test it thoroughly before applying it widely. Continue testing it over time. When a safety bug is found, update every system that uses it.
- Use robust version controls. Keep the prompts, markdown files, infrastructure-as-code, and other code under strict version controls. This enables linting, CI/CD inline testing, and comparing against vendor or third-party model changes. Which means an easier time identifying changes when issues arise, faster rollback or changes.
- Make AI system behaviour auditable and explainable. Document prompts, system instructions, and agent decision logic. Add comments explaining safety controls. Readable, auditable AI systems are easier to test, easier to maintain, and faster to diagnose when something goes wrong. Clear documentation helps with safety testing, incident response, and onboarding. Build the behaviour with the future engineers in mind, which is most likely you, but could also be AI.
- Use immutable context and state in AI agent workflows wherever possible. Mutable shared state between agent steps creates unpredictable behaviour and increases the risk of manipulation. Design for immutability and explicit state transitions.
- Know and comply with all AI-specific regulations that apply to your systems. This includes laws, regulations, and standards such as the EU AI Act, local privacy laws, and sector-specific requirements. Ask your legal and security teams to verify your obligations.
- Maintain a current inventory of all AI models, agents, tools, and dependencies. Include where each model is deployed, version, how it is accessed, who owns it, responsible persons, and where its documentation lives. Audit this inventory at least annually.
- Adopt an AI safety framework if your organisation does not already use one. Frameworks such as the NIST AI Risk Management Framework, OWASP Top 10 for LLMs, or MITRE ATLAS provide structured guidance. If your organisation has adopted one, follow it.
- Decommission old AI models and agents carefully and intentionally. Remove API access, revoke credentials, archive documentation, and update your inventory. Document the decommission process (ideally before it’s in production). Abandoned AI systems with live access are a serious risk.
- Plan for AI model updates that can be applied smoothly and safely. If your users or downstream systems depend on your AI’s behaviour, ensure updates can be rolled out, tested, and rolled back without disruption.
- Provide a hardening guide for any AI system you deploy that others will operate. If an administrator or end user is responsible for configuring or operating your AI system, give them clear guidance on how to do so securely. Assume the end user won’t read it and act accordingly.
- Design AI systems to be easy to integrate with securely. If your AI system forces developers to adopt insecure workarounds to integrate it, the workarounds will become the norm. Secure integration should be the path of least resistance.
- Prioritise usability when designing AI safety controls. Controls that are difficult to use will be worked around. Test guardrails for usability just as you would any other feature. Good safety design and good user experience are not opposites.
- Verify and document user consent before AI systems process user data. Never assume consent. Always ask. Store consent records in your system of record. This is both an ethical obligation and, in most jurisdictions, a legal requirement.
- If you find a reproducible failure mode, safety or security bug in an AI model or framework, report it. Mistakes happen. Hiding or exploiting the mistakes is to be avoided. Be willing to share how your AI system failed and what you learned. Responsible disclosure benefits the entire AI development community. Follow the vendor’s responsible disclosure process.
- Treat AI safety warnings and alerts as errors and fix them. A safety warning that is ignored is a vulnerability waiting to be exploited. If your AI framework, monitoring system, or testing tool raises a warning, address it with the same urgency as a compiler error.
- If your AI system runs on physical hardware, include physical security. Securing the software layer alone is not sufficient. Physical access to AI infrastructure must also be controlled, monitored, and documented.
- Maintain a list of dangerous AI coding patterns to avoid. Include patterns such as direct prompt construction from user input, unrestricted tool access, unvalidated AI output passed to system calls, and disabled safety filters. Check your codebase for these patterns regularly. Automate detection in your linter or tooling where possible.
- Follow your organisation’s AI safety guidelines and approved patterns. Use the AI safety frameworks and approved integrations your organisation has established. If you don’t have one, write one. If a business requirement prevents this, work with your business to identify alternatives, document it, and notify the teams responsible. They may require a formal exception.