Building an AI Firewall: Three Things I learned while Securing MCP

MCP is the future of AI & AI Security

Model Context Protocol (MCP) is a standard proposed by Anthropic that allows LLMs (models) to gather data and take actions (context). By separating models from tools, it flattens the gap between frontier models that can do amazing things — like searching the web, controlling a browser and searching files on your computer — and other models (like Llama).

With MCP, every model can have access to flexible, powerful tools.

MCP also provides a centralized control point for security. Just as traditional firewalls guard the boundary between internal networks and the internet, AI firewalls will stand between models and their context — monitoring and managing what the models can see and do.

We are not far from a world where the AI agent booking your flight has to justify the charge to your credit card company’s AI agent before it can make a charge. Those interactions will be made possible by MCP.

How to secure MCP

The joke in the AI security community is that the “S” in MCP stands for Security. Security was not built into the protocol and is not directly handled by most MCP servers. That is actually a good thing, because it lets server developers focus on making better tools instead of authentication and authorization.

The key is to wrap MCP servers in an authentication and authorization layer.

At Costa (my company), we call this an AI Gateway. An AI Gateway is a firewall, a proxy to models and a proxy to MCP servers. It acts as an MCP client to MCP servers, but can also act as an MCP server to MCP clients.

AI Gateways inject authentication (so the MCP server can have access to things like API credentials), and run security controls (similar to a WAF) on responses.

You can also use an AI Gateway as a security bridge to models (to protect information in prompts).

A diagram of the architecture:

Press enter or click to view image in full size

Sequence from Prompt -> Response using MCP of AI Gateway

As a concrete example:

Press enter or click to view image in full size

Example of a Secure MCP call

An AI Gateway can be in the cloud, running as an agent on your computer, or really anywhere… as long as it has the ability to proxy requests to the MCP server, to the model, and can run security rules.

If you are interested in running remote MCP servers, Cloudflare announced a way to run MCP servers in their cloud. (warning: I have not tried this yet — so cannot vouch for it).

Get Jake Heimark’s stories in your inbox

Join Medium for free to get updates from this writer.

Remember me for faster sign in

You could probably use Cloudflare ZTNA + Cloudflare remote MCP to build a simple AI Gateway with MCP access control.

Three things I learned: the non-obvious things about MCP

1. There is no best model or best tool

Some models are good at working with certain servers and others are good at working with other servers. I find that Qwen-QwQ works well with Jira and Llama 3.2 is great at navigating folders in Box.

If you are building a business around AI agents as a service, knowing which tools work well with which models for the business problem you are trying to solve is going to be key.

Making the perfect tool or even suite of tools is probably less valuable than it was four months ago.

2. Do not fight STDIN/STDOUT (yet)

The MCP schema allows for HTTP Server Sent Events (SSE). But most of the best (open source) MCP servers today run via stdin/stdout and are configured via an ENV variable or command line arguments.

If you want to give people access to the best servers at this moment, you need to work with the ecosystem as it is today. I wasted some cycles trying to get ahead of it and adapt servers into SSE — better to go with the flow.

This does not mean you can only run the servers locally. In fact, at Costa, we run most of our servers in isolated processes in the cloud. It just means that you need to create middleware to get the traffic from the model to the MCP server and back.

3. MCP is stateful and bidirectional by design. Today that does not matter, but it will soon. A lot.

The entire reason MCP is powerful is because we can’t just point models at an API and have them efficiently pull information out of a system (yet). We need middleware that provides rails to keep the models on track.

Most APIs are stateless — meaning when you make your fifth API request, the API does not remember anything about your fourth API request. It’s a whole new request.

MCP is not stateless by design. It is stateful. The connection itself (not just the server) has state.

Why does that matter?

At some point, MCP servers will not just be simple tool call wrappers. They will actually reach out and ask clients to do things and remember everything about each of their current sessions. That means MCP servers themselves can act as agentic orchestrators — conducting a symphony of attached MCP clients.

When stateful bidirectional communication becomes a thing, it’s going to be a BIG THING. MCP is how AI agents are going to start, stop, manage and communicate with other AI agents.

It is also going to be a security nightmare if we cannot get our tools around the protocol quickly enough to provide a platform for security AI agents to fight hostile AI agents.

My guess: within 18 months we will need them.

BTW — so many MCP servers are currently stateless that the protocol recently changed to include starting connections in stateless mode and upgrading them to stateful when the client is ready.