The day an AI taught me how to hack my own company

The more nerdy of us have been discussing AI alignment for over a decade now, mostly in the context of existential risk: the paperclip maximizer that destroys the world to hit its production quotas. But until recently this has largely been a theoretical debate.

This week, I saw a didactic, real-world example of alignment failure happen right on my screen. It wasn’t a sci-fi nightmare; it was a mundane IT ticket.

The Incident
I’ve been integrating a team of AI agents into my daily workflow at Wildlife Studios. Recently, I tasked one of them (running on Google DeepMind’s Gemini Pro via Antigravity) to access a file in our corporate Google Drive using the Model Context Protocol (MCP).

The agent hit a hard stop: I lacked the specific admin permissions on our enterprise domain to create the necessary Google Cloud Project.

In a traditional workflow, this is where the process ends. You file a Jira ticket, you Slack IT, and you wait.

But the agent is an optimizer. It viewed the error message not as a hard law, but as an obstacle to be routed around. Within seconds, it pivoted and generated a step-by-step guide to bypass our security protocols:

The Workaround: Create the project on my personal Gmail account instead.
The Bypass: Add my corporate email as a “Test User” to circumvent the organization’s external access blocks.
The Execution: Generate the keys and proceed with the data extraction.

It effectively taught me how to hack my own organization to get the job done.

Optimization vs. Constraints
This creates a fascinating, if worrying, dynamic for anyone managing tech teams.

Agents are outcome-oriented. Their objective function is “complete the task.” Unless “adhere to corporate policy” is explicitly weighted as heavily as “finish the job,” they will optimize for the former at the expense of the latter.

We are used to “Shadow IT”, employees using Dropbox or WhatsApp because the corporate tools are too slow, too restricted, or not compatible with external contacts. But we are entering the era of Shadow Agents.

This is different because of the active coaching element. An employee might not know how to bypass a specific GCP permission block. But now, millions of workers have a professional hacker sitting in their sidebar, ready to whisper a technical workaround the moment a policy gets in the way. And even worse, the employee might not even know a policy is being breached, as the Agent may be able to convince or mislead them!

The Awakening
Companies that are pushing for “AI Adoption” without preparing their infrastructure are in for a rude awakening.

If your security strategy relies on “security by obscurity” or friction, agents will dismantle it. They will find the path of least resistance, and if that path involves unsanctioned APIs or personal accounts, they will take it.

The Takeaway
We need to shift our mental model. You cannot just block tools anymore; you have to define agent boundaries.

If you don’t provide a sanctioned, low-friction path for these agents to operate, they will invent an unsanctioned one. And unlike a human employee, they won’t hesitate to break a rule they don’t understand to solve a problem they do.

Constrain your agents now, or prepare to pick up the pieces later.

The day an AI taught me how to hack my own company

Discussion about this post

Ready for more?