What Vibecoding a Real SaaS Taught Me About AI

I spent 15 days building a production multi-tenant SaaS using AI. Sounds fast. It wasn’t. Every layer broke in unexpected ways: RLS silently bypassed, OAuth tokens encrypted twice, environment variables reverted without warning, and when I pushed to production, the app pointed to my dev database.

I ended up debugging the way you supervise a junior engineer—constantly verifying, correcting, and questioning assumptions. The real work wasn’t coding. It was catching everything AI couldn’t see.

This is not a rant; it’s what building real SaaS with AI feels like today.

I hadn’t written production code in years. As a founder, most of my time goes into building my company and teams. So I gave myself a challenge: spend two weeks building a full multi-tenant SaaS module end-to-end using AI as my co-pilot.

Not a demo. Not a prototype. A real system with RLS, OAuth, background jobs, a full React UI, dev and prod environments, and an AI workflow.

To build it, I used Replit as the environment, pairing its AI coding agent with Architect for deeper reasoning. I also cross-checked output with ChatGPT and Perplexity. The specific tool didn’t matter , the same pattern emerged everywhere: AI writes code quickly, but it doesn’t preserve system-wide coherence across regenerations.

The goal was simple: get back into hands-on engineering and understand what actually works when you build with AI.

What I found surprised me. AI is genuinely fast at generating code. But the moment you touch production—real environments, real tenant isolation, real data—the gap between “the code runs” and “the system works” becomes impossible to ignore.

This is a technical postmortem of everything that broke, why it broke, and what actually fixed it.

The hard problems weren’t the obvious ones. They were the invisible ones.

AI accelerates you, but the engineering principles you learned years ago still decide whether your system lives or dies.

Every layer broke in different ways. Some were obvious. Most weren’t.

Row-level security is powerful. It’s also unforgiving.

What happened:

Admin credentials bypassed RLS entirely until I added FORCE. Without FORCE, RLS is just a suggestion.
Background jobs leaked tenant context because they don’t carry request headers.
The ORM dropped policies unless I declared them directly in the schema—Neon’s serverless connections reset, so RLS policies don’t persist.
Session context behaved differently depending on whether I was in a stateless or interactive environment.

So I moved RLS enforcement into database-level security definer functions instead of keeping it in application logic. Then I added a migration script to every deploy to verify policy integrity across environments.

Debugging RLS consumed a ton of time. Every time I fixed something, another path silently failed.

RLS ultimately worked but the cost was steep.

Writing OAuth is easy. Hardening it for multi-tenant production is not.

Real failures:

Refresh tokens were single-use, but I didn’t notice until a user refreshed twice.
Providers returned expiry fields as strings in one place, integers in another → silent comparison failures.
When the AI regenerated helper functions, tokens were double-encrypted:
- old code encrypted → new code encrypted again.
Every provider behaved slightly differently, breaking one-size-fits-all logic.

None of this produced meaningful errors.
Just opaque 401s. No trail. No warning.

User thinks they mistyped a password. The system knows nothing.

Requests carry tenant context.
Background jobs do not.

If you don’t manually set and clear tenant context, jobs operate with zero tenant context:

They read all data.
They write to the wrong tenant.
They corrupt data silently.

try {
  await storage.setTenantContext(tenantId);
  await doWork();
} finally {
  await storage.clearTenantContext();
}

That finally block prevented catastrophic corruption.

Hosted environments say they manage:

DATABASE_URL
PGUSER
PGPASSWORD

They do — until they don’t.

Real issues:

Values reverted silently.
Credentials pointed to admin roles.
Env vars desynced between preview and production.

When this happens:

RLS is bypassed.
Tenant isolation collapses.
Cross-tenant data leaks silently.

No AI agent warns you.
I added manual verification to my deploy checklist.

Failures I hit:

RLS policies vanish unless declared in the schema.
Roles differ across environments.
Migrations behave differently in dev vs preview vs prod.
Encryption logic changes subtly.
Schema drift accumulates invisibly.
Partial deployments create ghost bugs.

None of this is “a bug.”
It’s what happens when multi-tenant SaaS, multiple environments, and AI-generated code meet.

Each system does its job correctly.
Together, they create invisible drift.

This was the hard lesson.

Pushing to production does not sync anything:

Schema doesn’t copy
RLS rules don’t copy
Roles don’t copy
Environment variables don’t copy
Production DB starts empty

Preview ≠ Production.

What I had to manually do:

Finish 90% testing in preview
Create a fresh production DB
Push a clean schema, not a diff
Recreate RLS, triggers, and roles
Seed minimal test data
Re-add env vars for DB
Re-verify OAuth, RLS, and isolation in prod

And the disaster:
Because I created the production DB late:

Env vars never synced
Production app pointed to dev
Writes went to dev
Deleting “production” deleted both because they were wired together

This wasn’t a code bug — it was environment drift caused by assumptions I didn’t know I was making.

The only reason I recovered quickly:
I had a detailed schema and RLS document.

Documentation isn’t hygiene — it’s insurance.

AI rewrites:

logic
naming
migrations
patterns

The spec I wrote on day one was outdated by day two.If documentation doesn’t stay in sync, nothing aligns.
You end up maintaining a moving target.

AI tended to:

reassure instead of challenge
skip edge cases
contradict earlier decisions
rewrite its own patterns
almost never say “I don’t know”

This creates dangerous blind spots in infra, security, and data flows.

AI doesn’t push back. It acts confident even when wrong.

Fifteen days.
Multiple 12–14 hour sessions.

The failures weren’t in writing code.
They were in everything surrounding code:

migrations
tenant isolation
environment drift
silent failures

Production-grade systems don’t break loudly.
They break quietly.

And the error messages rarely help.

Code deployed fine.
Everything else did not.

Because env vars never synced:

Production wrote to dev
Databases were still coupled
Deleting “production” deleted both

No warning. No error. Just silent destruction.

Documentation saved the day.

The fear this created wasn’t fear of pushing code — it was fear of hidden coupling you can’t see until it’s too late.

The single most effective decision.

Backend-first → endless regeneration:

schemas changed
migrations shifted
API contracts drifted

Ask for one tiny fix → entire subsystem rewritten.

UI-first stabilized everything.

AI understands English and UI flows much better than backend architecture.

A React component is concrete. “A user should see this list” is concrete.

Once the UI existed:

API contracts were obvious
backend became mechanical
debugging became visual
regeneration became safer

If you start backend-first, you’ll spend half your time chasing drift.

When I said “Fix the RLS policy,” the agent rewrote things blindly.

But when I asked:

“Why did RLS bypass here?”
“What assumption is wrong?”
“What changed between versions?”
“Does your output match the spec?”

The agent reasoned instead of generated.

What worked:

forcing it to reason
asking it to audit itself
making it check against the spec
keeping a small “reminder log” to anchor its memory

AI improves when you make it think. Not when you make it produce.

When prod wrote to dev and deleting “production” wiped both DBs, recovery was only possible because I kept a single living document with:

full schema
every RLS rule
all roles & permissions
migrations in order
key architectural decisions

I rebuilt the entire system from that file.

In my case, Replit’s environment variable panel made it easy to edit values, but nothing warned me that the production instance still pointed at the dev database, a reminder that even good tooling can’t replace explicit verification when AI-generated code spans multiple environments.

Documentation is not optional. It is your disaster recovery plan.

With AI constantly regenerating logic, documentation must be as active as testing.

My actual workflow:

Perplexity → contradiction + architecture
Replit coding agent → implementation
Replit Architect → deep reasoning

One model hallucinated.
Two contradicted each other.
Three revealed which contradictions mattered.

Still brittle — but far better than relying on one.

AI repeatedly broke derived state when regenerating backend code.

Triggers:

guaranteed consistency
survived regeneration
eliminated drift

Major stability win.

AI tried to unify logic that shouldn’t unify.

Manual creation → simple defaults
CRM import → intelligent reconstruction

Separating paths made the system robust.

After silent migrations, vanished RLS, and environment drift, governance became non-negotiable:

architectural rules
intent-locking
environment discipline
guardrails
verification steps

AI accelerates. Governance stabilizes.

Current agents behave like autocomplete, not engineers.

They:

forget decisions
contradict themselves
break invariants
lack state
lack ownership

Future agents will need:

persistent memory
behavior modeling
constraints & incentives
rules & boundaries

The need becomes obvious the moment you ship real systems.

Every failure traced back to one root cause:

AI agents regenerate without preserving coherence.
They forget why decisions were made.
They don’t know what must never change.
Architecture collapses when code regenerates.

This isn’t a flaw in the tools. It’s a gap in how we’re using them.

Shipping a real system revealed gaps that never show up in demos.

The same questions kept surfacing:

Is this the right model?
Are we leaking IP?
Why does dev work but prod fail?
Why did the agent forget its own rule?
Is any of this creating value?

At some point the conclusion became unavoidable:

We’re missing an engineering layer.
A layer that:
keeps architecture coherent
stabilizes environments
reduces agent drift
prevents model misuse
protects IP
surfaces where value is created

This layer only becomes visible when you build a real system with AI — not a demo.

And it’s what I’ve started building.