AI-Assisted Development: How to Code Faster Without Losing Control

The practical workflow I use with Claude Code, Codex, Cursor, Antigravity and other AI tools — from planning, through tests and code review, to shipping.

This is not a beginner’s introduction to AI, and it’s not another collection of “magic prompts” or tool hype. It’s a practical guide to working with AI as part of a real software development process: planning, implementation, debugging, testing, documentation, code review, and shipping. I’m not trying to argue that one model or one editor changes everything, because it doesn’t. What matters is the workflow around them. This article is about how to use currently available AI coding tools in a way that actually makes you faster without giving up control over quality, architecture, security, or the final result.

TL;DR: Do code review ;-)

Tools

Right now, AI coding tools fall into three main categories:

Command-line
Terminal-based coding agent that reads, edits, and runs code directly on your machine. Some have integrations with IDE allowing to see changes in files directly in IDE.
IDEs
AI-native editors mostly based on VS Code. Deep codebase awareness, multi-file editing, sub-agents, skills, and support for multiple model families in one place.
Mobile Apps
Working remotely in cloud on your code repo, which makes it genuinely useful for quick reviews, debugging, planning, or checking something while you’re away from your desk.

The important point here is simple: you don’t have to lock yourself into one tool. In practice, it’s better if you don’t.

I usually run command-line tool (either claude or codex) inside IDE terminal (either Cursor or Antigravity) and do main development in terminal and ask IDE agent for reviews.

Models for Coding

Rankings

Model rankings move fast enough that any fixed table will age badly. So instead of treating this section like a permanent leaderboard, I think it’s better to treat it like a way of reading the market. If you care about staying current, check a ranking site like Artificial Analysis roughly once a month:

https://artificialanalysis.ai/models/capabilities/coding

Not because the ranking is absolute truth. It isn’t. But it gives you a good high-level view of which models are currently strong at coding, which cheaper models are catching up, and where the overall market is moving.

At the premium end, the names worth watching are usually the same small group of frontier models: Claude, GPT, and Gemini. These are the models I use for: planning, tricky debugging, architecture changes, larger refactors, code review on important work, tasks where better judgment matters more than token cost.

The budget tier is where things get really interesting. This category changes constantly, and that is exactly why checking current rankings matters. A model that was “clearly second tier” a month ago may suddenly become the best cheap option. Budget models are perfect for: sub-agents, reading logs, documentation search, repetitive implementation work, bulk transformations, cheap second opinions. The main point is not to find the one best cheap model forever. The point is to keep an eye on which low-cost models are currently punching above their price.

Stop looking for one perfect model. Start thinking in terms of a model stack. Use a strong model for planning and hard problems. Use a cheaper, faster one for routine work, search, and analysis. MIX THEM. That’s where the real gains start.

Subscription vs API Pay-Per-Use

SUBSCRIPTION. I suggest to mix at least one top-tier package for CLI (e.g. Claude Max, Gemini AI Ultra, ChatGPT Pro) with some basic IDE subscription having AI usage included (e.g. Cursor, Windsurf).

Using Two Models for Cross-Validation

One of the best habits you can adopt is using at least two model families. For example: one model from Claude, one from ChatGPT or Gemini. Let one write code. Let the other review it. Then switch roles when it makes sense.

Why does this work so well? Because models have blind spots. And those blind spots are not identical. One model misses something obvious to another. One is better at structure, another at critique. One is too eager to refactor, another is better at spotting risk. That difference is useful.

Choosing the Right Model for the Task

A practical split looks like this:

Planning and hard tasks → big model, high effort
Example: Opus 4.6 with high effort
Routine implementation → lighter model, medium effort
Example: Sonnet 4.6 with medium effort

High effort usually gives better results, but it also fills context quicker, costs more tokens and takes more time. In practice, the quality boost is often worth it on important tasks. On routine work, it usually isn’t.

Understanding Context

If there’s one thing people underestimate when they start coding with AI, it’s context. The context window is the total amount of text a model can see at one time. That includes everything: your prompt, the files it reads, terminal output, logs, previous replies, generated code, tool output. All of that lives in the same window. When the window fills up, the model starts losing awareness of earlier parts of the session.

How Much it Fits?

100K tokens
~300 pages of text, or ~50–80 source files of moderate size
1M tokens
~3,000 pages of text, or a medium-to-large codebase in one shot

That sounds huge. In practice, it disappears fast! A normal coding session can burn through context surprisingly quickly:

Reading 10 source files: ~15–30K tokens
Running a build and reading output: ~5–10K tokens
Browsing logs: ~10–50K tokens per dump
Each back-and-forth with the model: ~2–5K tokens
Reviewing generated code: ~5–15K tokens

You can easily burn 100K tokens in 15 minutes of active work without realizing it. As context fills up, quality drops. This is where the weird stuff starts: the model forgets an earlier requirement, it misreads your intent, it contradicts a previous decision, it reintroduces something you explicitly removed, it starts coding in a way that feels slightly “off”. That’s context degradation.

Context Compaction

Most tools now do auto-compaction when the session gets too full. They summarize old conversation and free up space. That helps, but it isn’t free. A compacted summary is not the same thing as full original context. You can also do manual compaction yourself. For example, /compact in Claude Code. And honestly, this is one of the best habits you can build. If you’ve been working for a while and you’re about to start a big new task, compact first. Don’t go into a complex task with half your context budget wasted on old discussion fragments, build logs, and dead ends from two hours ago. Fresh context matters.

The Power of a Good Prompt

Garbage In → Garbage Out

AI is not telepathic. If you give it vague, incomplete instructions, you get vague, incomplete output. Sometimes dressed up confidently, but still incomplete. The real fix is not “prompt engineering” in the social-media sense. It’s project setup. Before you do serious AI-assisted work, create configuration files that explain your project and your rules.

Configuration Files for AI

Different tools read different files at the start of a session:

Claude Code reads CLAUDE.md
Codex CLI reads AGENTS.md
Cursor reads .cursorrules or AGENTS.md

The ecosystem is gradually converging on AGENTS.md as the shared standard through Agent Skills. These files are effectively your contract with the model. This is one of those things that feels optional until you work without it. Then you realize how much chaos comes from the model simply not knowing your standards.

What to Include Configuration File

Coding standards
Modularization rules, for example: no file should exceed 1,000 lines; if it does, suggest refactoring. This is my must-have rule in each project.
Documentation rules
E.g.: documentation must be updated before every push to the repository. Not necessarily before every commit, but definitely before pushing.
Testing requirements
Always create at least minimal unit tests, even for small changes. These tests are not only for you, they are also for the model. Tests let the model verify it didn’t accidentally break or delete existing behavior. Add integration and functional tests where the project supports them. Update tests whenever new functionality is added.
Git hooks
Run tests before commit or before push, depending on the project
If hooks don’t exist yet, ask the model to create them.
References
Include links to other documentation in the repository so the model knows where to keep reading. That’s what good documentation often is for AI: not just explanation, but navigation.

For more information on guiding coding agents I recommend to look at this website: https://AGENTS.md

Project File Structure

Structure matters more with AI than it does with humans. A human can wander through a messy repository and gradually figure things out. A model can too, but every extra step burns context and increases the chance of misunderstanding.

Small Projects — One File

README.md
For a small project, you can often keep everything in one README.md file: project description, model rules, test instructions, current tasks. Simple is fine when the project is simple.

Large Projects — Multiple Files

For bigger projects, separate concerns:

AGENTS.md or CLAUDE.md
Rules and instructions for the AI
README.md
Project documentation, architecture, structure, integrations
TODO.md
Current and upcoming tasks
CHRONICLE.md
Experiment journal for R&D work

Chronicle ≠ Changelog

This distinction is worth making clearly.

Chronicle is a working journal:
What you tried, what worked, what failed, experiment results, why certain decisions were made. Its job is to preserve context across sessions so the model doesn’t repeat failed approaches or revisit abandoned ideas.
Changelog is much lighter:
Version numbers, summary of what shipped. In many projects, a separate changelog is unnecessary if your git history is already descriptive.

Use a Chronicle for:

R&D work
experiments
trial-and-error projects
systems where you expect a lot of dead ends

Skip it for:

standard app development
typical websites
straightforward product work

In those cases, strong commit messages are usually enough.

Starting a New Project with AI

Starting from zero with AI can be incredibly fast, but only if you do the first steps properly.

Plan First

Describe what you want to build. You can even ask the model to help you shape the description.
Use planning mode. For example, Shift + Tab in Claude Code. In this mode, the model does not write code. It focuses on concept and plan.
If you’re using Claude Code, there is also a very useful shortcut here: /init — this command helps bootstrap a project session and generate the initial project context. It’s a great starting point when you’re setting up a brand-new repo. I don’t want to overstate this as a universal pattern, because tools differ and I don’t want to claim the exact same command exists everywhere, but in Claude Code specifically it’s worth using.
Iterate on the plan. The model may ask clarifying questions. Refine until the plan feels right.
Save the plan to README.md. That becomes your starting documentation.

Technology Choices

Be explicit. If you have preferences for frameworks, libraries, infrastructure, deployment model, external integrations — say them upfront. Otherwise the model will pick its own stack. Sometimes that works. Sometimes you end up with architecture you never wanted.

Specify Deployment Target

This one is easy to overlook and extremely important. Tell the model where the project will be deployed. Why? Because deployment constraints shape architecture. Examples:

Cloudflare Workers means serverless constraints and Worker-specific patterns
Docker on Kubernetes implies a completely different structure
Vercel brings edge-function assumptions and platform limitations

If the model knows the target upfront, it plans accordingly. If it doesn’t, you often end up redesigning things later.

Joining an Existing Project

Existing projects are a different game. You’re not inventing structure from scratch. You’re reconstructing it.

If There Is No README

Ask the model to perform a full code review or audit of the project and generate documentation from the codebase itself. That gives you a baseline understanding and gives the model one too.

If README Exists

Don’t assume it’s accurate. Ask the model to verify the README against the actual codebase.

Questions to Ask

Ask the model to fill the gaps:

Is anything outdated?
Are there missing sections?
Does the documented architecture match reality?
Are external integrations fully described?

I like to ask AI to: do a comprehensive codebase analysis and search for any discrepancies between code and documentation.

Key Elements to Document

At minimum, capture:

project structure
which folders and files contain what
external service integrations
environment setup
how to install, run and test project
references to other docs

This sounds basic, but it dramatically improves future AI sessions.

The Implementation Cycle

This is where the day-to-day workflow and discipline really matters.

1. STARTING EACH SESSION WITH PLANNING

A good session usually starts like this:

Ask the model to read the docs or read .md files
Describe the task, or point it to a task in README.md or TODO.md
Start in planning mode
Iterate on the plan until you’re satisfied
Before implementation, consider compacting the context

That last step is underrated. Planning consumes tokens too. If the planning conversation got long, clean the session before implementation so the coding phase starts with maximum fresh context.

2. IMPLEMENTATION

Once implementation starts:

Generate the first version
Check whether it compiles and runs
Then iterate using short commands

This is one of the nicest parts of AI-assisted coding. After the model has the plan, you usually don’t need long prompts anymore. You can say: “fix this, “add that”, “change the color”, “move this into a separate module”. And it understands because the shared context already exists.

NEVER STOP AT THIS POINT.

3. TESTS

Ask the model to create or update tests: unit tests, integration tests, functional tests where appropriate. And here’s a very important rule:

Let the model run the tests itself. Don’t be human clipboard.

Models often ask you to run terminal commands for them. Try not to become the middleman — tell the model to run the commands itself. Why this matters: if tests fail, the model sees the failure directly, it can inspect the output, it can fix the issue immediately. That removes the whole annoying loop where you copy terminal errors back into chat like a human clipboard. This alone saves a huge amount of time.

4. DOCUMENTATION

After the code works, update the docs. Not later. Not “when there’s time.” Right then. Because the model also needs the updated docs for the next session.

5. CODE REVIEW (!)

Code review happens after tests and docs. Because tests and documentation also need review.

Self-Review

Use a clear prompt such as: “Do a code review of uncommitted code”. The word uncommitted matters because it scopes the review cleanly.

Read CR Recommendations

Do not blindly say “fix everything”. That sounds efficient, but it’s risky. Reviews often include: good fixes, optional suggestions, stylistic changes you may not want, recommendations that could break unrelated behavior.

So instead, go through the review point by point:

fix
change only X, not Y
skip
fix

Do it even if there are 10–15 points. You don’t need to keep the whole narrative summary, but you should always read the recommendations, priorities, and issues. That keeps you in control of what changes.

Cross-Model Code Review (!)

Now run the same review in a second model on the same branch. Then choose one of two paths:

let the second model fix issues directly, then ask the first model to review those changes
or feed the findings back into the first model for implementation

And there is one step people skip too often:

after fixes are made, ask the reviewing model to verify the fixes

The cycle is: REVIEW → FIX → CHECK THE FIX (don’t skip the last part)

Code Review Matters

If I had to pick one habit as the most important in AI-assisted development, it would be code review. Models make mistakes. Constantly. Even very good ones. Cross-model review catches what a single model misses. It is one of the highest-leverage practices in the whole workflow.

Adapt It to Your Pipeline

Depending on your workflow, you might review: a feature branch, a pull request, a range of commits. The principle stays the same: ALWAYS REVIEW BEFORE MERGE.

6. COMMIT / MERGE / PR

Only after: tests pass, docs are updated, reviews are done.

Commit Early and Often

This gets even more important when AI is involved. After completing a task, commit immediately. Commit after every completed task. Even if something is not fully finished, commit work in progress when needed.

Why it matters? There are several reasons: code review becomes easier from commit history, you can revert quickly if the model breaks something later, you reduce the risk of losing work when context expires or a session goes sideways. And that last one is real. If you go through multiple tasks without committing, and then the model damages something or the session loses context, you can lose hours of progress.

Let AI Commit

Ask AI to do the commit so it will automatically create the description. Usually it knows better than you do what changed, because it was directly involved in the implementation. Good commit messages also reduce the need for a separate change log.

Branching Strategy

Feature branches, trunk-based development, whatever works for you. That part is personal preference. The important thing is not the branching philosophy. It’s that you commit.

The Full Cycle

That loop works. Good AI-assisted development is about building a loop that stays reliable. And more importantly, it keeps working once the project grows.

Press enter or click to view image in full size

Security

Environment Variables and Secrets

This part is simple and non-negotiable:

Store keys, URLs, and credentials locally in .env
Never push .env to the repository
Make sure .env is in .gitignore

For production: set variables through platform tooling. For example: wrangler secret put, Vercel dashboard, GCP console, and similar systems.

Also make sure your dev and prod environments have access to the same variable set. The model can help by: generating the list of required variables, setting up config, checking where variables are referenced.

Keep Secrets Out of Prompts

Do not paste API keys, tokens, or passwords into prompts. EVER. Use environment variables and .env files instead. And yes, also make sure .env is in .gitignore.

This matters even more with models that have internet access or broader tool integrations. Anything you put into a prompt may be logged, transmitted, or retained somewhere outside your control.

AI Permissions

Give them, but use judgment and don’t go too far in any direction. Don’t be afraid to give the model permissions to: run terminal commands, work on servers, interact with cloud services, access databases. That’s where AI-assisted development becomes genuinely powerful.

Just use common sense:

development environments are low risk (should be)
production systems need more caution (add restrictions)
production databases need much more caution (read only)

As always: make BACKUPS, know your blast radius, and think before you approve. Tip: do not give AI access to your backups ;-)

Debugging with AI

Debugging is one of the places where AI can save absurd amounts of time, but only if the model can see what’s actually happening.

Give the Model Access to Logs

If the model has access to CLI tools such as: wrangler, Vercel CLI, other platform tooling — it can run commands like wrangler tail and inspect live errors directly. That changes the whole debugging loop.

Instead of: you reproduce the bug, you copy the error, you paste it into chat, the model guesses,
you get: model runs the command, model sees the error, model fixes the cause. Much faster!

What AI Can Test Well

API calls / CURL
It can send requests and analyze the responses
Server logs and local logs
It can read them directly and reason about them

What You Should Usually Test Yourself

Browser / UI behavior
Models can interact with browsers e.g. through Puppeteer or built-in browser features in some tools. But in practice, visual testing is often faster and more reliable when you do it yourself. You know what you’re looking for. You’re faster at spotting: alignment issues, visual glitches, odd spacing, broken responsiveness, “this just feels wrong” problems. You can always paste console errors back to the model if needed.

Core Principles

The more the model can inspect and execute on its own, the faster debugging becomes.
Let it: run commands, read logs, execute tests. Don’t become the bottleneck.

Agentic Tools

This is where things get more advanced and much more interesting.

Sub-Agents

A sub-agent is a separate AI instance with its own context window. It handles a specific task independently and returns results to the main agent. That matters because it lets you separate work instead of stuffing everything into one giant context.

When to Use Sub-Agents? Good use cases:

parallel tasks touching separate files
large repository analysis
documentation generation for a big codebase
log analysis delegated to a cheap model

That last one is especially useful. You do not want your expensive flagship model spending valuable context on hundreds of lines of noisy logs if a cheaper model can summarize them first.

Sub-agents are now available in: CLAUDE CODE, Cursor, Codex CLI, Gemini CLI. At this point, all major tools support the concept.

CRITICAL UNDERSTANDING: A sub-agent is a new session. It knows nothing. That means the task you give it must be: clear, self-contained, specific. If every sub-agent has to re-read the whole repo and all docs, you’ll often spend more tokens than doing the work directly in the main session. Sub-agents are powerful, but only when the work is well scoped.

Don’t overdo it. Three or four sub-agents at a time is usually a healthy maximum. Beyond that, productivity tends to go down rather than up.

Do cost optimization. One useful trick is to encode this into your AGENTS.md. For example: Use a lightweight model and a sub-agent for log analysis and documentation search. That way the main model automatically delegates cheap tasks to cheaper workers.

Skills

Skills are reusable instruction packages, typically stored as SKILL.md files. They follow an open standard that already works across many platforms. Examples:

a custom skill for your app’s UI conventions
a cleanup skill like Simplify Code
skills installed with npx skills add

They’re basically a way to turn repeatable patterns into reusable behavior.

MCP (Model Context Protocol)

A short mention, because it matters but doesn’t need a deep dive here. MCP is a standard for connecting AI models to external tools and data sources: Slack, GitHub, databases, internal systems. Very useful for advanced integrations. Not essential for everyday coding.

Long-Running Processes

Some tasks take time: builds, deployments, training runs, migrations. You don’t want to sit there manually polling status like it’s 2012.

Claude Code has a built-in /loop command, for example:

/loop 5m check if the deployment finished and tell me what happened

This creates a cron-style background check every five minutes for the current session. The model checks status, reads logs, and reports back when the work is done. Very useful.

Claude loop tool is session-scoped, tasks disappear when you close the terminal and auto-expire after 3 days.

For anything persistent, scheduled, or production-grade, use real automation: desktop scheduled tasks, GitHub Actions, other CI/CD tooling.

Practical Rules and Tips

After Compaction or a New Session, Re-Explain

Remember that after auto-compaction, manual compaction, or a fresh session, the model has lost detailed context. It should remember recent conversation but do not assume it still “knows what you meant.”

Resume Sessions

If your tool supports session continuation, you can use it to continue where you stopped. For example, in Claude Code you can resume a previous session with --continue or shorthand like claude -c.

This is useful when you want to get back to an earlier thread of work without starting from zero.

Be Extra Cautious after Refactoring

Models sometimes delete code they think is unnecessary. Not maliciously. Just confidently. And the scary part is that it may be dozens or hundreds of lines. After every refactor: do full double review, compare against the last commit with git diff (ask model for it), check what lines were removed.

Don’t Waste Time on Typos

Typos, punctuation, capitalization, grammar, messy phrasing, none of that matters much. Models usually understand intent perfectly well. Don’t optimize for polished prompts.

Feed the Model Documentation

Coding models rarely search the web on their own in the middle of implementation. So if you’re integrating a new library, a third-party API, a framework you haven’t used before — give the model URL to the docs and tell it to read them before implementation. Markdown docs are especially good for this.

Use Screenshots

Modern coding models are multimodal. If a layout bug or visual issue is hard to describe, show it. Paste: screenshots, image links, design references.

This works especially well for: CSS/layout issues, responsive breakpoints, visual regressions, UI polish problems.

Review Changes

Even if you don’t inspect every line, do a quick visual sanity check: how many lines changed? does that feel proportional to the task? did something unrelated vanish? was too much rewritten? That quick scan catches a lot.

Let AI Handle Git Issues

Merge conflicts, push failures, rebases, weird git states, this is exactly the sort of annoying mechanical work AI is often very good at. Ask it to fix the problem.

Remote VPS Setup

One setup I find especially practical is running Claude Code or Codex CLI on a cheap VPS, for example from Hetzner or DigitalOcean. That gives you two big benefits:

Isolation — Your AI agents work on a remote machine, not your personal laptop. That means less risk to your local environment.
Continuity — You can access the same session from anywhere: your laptop, your office, home, on the road. Your agents and their context stay on the server.

That is both convenient and surprisingly comfortable once you get used to it.

Mobile Workflow — Prototype on the Go

Mobile is also becoming much more practical than people expect. With apps like Claude, especially now that code execution is available, you can start real work while you’re moving around: on a train, between meetings, during a walk, wherever you happen to be.

A simple workflow looks like this:

Open Claude mobile and describe what you want to build or change
The model writes code, creates files, runs tests, all remotely
Later, on your laptop, review the pull request properly and merge

That is a very real productivity boost for prototyping. You no longer need to wait until you’re at your desk to begin. By the time you sit down, the PR can already be waiting for review.

Type or Talk

At a computer, typing is still usually the best input method. On mobile, voice dictation is excellent for prompts and rough ideas. For longer dictation, dedicated transcription tools, like Wispr Flow, often work better than standard keyboard dictation.

Can’t Be Ignored — OpenClaw

Press enter or click to view image in full size

It was supposed to be a meme-like image visualizing OpenClaw in a refrigerator 🤣 But it fits well too

Multi-Agent Orchestration

OpenClaw, formerly Clawdbot or Moltbot, deserves a mention because it points toward where all of this is going: general-purpose autonomous multi-agent orchestration, accessed via messaging platforms (WhatsApp, Telegram, Discord), that can handle multiple types of tasks. And yes, that’s exciting. But it’s also where a lot of hype outruns reality.

Configuration is Hard

Getting OpenClaw to behave properly, in a way that actually helps instead of wasting millions of tokens, is significantly harder than using Claude Code or Codex CLI directly. The configuration surface is large.

And with systems like this, small configuration mistakes don’t fail gracefully. They turn into: loops, redundant work, agents stepping on each other, token burn with little value.

With one agent, you can often get away with slightly vague instructions because you’re supervising closely. With multiple autonomous agents, vague instructions become chaos.

You need to specify very clearly: what each agent should do, what it should not do, how handoffs work, when work should stop, what counts as success.

Where it Shines?

OpenClaw is very good for: prototyping, research, exploring concepts, running parallel experiments. Basically, situations where you want to throw multiple agents at a space of possibilities and see what sticks.

It is also genuinely interesting for INFRASTRUCTURE MONITORING.

For example, you can connect agents to tools like: wrangler (Cloudflare), vercel, gcloud, modal etc. and let them monitor logs and systems every agent heartbeat. A monitoring agent that spots an error, alerts you, and suggests a fix is a very real and useful pattern.

That can lead to: immediate hotfixes, or queued follow-up work in the regular pipeline. This is one of the places where multi-agent orchestration already feels strong.

It is much less compelling for: production applications, precise work in production environments, situations where predictability matters more than exploration. In those cases, the overhead of orchestrating multiple agents can outweigh the benefit.

Bottom Line

OpenClaw absolutely has interesting applications. It may also be better suited to “vibe coding” style workflows where speed, experimentation, and loose exploration matter more than strict control.

But after spending a few hundred million tokens working with it, I do not see it as an upgrade over a manually orchestrated workflow for serious production-grade application development.

The workflow where you direct the models yourself, deciding when to: plan, code, review, switch models, validate fixes, deploy, is still more reliable and more predictable.

Keep OpenClaw on your radar, because autonomous orchestration is clearly part of the future. Just don’t rush to replace a workflow that already works. Automatic orchestration is very likely the next stage of AI-assisted development. But as of today, OpenClaw does not really deliver that experience out of the box.

You still need to invest substantial effort into: configuration, specification, supervision. And quite often, that cancels out a big part of the productivity gain you were hoping automation would provide.

Keep Experimenting

This space changes constantly. Tools evolve. Models improve. Good workflows today may get replaced by better ones surprisingly fast. So the final advice is simple: keep experimenting, stay curious, and share what actually works.

This guide reflects practical experience as of March 2026.