Screencast.From.2026-04-13.14-40-54.mp4
agentic coding in 30 loc. a loop, two tools, and an llm.
features
- works with any OpenAI-compatible API: OpenAI, ollama, lmstudio, litellm, vllm, local models
bashtool gives full system access: git, curl, compilers, file I/O (cat,sed -i, heredocs); optionaltimeout=<ms>andbg=truthyfor background tasksskilltool loads markdown playbooks fromskills/and~/.agents/skills/(auto-advertised in system prompt)- bundled skills:
plan,tasks,delegate,explore,refactor,review,verify,debug,tdd,new-skill,self - modular tools: add new tools by dropping
.mjsfiles intools/(auto-discovered at startup) - self-extending: agent can write its own tools via the
selfskill - recursive agents: tools can spawn sub-agents by calling
mias a child process - automatic
AGENTS.mdingestion from current directory for repo-specific context - non-interactive mode with
-p 'prompt'for scripting and CI - stdin pipes:
echo "do this" | miorcat file | mi - file context via
-f <file>argument - chat REPL with
/resetcommand and error recovery - streaming output (SSE) — tokens appear as they arrive
- graceful
SIGINThandling for bash child processes
install
# run directly npx @avcodes/mi # or install globally npm i -g @avcodes/mi mi
usage
# interactive repl (type /reset to clear history) OPENAI_API_KEY=sk-... mi # one-shot (run once, exit) mi -p 'refactor auth.js to use bcrypt' # load additional context from a file mi -f error.log -p 'why is this crashing?' # pipe stdin to the agent echo "write a python script that prints hello world" | mi # local models via any openai-compatible api MODEL=qwen3.5:4b OPENAI_BASE_URL=http://localhost:33821 mi
env
| var | default | what |
|---|---|---|
OPENAI_API_KEY |
(none) | api key |
OPENAI_BASE_URL |
https://api.openai.com |
api base url (ollama, lmstudio, litellm, etc) |
MODEL |
gpt-5.4 |
model name |
SYSTEM_PROMPT |
built-in agent prompt | override the system prompt entirely |
deep dive
an agentic harness is surprisingly simple. it's a loop that calls an llm, checks if it wants to use tools, executes them, feeds results back, and repeats. here's how each part works.
tools
the agent needs to affect the outside world. tools are just functions that take structured args and return a string. each tool lives in tools/<name>.mjs and exports name, description, parameters, and handler:
// tools/bash.mjs export default { name: 'bash', description: '...', parameters: {...}, handler: ({command, timeout, bg}) => { // run shell command, return output }};
the harness auto-discovers tools at startup by scanning tools/*.mjs. two tools ship by default:
bashgives the agent access to the entire system: git, curl, compilers, package managers, and file I/O (viacat,sed -n,sed -i, heredocs; the system prompt teaches the patterns). optionaltimeout=<ms>kills the process after the given delay and resolves with[timeout]. optionalbg=truthyruns the command detached and returnspid:X log:/tmp/mi-*.logimmediately.skillgives the agent specialized workflows loaded on demand from markdown playbooks in bundledskills/or~/.agents/skills/.
every tool returns a string because that's what goes back into the conversation.
tool definitions
the llm doesn't see your functions. it sees json schemas that describe what tools are available and what arguments they accept. each tool module exports these directly:
// tools/bash.mjs export default { name: 'bash', description: 'run bash cmd', parameters: { type: 'object', properties: { command: { type: 'string' } }, required: ['command'] }, handler: ... };
the harness builds the tools array from all discovered modules and sends it with every api call so the model knows what it can do.
messages
the conversation is a flat array of message objects. each message has a role (system, user, assistant, or tool) and content. this array is the agent's entire memory:
const hist = [{ role: 'system', content: SYSTEM }]; // user says something hist.push({ role: 'user', content: 'fix the bug in server.js' }); // assistant replies (pushed inside the loop) // tool results get pushed too (role: 'tool')
the system message sets the agent's personality and context (working directory, date). every user message, assistant response, and tool result gets appended. the model sees the full history on each call, which is how it maintains context across multiple tool uses.
the api call
each iteration makes a single call to the chat completions endpoint. the model receives the full message history and the tool definitions, and we ask for an SSE stream so tokens arrive incrementally:
const res = await fetch(`${base}/v1/chat/completions`, { method: 'POST', headers: { 'Content-Type': 'application/json', Authorization: `Bearer ${key}` }, body: JSON.stringify({ model, messages: msgs, tools: defs, stream: true }), }); // iterate res.body, parse `data: {...}` events, accumulate deltas into one message
the stream emits delta chunks: delta.content is partial text (write straight to stdout as it arrives), delta.tool_calls[i] are partial tool-call fragments (id/name first, then arguments in pieces; merge by index). once [DONE] arrives, the assembled message either has content (a text reply) or tool_calls (the model wants to use tools). this is the decision point that drives the whole loop.
the agentic loop
this is the core of the harness. it's a while (true) that keeps calling the llm until it responds with text instead of tool calls:
async function run(msgs) { while (true) { const msg = await streamLLM(msgs); // stream tokens to stdout, return assembled message msgs.push(msg); // add assistant response to history if (!msg.tool_calls) return; // no tools? we're done (text already streamed) // otherwise, execute tools and continue... } }
the loop exits only when the model decides it has enough information to respond directly. the model might call tools once or twenty times, it drives its own execution. this is what makes it agentic: the llm decides when it's done, not the code. note that text content is written to stdout during the stream, so run() doesn't return it; the user already saw it.
tool execution
when the model returns tool_calls, the harness executes each one and pushes the result back into the message history as a tool message:
for (const t of msg.tool_calls) { const { name } = t.function; const args = JSON.parse(t.function.arguments); const result = String(await tools[name](args)); msgs.push({ role: 'tool', tool_call_id: t.id, content: result }); }
each tool result is tagged with the tool_call_id so the model knows which call it corresponds to. after all tool results are pushed, the loop goes back to the top and calls the llm again, now with the tool outputs in context.
the repl
the outer shell is a simple read-eval-print loop. it reads user input, pushes it as a user message, and calls run(), which streams the response to stdout itself:
while (true) { const input = await ask('\n> '); if (input.trim()) { hist.push({ role: 'user', content: input }); try { await run(hist); } catch (e) { console.error('✗ ' + e.message); hist.pop(); } } }
there's also a one-shot mode (-p 'prompt') that skips the repl and exits after a single run. both modes use the same run() function. streaming works the same way; tokens just go to a piped stdout instead of a terminal. the agentic loop doesn't care where the prompt came from.
putting it together
the full flow looks like this:
user prompt → [system, user] → llm → tool_calls? → execute tools → [tool results] → llm → ... → text response
more sophisticated agents add things like memory, retries, parallel tool calls, or multi-agent delegation, but the core is always: loop, call, check for tools, execute, repeat.
