GitHub - evotai/evot: A self-evolving AI coding agent for long-running, complex software engineering.

4 min read Original article ↗

Evot

An agent engine that completes complex, long-running work with minimal tokens and maximum quality.

Benchmark · Why · Install · Quickstart · Dev · Community

export-1776943794877.mp4

Benchmark

Same task, same eval environment, different models. evot completes the work with fewer tokens, less time, and lower cost — across both frontier and open-source models.

Task: Fix a real bug in serde_json (issue #979) — investigate root cause, apply fix, write regression test, verify all tests pass.

Model Metric evot claude-code Difference
Opus 4.6 Cost $2.24 $6.16 64% cheaper
Opus 4.6 Time 2m 56s 3m 51s 24% faster
Opus 4.6 Input tokens 574.8K 1.5M 62% fewer
DeepSeek V4 Pro Cost $0.02 $0.07 67% cheaper
DeepSeek V4 Pro Time 6m 10s 16m 34s 63% faster
DeepSeek V4 Pro Input tokens 42.9K 133.8K 68% fewer

All agents produce correct, passing code. The difference is in how they manage context.

Why is evot faster and cheaper?

Evot's goal: complete tasks fast and well, without wasting a single token. Every design decision serves this — give the LLM less context, but higher quality context.

Other agents accumulate everything and call the LLM to summarize when context overflows — extra tokens, extra latency. Evot uses zero LLM calls for context management:

  • Algorithmic compaction — a four-pass Rust pipeline (Reclaim → Shrink → Collapse → Evict) runs in microseconds between every turn. Images downgrade to path references; old turns collapse to one-line summaries.
  • Spill to disk — large tool results write to disk with a short preview. The model re-reads on demand instead of carrying megabytes in context.
  • Compaction markers — structured metadata (files modified, conclusions, environment state) survives compaction. Progress is never lost.

Fewer tokens, higher signal density. Fast, high-quality task completion — no token wasted.

Quantitative benchmarking against the best. Evot maintains a reproducible eval pipeline that runs the same real-world tasks against Claude Code and Codex (latest versions). Every engine change is validated against these baselines — token usage, cost, time, and task success rate must improve or hold. This ensures continuous improvement without regression.

📢 News

  • 2026-05-30 [Engine] Major refactor — four-pass compaction pipeline, pi-aligned tools with parallel execution, leaner core. Not backward-compatible; start a new session.
  • 2026-05-17 [REPL] /goal — autonomous objectives, e.g. /goal remove unwraps in Rust context compaction. (removed — the agent loop handles multi-step tasks natively)
  • 2026-05-11 [Skills] Built-in opencli — control the browser, use logged-in cookies, read Feishu/Lark messages, Twitter/X timelines, and more.
  • 2026-05-11 [Slim] Tool outputs now auto-compact, with token savings shown inline.
  • 2026-05-08 [REPL] /harden — stress-test plans and git changes before shipping. Inspired by @cjzafir.
  • 2026-05-02 [Skills] Builtin skill support — review ships built-in, no install needed.
  • 2026-04-28 [Image] Resize, preserve through compaction, persist to disk.
  • 2026-04-23 [Search] Full-text session search — /resume <query> to find any past conversation.
  • 2026-04-18 [REPL] /history + /goto — time-travel through conversation context.

Installation

One-liner (recommended)

curl -fsSL https://evot.ai/install | sh

From source

git clone https://github.com/evotai/evot.git
cd evot
make setup && make install
evot

Quickstart

1. Set your API key

Create ~/.evotai/evot.env:

# Anthropic (default)
EVOT_LLM_ANTHROPIC_API_KEY=sk-ant-...
EVOT_LLM_ANTHROPIC_BASE_URL=your-anthropic-base-url
EVOT_LLM_ANTHROPIC_MODEL=claude-opus-4-6
# Multiple models: EVOT_LLM_ANTHROPIC_MODEL=claude-sonnet-4-6,claude-opus-4-6

# Or OpenAI
# EVOT_LLM_OPENAI_API_KEY=sk-...
# EVOT_LLM_OPENAI_BASE_URL=your-openai-base-url/v1
# EVOT_LLM_OPENAI_MODEL=gpt-5.5

# Or DeepSeek (Anthropic-compatible)
# EVOT_LLM_DEEPSEEK_API_KEY=sk-...
# EVOT_LLM_DEEPSEEK_BASE_URL=https://api.deepseek.com/anthropic
# EVOT_LLM_DEEPSEEK_PROTOCOL=anthropic
# EVOT_LLM_DEEPSEEK_MODEL=deepseek-v4-pro

# Or Xiaomi MiMo-V2.5-Pro (Anthropic-compatible)
# EVOT_LLM_XIAOMI_API_KEY=tp-...
# EVOT_LLM_XIAOMI_BASE_URL=https://token-plan-cn.xiaomimimo.com/anthropic
# EVOT_LLM_XIAOMI_PROTOCOL=anthropic
# EVOT_LLM_XIAOMI_MODEL=mimo-v2.5-pro

Use --model provider:model for one-off overrides.

2. Run

evot                                          # interactive REPL
evot -p "summarize today's PRs"               # one-shot task
evot -p "review this" -f ./src/main.rs        # attach file context
evot -p "continue work" -c                   # continue latest session in cwd
evot -p "continue work" -r my-session         # resume or create session
CLI flags & options
Flag Description
-p, --prompt Run a single prompt and exit
-f, --file <path> Attach file/directory context
-c, --continue Continue the latest session in the current directory
-r, --resume <id> Resume or create a session
--model <model> Override the configured model
--verbose Enable info-level logging

Development

make setup        # install Rust toolchain, git hooks
make test         # all tests (engine + CLI)
make install      # compile standalone binary to ~/.evotai/bin/evot

Community

License

Apache-2.0