A single-tool Python-first agentic chat — Using Python as the universal interface. The agent writes and executes Python code to accomplish any task: creating files, running commands, making HTTP requests, debugging, testing, and everything in between. Built as a single-file Python program using vanilla Python and local LLM integration.
Table of Contents
- Overview
- Key Features
- Architecture
- Installation
- Quick Start
- Usage
- Commands
- Memory System
- Global Memory
- Task Management
- Testing
- Development
- Contributing
- License
Overview
g023's Agentic Chat is a Python-first agentic development tool that uses a single tool (python_exec) for all operations. Instead of having dozens of specialized tools, the agent writes and executes Python code to accomplish any task — from file I/O to shell commands to web scraping to complex computations.
Philosophy: Python is the universal tool. If you can do it in Python, the agent can do it. No specialized tools needed.
The system is designed as a single-file Python application (agentic_chat.py) that integrates with a local Ollama server for LLM capabilities. It features a single-tool architecture (python_exec), 6-level memory system, global cross-session memory, multi-level reasoning with backtracking, sub-agent system (Planner/Coder/Reviewer), reinforcement learning, and comprehensive user control modes.
Key Features
🐍 Single-Tool Python Architecture (v5.0 — NEW)
- One Tool:
python_exec— the agent writes and executes Python code for everything - Universal Interface: File I/O, shell commands, HTTP requests, computation — all through Python
- Auto-Imports: Common modules (os, sys, json, subprocess, shutil, re, etc.) pre-loaded
- Foolproof Prompting: Comprehensive examples and patterns teach the agent how to use Python for any task
- Smart Output: Truncation for large outputs, clean error display with full tracebacks
- 120s Timeout: Generous execution time for complex tasks (configurable)
- No Confusion: One tool means the agent never picks the wrong tool
🤖 Multi-Level Reasoning (v5.0 Enhanced)
- Adaptive Depth: Automatically selects reasoning complexity (Quick → Standard → Deep → Exhaustive)
- Auto-Escalation: Reasoning depth auto-escalates when working memory indicates complexity
- Backtracking: Agents can revise their reasoning if it goes off-track
- Phase-Based Thinking: Structured reasoning with query analysis, decomposition, evaluation, and synthesis
🧠 Agentic Memory System (6 Levels + Global + Self-Reflection)
- 6-Level Memory: Summary, Conversation, Facts, Episodes, Working, and Identity
- LLM Compaction: Token-count based compaction with LLM summarization of older entries
- Global Memory: Cross-session persistent memory for self-improvement over time
- Memory Pruning: Importance scoring, archiving, and configurable memory budgets
- Persistent Sessions: Memory survives across conversations and restarts
- File-Backed Storage: All memory stored as editable markdown files
- Auto-Save: Session state saved automatically every turn
- Self-Reflection: Post-turn evaluation feeds into reinforcement learning
🐍 Python-First Execution
- Protected Workspace: Sandboxed execution in session directories
- Python Exec: Agent writes Python code to create files, run commands, test code, make requests
- Shell Access: Via
subprocess.run()— pip, git, curl, anything - File Operations: Via Python's built-in
open(),os,shutil,pathlib - HTTP Requests: Via
urllib.requestorrequests(if installed) - Robust Parsing: Multi-format tool call parsing optimized for small models
- User Approval: Configurable approval modes (YOLO, Approve, Guided)
🎮 User Control & Dynamic Configuration
- Interaction Modes: YOLO (auto-approve), Approve (manual), Guided (step-by-step)
- Dynamic Config: Runtime-modifiable settings via
/configcommand - Command System: 30+ slash commands for session management
- Task Management: Structured task plans, pause/resume, progress tracking
- Benchmarking: Benchmark prompts and system messages to improve performance
- Beautiful UI: ANSI-colored terminal output with progress indicators
🤖 Sub-Agent System
- PlannerAgent: Task decomposition and step planning specialist
- CoderAgent: Python-first code writing and testing specialist
- ReviewerAgent: Code quality verification and testing specialist
- Agent Memory: Each sub-agent has private memory for specialization
- Auto-Dispatch: Orchestrator selects appropriate agent based on task classification
🧠 Reinforcement Learning
- Outcome Tracking: Scores strategy effectiveness per task type
- EMA Smoothing: Exponential Moving Average for stable learning
- Strategy Selection: Recommends best approaches based on past success
- Prompt Evolution: Tracks and improves prompt performance
- Tool Chain Learning: Learns which tool sequences work best
- Self-Improvement: Agents adapt based on cumulative experience
Screenshot
Architecture
Core Components
agentic_chat.py (Main Application — v5.0)
├── Config Class # Dynamic runtime configuration
├── UI Class # Terminal output and formatting
├── LLM Interface # Wrapper for _new_ollama.py
├── Tool System # ToolResult, Tool, ToolRegistry (1 tool: python_exec)
├── ReinforcementTracker # Self-improvement via RL
├── SubAgent System # PlannerAgent, CoderAgent, ReviewerAgent
├── GlobalMemoryManager # Cross-session persistent memory
├── MemoryManager # 6-level memory with compaction & pruning
├── ReasoningEngine # Multi-phase reasoning with auto-escalation
├── AgenticLoop # Iterative Python-first execution loop
├── Orchestrator # Classification, synthesis, state management, agent dispatch
├── AgenticChat # Main interface and commands
└── Session Management # Protected workspaces and state persistence
Single-Tool Philosophy (v5.0)
Instead of 37+ specialized tools, v5.0 uses a single tool: python_exec. The agent writes Python code to accomplish any task:
| Task | How the Agent Does It |
|---|---|
| Create a file | open("file.py", "w").write(code) |
| Read a file | print(open("file.py").read()) |
| Run a script | subprocess.run([sys.executable, "file.py"], ...) |
| Shell commands | subprocess.run("pip install X", shell=True, ...) |
| Search files | os.walk() + string matching |
| HTTP requests | urllib.request.urlopen() |
| File operations | os.makedirs(), shutil.move(), os.remove() |
| Computation | Any Python code |
Why single-tool?
- No tool selection confusion for the LLM
- Python can do literally everything
- Simpler prompting = more reliable agent behavior
- Full power of Python standard library available
- Extensible via pip (agent can install packages itself)
Memory Architecture (v5.0)
The system uses a hierarchical memory system stored as files in session directories, plus a global memory layer:
- L0 Summary (
L0_SUMMARY.md): LLM-compressed historical context - L1 Conversation (
CONVERSATION.md): Rolling summary of recent exchanges - L2 Facts (
MEMORY.md): Key facts, preferences, and decisions - L3 Episodes (
EPISODES.md): Detailed per-turn records with metadata - L4 Working (
SCRATCH.md): Structured working notes (goal, steps, vars, errors) - L5 Identity (
GOAL.md,CONTEXT.md): Session goals and context - Agent Memory (
.agent_<name>_memory.json): Sub-agent private memory (NEW v4.0) - Global (
agentic_memory/): Cross-session insights, skills, benchmarks, preferences
Session Structure
sessions/
└── YYYYMMDD_HHMMSS_XXXXXX/
├── L0_SUMMARY.md # L0 compressed history
├── CONVERSATION.md # L1 conversation memory
├── MEMORY.md # L2 factual memory
├── EPISODES.md # L3 episode log
├── SCRATCH.md # L4 working memory (structured template)
├── GOAL.md # L5a identity/goal
├── CONTEXT.md # L5b session context
├── PLAN.md # Structured task plan
├── PROGRESS.md # Progress tracking
├── TASK_CONTEXT.md # Pause/resume state
├── ARCHIVE.md # Pruned memories
└── session_state.json # Serialized state
agentic_memory/ # Global memory (NEW v4.0)
├── INDEX.md # Master index
├── INSIGHTS.md # Cross-session lessons
├── USER_PREFS.md # User preferences
├── BENCHMARKS.md # Prompt benchmarks
├── SKILLS.md # Agent capabilities
├── ISSUES.md # Known issues
├── STRENGTHS.md # What works well
├── WEAKNESSES.md # Areas for improvement
└── TOOLS_LOG.md # Tool usage patterns
Installation
Prerequisites
- Python 3.8+: Vanilla Python (no external dependencies)
- Ollama Server: Running locally at
http://localhost:11434 - g023's Qwen3 1.77B Model: Loaded in Ollama
note: you can set the server for Ollama in the _new_ollama.py file at top
note: you can install the 1.77B Qwen3 model with
ollama pull hf.co/g023/Qwen3-1.77B-g023-GGUF:Q8_0
Setup Steps
-
Clone the repository:
git clone https://github.com/g023/g023_agentic_chat.git cd g023_agentic_chat -
Verify Ollama:
# Ensure Ollama server is running curl http://localhost:11434/api/tags
Quick Start
-
Start the chat:
-
Set a goal (optional):
You > /goal Build a Python calculator application -
Ask questions or give tasks:
You > Create a simple calculator that can add and subtract numbers -
Use commands:
You > /status # View current state You > /memory # View factual memory You > /save # Save session
Usage
Basic Interaction
The system responds to natural language queries and can perform tasks using tools. It automatically classifies queries and selects appropriate reasoning depth.
Example Session
$ python agentic_chat.py
╔══════════════════════════════════════════════════════════════════════════╗
║ G023's AGENTIC CHAT v5.0.0 ║
║ Python-First * Reasoning * Memory * Control ║
╚══════════════════════════════════════════════════════════════════════════╝
Session: 20260320_143000_abc123
Workspace: /path/to/sessions/20260320_143000_abc123
Thinking: ADAPTIVE | Mode: approve
Type /help for commands
You > Create a fibonacci calculator that prints the first 20 numbers
┌─ Analyzing...
│ Type: code | Complexity: 2 | Tools: yes | Agent: coder
▶ Query Analysis
▶ Decomposition
▶ Critical Evaluation
▶ Synthesis
✓ Reasoning done: 4 phases, 0 backtracks
┌─ AGENTIC EXECUTION LOOP
│ 🤖 Loop: Iteration 1/20
🔧 Tool: python_exec
🐍 Python execution needs approval:
Code: {'code': 'with open("fibonacci.py", "w") as f:\n f.write(...'}
Approve? [y/n/yolo/yolo N]: yolo
✓ YOLO mode: ALL code execution auto-approved!
✓ Result: Created fibonacci.py
🤖 Loop: Iteration 2/20
🔧 Tool: python_exec
✓ Result: 0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377,
610, 987, 1597, 2584, 4181
✓ Task complete!
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
ANSWER
I created `fibonacci.py` with a function that generates Fibonacci
numbers and tested it. The first 20 Fibonacci numbers are:
0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610,
987, 1597, 2584, 4181.
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
You >
How the Agent Works
The agent uses Python for everything. When given a task:
- Classify: Determines task type and complexity
- Reason: Multi-phase thinking (Quick → Standard → Deep → Exhaustive)
- Execute: Writes Python code via
python_execto accomplish the task - Iterate: If something fails, reads the error and fixes it
- Synthesize: Summarizes what was done
Example of what the agent does behind the scenes:
# Agent creates a file TOOL: python_exec ```python with open("fibonacci.py", "w") as f: f.write(''' def fibonacci(n): a, b = 0, 1 for _ in range(n): yield a a, b = b, a + b if __name__ == "__main__": for num in fibonacci(20): print(num, end=", ") print() ''') print("Created fibonacci.py")
Agent tests the file
TOOL: python_exec
import subprocess, sys r = subprocess.run([sys.executable, "fibonacci.py"], capture_output=True, text=True) print("Output:", r.stdout) if r.stderr: print("Errors:", r.stderr)
## Commands
The system supports 30+ slash commands for session management:
### Session Management
- `/save` - Save current session state
- `/load <session_id>` - Load a previous session
- `/sessions` - List all saved sessions
- `/clear` - Clear conversation history
### Memory & Context (6 Levels)
- `/summary` - View L0 historical summary (compressed context)
- `/memory` - View factual memory (L2)
- `/conversation` - View conversation memory (L1)
- `/episodes` - View episode log (L3)
- `/search <query>` - Search across all memory levels
- `/goal [text]` - View/set current goal (L5a)
- `/context` - View session context (L5b)
- `/scratch [text]` - View/write working notes (L4)
- `/archive` - View archived (pruned) memories
### Task Management (v4.0)
- `/tasks` - View structured task plan with subtask statuses
- `/resume` - Resume a paused task from saved context
- `/plan` - View execution plan (raw)
- `/progress` - View progress tracker
### Global Memory (v4.0)
- `/global [category]` - View global cross-session memory
- `/prune` - Manually trigger memory pruning
- `/benchmark [prompt]` - Benchmark a prompt or view benchmark history
### Thinking & Control
- `/think <level>` - Set thinking depth (quick/standard/deep/exhaustive/adaptive)
- `/yolo [N]` - Auto-approve code execution (all or next N rounds)
- `/approve` - Require manual approval for code execution
- `/guided` - Step-by-step reasoning guidance
### Information
- `/status` - Full system status with memory levels
- `/config [key] [value]` - View or set runtime config values
- `/agents` - Show the sub-agent system status
- `/rl` - View reinforcement learning stats
- `/tools` - Show available tool (python_exec)
- `/files` - List workspace files
- `/session` - Current session info
- `/history` - Recent conversation history
- `/verbose` - Toggle LLM output verbosity
- `/help` - Show all commands
## Memory System (v4.0 — 6 Levels + Global + Agent Memory)
The enhanced memory system ensures agents maintain continuity across conversations and learn over time:
### Level 0: Historical Summary (v4.0)
- **File**: `L0_SUMMARY.md`
- **Purpose**: LLM-compressed historical context from older conversations
- **Behavior**: When conversation memory exceeds token threshold (~8000 tokens), older entries are summarized by the LLM and stored here. Always included in agent context.
### Level 1: Conversation Memory
- **File**: `CONVERSATION.md`
- **Purpose**: Rolling summary of recent exchanges
- **Behavior**: Auto-compacts after 20 entries, keeps last 15
- **Example**:
- [14:30:15] T1 User: What is Python? AI: Python is a programming language.
- [14:31:22] T2 User: Show me an example AI: print("Hello, World!")
### Level 2: Factual Memory
- **File**: `MEMORY.md`
- **Purpose**: Key facts, preferences, and decisions
- **Behavior**: LLM-extracted important information
- **Example**:
- [14:30:15] User prefers Python over JavaScript
- [14:31:22] User is building a calculator app
### Level 3: Episode Memory
- **File**: `EPISODES.md`
- **Purpose**: Detailed per-turn records with metadata
- **Behavior**: Comprehensive log for debugging and analysis
- **Example**:
Turn 1 — 2026-03-20 14:30:15
User: What is Python? Type: question | Complexity: 1 | Think: QUICK AI: Python is a programming language.
### Level 4: Working Memory (v4.0 — Structured)
- **File**: `SCRATCH.md`
- **Purpose**: Structured working notes with template
- **Behavior**: Auto-populated with goal, current step, next step, variables, and errors
- **Template**:
Working Memory
Current Goal
Build a calculator
Current Step
Implementing addition
Next Step
Add subtraction
Variables
(none)
Errors
(none)
### Level 5: Identity Memory
- **Files**: `GOAL.md`, `CONTEXT.md`
- **Purpose**: Session goals and overall context
- **Behavior**: Defines who the agent is and what it's doing
### Agent Memory (v4.0)
- **Files**: `.agent_planner_memory.json`, `.agent_coder_memory.json`, `.agent_reviewer_memory.json`
- **Purpose**: Private per-sub-agent memory for specialization and continuity
- **Behavior**: Each sub-agent persists its own learned details
### Supplemental Memory Files (v4.0)
- **`ARCHIVE.md`**: Pruned low-importance memories (preserved for reference)
- **`TASK_CONTEXT.md`**: Saved task state for pause/resume capability
- **`PLAN.md`**: Structured task decomposition with subtask statuses
### Memory Compaction (v4.0)
When conversation memory exceeds the token threshold (~8000 tokens), the system:
1. Uses the LLM to summarize older conversation entries
2. Stores the summary in `L0_SUMMARY.md`
3. Keeps only the most recent 15 entries in `CONVERSATION.md`
4. The summary is always included in context, preventing loss of important information
### Memory Pruning (v4.0)
The system assigns importance scores to facts using heuristics:
- Higher scores for entries containing project details, preferences, decisions
- Lower scores for trivial entries (greetings, simple acknowledgments)
- Low-importance entries are moved to `ARCHIVE.md`
- Use `/prune` to manually trigger pruning
## Global Memory (v4.0)
Cross-session persistent memory stored in `agentic_memory/`:
| File | Purpose |
|------|---------|
| `INDEX.md` | Master index of all categories |
| `INSIGHTS.md` | Lessons learned across sessions |
| `USER_PREFS.md` | User preferences and patterns |
| `BENCHMARKS.md` | Prompt performance benchmarks |
| `SKILLS.md` | Accumulated agent capabilities |
| `ISSUES.md` | Known issues and workarounds |
| `STRENGTHS.md` | What the agent does well |
| `WEAKNESSES.md` | Areas for improvement |
| `TOOLS_LOG.md` | Tool usage patterns |
Global memory enables the agent to **learn over time** — insights from one session inform future sessions.
## Task Management (v4.0)
### Structured Task Decomposition
When the agent identifies a complex task, it can decompose it into subtasks with statuses:
- **pending** — Not yet started
- **in_progress** — Currently being worked on
- **completed** — Done
- **blocked** — Waiting on something
### Pause/Resume
Tasks can be paused and resumed across sessions:
1. Agent saves task context (goal, current step, variables, errors) to `TASK_CONTEXT.md`
2. Use `/resume` to reload the task context into working memory
3. The agent picks up exactly where it left off
### Structured Working Memory
`SCRATCH.md` uses a template with:
- **Current Goal**: What the agent is trying to accomplish
- **Current Step**: What it's doing right now
- **Next Step**: What comes next
- **Variables**: Key values being tracked
- **Errors**: Any errors encountered
## Tool System (v5.0 — Single Tool)
### python_exec — The Universal Tool
The agent has one tool: `python_exec`. It executes arbitrary Python code in the session workspace with common modules pre-imported (`os`, `sys`, `json`, `subprocess`, `shutil`, `re`, `math`, `pathlib`, `datetime`, `collections`).
| Operation | Python Pattern |
|-----------|---------------|
| Create file | `open("file.py", "w").write(code)` |
| Read file | `print(open("file.py").read())` |
| List directory | `os.listdir(".")` |
| Run script | `subprocess.run([sys.executable, "file.py"], capture_output=True, text=True)` |
| Shell command | `subprocess.run("pip install requests", shell=True, capture_output=True, text=True)` |
| Search files | `os.walk()` + string matching |
| HTTP request | `urllib.request.urlopen(url)` |
| Move/rename | `shutil.move("old.py", "new.py")` |
| Delete file | `os.remove("file.py")` |
| Create dir | `os.makedirs("src/utils", exist_ok=True)` |
| Modify file | Read → replace → write back |
### Tool Call Format
The parser supports multiple formats for maximum compatibility with small models:
Primary format: TOOL: python_exec with code block
TOOL: python_exec
Also works without language tag
TOOL: python_exec
Case-insensitive
Tool: python_exec
Fallback: bare python blocks (when model forgets TOOL:)
import os print(os.listdir("."))
The parser is case-insensitive and supports deduplication.
## Development
### Code Structure
The entire core application is contained in `agentic_chat.py`. Key design principles:
- **Single File**: All logic in one file for simplicity
- **Class-Based**: Clean separation of concerns
- **File-Backed**: Memory and state stored as editable files
- **Tool Pattern**: Extensible tool system
- **Session Isolation**: Each session in protected directory
- **Global Learning**: Cross-session memory for self-improvement
### Key Classes
- `AgenticChat`: Main interface and 30+ command dispatcher
- `Config`: Dynamic runtime configuration for agent-tunable settings
- `GlobalMemoryManager`: Cross-session persistent memory
- `MemoryManager`: 6-level memory with compaction, pruning, and task management
- `ReinforcementTracker`: EMA-based learning for prompts, strategies, and tool chains
- `SubAgent`: Base class for specialized Planner/Coder/Reviewer agents
- `ReasoningEngine`: Multi-phase reasoning with auto-escalation
- `AgenticLoop`: Iterative Python-first execution loop
- `Orchestrator`: Query classification, synthesis, and agent dispatch
- `Session`: Protected workspace management
- `ToolRegistry`: Single-tool registry (python_exec)
### Dependencies
- **External**: None (vanilla Python)
- **Local**: `_new_ollama.py` (LLM interface, not included)
- **System**: Ollama server with Qwen3 model
### Configuration
Key settings in `agentic_chat.py`:
```python
VERSION = "5.0.0"
MAX_ITERATIONS = 20 # Agent loop limit (runtime-configurable via Config)
MAX_TOOL_CALLS = 1255 # Tool call limit
PYTHON_EXEC_TIMEOUT = 120 # Python execution timeout (generous for complex tasks)
AUTO_SAVE_INTERVAL = 1 # Save every turn
MEMORY_BUDGET = 16000 # Max chars for assembled memory
COMPACTION_TOKEN_THRESHOLD = 8000 # Trigger LLM compaction
MEMORY_ARCHIVE_THRESHOLD = 0.3 # Archive entries below this importance
MAX_OUTPUT_CHARS = 15000 # Max output chars (truncated intelligently)
Testing
Five test suites with 120 total tests:
# Unit tests (24 tests) python TESTS/test_unit.py # v3/v4 compatibility tests (33 tests) python TESTS/test_v3_upgrade.py # Integration tests (7 tests — requires Ollama) python TESTS/test_integration.py # Parser, prompts, classification, and workflow tests (23 tests) python TESTS/test_fixes.py # 100x upgrade tests (33 tests) python TESTS/test_100x.py
Extending the System
The agent can install packages itself
No need to add tools — the agent can pip install anything via python_exec:
You > Install requests and fetch https://httpbin.org/get
The agent runs: subprocess.run("pip install requests", shell=True, ...) then uses it.
Adding Commands
# In AgenticChat.handle_command() handlers["/mycommand"] = self._cmd_mycommand
Adding Memory Levels
# In MemoryManager MEMORY_FILES["new_level"] = "NEW_LEVEL.md"
Contributing
- Fork the repository
- Create a feature branch:
git checkout -b feature/my-feature - Write tests for new functionality
- Implement changes with TDD approach
- Run tests:
python TESTS/test_unit.py && python TESTS/test_v3_upgrade.py && python TESTS/test_fixes.py && python TESTS/test_100x.py && python TESTS/test_integration.py - Commit with clear messages:
git commit -m "Add my feature" - Push and create a Pull Request
Guidelines
- TDD: Write tests before code
- Single File: Keep all logic in
agentic_chat.py - Documentation: Update README and knowledge base files
- Memory: Consider how new features affect the 6-level memory system and sub-agent memory
- Security: Validate all file operations against workspace root
License
This project is licensed under the MIT License - see the LICENSE file for details.
Credits
Author: g023
Repository: https://github.com/g023/g023_agentic_chat
Built with vanilla Python and local LLM integration. Inspired by agentic AI architectures.
