GitHub - g023/g023_agentic_chat: g023's Agentic Chat

16 min read Original article ↗

Version Python License

A single-tool Python-first agentic chat — Using Python as the universal interface. The agent writes and executes Python code to accomplish any task: creating files, running commands, making HTTP requests, debugging, testing, and everything in between. Built as a single-file Python program using vanilla Python and local LLM integration.

Table of Contents

Overview

g023's Agentic Chat is a Python-first agentic development tool that uses a single tool (python_exec) for all operations. Instead of having dozens of specialized tools, the agent writes and executes Python code to accomplish any task — from file I/O to shell commands to web scraping to complex computations.

Philosophy: Python is the universal tool. If you can do it in Python, the agent can do it. No specialized tools needed.

The system is designed as a single-file Python application (agentic_chat.py) that integrates with a local Ollama server for LLM capabilities. It features a single-tool architecture (python_exec), 6-level memory system, global cross-session memory, multi-level reasoning with backtracking, sub-agent system (Planner/Coder/Reviewer), reinforcement learning, and comprehensive user control modes.

Key Features

🐍 Single-Tool Python Architecture (v5.0 — NEW)

  • One Tool: python_exec — the agent writes and executes Python code for everything
  • Universal Interface: File I/O, shell commands, HTTP requests, computation — all through Python
  • Auto-Imports: Common modules (os, sys, json, subprocess, shutil, re, etc.) pre-loaded
  • Foolproof Prompting: Comprehensive examples and patterns teach the agent how to use Python for any task
  • Smart Output: Truncation for large outputs, clean error display with full tracebacks
  • 120s Timeout: Generous execution time for complex tasks (configurable)
  • No Confusion: One tool means the agent never picks the wrong tool

🤖 Multi-Level Reasoning (v5.0 Enhanced)

  • Adaptive Depth: Automatically selects reasoning complexity (Quick → Standard → Deep → Exhaustive)
  • Auto-Escalation: Reasoning depth auto-escalates when working memory indicates complexity
  • Backtracking: Agents can revise their reasoning if it goes off-track
  • Phase-Based Thinking: Structured reasoning with query analysis, decomposition, evaluation, and synthesis

🧠 Agentic Memory System (6 Levels + Global + Self-Reflection)

  • 6-Level Memory: Summary, Conversation, Facts, Episodes, Working, and Identity
  • LLM Compaction: Token-count based compaction with LLM summarization of older entries
  • Global Memory: Cross-session persistent memory for self-improvement over time
  • Memory Pruning: Importance scoring, archiving, and configurable memory budgets
  • Persistent Sessions: Memory survives across conversations and restarts
  • File-Backed Storage: All memory stored as editable markdown files
  • Auto-Save: Session state saved automatically every turn
  • Self-Reflection: Post-turn evaluation feeds into reinforcement learning

🐍 Python-First Execution

  • Protected Workspace: Sandboxed execution in session directories
  • Python Exec: Agent writes Python code to create files, run commands, test code, make requests
  • Shell Access: Via subprocess.run() — pip, git, curl, anything
  • File Operations: Via Python's built-in open(), os, shutil, pathlib
  • HTTP Requests: Via urllib.request or requests (if installed)
  • Robust Parsing: Multi-format tool call parsing optimized for small models
  • User Approval: Configurable approval modes (YOLO, Approve, Guided)

🎮 User Control & Dynamic Configuration

  • Interaction Modes: YOLO (auto-approve), Approve (manual), Guided (step-by-step)
  • Dynamic Config: Runtime-modifiable settings via /config command
  • Command System: 30+ slash commands for session management
  • Task Management: Structured task plans, pause/resume, progress tracking
  • Benchmarking: Benchmark prompts and system messages to improve performance
  • Beautiful UI: ANSI-colored terminal output with progress indicators

🤖 Sub-Agent System

  • PlannerAgent: Task decomposition and step planning specialist
  • CoderAgent: Python-first code writing and testing specialist
  • ReviewerAgent: Code quality verification and testing specialist
  • Agent Memory: Each sub-agent has private memory for specialization
  • Auto-Dispatch: Orchestrator selects appropriate agent based on task classification

🧠 Reinforcement Learning

  • Outcome Tracking: Scores strategy effectiveness per task type
  • EMA Smoothing: Exponential Moving Average for stable learning
  • Strategy Selection: Recommends best approaches based on past success
  • Prompt Evolution: Tracks and improves prompt performance
  • Tool Chain Learning: Learns which tool sequences work best
  • Self-Improvement: Agents adapt based on cumulative experience

Screenshot

screenshot

Architecture

Core Components

agentic_chat.py (Main Application — v5.0)
├── Config Class            # Dynamic runtime configuration
├── UI Class                # Terminal output and formatting
├── LLM Interface           # Wrapper for _new_ollama.py
├── Tool System             # ToolResult, Tool, ToolRegistry (1 tool: python_exec)
├── ReinforcementTracker    # Self-improvement via RL
├── SubAgent System         # PlannerAgent, CoderAgent, ReviewerAgent
├── GlobalMemoryManager     # Cross-session persistent memory
├── MemoryManager           # 6-level memory with compaction & pruning
├── ReasoningEngine         # Multi-phase reasoning with auto-escalation
├── AgenticLoop             # Iterative Python-first execution loop
├── Orchestrator            # Classification, synthesis, state management, agent dispatch
├── AgenticChat             # Main interface and commands
└── Session Management      # Protected workspaces and state persistence

Single-Tool Philosophy (v5.0)

Instead of 37+ specialized tools, v5.0 uses a single tool: python_exec. The agent writes Python code to accomplish any task:

Task How the Agent Does It
Create a file open("file.py", "w").write(code)
Read a file print(open("file.py").read())
Run a script subprocess.run([sys.executable, "file.py"], ...)
Shell commands subprocess.run("pip install X", shell=True, ...)
Search files os.walk() + string matching
HTTP requests urllib.request.urlopen()
File operations os.makedirs(), shutil.move(), os.remove()
Computation Any Python code

Why single-tool?

  • No tool selection confusion for the LLM
  • Python can do literally everything
  • Simpler prompting = more reliable agent behavior
  • Full power of Python standard library available
  • Extensible via pip (agent can install packages itself)

Memory Architecture (v5.0)

The system uses a hierarchical memory system stored as files in session directories, plus a global memory layer:

  • L0 Summary (L0_SUMMARY.md): LLM-compressed historical context
  • L1 Conversation (CONVERSATION.md): Rolling summary of recent exchanges
  • L2 Facts (MEMORY.md): Key facts, preferences, and decisions
  • L3 Episodes (EPISODES.md): Detailed per-turn records with metadata
  • L4 Working (SCRATCH.md): Structured working notes (goal, steps, vars, errors)
  • L5 Identity (GOAL.md, CONTEXT.md): Session goals and context
  • Agent Memory (.agent_<name>_memory.json): Sub-agent private memory (NEW v4.0)
  • Global (agentic_memory/): Cross-session insights, skills, benchmarks, preferences

Session Structure

sessions/
└── YYYYMMDD_HHMMSS_XXXXXX/
    ├── L0_SUMMARY.md      # L0 compressed history
    ├── CONVERSATION.md    # L1 conversation memory
    ├── MEMORY.md          # L2 factual memory
    ├── EPISODES.md        # L3 episode log
    ├── SCRATCH.md         # L4 working memory (structured template)
    ├── GOAL.md            # L5a identity/goal
    ├── CONTEXT.md         # L5b session context
    ├── PLAN.md            # Structured task plan
    ├── PROGRESS.md        # Progress tracking
    ├── TASK_CONTEXT.md    # Pause/resume state
    ├── ARCHIVE.md         # Pruned memories
    └── session_state.json # Serialized state

agentic_memory/                # Global memory (NEW v4.0)
├── INDEX.md               # Master index
├── INSIGHTS.md            # Cross-session lessons
├── USER_PREFS.md          # User preferences
├── BENCHMARKS.md          # Prompt benchmarks
├── SKILLS.md              # Agent capabilities
├── ISSUES.md              # Known issues
├── STRENGTHS.md           # What works well
├── WEAKNESSES.md          # Areas for improvement
└── TOOLS_LOG.md           # Tool usage patterns

Installation

Prerequisites

  • Python 3.8+: Vanilla Python (no external dependencies)
  • Ollama Server: Running locally at http://localhost:11434
  • g023's Qwen3 1.77B Model: Loaded in Ollama

note: you can set the server for Ollama in the _new_ollama.py file at top

note: you can install the 1.77B Qwen3 model with

ollama pull hf.co/g023/Qwen3-1.77B-g023-GGUF:Q8_0

Setup Steps

  1. Clone the repository:

    git clone https://github.com/g023/g023_agentic_chat.git
    cd g023_agentic_chat
  2. Verify Ollama:

    # Ensure Ollama server is running
    curl http://localhost:11434/api/tags

Quick Start

  1. Start the chat:

  2. Set a goal (optional):

    You > /goal Build a Python calculator application
    
  3. Ask questions or give tasks:

    You > Create a simple calculator that can add and subtract numbers
    
  4. Use commands:

    You > /status    # View current state
    You > /memory    # View factual memory
    You > /save      # Save session
    

Usage

Basic Interaction

The system responds to natural language queries and can perform tasks using tools. It automatically classifies queries and selects appropriate reasoning depth.

Example Session

$ python agentic_chat.py

╔══════════════════════════════════════════════════════════════════════════╗
║  G023's AGENTIC CHAT v5.0.0                                              ║
║  Python-First * Reasoning * Memory * Control                             ║
╚══════════════════════════════════════════════════════════════════════════╝

Session: 20260320_143000_abc123
Workspace: /path/to/sessions/20260320_143000_abc123
Thinking: ADAPTIVE | Mode: approve
Type /help for commands

You > Create a fibonacci calculator that prints the first 20 numbers

  ┌─ Analyzing...
  │ Type: code | Complexity: 2 | Tools: yes | Agent: coder

  ▶ Query Analysis
  ▶ Decomposition
  ▶ Critical Evaluation
  ▶ Synthesis

  ✓ Reasoning done: 4 phases, 0 backtracks

  ┌─ AGENTIC EXECUTION LOOP
  │ 🤖 Loop: Iteration 1/20

  🔧 Tool: python_exec

  🐍 Python execution needs approval:
     Code: {'code': 'with open("fibonacci.py", "w") as f:\n    f.write(...'}
  Approve? [y/n/yolo/yolo N]: yolo

  ✓ YOLO mode: ALL code execution auto-approved!
  ✓ Result: Created fibonacci.py

  🤖 Loop: Iteration 2/20
  🔧 Tool: python_exec
  ✓ Result: 0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377,
            610, 987, 1597, 2584, 4181

  ✓ Task complete!

  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
  ANSWER

  I created `fibonacci.py` with a function that generates Fibonacci
  numbers and tested it. The first 20 Fibonacci numbers are:
  0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610,
  987, 1597, 2584, 4181.
  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

You >

How the Agent Works

The agent uses Python for everything. When given a task:

  1. Classify: Determines task type and complexity
  2. Reason: Multi-phase thinking (Quick → Standard → Deep → Exhaustive)
  3. Execute: Writes Python code via python_exec to accomplish the task
  4. Iterate: If something fails, reads the error and fixes it
  5. Synthesize: Summarizes what was done

Example of what the agent does behind the scenes:

# Agent creates a file
TOOL: python_exec
```python
with open("fibonacci.py", "w") as f:
    f.write('''
def fibonacci(n):
    a, b = 0, 1
    for _ in range(n):
        yield a
        a, b = b, a + b

if __name__ == "__main__":
    for num in fibonacci(20):
        print(num, end=", ")
    print()
''')
print("Created fibonacci.py")

Agent tests the file

TOOL: python_exec

import subprocess, sys
r = subprocess.run([sys.executable, "fibonacci.py"], capture_output=True, text=True)
print("Output:", r.stdout)
if r.stderr: print("Errors:", r.stderr)

## Commands

The system supports 30+ slash commands for session management:

### Session Management
- `/save` - Save current session state
- `/load <session_id>` - Load a previous session
- `/sessions` - List all saved sessions
- `/clear` - Clear conversation history

### Memory & Context (6 Levels)
- `/summary` - View L0 historical summary (compressed context)
- `/memory` - View factual memory (L2)
- `/conversation` - View conversation memory (L1)
- `/episodes` - View episode log (L3)
- `/search <query>` - Search across all memory levels
- `/goal [text]` - View/set current goal (L5a)
- `/context` - View session context (L5b)
- `/scratch [text]` - View/write working notes (L4)
- `/archive` - View archived (pruned) memories

### Task Management (v4.0)
- `/tasks` - View structured task plan with subtask statuses
- `/resume` - Resume a paused task from saved context
- `/plan` - View execution plan (raw)
- `/progress` - View progress tracker

### Global Memory (v4.0)
- `/global [category]` - View global cross-session memory
- `/prune` - Manually trigger memory pruning
- `/benchmark [prompt]` - Benchmark a prompt or view benchmark history

### Thinking & Control
- `/think <level>` - Set thinking depth (quick/standard/deep/exhaustive/adaptive)
- `/yolo [N]` - Auto-approve code execution (all or next N rounds)
- `/approve` - Require manual approval for code execution
- `/guided` - Step-by-step reasoning guidance

### Information
- `/status` - Full system status with memory levels
- `/config [key] [value]` - View or set runtime config values
- `/agents` - Show the sub-agent system status
- `/rl` - View reinforcement learning stats
- `/tools` - Show available tool (python_exec)
- `/files` - List workspace files
- `/session` - Current session info
- `/history` - Recent conversation history
- `/verbose` - Toggle LLM output verbosity
- `/help` - Show all commands

## Memory System (v4.0 — 6 Levels + Global + Agent Memory)

The enhanced memory system ensures agents maintain continuity across conversations and learn over time:

### Level 0: Historical Summary (v4.0)
- **File**: `L0_SUMMARY.md`
- **Purpose**: LLM-compressed historical context from older conversations
- **Behavior**: When conversation memory exceeds token threshold (~8000 tokens), older entries are summarized by the LLM and stored here. Always included in agent context.

### Level 1: Conversation Memory
- **File**: `CONVERSATION.md`
- **Purpose**: Rolling summary of recent exchanges
- **Behavior**: Auto-compacts after 20 entries, keeps last 15
- **Example**:
  • [14:30:15] T1 User: What is Python? AI: Python is a programming language.
  • [14:31:22] T2 User: Show me an example AI: print("Hello, World!")

### Level 2: Factual Memory
- **File**: `MEMORY.md`
- **Purpose**: Key facts, preferences, and decisions
- **Behavior**: LLM-extracted important information
- **Example**:
  • [14:30:15] User prefers Python over JavaScript
  • [14:31:22] User is building a calculator app

### Level 3: Episode Memory
- **File**: `EPISODES.md`
- **Purpose**: Detailed per-turn records with metadata
- **Behavior**: Comprehensive log for debugging and analysis
- **Example**:

Turn 1 — 2026-03-20 14:30:15

User: What is Python? Type: question | Complexity: 1 | Think: QUICK AI: Python is a programming language.


### Level 4: Working Memory (v4.0 — Structured)
- **File**: `SCRATCH.md`
- **Purpose**: Structured working notes with template
- **Behavior**: Auto-populated with goal, current step, next step, variables, and errors
- **Template**:

Working Memory

Current Goal

Build a calculator

Current Step

Implementing addition

Next Step

Add subtraction

Variables

(none)

Errors

(none)


### Level 5: Identity Memory
- **Files**: `GOAL.md`, `CONTEXT.md`
- **Purpose**: Session goals and overall context
- **Behavior**: Defines who the agent is and what it's doing

### Agent Memory (v4.0)
- **Files**: `.agent_planner_memory.json`, `.agent_coder_memory.json`, `.agent_reviewer_memory.json`
- **Purpose**: Private per-sub-agent memory for specialization and continuity
- **Behavior**: Each sub-agent persists its own learned details

### Supplemental Memory Files (v4.0)
- **`ARCHIVE.md`**: Pruned low-importance memories (preserved for reference)
- **`TASK_CONTEXT.md`**: Saved task state for pause/resume capability
- **`PLAN.md`**: Structured task decomposition with subtask statuses

### Memory Compaction (v4.0)
When conversation memory exceeds the token threshold (~8000 tokens), the system:
1. Uses the LLM to summarize older conversation entries
2. Stores the summary in `L0_SUMMARY.md`
3. Keeps only the most recent 15 entries in `CONVERSATION.md`
4. The summary is always included in context, preventing loss of important information

### Memory Pruning (v4.0)
The system assigns importance scores to facts using heuristics:
- Higher scores for entries containing project details, preferences, decisions
- Lower scores for trivial entries (greetings, simple acknowledgments)
- Low-importance entries are moved to `ARCHIVE.md`
- Use `/prune` to manually trigger pruning

## Global Memory (v4.0)

Cross-session persistent memory stored in `agentic_memory/`:

| File | Purpose |
|------|---------|
| `INDEX.md` | Master index of all categories |
| `INSIGHTS.md` | Lessons learned across sessions |
| `USER_PREFS.md` | User preferences and patterns |
| `BENCHMARKS.md` | Prompt performance benchmarks |
| `SKILLS.md` | Accumulated agent capabilities |
| `ISSUES.md` | Known issues and workarounds |
| `STRENGTHS.md` | What the agent does well |
| `WEAKNESSES.md` | Areas for improvement |
| `TOOLS_LOG.md` | Tool usage patterns |

Global memory enables the agent to **learn over time** — insights from one session inform future sessions.

## Task Management (v4.0)

### Structured Task Decomposition
When the agent identifies a complex task, it can decompose it into subtasks with statuses:
- **pending** — Not yet started
- **in_progress** — Currently being worked on
- **completed** — Done
- **blocked** — Waiting on something

### Pause/Resume
Tasks can be paused and resumed across sessions:
1. Agent saves task context (goal, current step, variables, errors) to `TASK_CONTEXT.md`
2. Use `/resume` to reload the task context into working memory
3. The agent picks up exactly where it left off

### Structured Working Memory
`SCRATCH.md` uses a template with:
- **Current Goal**: What the agent is trying to accomplish
- **Current Step**: What it's doing right now
- **Next Step**: What comes next
- **Variables**: Key values being tracked
- **Errors**: Any errors encountered

## Tool System (v5.0 — Single Tool)

### python_exec — The Universal Tool

The agent has one tool: `python_exec`. It executes arbitrary Python code in the session workspace with common modules pre-imported (`os`, `sys`, `json`, `subprocess`, `shutil`, `re`, `math`, `pathlib`, `datetime`, `collections`).

| Operation | Python Pattern |
|-----------|---------------|
| Create file | `open("file.py", "w").write(code)` |
| Read file | `print(open("file.py").read())` |
| List directory | `os.listdir(".")` |
| Run script | `subprocess.run([sys.executable, "file.py"], capture_output=True, text=True)` |
| Shell command | `subprocess.run("pip install requests", shell=True, capture_output=True, text=True)` |
| Search files | `os.walk()` + string matching |
| HTTP request | `urllib.request.urlopen(url)` |
| Move/rename | `shutil.move("old.py", "new.py")` |
| Delete file | `os.remove("file.py")` |
| Create dir | `os.makedirs("src/utils", exist_ok=True)` |
| Modify file | Read → replace → write back |

### Tool Call Format

The parser supports multiple formats for maximum compatibility with small models:

Primary format: TOOL: python_exec with code block

TOOL: python_exec

Also works without language tag

TOOL: python_exec

Case-insensitive

Tool: python_exec

Fallback: bare python blocks (when model forgets TOOL:)

import os
print(os.listdir("."))

The parser is case-insensitive and supports deduplication.

## Development

### Code Structure

The entire core application is contained in `agentic_chat.py`. Key design principles:

- **Single File**: All logic in one file for simplicity 
- **Class-Based**: Clean separation of concerns
- **File-Backed**: Memory and state stored as editable files
- **Tool Pattern**: Extensible tool system
- **Session Isolation**: Each session in protected directory
- **Global Learning**: Cross-session memory for self-improvement

### Key Classes

- `AgenticChat`: Main interface and 30+ command dispatcher
- `Config`: Dynamic runtime configuration for agent-tunable settings
- `GlobalMemoryManager`: Cross-session persistent memory
- `MemoryManager`: 6-level memory with compaction, pruning, and task management
- `ReinforcementTracker`: EMA-based learning for prompts, strategies, and tool chains
- `SubAgent`: Base class for specialized Planner/Coder/Reviewer agents
- `ReasoningEngine`: Multi-phase reasoning with auto-escalation
- `AgenticLoop`: Iterative Python-first execution loop
- `Orchestrator`: Query classification, synthesis, and agent dispatch
- `Session`: Protected workspace management
- `ToolRegistry`: Single-tool registry (python_exec)

### Dependencies

- **External**: None (vanilla Python)
- **Local**: `_new_ollama.py` (LLM interface, not included)
- **System**: Ollama server with Qwen3 model

### Configuration

Key settings in `agentic_chat.py`:

```python
VERSION = "5.0.0"
MAX_ITERATIONS = 20        # Agent loop limit (runtime-configurable via Config)
MAX_TOOL_CALLS = 1255      # Tool call limit
PYTHON_EXEC_TIMEOUT = 120  # Python execution timeout (generous for complex tasks)
AUTO_SAVE_INTERVAL = 1     # Save every turn
MEMORY_BUDGET = 16000      # Max chars for assembled memory
COMPACTION_TOKEN_THRESHOLD = 8000  # Trigger LLM compaction
MEMORY_ARCHIVE_THRESHOLD = 0.3     # Archive entries below this importance
MAX_OUTPUT_CHARS = 15000   # Max output chars (truncated intelligently)

Testing

Five test suites with 120 total tests:

# Unit tests (24 tests)
python TESTS/test_unit.py

# v3/v4 compatibility tests (33 tests)
python TESTS/test_v3_upgrade.py

# Integration tests (7 tests — requires Ollama)
python TESTS/test_integration.py

# Parser, prompts, classification, and workflow tests (23 tests)
python TESTS/test_fixes.py

# 100x upgrade tests (33 tests)
python TESTS/test_100x.py

Extending the System

The agent can install packages itself

No need to add tools — the agent can pip install anything via python_exec:

You > Install requests and fetch https://httpbin.org/get

The agent runs: subprocess.run("pip install requests", shell=True, ...) then uses it.

Adding Commands

# In AgenticChat.handle_command()
handlers["/mycommand"] = self._cmd_mycommand

Adding Memory Levels

# In MemoryManager
MEMORY_FILES["new_level"] = "NEW_LEVEL.md"

Contributing

  1. Fork the repository
  2. Create a feature branch: git checkout -b feature/my-feature
  3. Write tests for new functionality
  4. Implement changes with TDD approach
  5. Run tests: python TESTS/test_unit.py && python TESTS/test_v3_upgrade.py && python TESTS/test_fixes.py && python TESTS/test_100x.py && python TESTS/test_integration.py
  6. Commit with clear messages: git commit -m "Add my feature"
  7. Push and create a Pull Request

Guidelines

  • TDD: Write tests before code
  • Single File: Keep all logic in agentic_chat.py
  • Documentation: Update README and knowledge base files
  • Memory: Consider how new features affect the 6-level memory system and sub-agent memory
  • Security: Validate all file operations against workspace root

License

This project is licensed under the MIT License - see the LICENSE file for details.

Credits

Author: g023
Repository: https://github.com/g023/g023_agentic_chat

Built with vanilla Python and local LLM integration. Inspired by agentic AI architectures.