GitHub - RoXsaita/NumbyAI-Public: AI-powered personal finance transaction categorizer

Local-first AI finance categorizer. Your bank data never leaves your machine.

Upload any bank statement CSV and NumbyAI automatically detects the format, categorizes every transaction using a local LLM (Ollama), and surfaces patterns across months. No cloud. No subscriptions. No data sharing.

Demo

NumbyAI Demo: Watch on YouTube →

Upload & AI Categorization

Drop a bank statement CSV and watch NumbyAI auto-detect columns and categorize every transaction using a local LLM.

Dashboard

Track spending by category, monitor budgets, and analyze cash flow trends — all from a single view.

Rule Advisor

The Rule Advisor analyzes your categorization patterns and suggests reusable rules, making future uploads instant.

What it does

Drop any CSV — the heuristic parser detects metadata rows, column layout, date format, currency, and number format automatically, without you mapping anything. Falls back to the LLM when confidence is low.
Rule engine runs first — saved patterns (regex, bank-specific, amount filters) categorize known transactions instantly, without touching the LLM.
LLM handles the rest — remaining transactions are batched and sent to Ollama in parallel workers. Confident results are committed; ambiguous ones go to the review queue.
Review queue — flag-and-resolve UI with bulk select, conflict detection, and one-click rule creation from any pattern.
Rule analysis — analyzes all your transactions for inconsistencies and suggests new rules to clean up historical data.
Dashboard — category breakdowns, month-over-month trends, cash flow, budget vs actual.

Statement Parser — Technical Overview

Most tools require you to manually map columns. NumbyAI's heuristic engine handles the messy reality of real-world bank exports:

What gets auto-detected

Signal	How
Metadata preamble rows	Scans from top, finds first row with both a date and a numeric value — everything above is skipped
Column roles	Scores each column independently (date density, numeric density, text length, emptiness)
Inflow/Outflow split	Detects adjacent complementary numeric columns (one empty when the other isn't) — common in UK/EU exports
Date format	Pattern-matches against 9 formats: `YYYY-MM-DD`, `DD/MM/YYYY`, `MM/DD/YYYY`, `DD.MM.YYYY`, `DD Mon YYYY`, short-year variants
Number format	Distinguishes EU (`1.234,56`) from US (`1,234.56`) by counting comma/dot separator signals
Currency	Detects from symbols (`$€£¥₹₽₩`) and ISO codes (`USD EUR GBP PLN CHF` etc.) in headers and data cells
Balance column	Identifies monotonically-signed numeric columns near the amount column

Supported bank formats (tested)

Bank / Format	Country	Notes
Chase	🇺🇸 US	7-column with Post Date and Category
Bank of America	🇺🇸 US	4-column, running balance
Wells Fargo	🇺🇸 US	No header row
Barclays	🇬🇧 UK	Inflow/outflow split columns
HSBC	🇬🇧 UK	Metadata preamble, separate debit/credit
ING	🇳🇱 NL	Semicolon-delimited, EU number format
Sparkasse	🇩🇪 DE	Semicolon-delimited, EU decimals, multi-row metadata
UBS	🇨🇭 CH	CHF currency detection, preamble rows
BNP Paribas	🇫🇷 FR	Semicolon-delimited, signed amounts
NAB	🇦🇺 AU	AUD, debit/credit columns
Santander	🇬🇧 UK / 🇪🇸 ES	Multiple regional formats
Revolut	🌍 Multi	Multi-currency exports
Tab-delimited	Any	Auto-detected
Pipe-delimited	Any	Auto-detected
Generic w/ metadata	Any	Account info header rows auto-skipped

When heuristic confidence is low, the LLM is called with a structured prompt and the first 15 rows to fill the gaps. Heuristic results always win when confident.

Categorization Pipeline

Upload CSV
    │
    ▼
Statement Analyzer
  ├─ Heuristic engine  ──── high confidence ──▶ column mapping resolved
  └─ LLM fallback      ──── low confidence  ──▶ LLM fills gaps
    │
    ▼
Rule Engine  ◀─── saved preferences (regex patterns, bank filters)
  ├─ Match found  ──▶ category applied instantly
  └─ No match     ──▶ LLM batch queue
    │
    ▼
Ollama (parallel workers, configurable batch size)
  ├─ Confident result  ──▶ category committed
  └─ Uncertain         ──▶ MANUAL_REVIEW flag
    │
    ▼
Review Queue
  ├─ Bulk select + assign category
  ├─ Per-transaction rule creation
  └─ Conflict resolution (AI vs reviewer)
    │
    ▼
Dashboard

Features

Zero-config format detection — works on statements with metadata headers, blank rows, split debit/credit columns, EU/US number formats, and 9 date format variants
Multi-currency — detects and stores transaction currency; dashboard handles mixed-currency months
Parallel LLM batching — configurable worker count and batch size; processes large statements fast
Rule analysis — post-hoc analysis finds categorization conflicts and suggests new rules across historical data
Bulk review UI — checkbox select-all, bulk categorize, inline conflict resolution
Budget tracking — set monthly budgets per category, visualized against actuals
Multi-bank — each upload is tagged to a bank; rules can be bank-specific or global
Auth optional — runs in single-user mode with no auth required; plug in Auth0 for multi-user
SQLite (dev) / PostgreSQL (prod) — swap via DATABASE_URL
Privacy first — no telemetry, no external API calls, runs entirely on your machine

Architecture

┌──────────────────────────────────────────────────────┐
│                    Browser (:8000)                    │
│  ┌────────────────┐      ┌─────────────────────────┐ │
│  │  Upload Wizard  │      │  Dashboard              │ │
│  │  (SimpleUpload) │      │  Charts · Budgets ·     │ │
│  │  Auto-detection │      │  Review · Trends        │ │
│  └────────────────┘      └─────────────────────────┘ │
└──────────────────┬───────────────────────────────────┘
                   │ REST + SSE (streaming)
┌──────────────────▼───────────────────────────────────┐
│              FastAPI Server (:8000)                   │
│  ┌─────────────┐ ┌──────────┐ ┌────────────────────┐ │
│  │  Statement  │ │  Rule    │ │  Ollama LLM        │ │
│  │  Analyzer   │ │  Engine  │ │  (parallel batches)│ │
│  └─────────────┘ └──────────┘ └────────────────────┘ │
│  ┌─────────────────────────────────────────────────┐  │
│  │         SQLite (dev) / PostgreSQL (prod)        │  │
│  └─────────────────────────────────────────────────┘  │
└──────────────────────────────────────────────────────┘
                   │
┌──────────────────▼───────────────────────────────────┐
│              Ollama (:11434)                          │
│              Local LLM — default: qwen3.5:9b         │
└──────────────────────────────────────────────────────┘

Prerequisites

Dependency	Version	Notes
Python	3.11+	Backend runtime
Node.js	18+	Frontend build
Ollama	Latest	Local LLM inference

Quick Start

Works on Windows, macOS, and Linux. The only prerequisites are Python 3.11+, Node.js 18+, and Ollama.

# 1. Clone
git clone https://github.com/RoXsaita/NumbyAI-Public.git
cd NumbyAI-Public

# 2. Install Ollama and pull the default model
python run.py setup-ollama

# 3. Copy env file
cp server/.env.example server/.env          # macOS / Linux
copy server\.env.example server\.env        # Windows (cmd)

# 4. Start everything (venv, deps, migrations, frontend build, server)
python run.py start

App runs at http://localhost:8000.

Upload the included sample_bank_export.csv — it's a realistic two-month export with metadata header rows, recurring merchants, and edge cases designed to exercise the parser.

Project Structure

NumbyAI-Public/
├── server/
│   ├── app/
│   │   ├── main.py                    # API routes
│   │   ├── config.py                  # Pydantic settings
│   │   ├── database.py                # SQLAlchemy models
│   │   ├── services/
│   │   │   ├── statement_analyzer.py  # Heuristic format detection + LLM fallback
│   │   │   ├── categorization_rules.py
│   │   │   ├── llm_service.py         # Ollama client + batching
│   │   │   └── ollama_service.py
│   │   └── tools/
│   │       └── statement_parser.py    # CSV/XLSX → transaction rows
│   ├── tests/
│   │   ├── fixtures/                  # Real-world format CSVs (Chase, Barclays, ING, ...)
│   │   └── test_statement_analyzer.py
│   ├── alembic/                       # DB migrations
│   └── Dockerfile
├── web/
│   └── src/
│       ├── components/SimpleUpload.tsx  # Upload wizard
│       ├── widgets/dashboard.tsx        # Main dashboard
│       └── lib/api-client.ts
├── sample_bank_export.csv             # Two-month test statement with metadata preamble
├── run.py                            # Cross-platform CLI (Windows / macOS / Linux)
└── Makefile                          # macOS / Linux shortcut (optional)

Configuration

All config via environment variables. See server/.env.example.

Variable	Description	Default
`DATABASE_URL`	DB connection string	`sqlite:///./finance_recon.db`
`SECRET_KEY`	JWT signing key	`dev-only-not-for-production`
`OLLAMA_URL`	Ollama server URL	`http://localhost:11434`
`OLLAMA_MODEL`	Model for categorization	`qwen3.5:9b`
`CATEGORIZATION_BATCH_SIZE`	Transactions per LLM batch	`20`
`CATEGORIZATION_MAX_WORKERS`	Parallel batch workers	`2`
`AUTH0_DOMAIN`	Auth0 domain (optional)	Disabled

Development

All commands work on Windows, macOS, and Linux via run.py:

python run.py start          # Stop → migrate → build → start
python run.py stop           # Kill the server
python run.py logs           # Tail backend logs
python run.py check          # ruff + mypy + pytest
python run.py setup-ollama   # Install/verify Ollama + pull model
python run.py test-e2e       # End-to-end categorization (requires Ollama)
python run.py clear-db       # Delete the SQLite database

macOS / Linux shortcut (Makefile)

If you have make installed, the Makefile still works:

make restart        # Stop → migrate → build → start
make stop           # Kill all services
make logs           # Tail backend logs
make check-python   # ruff + mypy + pytest
make test-e2e       # End-to-end categorization (requires Ollama)

Run tests

cd server
pytest tests --cov=app --cov-report=term-missing

Frontend

cd web && npm install && npm run build
# Dev mode with mock data (no backend needed):
npm run build:dev

No separate dev server — FastAPI serves the built frontend as static files.

Platform Notes

Windows

Use python instead of python3 (Windows Python installer registers python).
Ollama: install from ollama.com/download/windows or winget install Ollama.Ollama.
The Makefile requires GNU Make (e.g. via Git Bash or WSL) — use run.py instead.

Linux

Ollama: curl -fsSL https://ollama.com/install.sh | sh.
Everything else works out of the box.

macOS

Ollama: brew install ollama.
Both run.py and make work.

Deployment

Docker Compose (quickest)

# 1. Build the frontend first
cd web && npm install && npm run build && cd ..

# 2. Start the app + Ollama
docker-compose up

# 3. Pull the model inside the Ollama container (first run only)
docker-compose exec ollama ollama pull qwen3.5:9b

App runs at http://localhost:8000. Data is persisted in Docker volumes (sqlite_data, ollama_data).

Production (PostgreSQL)

A Dockerfile and railway.toml are included. For production:

Set DATABASE_URL to a PostgreSQL connection string
Set SECRET_KEY to a secure random value
Set ENVIRONMENT=production
Point OLLAMA_URL at your Ollama instance

License

MIT

GitHub - RoXsaita/NumbyAI-Public: AI-powered personal finance transaction categorizer — FastAPI + React + Ollama