![]()
Brain-inspired · Persistent · Written in Rust · Open source
An AI that actually thinks
when you're not talking to it.
Lethe is a cognitive runtime with brain regions that run on their own clocks: one that talks, one that remembers, one that drifts through your old conversations at 3 AM looking for things that slipped. She lives wherever you put her — your machine, or ours — and she never stops thinking.
She's already running on our side — sign in and start talking. Free for two weeks, card only if you keep her.
// what this changes
She holds the thread for as long as you do.
She remembers the work, notices what slipped, and decides — on her own — when something is worth interrupting you for.
two weeks later
you: how's the thing going
lethe: The Springer resubmission? You uploaded the revised
manuscript on the 2nd but I never saw a confirmation email come through.
Want me to check the submission portal?
mid-conversation
you: audit the deployment scripts
lethe: Already on it — sent a subagent to do the review.
It'll check in every couple of minutes and ping me with the result.
What else?
unprompted
lethe: You haven't mentioned the visa renewal since March 12. The window closes in 9 days. Want me to draft the email to the consulate, or are you already on it?
// architecture
The brain names aren't metaphors.
Each region is a real actor with its own clock and its own logs, mapped directly to neuroscience.
cortex
The voice. Picks tools, delegates work, decides when to speak and when to shut up and let you think.
hippocampus
Memory with opinions. Retrieves what's load-bearing right now and lets the rest fade.
dmn
Default-mode network. Runs while you're away — drifts across goals, connects dots, catches what everyone else missed.
brainstem
The brainstem. Boots the system, watches resources, keeps the process alive. You never talk to it. That's the point.
subagents
Disposable workers she spins up for a job and throws away when it's done. She keeps talking while they work.
attention gate
Filters background thoughts. Most aren't worth your time. The ones that are, get through.
01:24:18 dmn background cognition complete. found possible deadline drift
01:24:19 hippocampus recall triggered. 2 notes, 3 conversation matches, salience bias active
01:24:20 cortex delegation decision. spawned subagent: deployment audit
01:26:20 subagent progress report. checked install path, reviewing update path
01:26:21 attention notification reviewed. held for cortex decision
// principles
A cognitive runtime.
A brain has parts. So does she.
Five brain regions, each on its own clock, each doing one job well. Closer to how a brain works than to anything else in this space.
Swap the brain, keep the person.
Her memory survives model swaps, reboots, and new hardware. Who she is isn't tied to any one weight set. Rebuild her tomorrow — she'll still remember today.
One Rust binary. Yours.
~50 MB, statically linked, boots in milliseconds. One file you drop in — none of the Python-and-container pinball to wire up first. Sits as a systemd service and swaps between Anthropic, OpenAI, OpenRouter, or local Gemma without touching anything else.
// get started
Two minutes to memory.
Rather not run it yourself? Try hosted Lethe — free for two weeks →
1
Install
One command. Works on macOS and Linux.
curl -fsSL https://lethe.gg/install | bash
2
Say hello
Message your bot on Telegram. From this point on, she remembers.
// you'll need
- A Telegram Bot Token — message @BotFather, send /newbot
- An LLM API key or subscription — OpenRouter, Anthropic (API key or Claude subscription), or OpenAI
- Your Telegram User ID — message @userinfobot
1
Build llama.cpp
git clone https://github.com/ggml-org/llama.cpp && cd llama.cpp
cmake -B build -DGGML_CUDA=ON && cmake --build build -j$(nproc)
2
Start the model server
Download a Gemma 4 31B GGUF and run:
llama-server --model gemma-4-31B-it-Q8_0.gguf \
--split-mode tensor --jinja --reasoning-budget 4096 \
--ctx-size 98304 --parallel 2 --flash-attn on -fit off
3
Install Lethe & configure
curl -fsSL https://lethe.gg/install | bash
# then set in .env:
LLM_PROVIDER=openai
LLM_API_BASE=http://localhost:8090/v1
OPENAI_API_KEY=local
// you'll need
- GPU with ~48GB+ VRAM (2x RTX 4090 for Q4, 4x for Q8)
- A Telegram Bot Token
- A Gemma 4 31B GGUF model file
- See full local setup guide in the README
// hosted
Let us run her.
Same Lethe — the memory, the background thinking, the 3 AM drift — except she runs on our servers instead of yours. We keep her up; you just talk to her, in the browser or on Telegram. Free to start, $19.95 a month once she's earned it.
Free for two weeks. Keep her if she's worth it, walk if she isn't.