GitHub - stfurkan/pi-llm

4 min read Original article ↗

Pi-LLM: Local AI Server on Raspberry Pi 4

Turn your Raspberry Pi 4 (4GB) into a secure local LLM server running PrismML Bonsai 1-bit models. Chat with it from any device on your network via the built-in web UI with a model selector.

Why Bonsai? Traditional models need 2-7GB of RAM. Bonsai uses true 1-bit quantization (trained from scratch, not post-training compression). Two models included: Bonsai 4B (0.57GB, quality) and Bonsai 1.7B (0.25GB, fast) — switch between them from the UI dropdown.

Architecture

[Any device on LAN] --HTTPS--> [Caddy :443] --> [llama-server :8000 (router mode)]
                                                       |
                                                  Built-in web UI
                                                  + model selector dropdown
                                                  + OpenAI-compatible API
                                                       |
                                              ┌────────┴────────┐
                                              │                  │
                                         Bonsai 4B          Bonsai 1.7B
                                         (0.57 GB)          (0.25 GB)
                                          quality              fast

Models are loaded on-demand. Only the active model uses RAM (LRU eviction when switching).

Hardware Requirements

Item Notes
Raspberry Pi 4 Model B (4GB) The brain
32GB+ microSD (A2 rated) or USB SSD SSD recommended for longevity
5V/3A USB-C power supply Official RPi PSU recommended
Heatsink + fan Essential — sustained AI inference generates heat
WiFi or Ethernet WiFi is fine for most use cases

Quick Start — Core LLM Server (30-45 minutes)

This gets you a working local LLM chat accessible from any device on your LAN.

# 1. Flash Raspberry Pi OS Lite (64-bit) onto your SD card
#    See: guides/01-hardware-prep.md

# 2. SSH into your Pi
ssh pi@pi-llm.local

# 3. Copy this project to the Pi
scp -r pi-llm/ pi@pi-llm.local:~/pi-llm/

# 4. Run setup scripts in order
cd ~/pi-llm
sudo bash scripts/01-os-setup.sh        # System hardening + performance tuning
# >>> REBOOT and reconnect <<<
sudo bash scripts/02-install-bonsai.sh   # Build PrismML llama.cpp + Bonsai models
sudo bash scripts/03-security-setup.sh   # HTTPS + firewall
sudo bash scripts/04-monitoring.sh       # Temperature monitoring

# 5. Open the chat UI from any device on your network
#    https://pi-llm.local
#    (Accept the self-signed certificate warning)
#    Use the model selector dropdown to switch between Bonsai 4B and 1.7B

That's the complete core setup. You have a private LLM, accessible over HTTPS on your LAN, with two model sizes to pick from. Stop here if you only want local chat.

Optional: Physical Hardware Integration

Beyond chat, you can connect physical hardware (LEDs, displays, servos) and let the LLM control them via native tool calling.

Optional Step 05: TM1637 4-Digit Display (15 minutes)

Wire up a cheap TM1637 display, and the LLM can update it on command. Say "show 1234 on the display" and the physical LEDs light up in real time.

sudo bash scripts/05-install-display.sh

See guides/07-optional-display.md for wiring and details. Removable at any time with --uninstall.

Guides

Guide Description
01-hardware-prep.md What to buy, how to flash the SD card
02-os-setup.md First boot, SSH keys, what the setup script does
03-llm-server.md Bonsai models, llama-server router mode
04-security.md HTTPS, firewall rules
05-troubleshooting.md Common issues and fixes
06-optional-static-ip.md Optional: configure a static IP
07-optional-display.md Optional: TM1637 display with LLM tool calling

RAM Budget (4GB)

Component RAM Usage
Raspberry Pi OS Lite ~300 MB
llama-server + active model (on-demand) ~0.3-0.6 GB
Caddy reverse proxy ~25 MB
Total (one model loaded) ~0.6-0.9 GB
Free RAM ~3.1-3.4 GB

Router mode loads models on-demand. Only the active model occupies RAM. Switching models unloads the previous one.

Expected Performance

  • Bonsai 4B: ~2 tokens/second
  • Bonsai 1.7B: ~4-8 tokens/second
  • Concurrent users: 1 recommended (limited by Pi 4 hardware)

Security

  • SSH key-only authentication (passwords disabled)
  • fail2ban (3 attempts -> 1 hour ban)
  • HTTPS via Caddy (self-signed TLS)
  • UFW firewall (SSH + HTTPS from LAN only, direct port 8000 blocked)
  • Automatic security updates (unattended-upgrades)

Disclaimer

This is an educational/hobbyist project. It is provided "as is" without warranty of any kind. The authors take no responsibility for any damage, data loss, security issues, or other problems arising from the use of this project. Use it at your own risk.

  • The self-signed TLS certificate is not suitable for production or public-facing deployments
  • LLM outputs may be inaccurate, biased, or inappropriate — do not rely on them for critical decisions
  • Security hardening is designed for a trusted home network, not hostile environments
  • PrismML Bonsai models are third-party and subject to their own licenses and limitations

Always review the scripts before running them with sudo on your system.

License

MIT