GitHub - jolexxa/cow: 🐮 Cow is just an humble AI for your computer. 🥺

5 min read Original article ↗

style: very good analysis License: MIT

Holy cow! Now you can talk back to the cow!

Cow is just an humble AI for your computer. 🥺

cow_demo-converted.mp4

Cow allows you to interact with a local language model, free of charge, as much as you could possibly want — all from the comfort of your own home terminal.

Note

Cow supports 🍎 Apple Silicon and 🐧 Linux x64.

🤠 Wrangling

Binary Install

curl -fsSL https://raw.githubusercontent.com/jolexxa/cow/main/install.sh | bash

This downloads the latest release for your platform and installs it to ~/.local/bin/.

Tip

The first time you run Cow, it will download the required model files automatically from Hugging Face.

🧠 Cow Intelligence

Cow supports two inference backends:

  • llama.cpp via llama_cpp_dart — runs GGUF models on CPU or GPU. Llama.cpp is cross platform and works just about anywhere.
  • MLX via cow_mlx + mlx_dart — runs MLX-format models natively on Apple Silicon. MLX outperforms llama.cpp on Apple Silicon hardware.

A higher-level package called cow_brain wraps both backends behind a common InferenceRuntime interface and enables high-level agentic functionality (reasoning, tool use, context management).

Cow brain can run multiple inference sequences concurrently. In the future, this will allow Cow to efficiently spawn and execute subagents which diverge from the same conversation history.

Both backends support batched decoding to speed up concurrent workloads:

Platform Backend Batching Speedup
macOS (Apple Silicon) MLX mlx-swift fork with batched decoding 1.56x
Linux (AMD/NVIDIA GPU) llama.cpp Multi-sequence llama_batch 1.48x

This was not-so-scientifically-measured by the benchmark.dart stress test on an M1 Max Macbook Pro and a Linux desktop with an AMD 9070.

Note

llama.cpp batching does not currently benefit Apple Silicon's Metal backend due to known kernel limitations. In fact, it actually slows things down for long conversations. On macOS, MLX is strongly preferred.

On Apple Silicon, Cow uses MLX with Qwen 3-8B 4-bit for primary interactions and Qwen 2.5-3B Instruct 4-bit for lightweight summarization. On Linux, Cow uses llama.cpp with the equivalent GGUF models.

Cow cannot support arbitrary models. Most models require prompts to follow a specific template, usually provided as jinja code.

One could change the context size in AppInfo. One should be sure they have enough memory to support the context size they choose or they might just put their computer out to pasture.

📦 Packages

Cow is currently a monorepo. All packages live under packages/.

Package Description
cow Main terminal application — orchestrates backends, UI, and model management
cow_brain Agentic inference layer — reasoning, tool use, context management, and a common InferenceRuntime interface
cow_model_manager Model installer — downloads and manages LLM model files
llama_cpp_dart Dart FFI bindings for llama.cpp
cow_mlx MLX Swift inference backend (macOS only) — built separately via Xcode
mlx_dart Dart FFI bindings for cow_mlx
blocterm Bridges bloc and nocterm for reactive terminal UIs
logic_blocks Human-friendly hierarchical state machines for Dart
collections Utility collection types used across packages

💻 Terminal

For a beautiful Terminal UI (TUI), Cow uses nocterm. Nocterm is also still in active development. Cow introduces a package called blocterm to enable bloc to be used as if it were a typical Flutter application.

Cow-related contributions to Nocterm:

🤝 Contributing

Development Setup

Prerequisites

  • Dart SDK (the easiest way is to use FVM to install Flutter, which includes Dart — without a version manager, you'll end up in a stampede)
  • Xcode (macOS only — required to build the MLX Swift library and compile Metal shaders)

1. Clone with submodules

Cow includes llama.cpp as a git submodule (used for FFI bindings). The --recursive flag pulls it in automatically.

git clone --recursive https://github.com/jolexxa/cow.git
cd cow

2. Install dependencies

Cow is a monorepo with multiple Dart packages under packages/.

3. Download llama.cpp native libraries

Downloads prebuilt llama.cpp binaries for your platform (macOS ARM64 or Linux x64) and places them in packages/llama_cpp_dart/assets/native/.

dart tool/download_llama_assets.dart

4. Build MLX (macOS only)

Builds the CowMLX Swift dynamic library. This requires Xcode (not just the command-line tools) because MLX uses Metal shaders that SwiftPM alone can't compile.

5. Run

dart run packages/cow/bin/cow.dart

Developer Scripts

All scripts are in ./tool/ and most accept an optional package name (e.g., cow_brain, blocterm).

dart tool/pub_get.dart [pkg]      # dart pub get (one or all)
dart tool/test.dart [pkg]         # run Dart tests (one or all)
dart tool/analyze.dart [pkg]      # dart analyze --fatal-infos
dart tool/format.dart [pkg]       # dart format (add --check for CI mode)
dart tool/coverage.dart [pkg]     # tests + lcov coverage report
dart tool/codegen.dart [pkg]      # build_runner / ffigen code generation
dart tool/build_mlx.dart          # build CowMLX Swift library
dart tool/checks.dart             # full CI check (format → analyze → build → test → coverage)

Model Profiles

Cow treats "model profiles" as the wiring layer between raw inference output and the app's message/tool semantics. Each profile defines three pieces:

  • Prompt formatter — converts messages into a token sequence matching the model's chat template
  • Stream parser — converts raw streamed tokens into structured ModelOutput
  • Tool parser — extracts tool calls from model text output

Profiles live in:

To add a new local model, implement the formatter/parser/extractor as needed, register the profile, and add tests. Profiles are thin wiring by design — keep logic in the formatter/parser classes and keep the profile declarations mostly declarative.

🎬 Credits

Cow is grateful to Alibaba Cloud for releasing the Qwen models under the permissive Apache 2.0 license. See the credits for the full license.

Cow itself is licensed under the permissive MIT license. Yee-haw!