GitHub - bakks/butterfish: A shell with AI superpowers

A shell with AI superpowers

What is this thing?

Butterfish is for people who work from the command line, it adds AI prompting to your shell (bash, zsh) with OpenAI. Think Github Copilot for shell.

Here's how it works: use your shell as normal, start a command with a capital letter to prompt the AI. The AI sees the shell history, so you can ask contextual questions like "Why did that command fail?".

This is a magical UX pattern -- you get high-context AI help exactly when you want it, NO COPY/PASTING.

What can you do with Butterfish Shell?

Once you run butterfish shell you can do the following things from the command line:

"Give me a command to do x"
"Why did that command fail?"
"!Run make in this directory, debug problems" (this acts as an agent)
"@Show the largest files here" (this attempts exactly one shell command)
Autocomplete shell commands (if the AI 'verbally' suggested a command it will appear)
"Give me a pasta recipe" (this is a ChatGPT interface so it's not just for shell stuff!)

Feedback and external contribution is very welcome! Butterfish is open source under the MIT license. We hope that you find it useful!

Prompt Transparency

Many AI-enabled products obscure the prompt (instructional text) sent to the AI model, Butterfish makes it transparent and configurable.

To see the raw AI requests / responses you can run Butterfish in verbose mode (butterfish shell -v) and watch the log file (/var/tmp/butterfish.log on MacOS). For more verbosity, use -vv.

To configure the prompts you can edit ~/.config/butterfish/prompts.yaml.

Installation & Authentication

Butterfish works on MacOS and Linux. You can install via Homebrew on MacOS:

brew install bakks/bakks/butterfish
butterfish shell
Is this thing working? # Type this literally into the CLI

You can also install with go install:

go install github.com/bakks/butterfish/cmd/butterfish@latest
$(go env GOPATH)/bin/butterfish shell
Is this thing working? # Type this literally into the CLI

The first invocation will prompt you to paste in an OpenAI API secret key. You can get an OpenAI key at https://platform.openai.com/account/api-keys.

The key will be written to ~/.config/butterfish/butterfish.env, which looks like:

It may also be useful to alias the butterfish command to something shorter. If you add the following line to your ~/.zshrc or ~/.bashrc file then you can run it with only bf.

Shell Mode

How does this work? Shell mode wraps your shell rather than replacing it.

You run butterfish shell and use your existing shell as normal, this is tested with zsh and bash
You start a command with a capital letter to prompt the LLM, e.g. "How do I do..."
You can autocomplete commands and prompt questions with Tab
Prompts and autocomplete use local context for answers, like ChatGPT

This pattern is shockingly effective because your shell history becomes the AI chat context. For example, if you cat a file to print it out then the AI will see it. If you tried a command that failed, the AI can see the command and the error.

Shell mode defaults to using gpt-5.5 with high reasoning effort and the Responses API priority service tier for fast mode. You can override the model with:

butterfish shell -m gpt-5.5

Shell Mode Command Reference

> butterfish shell --help
Usage: butterfish shell

Start the Butterfish shell wrapper. This wraps your existing shell, giving
you access to LLM prompting by starting your command with a capital letter.
LLM calls include prior shell context. This is great for keeping a chat-like
terminal open, sending written prompts, debugging commands, and iterating on
past actions.

Use:
  - Type a normal command, like 'ls -l' and press enter to execute it
  - Start a command with a capital letter to send it to GPT, like 'How do I
    recursively find local .py files?'
  - Autosuggest will print command completions, press tab to fill them in
  - GPT will be able to see your shell history, so you can ask contextual
    questions like 'why didnt my last command work?'
  - Start a command with ! to enter Agent Mode, in which GPT will act as an agent
    attempting to accomplish your goal by executing commands, for example '!Run
    make in this directory and debug any problems'.
  - Start a command with !! to enter Unsafe Agent Mode, in which GPT will execute
    commands without confirmation. USE WITH CAUTION.
  - Start a command with @ to enter Action Mode, in which GPT will attempt
    exactly one shell command for your request or decline if a single command
    is not a good fit.
  - Start a command with @@ to auto-execute the Action Mode command
    immediately. USE WITH CAUTION.

Here are special Butterfish commands:
  - Help : Give hints about usage.
  - Status : Show the current Butterfish configuration.
  - History : Print out the history that would be sent in a GPT prompt.

If you do not have OpenAI free credits then you will need a subscription and
you will need to pay for OpenAI API use. Autosuggest will probably be the most
expensive feature. You can reduce spend by disabling shell autosuggest (-A) or
increasing the autosuggest timeout (e.g. -t 2000).

Flags:
  -h, --help                       Show context-sensitive help.
  -v, --verbose                    Verbose mode, prints full LLM prompts
                                   (sometimes to log file). Use multiple times
                                   for more verbosity, e.g. -vv.
  -L, --log                        Write verbose content to a log file rather
                                   than stdout, usually /var/tmp/butterfish.log
  -V, --version                    Print version information and exit.
  -u, --base-url="https://api.openai.com/v1/responses"
                                   Base URL for OpenAI-compatible API.
                                   The default points directly at the Responses
                                   endpoint.
      --service-tier="priority"    Responses API service tier for model
                                   requests. Use priority for fast mode;
                                   set empty to omit.
  -z, --token-timeout=300000       Timeout before the first streaming event is
                                   received and between streaming events. In
                                   milliseconds.
  -l, --light-color                Light color mode, appropriate for a terminal
                                   with a white(ish) background

  -b, --bin=STRING                 Shell to use (e.g. /bin/zsh), defaults to
                                   $SHELL.
  -m, --model="gpt-5.5"            Model for when the user manually enters a
                                   prompt.
  -r, --reasoning-effort="high"
                                   Reasoning effort for shell prompting, Agent
                                   Mode, and Action Mode. Ignored for
                                   autosuggest and automatically disabled for
                                   models that don't support reasoning.
  -A, --autosuggest-disabled       Disable autosuggest.
  -a, --autosuggest-model="gpt-5.5"
                                   Model for autosuggest
  -t, --autosuggest-timeout=500    Delay after typing before autosuggest (lower
                                   values trigger more calls and are more
                                   expensive). In milliseconds.
  -T, --newline-autosuggest-timeout=3500
                                   Timeout for autosuggest on a fresh line, i.e.
                                   before a command has started. Negative values
                                   disable. In milliseconds.
  -p, --no-command-prompt          Don't change command prompt (shell PS1
                                   variable). If not set, an emoji will be added
                                   to the prompt as a reminder you're in Shell
                                   Mode.
  -P, --max-prompt-tokens=32768    Maximum number of tokens, we restrict
                                   calls to this size regardless of model
                                   capabilities.
  -H, --max-history-block-tokens=1024
                                   Maximum number of tokens of each block of
                                   history. For example, if a command has a very
                                   long output, it will be truncated to this
                                   length when sending the shell's history.
  -R, --max-response-tokens=0      Maximum number of output tokens in a shell
                                   response. The default 0 omits the API cap.

Agent Mode

If you're in Shell Mode you can start an agent to accomplish a goal by triggering Agent Mode. Start a command with !, as in !Fix that bug. Agent Mode will populate a command in your shell, which you can execute with Enter, or you can edit the command, or give feedback to the agent by doing a shell prompt (by starting a command with a capital letter). Agent Mode will exit if it decides the goal is met or impossible, or you can manually exit with Ctrl-C. For models that support it (for example GPT-5.1+), Agent Mode uses the Responses API shell tool; older models fall back to the structured command function.

You can trigger Unsafe Agent Mode by starting a command with !!, which will execute commands without confirmation, and is thus potentially dangerous.

Agent Mode Examples

How well does this work? Mileage will vary. Your success rate will be higher with simpler goals and more guidance about how to accomplish them.

The advantages of this feature are that the agent can see your shell history and so it has context of what you're doing manually and can take over. If a command fails the agent will tweak it and try again.

Some disadvantages are that the agent is biased towards specific versions of commands and may have to experiment to get it right, for example the flags for grep on MacOS are different than on most Linux implementations. The agent isn't very effective at manipulating large text files like code files, so you will want to be conscious of the context it needs to be successful.

Here are some goals that work well:

!Recursively list the golang files in this directory
!Find the hidden files in this directory and ask me if I want to delete them. This will generally print some things and then wait for user input (provided by prompting starting with a capital letter).
!Show me what process is using the most memory

Here are some goals that work sometimes:

!Run make in this dir, debug problems
!Install python dependencies for this project
!Create a list of the top 3 hacker news headlines, including a link. Use the pup command to parse them out of HTML

Action Mode

If you're in Shell Mode you can trigger Action Mode with @. Action Mode is a single-shot version of Agent Mode: Butterfish asks the model for exactly one shell command, stages that command in your shell, and exits after that command finishes. If a single command does not make sense for the request, the model can decline instead of forcing a bad command.

If you use @@, Butterfish will execute the generated command immediately instead of staging it for review first.

Some requests that fit Action Mode well:

@show the 10 largest files here
@find all markdown files modified today
@tail the latest app log

Local Models

Butterfish uses OpenAI models by default, but you can instead point it to any server with a OpenAI compatible API with the --base-url (-u) flag. For example:

butterfish prompt -u "http://localhost:5000/v1/responses" "Is this thing working?"

This enables using Butterfish with local or remote non-OpenAI models. Notes on this feature:

In practice using hosted models is much simpler than running your own, and Butterfish's prompts have been tuned for OpenAI models, so you will probably get the best results using the default OpenAI models.
Being OpenAI-API compatible in this case means implementing the Responses endpoint with streaming results. Butterfish will normalize a base URL ending in /responses down to /v1 and send requests to /v1/responses.
Butterfish will add your token to requests to the Responses endpoint, so be careful about accidentally leaking credentials if you don't trust the server.
Options for running a local model with a compatible interface include LM Studio and text-generation-webui.

CLI Examples

Shell Mode is the primary focus of Butterfish but it also includes more specific command line utilities for prompting and generating commands.

`prompt` - Straightforward LLM prompt

Examples:

butterfish prompt "Write me a poem about placeholder text"
echo "Explain unix pipes to me:" | butterfish prompt
cat go.mod | butterfish prompt "Explain what this go project file contains:"

> butterfish prompt --help
Usage: butterfish prompt [<prompt> ...]

Run an LLM prompt without wrapping, stream results back. This is a
straight-through call to the LLM from the command line with a given prompt.
This accepts piped input, if there is both piped input and a prompt then they
will be concatenated together (prompt first). It is recommended that you wrap
the prompt with quotes. The default GPT model is gpt-5.5.

Arguments:
  [<prompt> ...]    Prompt to use.

Flags:
  -h, --help                     Show context-sensitive help.
  -v, --verbose                  Verbose mode, prints full LLM prompts
                                 (sometimes to log file). Use multiple times for
                                 more verbosity, e.g. -vv.
  -V, --version                  Print version information and exit.

  -m, --model="gpt-5.5"            LLM to use for the prompt.
  -r, --reasoning-effort="high"
                                 Reasoning effort for the prompt request.
                                 Automatically disabled for models that don't
                                 support reasoning.
  -n, --num-tokens=1024          Maximum number of tokens to generate.

`gencmd` - Generate a shell command

Use the -f flag to execute sight unseen.

butterfish gencmd -f "Find all of the go files in the current directory, recursively"

> butterfish gencmd --help
Usage: butterfish gencmd <prompt> ...

Generate a shell command from a prompt, i.e. pass in what you want, a shell
command will be generated. Accepts piped input. You can use the -f command to
execute it sight-unseen.

Arguments:
  <prompt> ...    Prompt describing the desired shell command.

Flags:
  -h, --help       Show context-sensitive help.
  -v, --verbose    Verbose mode, prints full LLM prompts (sometimes to log
                   file). Use multiple times for more verbosity, e.g. -vv.
  -V, --version    Print version information and exit.

  -r, --reasoning-effort="high"
                   Reasoning effort for command generation. Automatically
                   disabled for models that don't support reasoning.
  -f, --force      Execute the command without prompting.

`exec` - Run a command and suggest a fix if it fails

Use -r to control the reasoning effort for the fix-suggestion request. It defaults to high.

butterfish exec 'find -nam foobar'

Commands

Here's the command help:

> butterfish --help
Usage: butterfish <command>

Do useful things with LLMs from the command line, with a bent towards software
engineering.

Butterfish is a command line tool for working with LLMs. It has two modes: CLI
command mode, used to prompt LLMs and generate commands, and Shell mode: Wraps
your local shell to provide easy prompting and autocomplete.

Butterfish stores an OpenAI auth token at ~/.config/butterfish/butterfish.env
and the prompt wrappers it uses at ~/.config/butterfish/prompts.yaml. Butterfish
logs to the system temp dir, usually to /var/tmp/butterfish.log.

To print the full prompts and responses from the OpenAI API, use the --verbose
flag. Support can be found at https://github.com/bakks/butterfish.

If you do not have OpenAI free credits then you will need a subscription and you
will need to pay for OpenAI API use. If you're using Shell Mode, autosuggest
will probably be the most expensive part. You can reduce spend by disabling
shell autosuggest (-A) or increasing the autosuggest timeout (e.g. -t 2000).
See "butterfish shell --help".

v0.1.12 darwin amd64 (commit 0c115fa) (built 2023-09-27T19:12:29Z) MIT License -
Copyright (c) 2023 Peter Bakkum

Flags:
  -h, --help       Show context-sensitive help.
  -v, --verbose    Verbose mode, prints full LLM prompts (sometimes to log
                   file). Use multiple times for more verbosity, e.g. -vv.
  -V, --version    Print version information and exit.

Commands:
  shell
    Start the Butterfish shell wrapper. This wraps your existing shell, giving
    you access to LLM prompting by starting your command with a capital letter.
    LLM calls include prior shell context. This is great for keeping a chat-like
    terminal open, sending written prompts, debugging commands, and iterating on
    past actions.

    Use:
      - Type a normal command, like 'ls -l' and press enter to execute it
      - Start a command with a capital letter to send it to GPT, like 'How do I
        recursively find local .py files?'
      - Autosuggest will print command completions, press tab to fill them in
      - GPT will be able to see your shell history, so you can ask contextual
        questions like 'why didnt my last command work?'
      - Start a command with ! to enter Agent Mode, in which GPT will act as
        an agent attempting to accomplish your goal by executing commands,
        for example '!Run make in this directory and debug any problems'.
      - Start a command with !! to enter Unsafe Agent Mode, in which GPT will
        execute commands without confirmation. USE WITH CAUTION.

    Here are special Butterfish commands:
      - Help : Give hints about usage.
      - Status : Show the current Butterfish configuration.
      - History : Print out the history that would be sent in a GPT prompt.

    If you do not have OpenAI free credits then you will need a subscription and
    you will need to pay for OpenAI API use. Autosuggest will probably be the
    most expensive feature. You can reduce spend by disabling shell autosuggest
    (-A) or increasing the autosuggest timeout (e.g. -t 2000).

  prompt [<prompt> ...]
    Run an LLM prompt without wrapping, stream results back. This is a
    straight-through call to the LLM from the command line with a given prompt.
    This accepts piped input, if there is both piped input and a prompt then
    they will be concatenated together (prompt first). It is recommended that
    you wrap the prompt with quotes. The default GPT model is gpt-5.5.

  gencmd <prompt> ...
    Generate a shell command from a prompt, i.e. pass in what you want, a shell
    command will be generated. Accepts piped input. You can use the -f command
    to execute it sight-unseen.

  exec [<command> ...]
    Execute a command and try to debug problems. The command can either passed
    in or in the command register (if you have run gencmd in Console Mode).

Run "butterfish <command> --help" for more information on a command.

Prompt Library

A goal of Butterfish is to make prompts transparent and easily editable. Butterfish will write a prompt library to ~/.config/butterfish/prompts.yaml and load this every time it runs. You can edit prompts in that file to tweak them. If you edit a prompt then set OkToReplace: false, which prevents overwriting.

> head -n 8 ~/.config/butterfish/prompts.yaml
- name: shell_system_message
  prompt: 'You are an assistant that helps the user with a Unix shell. Give advice
    about commands that can be run and examples but keep your answers succinct. Here
    is system info about the local machine: ''{sysinfo}'''
  oktoreplace: true
- name: shell_autocomplete_command
  prompt: |-
    You are a unix shell command autocompleter. I will give you the user's history, predict the full command they will type. You will find good suggestions in the user's history, suggest the full command.

If you want to see the exact communication between Butterfish and the OpenAI API then set the verbose flag (-v) when you run Butterfish, this will print the full prompt and response either to the terminal or to a log file.

Example

If you want to customize how Butterfish generates shell commands, open ~/.config/butterfish/prompts.yaml, find generate_command, and edit it. Once you edit, set oktoreplace: false to prevent overwriting.

Remember that if you run Butterfish in verbose mode (with -v), you will see the exact prompt when you run it.

Dev Setup

I've been developing Butterfish on an Intel Mac, but it should work fine on ARM Macs and probably work on Linux (untested). Here is how to get set up for development on MacOS:

brew install git go protobuf protoc-gen-go protoc-gen-go-grpc
git clone https://github.com/bakks/butterfish
cd butterfish
make
./bin/butterfish prompt "Is this thing working?"

Default tests skip the slower PTY E2E suite:

Run the PTY E2E suite explicitly when changing shell editing, wrapping, or pseudo-terminal behavior:

make test-pty-e2e
make test-full