OfflineAI: tiny JetBrains LLM Agent
OfflineAI is the little LLM coding agent that we use. It's optimized for privacy :)
It works quite well with Qwen3-Coder-30B-A3B-Instruct-UD-Q4_K_XL.gguf on an RTX 3090 or up.
To activate the AI agent, click anywhere into your source code and press
Ctrl+Alt+Shift+H (for Help!)
and the OfflineAI Chat window should open. Type a task and start it with Ctrl+Return. Or cancel with the "Reset Chat History" button in the top right corner.
The agent will never modify existing files directly, instead it will open a diff inside the IDE.
How to compile
and that will create
build/distributions/jetbrains-mini-agent-0.1.0.zip
Installation
Compile or Download jetbrains-mini-agent-0.1.0.zip and install it using
Settings/Preferences > Plugins > ⚙️ > Install plugin from disk...
Usage
Download Qwen3-Coder-30B-A3B-Instruct-UD-Q4_K_XL.gguf
Run the LLM locally, for example with
docker run --gpus all -v /path/to/gguf/file/:/models -p 8081:8081 ghcr.io/ggml-org/llama.cpp:server-cuda -m /models/Qwen3-Coder-30B-A3B-Instruct-UD-Q4_K_XL.gguf --port 8081 --host 0.0.0.0 --jinja -ngl 99 --threads -1 --ctx-size 32768 --temp 0.7 --min-p 0.0 --top-p 0.80 --top-k 20 --repeat-penalty 1.05 --verbose
And now the plugin will work 🎉
(Or you can patch OfflineAiInvocationService.kt line 21 to use a different LLM.)
Just fork it!
I'm putting this out here in case it's useful, but please don't expect any support or feature improvements. If in doubt, just fork it and create your own customized variant.