GitHub - randombk/kllm: Kernel-level LLM inference via /dev/llm0

`kllm`: Kernel-level LLM inference via `/dev/llm0`

Ring 0 LLM inference! What can go wrong?

THIS IS A JOKE - DO NOT ACTUALLY USE THIS

This repo ports a heavily-stripped down version of llm.c inference to the kernel. This currently barely works, and expect lots of timeouts and bugs. Only single-core CPU inference without any acceleration (SSE, AVX, etc.) is supported. Even for the 124M GPT-2 model, expect to wait 1m+ for a single token, during which time the system will be unresponsive.

Building & Installing

# Build the module. Make sure to have the right kernel headers installed.
make

# Download the GPT-2 checkpoints
./download_starter_pack.sh

# Copy the models to /lib/firmware
sudo cp gpt2_124M.bin /lib/firmware/kllm-gpt2.bin
sudo cp gpt2_tokenizer.bin /lib/firmware/kllm-tokenizer.bin

# Load the module
sudo insmod kllm.ko

Usage

# Exclude newline, otherwise the LLM with generate a few extra newlines before writing actual text
echo -n "Hello, world!" > /dev/llm0

# Responses will appear in `dmesg` for now. It should also be readable from `/dev/llm0`, but this is unstable.