GitHub - HuyNguyenAu/llm_native_virtual_machine: An experimental virtual processor that has an LLM instead of an ALU. It's starting to look like a virtual machine for software 2.0.

Let's start out with an idea because I'm bored: What if a programming language was powered by an LLM?

Basically, we shifted from having deterministic operations to one that is probabilistic and generative. This new paradigm called "Soft Computing" allows us to work with data that is unstructured, messy, or subjective in a way that traditional computing struggles with. In short, we can handle ambiguity and fuzzy logic, which is becoming more common in real‑world applications.

This project explores the idea of implementing a simple and reduced programming language which attempts to make the programmer focus more on the "what" rather than the "how" by keeping the instructions simple and moving the complexity to the language model.

What we have here is a new way to decompose large complex prompts into small atomic instructions that can be executed sequentially while maintaining a context stack to keep track of the conversation history and relevant information. This allows the language model acheive better performance and accuracy, especially when working with smaller models that have less attention capacity.

Think of this language as a middle ground between traditional programming languages and natural language prompts, where we can write code that is more structured and modular than natural language prompts, but still allows us to work with multi-modal data (only text for now) in a way that is more intuitive and flexible than traditional programming languages.

Why?

I really wanted to imagine a future where we can write code where we don't have to worry about edge cases or complex logic to handle unstructured data. Instead, we can just write code that describes what we want to achieve, and let the language model handle the complexity of how to achieve it. In short, we can write code that is more focused on the "what" rather than the "how".

Here's an example of what a program written in this language looks like:

; Program: Room Comfort Adjustment System
; Objective: Adjust the room's temperature and lighting based on sensor data to achieve optimal physical comfort.
; Output: Adjusted temperature and lighting settings.

; Registers:
; X1: Sensor data (JSON format)
; X2: User feedback (string)
; X3: Temporary register for constructing prompts
; X4: Current temperature information extracted from sensor data
; X5: Classified category of temperature feedback
; X6: Classified intensity of temperature feedback
; X7: Current light intensity information extracted from sensor data
; X8: Classified category of light feedback
; X9: Classified intensity of light feedback
; X10: Adjusted temperature based on feedback
; X11: Validation result for adjusted temperature
; X12: Adjusted light intensity based on feedback
; X13: Output JSON containing the final adjusted temperature and light intensity
; C1: Context stack for sensor data
; C10: Context stack for processing feedback and adjustments
; C30: Saved context for temperature feedback
; C31: Saved context for light feedback

; Load sensor data and user feedback.
LC   X1, "examples/data/room_sensor_data.json"
LS   X2, "It's too dark to read and I am sweating."

PSH  C1, X1, "user"                     ; Push the sensor data the context stack for processing.

; Sense: Build the temperature feedback context for adjustments.
LS   X3, "A sentence that describes the current state of the room with only the temperature information in celsius."
INF  X4, X3, C1

LS   X3, "User feedback:"
PSH  C10, X3, "user"
PSH  C10, X2, "user"                     ; Push the user feedback to the context stack for processing.

LS   X3, "If the user feedback is related to temperature, classify the feedback into one of the following categories: TOO_COLD, TOO_WARM, COMFORTABLE, UNRELATED. Category:"
INF  X5, X3, C10

LS   X3, "If the user feedback is related to temperature, classify the intensity of the feedback into one of the following levels: Mild, Moderate, Severe. Intensity:"
INF  X6, X3, C10

MVC  C10, C0                            ; Clear context stack to classify the temperature feedback.

LS   X3, "Current room state:"
PSH  C10, X3, "user"
PSH  C10, X4, "user"                    ; Push the temperature information for context.

LS   X3, "Category and intensity of the temperature feedback:"
PSH  C10, X3, "user"
PSH  C10, X5, "user"                    ; Push the temperature feedback category for context.
PSH  C10, X6, "user"                    ; Push the temperature feedback intensity for context.

MVC C30, C10                            ; Save the temperature feedback context for later adjustment.

; Sense: Build the light feedback context for adjustments.
LS   X3, "A sentence that describes the current state of the room with only the light intensity information in percentage."
INF  X7, X3, C1

MVC  C10, C0                            ; Clear context stack to classify the light feedback.

LS   X3, "User feedback:"
PSH  C10, X3, "user"
PSH  C10, X2, "user"                    ; Push the user feedback to the context stack for processing.

LS   X3, "If the user feedback is related to light, classify the feedback into one of the following categories: TOO_DARK, TOO_BRIGHT, COMFORTABLE, UNRELATED. Category:"
INF  X8, X3, C10

LS  X3, "If the user feedback is related to light, classify the intensity of the feedback into one of the following levels: Mild, Moderate, Severe. Intensity:"
INF  X9, X3, C10

MVC  C10, C0                            ; Clear context stack to build the light feedback context for adjustments.

LS   X3, "Current room state:"
PSH  C10, X3, "user"
PSH  C10, X7, "user"                    ; Push the light intensity information for context.

LS   X3, "Category and intensity of the light feedback:"
PSH  C10, X3, "user"
PSH  C10, X8, "user"                    ; Push the light feedback category for context.
PSH  C10, X9, "user"                    ; Push the light feedback intensity for context.

MVC  C31, C10                            ; Save the light feedback context for later adjustment.

; Think: Adjust the temperature based on the classified feedback.
LI   X31, 5                             ; Set a retry limit to prevent infinite loops in case of invalid adjustments.

RETRY_TEMP:
MVC  C10, C30                           ; Restore the temperature feedback context for adjustments.
SUBI X31, 1                             ; Decrement the retry counter.

LI   X3, 0
BEQ  X31, X3, ABORT_TEMP                ; If retry limit is reached, abort the operation.

LS   X3, "Intensity Factor: Mild = 1, Moderate = 2, Severe = 3. If 'TOO_COLD', increase temperature by (0.5 * intensity_factor). If 'TOO_WARM', decrease temperature by (0.5 * intensity_factor). If 'COMFORTABLE', no change to temperature. If 'UNRELATED', no change to temperature.\nWhat is the new room temperature?"
INF  X10, X3, C10

; Guardrails: Ensure that the temperature adjustments are within safe and reasonable limits.
MVC  C10, C0                             ; Clear context stack to validate the adjusted temperature.
PSH  C10, X10, "user"                    ; Push the adjusted temperature for validation.

LS   X3, "Is the temperature mentioned above one of the following: 18°C, 18.5°C, 19°C, 19.5°C, 20°C, 20.5°C, 21°C, 21.5°C, 22°C, 22.5°C, 23°C, 23.5°C, 24°C?"
EVAL X11, X3, C10

LI   X3, 0
BEQ  X11, X3, RETRY_TEMP

; Think: Adjust the light intensity based on the classified feedback.
LI   X31, 5                             ; Set a retry limit to prevent infinite loops in case of invalid adjustments.

RETRY_LIGHT:
MVC  C10, C31                           ; Restore the light feedback context for adjustments.
SUBI X31, 1                             ; Decrement the retry counter.

LI   X3, 0
BEQ  X31, X3, ABORT_LIGHT               ; If retry limit is reached, abort the operation.

LS   X3, "Intensity Factor: Mild = 1, Moderate = 2, Severe = 3. If 'TOO_DARK', increase light by (5 * intensity). If 'TOO_BRIGHT', decrease light by (5 * intensity). If 'COMFORTABLE', no change to light. If 'UNRELATED', no change to light.\nWhat is the new light percentage?"
INF  X12, X3, C10

; Guardrails: Ensure that the light intensity adjustments are within safe and reasonable limits.
MVC  C10, C0                             ; Clear context stack to validate the adjusted light intensity.
PSH  C10, X12, "user"                    ; Push the adjusted light intensity for validation.

LS   X3, "Is the light intensity percentage mentioned above between 0% and 100%?"
EVAL X13, X3, C10

LI   X3, 0
BEQ  X13, X3, RETRY_LIGHT

; Act: Implement the adjustments to achieve the desired physical comfort.
MVC  C10, C0                             ; Clear context stack to prepare for output.
PSH  C10, X10, "user"                    ; Push the final adjusted temperature for output.
PSH  C10, X12, "user"                    ; Push the final adjusted light intensity for output.

LS   X3, "{ \"temp_celsius\": number, \"light_percent\": number }"
INF  X13, X3, C10

PLN X13
EXIT

ABORT_TEMP:
LS   X3, "Failed to adjust the room's temperature within the 5 attempts after multiple attempts."
PLN  X3
EXIT

ABORT_LIGHT:
LS   X3, "Failed to adjust the room's light intensity within the 5 attempts after multiple attempts."
PLN  X3

Future Directions

In a way, this project is essentially a low-level language for Agents. Frameworks like LangChain or AutoGPT are great for quickly building agents and applications, but they lack the flexibility and control that this language can provide.

The difference is comparable to Machine Code vs Web Framework. Here we have a fundermental execution engine or model that has the following distinct advantages:

Abstraction: We operate on atomic instructions. If you want to move data, you use MV. If you want to check a condition, you EVAL then BEQ. There is no black box logic.
Context/Memory Management: We have context registers which allow for fine-grained control over what information is relevant for each instruction.
Control Flow: The LLM only handles data processing and generation, while the control flow is handled by the assembler. The decision of where to go next is handled by branching. This is more robust since the logic of the program is not inside a prompt.

A dream state for this project would be to have other frameworks and languages compile down to this language to provide a transparent, auditable sequence of operations that can be optimised for cost and speed.

Registers

There are 33 general-purpose registers, named X0 to X32. These registers can hold text and positive numbers (currently working on support images and audio).

Similary, there are 33 context registers, named C0 to C32. These registers are used to manage the context stack, which is a FILO (First In, Last Out) structure that holds a sequence of messages that certain instructions use to maintain context.

Context Stack

The context stack is a FILO (First In, Last Out) structure that holds a sequence of messages that certain instructions use to maintain context. When you push a register onto the context stack, its content is added to the bottom of the stack as a message. When you pop from the context stack, the bottom message is removed and stored in a register. The context stack can be refined during the lifetime of the program, which allows remaining relevant information while discarding irrelevant details.

The instructions MVC, PSH, POP, and DRP are used to manage the context stack. GEN creates a model response prompt, and EVAL takes the question/query from the source register and evaluates it as a boolean question. Both of these instructions use the context stack previous history. This means that you can refine and manage the context stack to improve performance for the MAP and EVAL instructions, which is especially important when working with smaller models that have less attention capacity.

Instruction Terminology

rd - destination general-purpose register
rs - source general-purpose register
rdc - destination context register
rsc - source context register
imm - number value
str - string value
label_name - a label used for branching

Instruction Set

The instruction set is loosely inspired by RISC-V assembly language:

Instruction	Description	Use
LS	Load string into `rd`	`ls rd, str`
LI	Load immediate into `rd`	`li rd, imm`
LC	Load the content from the path `rs` into `rd`	`lc rd, str`
MV	Copy `rs` into `rd`	`mv rd, rs`
MVC	Copy `rsc` into `rdc`	`mvc rdc, rsc`
BEQ	Go to label if `rs1` = `rs2`	`beq rs1, rs2, label_name`
BLT	Go to label if `rs1` < `rs2`	`blt rs1, rs2, label_name`
BLE	Go to label if `rs1` <= `rs2`	`ble rs1, rs2, label_name`
BGT	Go to label if `rs1` > `rs2`	`bgt rs1, rs2, label_name`
BGE	Go to label if `rs1` >= `rs2`	`bge rs1, rs2, label_name`
PSH	Push `rs` into the context stack `rdc` with role	`psh rdc, rs, "user"\|"assistant"`
POP	Pop the bottom of the context stack `rsc` into `rd`	`pop rd, rsc`
DRP	Drop the bottom of the context stack `rsc`	`drp rsc`
INF	Use `rs` as the next message and store the response in `rd` using context register `rsc`	`inf rd, rs, rsc`
EVAL	Boolean evaluation of the question `rs` and store the response in `rd` (0 = false/no, 1 = true/yes) using context register `rsc`	`eval rd, rs, rsc`
SIM	Cosine similarity between `rs` and `rs` and store the result in `rd` (0 - 100)	`sim rd, rs`
LABEL	Define a label. Required for branching instructions	`label_name:`
PUT	Print the value of `rs`	`put rs`
PLN	Print the value of `rs` followed by a newline	`pln rs`
PCX	Print the content of the context register `rsc`	`pcx rsc`
SUBI	Decrement the value in `rs` by `num`	`sub rd, num`
EXIT	Exit the program	`exit`

Smaller Models

A pain point of working with smaller models (below 2.6B) is that the attention heads are simply not deep enough to map complex relationships between words. They function much closer to advanced autocomplete engines looking for patterns in the input text. Certain words or phrases can steer outcome more than others, which means that the model might completely ignore some words or phrases.

Keep in mind that the smaller the model you choose, the more precise you need to be with your instructions and guardrails. They lack reasoning capabilities, but are good at pattern matching at speed.

Decompose your reasoning instructions into small simple explicit sequential steps.
KISS (Keep It Simple, Stupid).

Some tips for working with smaller models:

Always tell it exactly what to do, not what not to do.
Don't leave any room for interpretation.
Keep the instructions and guardrails as simple and straightforward as possible. Keep it strict, and structured like User_State: Happy \n Room_State: Cold.
Keep the context stack as clean and relevant as possible. Don't push anything that is not directly relevant to the current instruction.

Quick Start

Clone the repository:

git clone https://github.com/HuyNguyenAu/language_processor_unit.git
cd language_processor_unit

Install llama.cpp.
Download LFM2 2.6B. LFM2 8B-A1B is also a very good option if you have the hardware for it.
Download Qwen3 Embedding 0.6B.

Create the .env file in the root directory with the following content:

# File name of the text model in the models directory.
TEXT_MODEL="LFM2-2.6B-Q5_K_M"

# File name of the embedding model in the models directory.
EMBEDDING_MODEL="Qwen3-Embedding-0.6B-Q4_1-imat"

# When true, output byte code of built assembly file.
DEBUG_BUILD=false

# When true, output excuted instructions and their results.
DEBUG_RUN=false

# When true, output chat interactions with the language model.
DEBUG_CHAT=false

# LFM2-2.6B recommended parameters.
TEXT_MODEL_TEMPERATURE=0.3
TEXT_MODEL_MIN_P=0.15
TEXT_MODEL_REPEAT_PENALTY=1.05

With Embeddings Model (Recommended)

Start the LLama.cpp server. Make sure to specify the --embeddings flag and the correct pooling strategy:
```
./llama-server -np 1 --embeddings --pooling mean --models-dir C:\llama\models
```

Without Embeddings Model (Faster)

Start the LLama.cpp server. This will use the text model for both text generation and embeddings, which is faster but less accurate for embeddings. Make sure to specify the --embeddings flag and the correct pooling strategy:
```
./llama-server -np 1 --embeddings --pooling mean -m C:\llama\models\LFM2.5-1.2B-Instruct-Q8_0.gguf
```

Why Use LFM2 models?

LFM2 models are very fast and capable enough for general purpose tasks relative to their size and have decent knowledge and reasoning capabilities. Here we are more concerned with the speed of the model because the LPU is designed to work with smaller models that can run on consumer hardware.

Run The Example Program

Build the example program:

cargo run build examples/room-comfort.aasm

Run the example program:
```
cargo run run build/room-comfort.lpu
```

Acknowledgements

This project was inspired by the following works:

Crafting Interpreters by Bob Nystrom. The structure and design of the assembler and processor follows a similar approach to the one described in this book.
Andrej Karpathy LLM OS and Software 2.0 ideas.