Implementing a (vibed) LLM Coding Agent in Prolog

This is another post on DeepClause, my current side project. If you want to know more about it, please take a look at my earlier post.

At its core, DeepClause is a neurosymbolic AI system that consists of two components: a domain-specific language built on top of Prolog (the DeepClause Meta Language, or “DML”) and a runtime engine that includes a meta-interpreter written in SWI-Prolog. I’ve mostly built it to experiment with all the possible combinations of classical good old fashioned AI (logic programming, constraint solving, …) and moderns LLMs, hoping to address the shortcomings of both. It’s also been lot of fun to make it work by making all of it work with the WASM build of SWI-Prolog and experimenting with lots of other fascinating things like Co-routining in Prolog and Prolog/Javascript interop.

What’s great about DML is that it gives us a relatively concise way to encode agentic workflows or even implement fully fledged autonomous agents. For example, a basic search-and-extract workflow would look like this:

agent_main :-
    % Search the web
    tool(web_search("latest AI breakthroughs 2024"), Results),
    
    % Extract structured data using LLM
    extract_breakthroughs(Results, Breakthroughs),
    
    % Present findings
    format_report(Breakthroughs, Report),
    yield(Report).

% @-predicate: LLM-powered function
extract_breakthroughs(SearchResults, BreakthroughsList) :-
    @("From the SearchResults, extract a list of major AI breakthroughs. 
       Each item should include: technology name, organization, and key innovation.
       Return as a Prolog list of structured terms.").

% Standard Prolog predicate
format_report([], "No breakthroughs found.").
format_report([H|T], Report) :-
    format_report(T, RestReport),
    format(string(Report), "• ~w\n~w", [H, RestReport]).

To build something more flashy (and unpredictable) we can use the basic DML primitives to create an actual agent, i.e. a loop around tool calls. The core of such an agent could e.g. look like this:

...

% Recursive step: Think, Act, Observe, and loop
react_loop(Task, History, Iter, Max) :-
    % 1. THINK: Generate the next thought and action based on the current state
    format_history_for_prompt(History, FormattedHistory),
    think(Task, FormattedHistory, Thought, Action),
    log(Thought),
    log("Action: {Action}"),

    % 2. ACT: Check if the action is to finish or to use a tool
    ( Action = finish(FinalAnswer) ->
        log("Task finished. Presenting final answer."),
        end_thinking,
        chat("What is the final answer? ...")
    ;
        Action = tool_call(Tool),
        log("Executing ~w", [Tool]),
        execute_action(Action, Observation),
        log("Observation: ~w", [Observation]),

        % 3. Add the new step to history and continue the loop
        NewHistoryItem = step(Iter, Thought, Action, Observation),
        append(History, [NewHistoryItem], NewHistory),
        NextIter is Iter + 1,
        react_loop(Task, NewHistory, NextIter, Max)
    ).
...

So, let’s see how far we can take this and create our own Coding Agent implemented in DML (since that’s all the rage these days). And, because nobody wants to write code by hand anymore, let’s also use this as an experiment to see how well Opus 4.5 deals with a domain specific language.

So, we first feed the DeepClause README and source code into Opus. Then, as I just happened to read the paper “Recursive Lamguage Models” (title sounds fancier than what the paper is actually about), we add it to the context as well and finally ask Opus to create a coding agent following DML principles and the techniques presented in the aforementioned paper. And voila, after a few minutes of “work”, we end up with about 500 lines of DML code.

Fixing a few small syntax errors (unsurprisingly, even Opus still has some issues with stuff it hasn’t seen before) and going a bit back and forth to tweak our coding agent a little, we can finally take it for a test drive in the DeepClause CLI app:

Note: As SOTA models are quite expensive, the demo above uses Gemini 2.5 flash as the underlying model. Also, it seems that the Gnome screen recorder causes some weird artifacts at the end…

Anyways, the resulting code is quite beautiful and a testament to what LLMs are already when used for coding agents. Instead of directly calling any tools, the core agent prompt loop generates a list of Prolog terms:

...
% LLM generates a clause (list of goals to execute)
llm_generate_clause(Goal, StateDesc, Clause) :-
    clear_memory,
    system("You are an RLM (Recursive Language Model) coding agent. You generate Prolog clauses to understand, plan, and modify code.

## WORKFLOW PHASES

**Phase 1: UNDERSTAND** - Explore the codebase before making changes
**Phase 2: PLAN** - Create a plan of what to change
**Phase 3: EXECUTE** - Apply the planned changes

Always follow this workflow: understand -> plan -> execute

## AVAILABLE GOALS

**Exploration Actions (UNDERSTAND phase):**
- list_files(Path) - List files in directory (e.g., list_files(\".\") or list_files(\"src\"))
- read_file(Path) - Read a file (e.g., read_file(\"README.md\"))
- search_pattern(Pattern) - Grep search (e.g., search_pattern(\"authenticate\"))
- analyze_structure - Overview of all code files

**Analysis Actions:**
- analyze_observations(IdList, Question) - Analyze past observations by ID (e.g., analyze_observations([1,3], \"How do these relate?\"))
- analyze_chunk(Content, Question) - Analyze inline content
...

These terms are then executed directly by the meta-interpreter like structure in the DML code. In some cases terms get mapped directly onto a tool call, trigger sub-LLM calls or update the internal state of the agent that keeps track of all the observations, edits, etc.

...

execute_rlm_action(analyze_structure, State, Obs) :-
    get_state(directory, State, Dir),
    format(string(Cmd), "find ~w -type f \\( -name '*.py' -o -name '*.js' -o -name '*.ts' -o -name '*.pl' -o -name '*.dml' \\) 2>/dev/null | grep -v node_modules | head -100", [Dir]),
    vm_run(Cmd, Result),
    (Result \= "" ->
        format(string(Obs), "=== Codebase Structure ===\n~w", [Result])
    ;
        Obs = "=== Codebase Structure ===\nNo code files found."
    ).

execute_rlm_action(analyze_chunk(Content, Question), _, Obs) :-
    analyze_chunk_llm(Content, Question, Analysis),
    format(string(Obs), "=== Analysis ===\nQ: ~w\n\n~w", [Question, Analysis]).

execute_rlm_action(analyze_observations(ObsIds, Question), State, Obs) :-
    get_state(observations, State, Observations),
    collect_obs_by_ids(Observations, ObsIds, Collected),
    (Collected \= "" ->
        analyze_chunk_llm(Collected, Question, Analysis),
        format(string(Obs), "=== Analysis of Obs ~w ===\nQ: ~w\n\n~w", [ObsIds, Question, Analysis])
    ;
        format(string(Obs), "=== Error ===\nNo observations found for IDs: ~w", [ObsIds])
    ).


...

So what do we learn from this? Well, at least a few things:

Opus 4.5. is really good…
My DML implementation (about 90% of the core Prolog code was still written by hand) seems to work even for more complicated use cases.
Working with Prolog terms to describe sequences of actions feels somehow right.

Should we henceforth build coding agents with Prolog/DML? Probably not. However, I fell that there should be a good niche for something like DML, for instance:

Instead of using markdown files to describe a specification, we can write (or ask an LLM to write) an “executable spec” in the form of DML code.
Or use DML to both encode and verify some formal version of a specification before we give the entire thing to an LLM for implementation. I am not Martin Fowler, so I don’t really know what such a formal spec should look like, but it’s certainly an intriguing option.
In the same spirit, another interesting use case along the lines of this paper would be to build specialized DML agents for formal verification inside a coding agent such as Claude or Codex.

If you have liked this article, then please subscribe and stay tuned for more experiments with LLMs, Prolog and WASM.

Implementing a (vibed) LLM Coding Agent in Prolog

Diskussion über diese Post

Sind Sie bereit für mehr?