llama : add gpt-oss by ggerganov · Pull Request #15091 · ggml-org/llama.cpp

3 min read Original article ↗

@nachoal it appears there's a lot of new stuff here. At least to me-- but I have not used openai's API with openai before, only local models.

Like, there are two kinds of system prompts-- a "system message" and a "developer message". Also there are two types of tools-- "builtin_tools" (python or browser tools) referenced in the system message and function tools (described in the developer message). There is a special format for describing the function tools but I'm guessing MCP would work too.

The function tools are called in a separate "commentary" channel from normal reply content (and distinct from the "reasoning_content") per the harmony response format.

So different types of output appear in different places in the chat completion. As an example, instead of parsing <think></think> tags directly in the response as with some other models, you would find the reasoning content (in python) w:

reasoning_text = response.choices[0].message.reasoning_content

where response is a ChatCompletion object that came from the usual OpenAI API call:

response = client.chat.completions.create( ... )

It looks like right now in llama.cpp by default when an assistant tries to use a tool, the content string is left empty and it's the reasoning_content actually that contains the call stuff at the end of the reasoning text in the following format (this is just a reference MCP fetch tool call):

<|start|>assistant<|channel|>commentary to=fetch json<|message|>{\"url\":\"https://www.github.com\",\"max_length\":5000}

The expected format is is supposed to come after the reasoning like this:

<|start|>assistant<|channel|>commentary to=functions.get_weather <|constrain|>json<|message|>{"location":"San Francisco"}<|call|>

So the output looks very close but not exactly right from what I am seeing. It's missing the <|call|>

I'm sure that in the near future a tool call will be fully extracted by llama.cpp and put in response.choices[0].message.tool_calls or response.choices[0].message.function_call or wherever it's supposed to go, but as of right now it isn't recognizing the commentary channel at all.

The Harmony dox also discusses "Output by the model which can either be a tool call or a message output. " -- so apparently you can get a message OR a tool call, but apparently not both, which is why the content is blank when it tries to use a tool.

The hacky temporary workaround to this bug to maintain compatibility with other models would be to come up with a regex expression you could use to pull the json toolname and arguments/{output} from the reasoning_content text and substitute the resulting json as the reply text.

There's a note in this PR that the tool template stuff is a WIP and tool use is still to come, so I guess it may make the most sense to just wait for this to get fixed unless you're really itching to get tools working.

Anyone that knows more please correct me as I'm just figuring this out myself!