Ollama
llmock implements Ollama's native /api/chat, /api/generate, and
/api/tags endpoints with NDJSON streaming, matching Ollama's wire format
including its key differences from OpenAI.
Endpoints
| Method | Path | Description |
|---|---|---|
| POST | /api/chat | Chat completions (multi-turn, tool calls) |
| POST | /api/generate | Single-prompt text generation (no tool calls) |
| GET | /api/tags | List available models (derived from fixtures) |
Key Differences from OpenAI
-
Defaults to streaming. Ollama treats
streamastruewhen absent — the opposite of OpenAI. Set"stream": falseexplicitly for non-streaming responses. - NDJSON, not SSE. Streaming uses newline-delimited JSON, not Server-Sent Events.
-
Tool call arguments are objects. Unlike OpenAI which sends stringified
JSON, Ollama sends parsed objects in
arguments. -
No tool call IDs. Ollama tool calls have no
idfield. -
Duration metadata. Responses include
done_reason,total_duration,eval_count, etc. on the final chunk. llmock sends zeroed values.
Quick Start
import { LLMock } from "@copilotkit/llmock";
const mock = new LLMock();
mock.onMessage("hello", { content: "Hi from Ollama!" });
await mock.start();
// Point the Ollama SDK at llmock
const res = await fetch(`${mock.url}/api/chat`, {
method: "POST",
body: JSON.stringify({
model: "llama3",
messages: [{ role: "user", content: "hello" }],
stream: false,
}),
});
Streaming Response Format (NDJSON)
When stream is true (the default), each line is a complete JSON
object separated by newlines:
{"model":"llama3","message":{"role":"assistant","content":"Hi"},"done":false}
{"model":"llama3","message":{"role":"assistant","content":" there"},"done":false}
{"model":"llama3","message":{"role":"assistant","content":""},"done":true,"done_reason":"stop","total_duration":0,"load_duration":0,"prompt_eval_count":0,"prompt_eval_duration":0,"eval_count":0,"eval_duration":0}
Non-Streaming Response
{
"model": "llama3",
"message": { "role": "assistant", "content": "Hi there!" },
"done": true,
"done_reason": "stop",
"total_duration": 0,
"load_duration": 0,
"prompt_eval_count": 0,
"prompt_eval_duration": 0,
"eval_count": 0,
"eval_duration": 0
}
Tool Calls
Tool calls in Ollama send arguments as a parsed object (not a JSON string).
llmock automatically converts fixture arguments strings into objects for the
Ollama wire format.
{
"fixtures": [
{
"match": { "userMessage": "weather" },
"response": {
"toolCalls": [
{ "name": "get_weather", "arguments": "{\"city\":\"NYC\"}" }
]
}
}
]
}
The Ollama streaming response wraps tool calls in a single chunk:
{"model":"llama3","message":{"role":"assistant","content":"","tool_calls":[{"function":{"name":"get_weather","arguments":{"city":"NYC"}}}]},"done":false}
{"model":"llama3","message":{"role":"assistant","content":""},"done":true,"done_reason":"stop","total_duration":0,"load_duration":0,"prompt_eval_count":0,"prompt_eval_duration":0,"eval_count":0,"eval_duration":0}
/api/generate Endpoint
The /api/generate endpoint takes a prompt string instead of a
messages array. The prompt is internally converted to a single user message
for fixture matching. Only text responses are supported (no tool calls).
{
"model": "llama3",
"prompt": "Tell me a joke",
"stream": false
}
{
"model": "llama3",
"created_at": "2025-01-01T00:00:00.000Z",
"response": "Why did the chicken cross the road?",
"done": true,
"done_reason": "stop",
"total_duration": 0,
"load_duration": 0,
"prompt_eval_count": 0,
"prompt_eval_duration": 0,
"eval_count": 0,
"eval_duration": 0,
"context": []
}
/api/tags Endpoint
GET /api/tags returns a list of available models, derived from the
model fields across all loaded fixtures. This lets Ollama clients discover
which models the mock server supports.
Request Translation
llmock internally translates Ollama requests to a unified
ChatCompletionRequest format for fixture matching. The
ollamaToCompletionRequest() function maps Ollama's
options.temperature to temperature and
options.num_predict to max_tokens, so the same fixtures work
across all providers.