llmock Documentation
llmock is a deterministic mock LLM server for testing. It runs a real HTTP server that any process on the machine can reach, serving fixture-driven responses in the authentic SSE format for OpenAI, Anthropic Claude, and Google Gemini APIs.
Quick Start
# npm
npm install @copilotkit/llmock
# pnpm
pnpm add @copilotkit/llmock
import { LLMock } from "@copilotkit/llmock";
import { describe, it, expect, beforeAll, afterAll } from "vitest";
let mock: LLMock;
beforeAll(async () => {
mock = new LLMock();
await mock.start();
});
afterAll(async () => {
await mock.stop();
});
it("returns a text response", async () => {
mock.on({ userMessage: "hello" }, { content: "Hi there!" });
const res = await fetch(`${mock.url}/v1/chat/completions`, {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({
model: "gpt-4",
messages: [{ role: "user", content: "hello" }],
stream: false,
}),
});
const body = await res.json();
expect(body.choices[0].message.content).toBe("Hi there!");
});
# Start the server with fixture files
npx llmock --fixtures ./fixtures --port 5555
# Point your app at it
export OPENAI_BASE_URL=http://localhost:5555/v1
export OPENAI_API_KEY=mock-key
Supported Endpoints
| Endpoint | Provider | Transport |
|---|---|---|
| POST /v1/chat/completions | OpenAI | HTTP SSE / JSON |
| POST /v1/responses | OpenAI | HTTP SSE |
| WS /v1/responses | OpenAI | WebSocket |
| WS /v1/realtime | OpenAI | WebSocket |
| POST /v1/messages | Anthropic | HTTP SSE / JSON |
| POST /v1beta/models/:model:* | Google Gemini | HTTP SSE / JSON |
| WS /ws/google.ai.generativelanguage.* | Google Gemini Live | WebSocket |
| POST /v1/embeddings | OpenAI | JSON |
Feature Pages
Chat Completions
Streaming and non-streaming text + tool call responses via SSE.
OpenAIResponses API
HTTP SSE and WebSocket transports for the Responses API.
AnthropicClaude Messages
Anthropic-format SSE streaming with content blocks.
GoogleGemini
GenerateContent and StreamGenerateContent endpoints.
NewEmbeddings
OpenAI-compatible /v1/embeddings endpoint with fixture or auto-generated vectors.
NewStructured Output
JSON mode and response_format matching for structured responses.
NewSequential Responses
Stateful fixtures that return different responses on each call.
CoreFixtures
JSON fixture file format, matching rules, and validation.
CoreError Injection
One-shot errors, stream truncation, and disconnect simulation.
CoreWebSocket APIs
Realtime, Responses, and Gemini Live over WebSocket.
NewDocker & Helm
Container image and Kubernetes Helm chart deployment.
CIDrift Detection
Three-way conformance testing against real APIs.
API Reference
LLMock class
| Method | Description |
|---|---|
| new LLMock(opts?) |
Create instance. Options: port, host,
latency, chunkSize, logLevel
|
| start() | Start the HTTP server. Returns the base URL. |
| stop() | Stop the server. |
| on(match, response, opts?) | Add a fixture with match criteria and response. |
| onMessage(pattern, response) | Shorthand: match on userMessage. |
| onToolCall(name, response) | Shorthand: match on toolName. |
| onEmbedding(pattern, response) | Shorthand: match on inputText (embeddings). |
| onJsonOutput(pattern, json) | Shorthand: match userMessage + responseFormat=json_object. |
| onToolResult(id, response) | Shorthand: match on toolCallId. |
| nextRequestError(status, body?) | Queue a one-shot error for the next request. |
| addFixture(fixture) | Add a raw Fixture object. |
| loadFixtureFile(path) | Load fixtures from a JSON file. |
| loadFixtureDir(path) | Load all fixture JSON files from a directory. |
| reset() | Clear all fixtures and journal entries. |
| getRequests() | Get all journal entries. |
| getLastRequest() | Get the most recent journal entry. |
| .url / .port | Access the server URL and port. |