llmock Documentation

llmock is a deterministic mock LLM server for testing. It runs a real HTTP server that any process on the machine can reach, serving fixture-driven responses in the authentic SSE format for OpenAI, Anthropic Claude, and Google Gemini APIs.

Quick Start

            Install
            shell
          

# npm
npm install @copilotkit/llmock

# pnpm
pnpm add @copilotkit/llmock

            Programmatic usage (vitest)
            ts
          

import { LLMock } from "@copilotkit/llmock";
import { describe, it, expect, beforeAll, afterAll } from "vitest";

let mock: LLMock;

beforeAll(async () => {
  mock = new LLMock();
  await mock.start();
});

afterAll(async () => {
  await mock.stop();
});

it("returns a text response", async () => {
  mock.on({ userMessage: "hello" }, { content: "Hi there!" });

  const res = await fetch(`${mock.url}/v1/chat/completions`, {
    method: "POST",
    headers: { "Content-Type": "application/json" },
    body: JSON.stringify({
      model: "gpt-4",
      messages: [{ role: "user", content: "hello" }],
      stream: false,
    }),
  });
  const body = await res.json();
  expect(body.choices[0].message.content).toBe("Hi there!");
});

            CLI usage
            shell
          

# Start the server with fixture files
npx llmock --fixtures ./fixtures --port 5555

# Point your app at it
export OPENAI_BASE_URL=http://localhost:5555/v1
export OPENAI_API_KEY=mock-key

Supported Endpoints

Endpoint	Provider	Transport
POST /v1/chat/completions	OpenAI	HTTP SSE / JSON
POST /v1/responses	OpenAI	HTTP SSE
WS /v1/responses	OpenAI	WebSocket
WS /v1/realtime	OpenAI	WebSocket
POST /v1/messages	Anthropic	HTTP SSE / JSON
POST /v1beta/models/:model:*	Google Gemini	HTTP SSE / JSON
WS /ws/google.ai.generativelanguage.*	Google Gemini Live	WebSocket
POST /v1/embeddings	OpenAI	JSON

Feature Pages

OpenAI

Chat Completions

Streaming and non-streaming text + tool call responses via SSE.

OpenAI

Responses API

HTTP SSE and WebSocket transports for the Responses API.

Anthropic

Claude Messages

Anthropic-format SSE streaming with content blocks.

Google

Gemini

GenerateContent and StreamGenerateContent endpoints.

New

Embeddings

OpenAI-compatible /v1/embeddings endpoint with fixture or auto-generated vectors.

New

Structured Output

JSON mode and response_format matching for structured responses.

New

Sequential Responses

Stateful fixtures that return different responses on each call.

Core

Fixtures

JSON fixture file format, matching rules, and validation.

Core

Error Injection

One-shot errors, stream truncation, and disconnect simulation.

Core

WebSocket APIs

Realtime, Responses, and Gemini Live over WebSocket.

New

Docker & Helm

Container image and Kubernetes Helm chart deployment.

Drift Detection

Three-way conformance testing against real APIs.

API Reference

LLMock class

Method	Description
new LLMock(opts?)	Create instance. Options: `port`, `host`, `latency`, `chunkSize`, `logLevel`
start()	Start the HTTP server. Returns the base URL.
stop()	Stop the server.
on(match, response, opts?)	Add a fixture with match criteria and response.
onMessage(pattern, response)	Shorthand: match on userMessage.
onToolCall(name, response)	Shorthand: match on toolName.
onEmbedding(pattern, response)	Shorthand: match on inputText (embeddings).
onJsonOutput(pattern, json)	Shorthand: match userMessage + responseFormat=json_object.
onToolResult(id, response)	Shorthand: match on toolCallId.
nextRequestError(status, body?)	Queue a one-shot error for the next request.
addFixture(fixture)	Add a raw Fixture object.
loadFixtureFile(path)	Load fixtures from a JSON file.
loadFixtureDir(path)	Load all fixture JSON files from a directory.
reset()	Clear all fixtures and journal entries.
getRequests()	Get all journal entries.
getLastRequest()	Get the most recent journal entry.
.url / .port	Access the server URL and port.