llmock — Deterministic mock LLM server for testing

Why llmock

Everything you need to test AI integrations

Built for E2E test suites where multiple processes — your app, agent workers, framework runtimes — all need to hit the same mock endpoint.

⚡

Real HTTP Server

Runs on an actual port. Any process on the machine can reach it — Next.js, Mastra, LangGraph, Agno, anything that speaks HTTP.

📡

Authentic SSE Streams

OpenAI, Claude, and Gemini APIs — authentic SSE format for each provider. Streaming and non-streaming modes.

📁

JSON Fixture Files

Define responses as JSON — one file per feature. Load a directory, load a file, or register fixtures programmatically.

🔧

Tool Call Support

Return tool calls with structured arguments. Match on tool names, tool result IDs, or write custom predicates.

💥

Error Injection

Queue one-shot errors — 429 rate limits, 503 outages, whatever. Fires once, then auto-removes itself.

📋

Request Journal

Every request recorded. Inspect messages, verify tool calls, assert on conversation history. HTTP and programmatic access.

Usage

Fixture-driven. Zero boilerplate.

Simple text responses

Match on the last user message — substring or regex. The fixture fires when it matches, streaming SSE chunks just like the real API.

First-match-wins routing
Substring and RegExp matching
Configurable chunk size and latency

              fixtures/chat.json
              json
            

{
  "fixtures": [
    {
      "match": { "userMessage": "stock price of AAPL" },
      "response": {
        "content": "The current stock price of Apple Inc. (AAPL) is $150.25."
      }
    },
    {
      "match": { "userMessage": "capital of France" },
      "response": {
        "content": "The capital of France is Paris."
      }
    }
  ]
}

              fixtures/tools.json
              json
            

{
  "fixtures": [
    {
      "match": { "userMessage": "one step with eggs" },
      "response": {
        "toolCalls": [{
          "name": "generate_task_steps",
          "arguments": "{\"steps\":[{\"description\":\"Crack eggs\"},{\"description\":\"Preheat oven\"}]}"
        }]
      }
    },
    {
      "match": { "userMessage": "background color to blue" },
      "response": {
        "toolCalls": [{
          "name": "change_background",
          "arguments": "{\"background\":\"blue\"}"
        }]
      }
    }
  ]
}

Tool call responses

Return structured tool calls that agent frameworks execute directly. Used in production E2E tests for CopilotKit, Mastra, and LangGraph integrations.

Tool calls with JSON arguments
Match on tool name or tool result ID
Multi-tool-call responses

Predicate-based routing

When substring matching isn't enough, use predicates. Inspect the full request — system prompt flags, message history, model name, anything.

Inspect system prompt state flags
Route supervisor agents by conversation state
Combine with substring matching (AND logic)

              e2e/mock-setup.ts
              ts
            

// Supervisor sees the same user message every time,
// but system prompt contains state flags
mock.addFixture({
  match: {
    predicate: (req) => {
      const sys = req.messages
        .find(m => m.role === "system");
      return sys?.content
        ?.includes("Flights found: false");
    }
  },
  response: {
    toolCalls: [{
      name: "supervisor_response",
      arguments: '{"next_agent":"flights_agent"}'
    }]
  }
});

              e2e/global-setup.ts
              ts
            

import { LLMock } from "@copilotkit/llmock";

const mock = new LLMock({ port: 5555 });

// Load JSON fixture files
mock.loadFixtureDir("./fixtures/openai");

// Catch-all for tool results
mock.addFixture({
  match: {
    predicate: (req) =>
      req.messages.at(-1)?.role === "tool"
  },
  response: { content: "Done!" }
});

const url = await mock.start();

// Every process on the machine can reach this
process.env.OPENAI_BASE_URL = `${url}/v1`;
process.env.OPENAI_API_KEY = "mock-key";

E2E global setup

Start the mock server once in Playwright's global setup. All child processes — Next.js, agent workers, CopilotKit runtime — inherit OPENAI_BASE_URL and hit the same server.

One server, many processes
JSON fixtures loaded from disk
Programmatic catch-alls for tool results
Universal fallback prevents 404 crashes

Comparison

llmock vs MSW

MSW is great for in-process API mocking. llmock is for when multiple processes need to hit the same LLM endpoint.

// MSW: only intercepts in the process that calls server.listen()
// llmock: real server on a real port — any process can reach it

Playwright test runner
  └─ controls browser → Next.js app (separate process)
                                    └─ OPENAI_BASE_URL → llmock :5555
                                        ├─ Mastra agent workers
                                        ├─ LangGraph workers
                                        └─ CopilotKit runtime

Capability	llmock	MSW
Cross-process interception	Real server ✓	In-process only
Chat Completions SSE	Built-in ✓	Manual — build data/[DONE] yourself
Responses API SSE	Built-in ✓	Manual — MSW sse() uses wrong format
Claude Messages API SSE	Built-in ✓	Manual — build event/data SSE yourself
Gemini streaming	Built-in ✓	Manual — build data SSE yourself
Multi-provider support	OpenAI + Claude + Gemini ✓	Provider-agnostic (manual)
Fixture files (JSON)	Yes ✓	No — handlers are code-only
Request journal	Yes ✓	No — track manually
Non-streaming responses	Yes ✓	Yes ✓
Error injection (one-shot)	Yes ✓	Yes (server.use)
CLI server	Yes ✓	No
Dependencies	Zero	~300KB