Zero dependencies · Node.js builtins only

Deterministic mock LLM server for testing

Real HTTP server. Real SSE streams. Fixture-driven responses. Multi-provider mock — OpenAI, Claude, Gemini — any process on the machine can reach it.

$ npm install @copilotkit/llmock
fixtures/chat.json
{
  "fixtures": [
    {
      "match": {
        "userMessage": "capital of France"
      },
      "response": {
        "content": "The capital of France is Paris."
      }
    }
  ]
}
Terminal

Everything you need to test AI integrations

Built for E2E test suites where multiple processes — your app, agent workers, framework runtimes — all need to hit the same mock endpoint.

Real HTTP Server

Runs on an actual port. Any process on the machine can reach it — Next.js, Mastra, LangGraph, Agno, anything that speaks HTTP.

📡

Authentic SSE Streams

OpenAI, Claude, and Gemini APIs — authentic SSE format for each provider. Streaming and non-streaming modes.

📁

JSON Fixture Files

Define responses as JSON — one file per feature. Load a directory, load a file, or register fixtures programmatically.

🔧

Tool Call Support

Return tool calls with structured arguments. Match on tool names, tool result IDs, or write custom predicates.

💥

Error Injection

Queue one-shot errors — 429 rate limits, 503 outages, whatever. Fires once, then auto-removes itself.

📋

Request Journal

Every request recorded. Inspect messages, verify tool calls, assert on conversation history. HTTP and programmatic access.

Fixture-driven. Zero boilerplate.

Simple text responses

Match on the last user message — substring or regex. The fixture fires when it matches, streaming SSE chunks just like the real API.

  • First-match-wins routing
  • Substring and RegExp matching
  • Configurable chunk size and latency
fixtures/chat.json json
{
  "fixtures": [
    {
      "match": { "userMessage": "stock price of AAPL" },
      "response": {
        "content": "The current stock price of Apple Inc. (AAPL) is $150.25."
      }
    },
    {
      "match": { "userMessage": "capital of France" },
      "response": {
        "content": "The capital of France is Paris."
      }
    }
  ]
}
fixtures/tools.json json
{
  "fixtures": [
    {
      "match": { "userMessage": "one step with eggs" },
      "response": {
        "toolCalls": [{
          "name": "generate_task_steps",
          "arguments": "{\"steps\":[{\"description\":\"Crack eggs\"},{\"description\":\"Preheat oven\"}]}"
        }]
      }
    },
    {
      "match": { "userMessage": "background color to blue" },
      "response": {
        "toolCalls": [{
          "name": "change_background",
          "arguments": "{\"background\":\"blue\"}"
        }]
      }
    }
  ]
}

Tool call responses

Return structured tool calls that agent frameworks execute directly. Used in production E2E tests for CopilotKit, Mastra, and LangGraph integrations.

  • Tool calls with JSON arguments
  • Match on tool name or tool result ID
  • Multi-tool-call responses

Predicate-based routing

When substring matching isn't enough, use predicates. Inspect the full request — system prompt flags, message history, model name, anything.

  • Inspect system prompt state flags
  • Route supervisor agents by conversation state
  • Combine with substring matching (AND logic)
e2e/mock-setup.ts ts
// Supervisor sees the same user message every time,
// but system prompt contains state flags
mock.addFixture({
  match: {
    predicate: (req) => {
      const sys = req.messages
        .find(m => m.role === "system");
      return sys?.content
        ?.includes("Flights found: false");
    }
  },
  response: {
    toolCalls: [{
      name: "supervisor_response",
      arguments: '{"next_agent":"flights_agent"}'
    }]
  }
});
e2e/global-setup.ts ts
import { LLMock } from "@copilotkit/llmock";

const mock = new LLMock({ port: 5555 });

// Load JSON fixture files
mock.loadFixtureDir("./fixtures/openai");

// Catch-all for tool results
mock.addFixture({
  match: {
    predicate: (req) =>
      req.messages.at(-1)?.role === "tool"
  },
  response: { content: "Done!" }
});

const url = await mock.start();

// Every process on the machine can reach this
process.env.OPENAI_BASE_URL = `${url}/v1`;
process.env.OPENAI_API_KEY = "mock-key";

E2E global setup

Start the mock server once in Playwright's global setup. All child processes — Next.js, agent workers, CopilotKit runtime — inherit OPENAI_BASE_URL and hit the same server.

  • One server, many processes
  • JSON fixtures loaded from disk
  • Programmatic catch-alls for tool results
  • Universal fallback prevents 404 crashes

llmock vs MSW

MSW is great for in-process API mocking. llmock is for when multiple processes need to hit the same LLM endpoint.

// MSW: only intercepts in the process that calls server.listen()
// llmock: real server on a real port — any process can reach it

Playwright test runner
  └─ controls browser Next.js app (separate process)
                                    └─ OPENAI_BASE_URL llmock :5555
                                        ├─ Mastra agent workers
                                        ├─ LangGraph workers
                                        └─ CopilotKit runtime
Capability llmock MSW
Cross-process interception Real server ✓ In-process only
Chat Completions SSE Built-in ✓ Manual — build data/[DONE] yourself
Responses API SSE Built-in ✓ Manual — MSW sse() uses wrong format
Claude Messages API SSE Built-in ✓ Manual — build event/data SSE yourself
Gemini streaming Built-in ✓ Manual — build data SSE yourself
Multi-provider support OpenAI + Claude + Gemini ✓ Provider-agnostic (manual)
Fixture files (JSON) Yes ✓ No — handlers are code-only
Request journal Yes ✓ No — track manually
Non-streaming responses Yes ✓ Yes ✓
Error injection (one-shot) Yes ✓ Yes (server.use)
CLI server Yes ✓ No
Dependencies Zero ~300KB