llmock

Why llmock

Stop paying for flaky tests

Tests that hit real LLM APIs — OpenAI, Gemini, Anthropic — cost money, time out, and produce non-deterministic results. llmock replaces those calls with immediate, deterministic responses from a real HTTP server any process on the machine can reach.

⚡

Real HTTP Server

Runs on an actual port. Any process on the machine can reach it — Next.js, Mastra, LangGraph, Agno, anything that speaks HTTP.

📡

Authentic SSE Streams

OpenAI, Claude, and Gemini APIs — authentic SSE format for each provider. Streaming and non-streaming modes.

📁

JSON Fixture Files

Define responses as JSON — one file per feature. Load a directory, load a file, or register fixtures programmatically.

🔧

Tool Call Support

Return tool calls with structured arguments. Match on tool names, tool result IDs, or write custom predicates.

💥

Error Injection

Queue one-shot errors — 429 rate limits, 503 outages, whatever. Fires once, then auto-removes itself.

📋

Request Journal

Every request recorded. Inspect messages, verify tool calls, assert on conversation history. HTTP and programmatic access.

🔌

WebSocket APIs

OpenAI Responses, OpenAI Realtime, and Gemini Live over WebSocket. Same fixtures, real RFC 6455 framing, zero dependencies. Text + tool calls.

🎛️

Streaming Physics

Simulate realistic streaming timing with TTFT, TPS, and jitter. Test loading states and streaming UX under real-world conditions.

Usage

Fixture-driven. Zero boilerplate.

Simple text responses

Match on the last user message — substring or regex. The fixture fires when it matches, streaming SSE chunks just like the real API.

First-match-wins routing
Substring and RegExp matching
Configurable chunk size and latency

              fixtures/chat.json
              json
            

{
  "fixtures": [
    {
      "match": { "userMessage": "stock price of AAPL" },
      "response": {
        "content": "The current stock price of Apple Inc. (AAPL) is $150.25."
      }
    },
    {
      "match": { "userMessage": "capital of France" },
      "response": {
        "content": "The capital of France is Paris."
      }
    }
  ]
}

              fixtures/tools.json
              json
            

{
  "fixtures": [
    {
      "match": { "userMessage": "one step with eggs" },
      "response": {
        "toolCalls": [{
          "name": "generate_task_steps",
          "arguments": "{\"steps\":[{\"description\":\"Crack eggs\"},{\"description\":\"Preheat oven\"}]}"
        }]
      }
    },
    {
      "match": { "userMessage": "background color to blue" },
      "response": {
        "toolCalls": [{
          "name": "change_background",
          "arguments": "{\"background\":\"blue\"}"
        }]
      }
    }
  ]
}

Tool call responses

Return structured tool calls that agent frameworks execute directly. Used in production E2E tests for CopilotKit, Mastra, and LangGraph integrations.

Tool calls with JSON arguments
Match on tool name or tool result ID
Multi-tool-call responses

Predicate-based routing

When substring matching isn't enough, use predicates. Inspect the full request — system prompt flags, message history, model name, anything.

Inspect system prompt state flags
Route supervisor agents by conversation state
Combine with substring matching (AND logic)

              e2e/mock-setup.ts
              ts
            

// Supervisor sees the same user message every time,
// but system prompt contains state flags
mock.addFixture({
  match: {
    predicate: (req) => {
      const sys = req.messages
        .find(m => m.role === "system");
      return sys?.content
        ?.includes("Flights found: false");
    }
  },
  response: {
    toolCalls: [{
      name: "supervisor_response",
      arguments: '{"next_agent":"flights_agent"}'
    }]
  }
});

              e2e/global-setup.ts
              ts
            

import { LLMock } from "@copilotkit/llmock";

const mock = new LLMock({ port: 5555 });

// Load JSON fixture files
mock.loadFixtureDir("./fixtures/openai");

// Catch-all for tool results
mock.addFixture({
  match: {
    predicate: (req) =>
      req.messages.at(-1)?.role === "tool"
  },
  response: { content: "Done!" }
});

const url = await mock.start();

// Every process on the machine can reach this
process.env.OPENAI_BASE_URL = `${url}/v1`;
process.env.OPENAI_API_KEY = "mock-key";

E2E global setup

Start the mock server once in Playwright's global setup. All child processes — Next.js, agent workers, CopilotKit runtime — inherit OPENAI_BASE_URL and hit the same server.

One server, many processes
JSON fixtures loaded from disk
Programmatic catch-alls for tool results
Universal fallback prevents 404 crashes

WebSocket APIs

Same fixtures work over WebSocket transport. OpenAI Responses, OpenAI Realtime, and Gemini Live — RFC 6455 framing with zero dependencies.

OpenAI Responses API over WebSocket
OpenAI Realtime API — text + tool calls
Gemini Live BidiGenerateContent (unverified — no text-capable model exists yet)
No audio/video — text and tool call paths only

              OpenAI Realtime over WebSocket
              jsonc
            

// Connect to ws://localhost:5555/v1/realtime

// → Configure session:
{ "type": "session.update",
  "session": { "modalities": ["text"] } }

// → Add user message:
{ "type": "conversation.item.create",
  "item": { "type": "message",
    "role": "user",
    "content": [{ "type": "input_text",
      "text": "Hello" }] } }

// → Request response:
{ "type": "response.create" }

// ← Server streams back:
// {"type":"response.created", ...}
// {"type":"response.text.delta","delta":"Hi"}
// {"type":"response.text.delta","delta":" there!"}
// {"type":"response.text.done", ...}
// {"type":"response.done", ...}

Comparison

How llmock compares

llmock is purpose-built for LLM API testing. Here's how it stacks up against general-purpose and LLM-specific mocking tools.

// MSW: only intercepts in the process that calls server.listen()
// llmock: real server on a real port — any process can reach it

Playwright test runner
  └─ controls browser → Next.js app (separate process)
                                    └─ OPENAI_BASE_URL → llmock :5555
                                        ├─ Mastra agent workers
                                        ├─ LangGraph workers
                                        └─ CopilotKit runtime

Capability	llmock	MSW	VidaiMock	mock-llm	piyook/llm-mock
Cross-process interception	Real server ✓	In-process only	Yes	Yes (Docker)	Yes
Chat Completions SSE	Built-in ✓	Manual	Yes	Yes	No
Responses API SSE	Built-in ✓	Manual	No	No	No
Claude Messages API	Built-in ✓	Manual	Yes	No	No
Gemini streaming	Built-in ✓	Manual	No	No	No
WebSocket APIs	Built-in ✓	No	No	No	No
Multi-provider support	OpenAI + Claude + Gemini + compatible ✓	Manual	OpenAI + Claude + Gemini + Bedrock	OpenAI only	OpenAI only
Embeddings API	Built-in ✓	No	Yes	No	Yes
Structured output / JSON mode	Built-in ✓	Manual	No	No	No
Sequential / stateful responses	Built-in ✓	Manual	No	No	No
Fixture files	JSON ✓	Code-only	Python config	YAML config	JSON templates
Programmatic API (test helpers)	Yes (TypeScript/JS) ✓	Yes (TypeScript/JS)	Yes (Python)	No	No
Request journal	Yes ✓	Manual	No	No	No
Error injection (one-shot)	Yes ✓	Yes	Partial	No	No
Docker image	Yes ✓	No	No	Yes	No
Helm chart	Yes ✓	No	No	No	No
Drift detection	Yes ✓	No	No	No	No
Azure OpenAI	Yes ✓	Manual	Yes	No	No
AWS Bedrock	Yes (non-streaming) ✓	Manual	Yes	No	No
CLI server	Yes ✓	No	No	Yes	Yes
GET /v1/models	Yes ✓	No	No	Yes	No
Dependencies	Zero	~300KB	Python + deps	Docker required	Minimal

Reliability

Verified against real APIs. Every day.

A mock that doesn't match reality is worse than no mock — your tests pass, but production breaks. llmock runs three-way drift detection that compares SDK types, real API responses, and mock output to catch shape mismatches before you do.

{ }

SDK Types

What TypeScript types say the shape should be

⇄

Real API

What OpenAI, Claude, Gemini actually return

⚙

What the mock produces for the same request

Mock doesn't match real

llmock needs updating — test fails immediately. The SDK comparison tells us why it drifted.

Provider changed, SDK is behind

Early warning — the real API has new fields that neither the SDK nor llmock know about yet.

All three agree

No drift — the mock matches reality and the SDK types are current.

$ pnpm test:drift

[critical] LLMOCK DRIFT — field in SDK + real API but missing from mock
Path: choices[].message.refusal
SDK: null Real: null Mock: <absent>

[critical] TYPE MISMATCH — real API and mock disagree on type
Path: content[].input
SDK: object Real: object Mock: string

[warning] PROVIDER ADDED FIELD — in real API but not in SDK or mock
Path: choices[].message.annotations
SDK: <absent> Real: array Mock: <absent>

✓ 2 critical (test fails) · 1 warning (logged) · detected before any user reported it

AI-Assisted Development

Claude Code Integration

llmock ships with a Claude Code skill that teaches your AI assistant how to write fixtures correctly — match fields, response types, agent loop patterns, gotchas, and debugging techniques.

🔌

Plugin Install

/plugin marketplace add CopilotKit/llmock
/plugin install llmock@copilotkit-tools

Skill appears as /llmock:write-fixtures

📂

Local Plugin

claude --plugin-dir ./node_modules/@copilotkit/llmock

Same result, no marketplace needed

📁

Add Directory

claude --add-dir ./node_modules/@copilotkit/llmock

Skill appears as /write-fixtures for the session

📋

Copy to Project

cp node_modules/@copilotkit/llmock/.claude/commands/write-fixtures.md .claude/commands/

Permanent /write-fixtures — commit to share with team

Stop paying for flaky tests

Real HTTP Server

Authentic SSE Streams

JSON Fixture Files

Tool Call Support

Error Injection

Request Journal

WebSocket APIs

Streaming Physics

Fixture-driven. Zero boilerplate.

Simple text responses

Tool call responses

Predicate-based routing

E2E global setup

WebSocket APIs

How llmock compares

Verified against real APIs. Every day.

SDK Types

Real API

llmock

Mock doesn't match real

Provider changed, SDK is behind

All three agree

Claude Code Integration

Plugin Install

Local Plugin

Add Directory

Copy to Project

Real-World Usage