Streaming Physics

Simulate realistic LLM streaming timing with configurable time-to-first-token (TTFT), tokens-per-second (TPS), and random jitter. Perfect for testing loading states, progress indicators, and streaming UX under realistic conditions.

StreamingProfile

The streamingProfile option can be set on any fixture to control the timing of streamed chunks.

Property	Type	Description
`ttft`	`number`	Time to first token in milliseconds. Delay before the first chunk is sent.
`tps`	`number`	Tokens per second. Each chunk after the first is delayed by `1000 / tps` ms.
`jitter`	`number`	Random variance factor (0–1). Each delay is multiplied by `1 + random(-1,1) * jitter`. Default 0 (no variance).

Programmatic Usage

streaming-physics.test.ts ts

const mock = new LLMock();
await mock.start();

// Simulate GPT-4 streaming timing
mock.on(
  { userMessage: "hello" },
  { content: "Hello! How can I help you today?" },
  {
    streamingProfile: {
      ttft: 800,    // 800ms before first token
      tps: 50,      // 50 tokens/sec after that
      jitter: 0.2,  // +/-20% variance on each delay
    },
  },
);

JSON Fixture File

fixtures/slow-model.json json

{
  "fixtures": [
    {
      "match": { "userMessage": "think carefully" },
      "response": { "content": "Let me think about this..." },
      "streamingProfile": {
        "ttft": 2000,
        "tps": 30,
        "jitter": 0.1
      }
    }
  ]
}

Interaction with `latency`

When streamingProfile is set, it takes priority over the latency field.
If streamingProfile is not set, the existing latency behavior applies (flat delay per chunk).
If streamingProfile is set but has neither ttft nor tps, it falls back to latency.

Realistic Profiles

Here are some example profiles that approximate real-world LLM behavior:

profiles.ts ts

// Fast model (GPT-4o-mini, Claude 3 Haiku)
{ ttft: 200, tps: 100, jitter: 0.15 }

// Standard model (GPT-4o, Claude 3.5 Sonnet)
{ ttft: 500, tps: 60, jitter: 0.2 }

// Reasoning model (o1, o3, Claude with extended thinking)
{ ttft: 5000, tps: 80, jitter: 0.1 }

// Slow/overloaded (rate-limited or cold start)
{ ttft: 3000, tps: 15, jitter: 0.4 }

Streaming physics applies to all provider APIs — OpenAI Chat Completions, Responses API, Claude Messages, and Gemini. The same streamingProfile field works across all of them.