Cohere v2 Chat API

The POST /v2/chat endpoint implements the Cohere v2 Chat API with typed SSE streaming events and dual usage tracking (billed_units and tokens).

Endpoint

Method	Path	Description
POST	/v2/chat	Cohere v2 Chat (SSE streaming or JSON)

Key Features

Model field required. Unlike OpenAI, Cohere requires the model field — requests without it receive a 400 error.
Typed SSE events. Streaming uses event: + data: pairs with event types like message-start, content-delta, tool-call-start, etc.
Dual usage tracking. Responses include both billed_units (input_tokens, output_tokens, search_units, classifications) and tokens (input_tokens, output_tokens). llmock returns zeroed values.
Defaults to non-streaming. Set "stream": true explicitly to enable SSE streaming.

Quick Start

cohere-quick-start.ts ts

import { LLMock } from "@copilotkit/llmock";

const mock = new LLMock();
mock.onMessage("hello", { content: "Hi from Cohere!" });
await mock.start();

// Point the Cohere SDK at llmock
const res = await fetch(`${mock.url}/v2/chat`, {
  method: "POST",
  headers: { "Content-Type": "application/json" },
  body: JSON.stringify({
    model: "command-r-plus",
    messages: [{ role: "user", content: "hello" }],
  }),
});

SSE Event Sequence (Text)

When stream: true, Cohere produces these typed events for text responses:

message-start — message metadata (role, empty content/tool arrays)
content-start — content block type declaration
content-delta — text chunks
content-end
message-end — finish_reason (COMPLETE) and usage

SSE Event Sequence (Tool Calls)

For tool call responses, the event sequence is:

message-start
tool-plan-delta — tool planning text
tool-call-start — tool call ID, function name
tool-call-delta — chunked arguments JSON
tool-call-end
message-end — finish_reason (TOOL_CALL) and usage

Non-Streaming Response

/v2/chat non-streaming response json

{
  "id": "msg_abc123",
  "finish_reason": "COMPLETE",
  "message": {
    "role": "assistant",
    "content": [{ "type": "text", "text": "Hi from Cohere!" }],
    "tool_calls": [],
    "tool_plan": "",
    "citations": []
  },
  "usage": {
    "billed_units": {
      "input_tokens": 0,
      "output_tokens": 0,
      "search_units": 0,
      "classifications": 0
    },
    "tokens": { "input_tokens": 0, "output_tokens": 0 }
  }
}

Fixture Examples

cohere-fixtures.json json

{
  "fixtures": [
    {
      "match": { "userMessage": "hello" },
      "response": { "content": "Hi from Cohere!" }
    },
    {
      "match": { "userMessage": "search" },
      "response": {
        "toolCalls": [
          {
            "name": "web_search",
            "arguments": "{\"query\":\"latest news\"}"
          }
        ]
      }
    }
  ]
}

Streaming Event Wire Format

Each SSE event is a typed event: + data: pair:

Cohere SSE wire format text

event: message-start
data: {"id":"msg_abc123","type":"message-start","delta":{"message":{"role":"assistant","content":[],"tool_plan":"","tool_calls":[],"citations":[]}}}

event: content-start
data: {"type":"content-start","index":0,"delta":{"message":{"content":{"type":"text"}}}}

event: content-delta
data: {"type":"content-delta","index":0,"delta":{"message":{"content":{"type":"text","text":"Hi "}}}}

event: content-delta
data: {"type":"content-delta","index":0,"delta":{"message":{"content":{"type":"text","text":"from Cohere!"}}}}

event: content-end
data: {"type":"content-end","index":0}

event: message-end
data: {"type":"message-end","delta":{"finish_reason":"COMPLETE","usage":{"billed_units":{"input_tokens":0,"output_tokens":0,"search_units":0,"classifications":0},"tokens":{"input_tokens":0,"output_tokens":0}}}}

Request Translation

llmock internally translates Cohere requests to a unified ChatCompletionRequest format for fixture matching. The cohereToCompletionRequest() function maps Cohere message roles (including tool with tool_call_id) and tool definitions to the common format.