Cohere v2 Chat API
The POST /v2/chat endpoint implements the Cohere v2 Chat API with typed SSE
streaming events and dual usage tracking (billed_units and
tokens).
Endpoint
| Method | Path | Description |
|---|---|---|
| POST | /v2/chat | Cohere v2 Chat (SSE streaming or JSON) |
Key Features
-
Model field required. Unlike OpenAI, Cohere requires the
modelfield — requests without it receive a 400 error. -
Typed SSE events. Streaming uses
event:+data:pairs with event types likemessage-start,content-delta,tool-call-start, etc. -
Dual usage tracking. Responses include both
billed_units(input_tokens, output_tokens, search_units, classifications) andtokens(input_tokens, output_tokens). llmock returns zeroed values. -
Defaults to non-streaming. Set
"stream": trueexplicitly to enable SSE streaming.
Quick Start
cohere-quick-start.ts ts
import { LLMock } from "@copilotkit/llmock";
const mock = new LLMock();
mock.onMessage("hello", { content: "Hi from Cohere!" });
await mock.start();
// Point the Cohere SDK at llmock
const res = await fetch(`${mock.url}/v2/chat`, {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({
model: "command-r-plus",
messages: [{ role: "user", content: "hello" }],
}),
});
SSE Event Sequence (Text)
When stream: true, Cohere produces these typed events for text responses:
-
message-start— message metadata (role, empty content/tool arrays) content-start— content block type declarationcontent-delta— text chunkscontent-endmessage-end— finish_reason (COMPLETE) and usage
SSE Event Sequence (Tool Calls)
For tool call responses, the event sequence is:
message-starttool-plan-delta— tool planning texttool-call-start— tool call ID, function nametool-call-delta— chunked arguments JSONtool-call-endmessage-end— finish_reason (TOOL_CALL) and usage
Non-Streaming Response
/v2/chat non-streaming response json
{
"id": "msg_abc123",
"finish_reason": "COMPLETE",
"message": {
"role": "assistant",
"content": [{ "type": "text", "text": "Hi from Cohere!" }],
"tool_calls": [],
"tool_plan": "",
"citations": []
},
"usage": {
"billed_units": {
"input_tokens": 0,
"output_tokens": 0,
"search_units": 0,
"classifications": 0
},
"tokens": { "input_tokens": 0, "output_tokens": 0 }
}
}
Fixture Examples
cohere-fixtures.json json
{
"fixtures": [
{
"match": { "userMessage": "hello" },
"response": { "content": "Hi from Cohere!" }
},
{
"match": { "userMessage": "search" },
"response": {
"toolCalls": [
{
"name": "web_search",
"arguments": "{\"query\":\"latest news\"}"
}
]
}
}
]
}
Streaming Event Wire Format
Each SSE event is a typed event: + data: pair:
Cohere SSE wire format text
event: message-start
data: {"id":"msg_abc123","type":"message-start","delta":{"message":{"role":"assistant","content":[],"tool_plan":"","tool_calls":[],"citations":[]}}}
event: content-start
data: {"type":"content-start","index":0,"delta":{"message":{"content":{"type":"text"}}}}
event: content-delta
data: {"type":"content-delta","index":0,"delta":{"message":{"content":{"type":"text","text":"Hi "}}}}
event: content-delta
data: {"type":"content-delta","index":0,"delta":{"message":{"content":{"type":"text","text":"from Cohere!"}}}}
event: content-end
data: {"type":"content-end","index":0}
event: message-end
data: {"type":"message-end","delta":{"finish_reason":"COMPLETE","usage":{"billed_units":{"input_tokens":0,"output_tokens":0,"search_units":0,"classifications":0},"tokens":{"input_tokens":0,"output_tokens":0}}}}
Request Translation
llmock internally translates Cohere requests to a unified
ChatCompletionRequest format for fixture matching. The
cohereToCompletionRequest() function maps Cohere message roles (including
tool with tool_call_id) and tool definitions to the common
format.