Cohere v2 Chat API

The POST /v2/chat endpoint implements the Cohere v2 Chat API with typed SSE streaming events and dual usage tracking (billed_units and tokens).

Endpoint

Method Path Description
POST /v2/chat Cohere v2 Chat (SSE streaming or JSON)

Key Features

Quick Start

cohere-quick-start.ts ts
import { LLMock } from "@copilotkit/llmock";

const mock = new LLMock();
mock.onMessage("hello", { content: "Hi from Cohere!" });
await mock.start();

// Point the Cohere SDK at llmock
const res = await fetch(`${mock.url}/v2/chat`, {
  method: "POST",
  headers: { "Content-Type": "application/json" },
  body: JSON.stringify({
    model: "command-r-plus",
    messages: [{ role: "user", content: "hello" }],
  }),
});

SSE Event Sequence (Text)

When stream: true, Cohere produces these typed events for text responses:

  1. message-start — message metadata (role, empty content/tool arrays)
  2. content-start — content block type declaration
  3. content-delta — text chunks
  4. content-end
  5. message-end — finish_reason (COMPLETE) and usage

SSE Event Sequence (Tool Calls)

For tool call responses, the event sequence is:

  1. message-start
  2. tool-plan-delta — tool planning text
  3. tool-call-start — tool call ID, function name
  4. tool-call-delta — chunked arguments JSON
  5. tool-call-end
  6. message-end — finish_reason (TOOL_CALL) and usage

Non-Streaming Response

/v2/chat non-streaming response json
{
  "id": "msg_abc123",
  "finish_reason": "COMPLETE",
  "message": {
    "role": "assistant",
    "content": [{ "type": "text", "text": "Hi from Cohere!" }],
    "tool_calls": [],
    "tool_plan": "",
    "citations": []
  },
  "usage": {
    "billed_units": {
      "input_tokens": 0,
      "output_tokens": 0,
      "search_units": 0,
      "classifications": 0
    },
    "tokens": { "input_tokens": 0, "output_tokens": 0 }
  }
}

Fixture Examples

cohere-fixtures.json json
{
  "fixtures": [
    {
      "match": { "userMessage": "hello" },
      "response": { "content": "Hi from Cohere!" }
    },
    {
      "match": { "userMessage": "search" },
      "response": {
        "toolCalls": [
          {
            "name": "web_search",
            "arguments": "{\"query\":\"latest news\"}"
          }
        ]
      }
    }
  ]
}

Streaming Event Wire Format

Each SSE event is a typed event: + data: pair:

Cohere SSE wire format text
event: message-start
data: {"id":"msg_abc123","type":"message-start","delta":{"message":{"role":"assistant","content":[],"tool_plan":"","tool_calls":[],"citations":[]}}}

event: content-start
data: {"type":"content-start","index":0,"delta":{"message":{"content":{"type":"text"}}}}

event: content-delta
data: {"type":"content-delta","index":0,"delta":{"message":{"content":{"type":"text","text":"Hi "}}}}

event: content-delta
data: {"type":"content-delta","index":0,"delta":{"message":{"content":{"type":"text","text":"from Cohere!"}}}}

event: content-end
data: {"type":"content-end","index":0}

event: message-end
data: {"type":"message-end","delta":{"finish_reason":"COMPLETE","usage":{"billed_units":{"input_tokens":0,"output_tokens":0,"search_units":0,"classifications":0},"tokens":{"input_tokens":0,"output_tokens":0}}}}

Request Translation

llmock internally translates Cohere requests to a unified ChatCompletionRequest format for fixture matching. The cohereToCompletionRequest() function maps Cohere message roles (including tool with tool_call_id) and tool definitions to the common format.