API reference

Chat completions

Generate a model response for a conversation. POST a list of messages and the model returns the next assistant message. This is the primary endpoint and follows the OpenAI chat-completions schema, including streaming, tool calls, and structured outputs.

Create a chat completion #

POST https://api.merius.ai/chat/completions

Request

curl https://api.merius.ai/v1/chat/completions \
  -H "Authorization: Bearer $MERIUS_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen/qwen3-30b-a3b",
    "messages": [
      {"role": "system", "content": "You are concise."},
      {"role": "user", "content": "Name three primary colors."}
    ],
    "temperature": 0.7
  }'

Request parameters #

The request body is JSON. model and messages are required; the rest are optional and share OpenAI’s semantics and defaults.

Parameter	Type	Description
`model` required	string	The model slug to call, e.g. qwen/qwen3-30b-a3b. See the Models page for the full list.
`messages` required	array	The conversation so far, as objects with a role (system, user, or assistant) and content.
`temperature`	number	Sampling temperature, 0–2. Higher is more random, lower is more focused. Default 1.
`top_p`	number	Nucleus sampling, 0–1. An alternative to temperature; adjust one, not both. Default 1.
`max_tokens`	integer	Upper bound on tokens generated in the completion. Defaults to the model’s remaining context.
`stream`	boolean	Stream the response as server-sent events. Default false. See Streaming.
`stop`	string \| array	Up to four sequences where generation stops. The stop text is not included in the output.
`presence_penalty`	number	Between -2 and 2. Positive values push the model toward new topics. Default 0.
`frequency_penalty`	number	Between -2 and 2. Positive values discourage repeating the same tokens. Default 0.
`seed`	integer	Best-effort determinism: the same seed and parameters return a similar result where supported.
`tools`	array	Function definitions the model may call. See Function calling.
`response_format`	object	Request JSON or a JSON schema for structured outputs. See Structured outputs.

The response object #

A non-streaming request returns a single chat-completion object. The generated text is in choices[0].message.content; usage reports token counts for the call.

Response

{
  "id": "chatcmpl-9f3a…",
  "object": "chat.completion",
  "created": 1768000000,
  "model": "qwen/qwen3-30b-a3b",
  "choices": [
    {
      "index": 0,
      "message": {"role": "assistant", "content": "Red, blue, and yellow."},
      "finish_reason": "stop"
    }
  ],
  "usage": {"prompt_tokens": 18, "completion_tokens": 7, "total_tokens": 25}
}

Response fields #

Field	Type	Description
`id`	string	Unique identifier for the completion.
`object`	string	Always chat.completion (or chat.completion.chunk when streaming).
`created`	integer	Unix timestamp (seconds) of when the completion was created.
`model`	string	The model that produced the response.
`choices`	array	The generated choices. Each has an index, a message, and a finish_reason.
`finish_reason`	string	Why generation stopped: stop, length, tool_calls, or content_filter.
`usage`	object	Token counts: prompt_tokens, completion_tokens, and total_tokens.

When you set stream: true, the object becomes a series of chat.completion.chunk events instead, with text under choices[0].delta.content. See Streaming.