API reference
Chat completions
Generate a model response for a conversation. POST a list of messages and the model returns the next assistant message. This is the primary endpoint and follows the OpenAI chat-completions schema, including streaming, tool calls, and structured outputs.
Create a chat completion #
POST https://api.merius.ai/chat/completions
Request
curl https://api.merius.ai/v1/chat/completions \
-H "Authorization: Bearer $MERIUS_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen/qwen3-30b-a3b",
"messages": [
{"role": "system", "content": "You are concise."},
{"role": "user", "content": "Name three primary colors."}
],
"temperature": 0.7
}'
Request parameters #
The request body is JSON. model and messages are required; the rest are optional and share OpenAI’s semantics and defaults.
| Parameter | Type | Description |
|---|---|---|
model required | string | The model slug to call, e.g. qwen/qwen3-30b-a3b. See the Models page for the full list. |
messages required | array | The conversation so far, as objects with a role (system, user, or assistant) and content. |
temperature | number | Sampling temperature, 0–2. Higher is more random, lower is more focused. Default 1. |
top_p | number | Nucleus sampling, 0–1. An alternative to temperature; adjust one, not both. Default 1. |
max_tokens | integer | Upper bound on tokens generated in the completion. Defaults to the model’s remaining context. |
stream | boolean | Stream the response as server-sent events. Default false. See Streaming. |
stop | string | array | Up to four sequences where generation stops. The stop text is not included in the output. |
presence_penalty | number | Between -2 and 2. Positive values push the model toward new topics. Default 0. |
frequency_penalty | number | Between -2 and 2. Positive values discourage repeating the same tokens. Default 0. |
seed | integer | Best-effort determinism: the same seed and parameters return a similar result where supported. |
tools | array | Function definitions the model may call. See Function calling. |
response_format | object | Request JSON or a JSON schema for structured outputs. See Structured outputs. |
The response object #
A non-streaming request returns a single chat-completion object. The generated text is in choices[0].message.content; usage reports token counts for the call.
Response
{
"id": "chatcmpl-9f3a…",
"object": "chat.completion",
"created": 1768000000,
"model": "qwen/qwen3-30b-a3b",
"choices": [
{
"index": 0,
"message": {"role": "assistant", "content": "Red, blue, and yellow."},
"finish_reason": "stop"
}
],
"usage": {"prompt_tokens": 18, "completion_tokens": 7, "total_tokens": 25}
}
Response fields #
| Field | Type | Description |
|---|---|---|
id | string | Unique identifier for the completion. |
object | string | Always chat.completion (or chat.completion.chunk when streaming). |
created | integer | Unix timestamp (seconds) of when the completion was created. |
model | string | The model that produced the response. |
choices | array | The generated choices. Each has an index, a message, and a finish_reason. |
finish_reason | string | Why generation stopped: stop, length, tool_calls, or content_filter. |
usage | object | Token counts: prompt_tokens, completion_tokens, and total_tokens. |
When you set stream: true, the object becomes a series of chat.completion.chunk events instead, with text under choices[0].delta.content. See Streaming.