Chatgpt 4o Latest OpenAI 3 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max tokens `max_tokens`	integer (1…+∞)	4096	Maximum number of output tokens the model may generate.	—

Sampling 2 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number (0…2 step 0.1)	1	Controls randomness. Lower values make outputs more focused; higher values make them more varied.	—
Top P `top_p`	number (0…1 step 0.01)	1	Controls nucleus sampling by limiting generation to tokens whose cumulative probability reaches this value.	—

Open Chatgpt 4o Latest page →

Gpt 3.5 Turbo OpenAI 3 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max tokens `max_tokens`	integer (1…+∞)	4096	Maximum number of output tokens the model may generate.	—

Sampling 2 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number (0…2 step 0.1)	1	Controls randomness. Lower values make outputs more focused; higher values make them more varied.	—
Top P `top_p`	number (0…1 step 0.01)	1	Controls nucleus sampling by limiting generation to tokens whose cumulative probability reaches this value.	—

Open Gpt 3.5 Turbo page →

Gpt 4 Turbo OpenAI 3 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max tokens `max_tokens`	integer (1…+∞)	4096	Maximum number of output tokens the model may generate.	—

Sampling 2 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number (0…2 step 0.1)	1	Controls randomness. Lower values make outputs more focused; higher values make them more varied.	—
Top P `top_p`	number (0…1 step 0.01)	1	Controls nucleus sampling by limiting generation to tokens whose cumulative probability reaches this value.	—

Open Gpt 4 Turbo page →

Gpt 4 Turbo 2024-04-09 OpenAI 3 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max tokens `max_tokens`	integer (1…+∞)	4096	Maximum number of output tokens the model may generate.	—

Sampling 2 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number (0…2 step 0.1)	1	Controls randomness. Lower values make outputs more focused; higher values make them more varied.	—
Top P `top_p`	number (0…1 step 0.01)	1	Controls nucleus sampling by limiting generation to tokens whose cumulative probability reaches this value.	—

Open Gpt 4 Turbo 2024-04-09 page →

Gpt 4.1 OpenAI 3 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max tokens `max_tokens`	integer (1…+∞)	4096	Maximum number of output tokens the model may generate.	—

Sampling 2 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number (0…2 step 0.1)	1	Controls randomness. Lower values make outputs more focused; higher values make them more varied.	—
Top P `top_p`	number (0…1 step 0.01)	1	Controls nucleus sampling by limiting generation to tokens whose cumulative probability reaches this value.	—

Open Gpt 4.1 page →

Gpt 4.1 Mini OpenAI 3 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max tokens `max_tokens`	integer (1…+∞)	4096	Maximum number of output tokens the model may generate.	—

Sampling 2 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number (0…2 step 0.1)	1	Controls randomness. Lower values make outputs more focused; higher values make them more varied.	—
Top P `top_p`	number (0…1 step 0.01)	1	Controls nucleus sampling by limiting generation to tokens whose cumulative probability reaches this value.	—

Open Gpt 4.1 Mini page →

Gpt 4.1 Nano OpenAI 3 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max tokens `max_tokens`	integer (1…+∞)	4096	Maximum number of output tokens the model may generate.	—

Sampling 2 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number (0…2 step 0.1)	1	Controls randomness. Lower values make outputs more focused; higher values make them more varied.	—
Top P `top_p`	number (0…1 step 0.01)	1	Controls nucleus sampling by limiting generation to tokens whose cumulative probability reaches this value.	—

Open Gpt 4.1 Nano page →

GPT-4o OpenAI 3 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max tokens `max_tokens`	integer (1…+∞)	4096	Maximum number of output tokens the model may generate.	—

Sampling 2 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number (0…2 step 0.1)	1	Controls randomness. Lower values make outputs more focused; higher values make them more varied.	—
Top P `top_p`	number (0…1 step 0.01)	1	Controls nucleus sampling by limiting generation to tokens whose cumulative probability reaches this value.	—

Open GPT-4o page →

Gpt 4o 2024-11-20 OpenAI 3 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max tokens `max_tokens`	integer (1…+∞)	4096	Maximum number of output tokens the model may generate.	—

Sampling 2 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number (0…2 step 0.1)	1	Controls randomness. Lower values make outputs more focused; higher values make them more varied.	—
Top P `top_p`	number (0…1 step 0.01)	1	Controls nucleus sampling by limiting generation to tokens whose cumulative probability reaches this value.	—

Open Gpt 4o 2024-11-20 page →

GPT-4o mini OpenAI 3 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max tokens `max_tokens`	integer (1…+∞)	4096	Maximum number of output tokens the model may generate.	—

Sampling 2 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number (0…2 step 0.1)	1	Controls randomness. Lower values make outputs more focused; higher values make them more varied.	—
Top P `top_p`	number (0…1 step 0.01)	1	Controls nucleus sampling by limiting generation to tokens whose cumulative probability reaches this value.	—

Open GPT-4o mini page →

Gpt 5 OpenAI 2 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max completion tokens `max_completion_tokens`	integer (16…+∞)	4096	Maximum number of output tokens the model may generate.	—

Reasoning 1 param

Parameter	Type	Default	Description	Condition
Reasoning effort `reasoning_effort`	enum (minimal \| low \| medium \| high)	"medium"	Controls how much reasoning the model should perform before producing an answer.	—

Open Gpt 5 page →

Gpt 5 Chat Latest OpenAI 1 param

Length 1 param

Parameter	Type	Default	Description	Condition
Max completion tokens `max_completion_tokens`	integer (16…+∞)	4096	Maximum number of output tokens the model may generate.	—

Open Gpt 5 Chat Latest page →

Gpt 5 Mini OpenAI 2 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max completion tokens `max_completion_tokens`	integer (16…+∞)	4096	Maximum number of output tokens the model may generate.	—

Reasoning 1 param

Parameter	Type	Default	Description	Condition
Reasoning effort `reasoning_effort`	enum (minimal \| low \| medium \| high)	"medium"	Controls how much reasoning the model should perform before producing an answer.	—

Open Gpt 5 Mini page →

Gpt 5 Nano OpenAI 2 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max completion tokens `max_completion_tokens`	integer (16…+∞)	4096	Maximum number of output tokens the model may generate.	—

Reasoning 1 param

Parameter	Type	Default	Description	Condition
Reasoning effort `reasoning_effort`	enum (minimal \| low \| medium \| high)	"medium"	Controls how much reasoning the model should perform before producing an answer.	—

Open Gpt 5 Nano page →

Gpt 5.1 OpenAI 2 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max completion tokens `max_completion_tokens`	integer (16…+∞)	4096	Maximum number of output tokens the model may generate.	—

Reasoning 1 param

Parameter	Type	Default	Description	Condition
Reasoning effort `reasoning_effort`	enum (none \| low \| medium \| high)	"none"	Controls how much reasoning the model should perform before producing an answer.	—

Open Gpt 5.1 page →

Gpt 5.1 Codex Max OpenAI Subscription 3 params

Reasoning 2 params

Parameter	Type	Default	Description	Condition
Reasoning effort `reasoning.effort`	enum (minimal \| low \| medium \| high \| xhigh)	"medium"	Controls how much reasoning the model should perform before producing an answer.	—
Reasoning summary `reasoning.summary`	enum (auto \| concise \| detailed \| none)	"auto"	Controls the level of reasoning summary returned with the response.	—

Output 1 param

Parameter	Type	Default	Description	Condition
Verbosity `text.verbosity`	enum (low \| medium \| high)	"medium"	Controls how concise or detailed the model's final text response should be.	—

Open Gpt 5.1 Codex Max page →

Gpt 5.1 Codex OpenAI Subscription 3 params

Reasoning 2 params

Parameter	Type	Default	Description	Condition
Reasoning effort `reasoning.effort`	enum (minimal \| low \| medium \| high)	"medium"	Controls how much reasoning the model should perform before producing an answer.	—
Reasoning summary `reasoning.summary`	enum (auto \| concise \| detailed \| none)	"auto"	Controls the level of reasoning summary returned with the response.	—

Output 1 param

Parameter	Type	Default	Description	Condition
Verbosity `text.verbosity`	enum (low \| medium \| high)	"medium"	Controls how concise or detailed the model's final text response should be.	—

Open Gpt 5.1 Codex page →

Gpt 5.2 OpenAI 2 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max completion tokens `max_completion_tokens`	integer (16…+∞)	4096	Maximum number of output tokens the model may generate.	—

Reasoning 1 param

Parameter	Type	Default	Description	Condition
Reasoning effort `reasoning_effort`	enum (none \| low \| medium \| high \| xhigh)	"medium"	Controls how much reasoning the model should perform before producing an answer.	—

Open Gpt 5.2 page →

Gpt 5.2 Codex OpenAI Subscription 3 params

Reasoning 2 params

Parameter	Type	Default	Description	Condition
Reasoning effort `reasoning.effort`	enum (minimal \| low \| medium \| high \| xhigh)	"medium"	Controls how much reasoning the model should perform before producing an answer.	—
Reasoning summary `reasoning.summary`	enum (auto \| concise \| detailed \| none)	"auto"	Controls the level of reasoning summary returned with the response.	—

Output 1 param

Parameter	Type	Default	Description	Condition
Verbosity `text.verbosity`	enum (low \| medium \| high)	"medium"	Controls how concise or detailed the model's final text response should be.	—

Open Gpt 5.2 Codex page →

Gpt 5.2 OpenAI Subscription 3 params

Reasoning 2 params

Parameter	Type	Default	Description	Condition
Reasoning effort `reasoning.effort`	enum (minimal \| low \| medium \| high \| xhigh)	"medium"	Controls how much reasoning the model should perform before producing an answer.	—
Reasoning summary `reasoning.summary`	enum (auto \| concise \| detailed \| none)	"auto"	Controls the level of reasoning summary returned with the response.	—

Output 1 param

Parameter	Type	Default	Description	Condition
Verbosity `text.verbosity`	enum (low \| medium \| high)	"medium"	Controls how concise or detailed the model's final text response should be.	—

Open Gpt 5.2 page →

Gpt 5.3 Codex OpenAI 2 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max completion tokens `max_completion_tokens`	integer (16…+∞)	4096	Maximum number of output tokens the model may generate.	—

Reasoning 1 param

Parameter	Type	Default	Description	Condition
Reasoning effort `reasoning_effort`	enum (low \| medium \| high \| xhigh)	"medium"	Controls how much reasoning the model should perform before producing an answer.	—

Open Gpt 5.3 Codex page →

Gpt 5.3 Codex Spark OpenAI Subscription 3 params

Reasoning 2 params

Parameter	Type	Default	Description	Condition
Reasoning effort `reasoning.effort`	enum (minimal \| low \| medium \| high \| xhigh)	"medium"	Controls how much reasoning the model should perform before producing an answer.	—
Reasoning summary `reasoning.summary`	enum (auto \| concise \| detailed \| none)	"auto"	Controls the level of reasoning summary returned with the response.	—

Output 1 param

Parameter	Type	Default	Description	Condition
Verbosity `text.verbosity`	enum (low \| medium \| high)	"medium"	Controls how concise or detailed the model's final text response should be.	—

Open Gpt 5.3 Codex Spark page →

Gpt 5.3 Codex OpenAI Subscription 3 params

Reasoning 2 params

Parameter	Type	Default	Description	Condition
Reasoning effort `reasoning.effort`	enum (minimal \| low \| medium \| high \| xhigh)	"medium"	Controls how much reasoning the model should perform before producing an answer.	—
Reasoning summary `reasoning.summary`	enum (auto \| concise \| detailed \| none)	"auto"	Controls the level of reasoning summary returned with the response.	—

Output 1 param

Parameter	Type	Default	Description	Condition
Verbosity `text.verbosity`	enum (low \| medium \| high)	"medium"	Controls how concise or detailed the model's final text response should be.	—

Open Gpt 5.3 Codex page →

Gpt 5.4 OpenAI 2 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max completion tokens `max_completion_tokens`	integer (16…+∞)	4096	Maximum number of output tokens the model may generate.	—

Reasoning 1 param

Parameter	Type	Default	Description	Condition
Reasoning effort `reasoning_effort`	enum (none \| low \| medium \| high \| xhigh)	"medium"	Controls how much reasoning the model should perform before producing an answer.	—

Open Gpt 5.4 page →

Gpt 5.4 Mini OpenAI 2 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max completion tokens `max_completion_tokens`	integer (16…+∞)	4096	Maximum number of output tokens the model may generate.	—

Reasoning 1 param

Parameter	Type	Default	Description	Condition
Reasoning effort `reasoning_effort`	enum (none \| low \| medium \| high \| xhigh)	"medium"	Controls how much reasoning the model should perform before producing an answer.	—

Open Gpt 5.4 Mini page →

Gpt 5.4 Mini OpenAI Subscription 3 params

Reasoning 2 params

Parameter	Type	Default	Description	Condition
Reasoning effort `reasoning.effort`	enum (minimal \| low \| medium \| high \| xhigh)	"medium"	Controls how much reasoning the model should perform before producing an answer.	—
Reasoning summary `reasoning.summary`	enum (auto \| concise \| detailed \| none)	"auto"	Controls the level of reasoning summary returned with the response.	—

Output 1 param

Parameter	Type	Default	Description	Condition
Verbosity `text.verbosity`	enum (low \| medium \| high)	"medium"	Controls how concise or detailed the model's final text response should be.	—

Open Gpt 5.4 Mini page →

Gpt 5.4 Nano OpenAI 2 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max completion tokens `max_completion_tokens`	integer (16…+∞)	4096	Maximum number of output tokens the model may generate.	—

Reasoning 1 param

Parameter	Type	Default	Description	Condition
Reasoning effort `reasoning_effort`	enum (none \| low \| medium \| high \| xhigh)	"medium"	Controls how much reasoning the model should perform before producing an answer.	—

Open Gpt 5.4 Nano page →

Gpt 5.4 Pro OpenAI 2 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max completion tokens `max_completion_tokens`	integer (16…+∞)	4096	Maximum number of output tokens the model may generate.	—

Reasoning 1 param

Parameter	Type	Default	Description	Condition
Reasoning effort `reasoning_effort`	enum (medium \| high \| xhigh)	"medium"	Controls how much reasoning the model should perform before producing an answer.	—

Open Gpt 5.4 Pro page →

Gpt 5.4 Pro OpenAI Subscription 3 params

Reasoning 2 params

Parameter	Type	Default	Description	Condition
Reasoning effort `reasoning.effort`	enum (medium \| high \| xhigh)	"medium"	Controls how much reasoning the model should perform before producing an answer.	—
Reasoning summary `reasoning.summary`	enum (auto \| concise \| detailed \| none)	"auto"	Controls the level of reasoning summary returned with the response.	—

Output 1 param

Parameter	Type	Default	Description	Condition
Verbosity `text.verbosity`	enum (low \| medium \| high)	"medium"	Controls how concise or detailed the model's final text response should be.	—

Open Gpt 5.4 Pro page →

Gpt 5.4 OpenAI Subscription 3 params

Reasoning 2 params

Parameter	Type	Default	Description	Condition
Reasoning effort `reasoning.effort`	enum (minimal \| low \| medium \| high \| xhigh)	"medium"	Controls how much reasoning the model should perform before producing an answer.	—
Reasoning summary `reasoning.summary`	enum (auto \| concise \| detailed \| none)	"auto"	Controls the level of reasoning summary returned with the response.	—

Output 1 param

Parameter	Type	Default	Description	Condition
Verbosity `text.verbosity`	enum (low \| medium \| high)	"medium"	Controls how concise or detailed the model's final text response should be.	—

Open Gpt 5.4 page →

Gpt 5.5 OpenAI 2 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max completion tokens `max_completion_tokens`	integer (16…+∞)	4096	Maximum number of output tokens the model may generate.	—

Reasoning 1 param

Parameter	Type	Default	Description	Condition
Reasoning effort `reasoning_effort`	enum (none \| low \| medium \| high \| xhigh)	"medium"	Controls how much reasoning the model should perform before producing an answer.	—

Open Gpt 5.5 page →

Gpt 5.5 Pro OpenAI 2 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max completion tokens `max_completion_tokens`	integer (16…+∞)	4096	Maximum number of output tokens the model may generate.	—

Reasoning 1 param

Parameter	Type	Default	Description	Condition
Reasoning effort `reasoning_effort`	enum (medium \| high \| xhigh)	"medium"	Controls how much reasoning the model should perform before producing an answer.	—

Open Gpt 5.5 Pro page →

Gpt 5.5 Pro OpenAI Subscription 3 params

Reasoning 2 params

Parameter	Type	Default	Description	Condition
Reasoning effort `reasoning.effort`	enum (medium \| high \| xhigh)	"medium"	Controls how much reasoning the model should perform before producing an answer.	—
Reasoning summary `reasoning.summary`	enum (auto \| concise \| detailed \| none)	"auto"	Controls the level of reasoning summary returned with the response.	—

Output 1 param

Parameter	Type	Default	Description	Condition
Verbosity `text.verbosity`	enum (low \| medium \| high)	"medium"	Controls how concise or detailed the model's final text response should be.	—

Open Gpt 5.5 Pro page →

Gpt 5.5 OpenAI Subscription 3 params

Reasoning 2 params

Parameter	Type	Default	Description	Condition
Reasoning effort `reasoning.effort`	enum (minimal \| low \| medium \| high \| xhigh)	"medium"	Controls how much reasoning the model should perform before producing an answer.	—
Reasoning summary `reasoning.summary`	enum (auto \| concise \| detailed \| none)	"auto"	Controls the level of reasoning summary returned with the response.	—

Output 1 param

Parameter	Type	Default	Description	Condition
Verbosity `text.verbosity`	enum (low \| medium \| high)	"medium"	Controls how concise or detailed the model's final text response should be.	—

Open Gpt 5.5 page →

Gpt 5.6 Luna OpenAI 2 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max completion tokens `max_completion_tokens`	integer (16…+∞)	4096	Maximum number of output tokens the model may generate.	—

Reasoning 1 param

Parameter	Type	Default	Description	Condition
Reasoning effort `reasoning_effort`	enum (none \| low \| medium \| high)	"none"	Controls how much reasoning the model should perform before producing an answer.	—

Open Gpt 5.6 Luna page →

Gpt 5.6 Luna OpenAI Subscription 3 params

Reasoning 2 params

Parameter	Type	Default	Description	Condition
Reasoning effort `reasoning.effort`	enum (none \| low \| medium \| high \| xhigh \| max)	"medium"	Controls how much reasoning the model should perform before producing an answer.	—
Reasoning summary `reasoning.summary`	enum (auto \| concise \| detailed)	"auto"	Controls the level of reasoning summary returned with the response.	—

Output 1 param

Parameter	Type	Default	Description	Condition
Verbosity `text.verbosity`	enum (low \| medium \| high)	"medium"	Controls how concise or detailed the model's final text response should be.	—

Open Gpt 5.6 Luna page →

Gpt 5.6 Sol OpenAI 2 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max completion tokens `max_completion_tokens`	integer (16…+∞)	4096	Maximum number of output tokens the model may generate.	—

Reasoning 1 param

Parameter	Type	Default	Description	Condition
Reasoning effort `reasoning_effort`	enum (none \| low \| medium \| high)	"none"	Controls how much reasoning the model should perform before producing an answer.	—

Open Gpt 5.6 Sol page →

Gpt 5.6 Sol OpenAI Subscription 3 params

Reasoning 2 params

Parameter	Type	Default	Description	Condition
Reasoning effort `reasoning.effort`	enum (none \| low \| medium \| high \| xhigh \| max)	"medium"	Controls how much reasoning the model should perform before producing an answer.	—
Reasoning summary `reasoning.summary`	enum (auto \| concise \| detailed)	"auto"	Controls the level of reasoning summary returned with the response.	—

Output 1 param

Parameter	Type	Default	Description	Condition
Verbosity `text.verbosity`	enum (low \| medium \| high)	"medium"	Controls how concise or detailed the model's final text response should be.	—

Open Gpt 5.6 Sol page →

Gpt 5.6 Terra OpenAI 2 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max completion tokens `max_completion_tokens`	integer (16…+∞)	4096	Maximum number of output tokens the model may generate.	—

Reasoning 1 param

Parameter	Type	Default	Description	Condition
Reasoning effort `reasoning_effort`	enum (none \| low \| medium \| high)	"none"	Controls how much reasoning the model should perform before producing an answer.	—

Open Gpt 5.6 Terra page →

Gpt 5.6 Terra OpenAI Subscription 3 params

Reasoning 2 params

Parameter	Type	Default	Description	Condition
Reasoning effort `reasoning.effort`	enum (none \| low \| medium \| high \| xhigh \| max)	"medium"	Controls how much reasoning the model should perform before producing an answer.	—
Reasoning summary `reasoning.summary`	enum (auto \| concise \| detailed)	"auto"	Controls the level of reasoning summary returned with the response.	—

Output 1 param

Parameter	Type	Default	Description	Condition
Verbosity `text.verbosity`	enum (low \| medium \| high)	"medium"	Controls how concise or detailed the model's final text response should be.	—

Open Gpt 5.6 Terra page →

Gpt Oss 120b OpenAI 6 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max completion tokens `max_completion_tokens`	integer (1…131072)	—	Maximum number of output tokens the model may generate.	—

Sampling 2 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number	1	Controls randomness. Lower values make outputs more focused; higher values make them more varied. OpenAI recommends sampling at 1.0.	—
Top P `top_p`	number (0…1)	1	Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability. OpenAI recommends sampling at 1.0.	—

Reasoning 1 param

Parameter	Type	Default	Description	Condition
Reasoning effort `reasoning_effort`	enum (low \| medium \| high)	"medium"	Controls how much reasoning the model should perform before producing an answer.	—

Tools 1 param

Parameter	Type	Default	Description	Condition
Tool choice `tool_choice`	enum (auto \| none \| required)	—	Controls whether the model may call tools, must call one, or skips tool calls.	—

Output 1 param

Parameter	Type	Default	Description	Condition
Response format `response_format.type`	enum (text \| json_schema)	"text"	Controls whether the model returns normal text or a schema-constrained JSON object.	—

Open Gpt Oss 120b page →

Gpt Oss 20b OpenAI 6 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max completion tokens `max_completion_tokens`	integer (1…131072)	—	Maximum number of output tokens the model may generate.	—

Sampling 2 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number	1	Controls randomness. Lower values make outputs more focused; higher values make them more varied. OpenAI recommends sampling at 1.0.	—
Top P `top_p`	number (0…1)	1	Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability. OpenAI recommends sampling at 1.0.	—

Reasoning 1 param

Parameter	Type	Default	Description	Condition
Reasoning effort `reasoning_effort`	enum (low \| medium \| high)	"medium"	Controls how much reasoning the model should perform before producing an answer.	—

Tools 1 param

Parameter	Type	Default	Description	Condition
Tool choice `tool_choice`	enum (auto \| none \| required)	—	Controls whether the model may call tools, must call one, or skips tool calls.	—

Output 1 param

Parameter	Type	Default	Description	Condition
Response format `response_format.type`	enum (text \| json_schema)	"text"	Controls whether the model returns normal text or a schema-constrained JSON object.	—

Open Gpt Oss 20b page →

Gpt Oss Safeguard 120b OpenAI 4 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max completion tokens `max_completion_tokens`	integer (1…131072)	—	Maximum number of output tokens the model may generate.	—

Sampling 2 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number	1	Controls randomness. Lower values make outputs more focused; higher values make them more varied. OpenAI recommends sampling at 1.0.	—
Top P `top_p`	number (0…1)	1	Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability. OpenAI recommends sampling at 1.0.	—

Reasoning 1 param

Parameter	Type	Default	Description	Condition
Reasoning effort `reasoning_effort`	enum (low \| medium \| high)	"medium"	Controls how much reasoning the model performs when interpreting the provided safety policy before returning a classification.	—

Open Gpt Oss Safeguard 120b page →

Gpt Oss Safeguard 20b OpenAI 4 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max completion tokens `max_completion_tokens`	integer (1…131072)	—	Maximum number of output tokens the model may generate.	—

Sampling 2 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number	1	Controls randomness. Lower values make outputs more focused; higher values make them more varied. OpenAI recommends sampling at 1.0.	—
Top P `top_p`	number (0…1)	1	Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability. OpenAI recommends sampling at 1.0.	—

Reasoning 1 param

Parameter	Type	Default	Description	Condition
Reasoning effort `reasoning_effort`	enum (low \| medium \| high)	"medium"	Controls how much reasoning the model performs when interpreting the provided safety policy before returning a classification.	—

Open Gpt Oss Safeguard 20b page →

o1 OpenAI 2 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max completion tokens `max_completion_tokens`	integer (16…+∞)	4096	Maximum number of output tokens the model may generate.	—

Reasoning 1 param

Parameter	Type	Default	Description	Condition
Reasoning effort `reasoning_effort`	enum (low \| medium \| high \| xhigh)	"medium"	Controls how much reasoning the model should perform before producing an answer.	—

Open o1 page →

o1-mini OpenAI 2 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max tokens `max_tokens`	integer (1…+∞)	4096	Maximum number of output tokens the model may generate.	—

Reasoning 1 param

Parameter	Type	Default	Description	Condition
Reasoning effort `reasoning_effort`	enum (minimal \| low \| medium \| high)	"medium"	Controls how much reasoning the model should perform before producing an answer.	—

Open o1-mini page →

O1 Preview OpenAI 2 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max tokens `max_tokens`	integer (1…+∞)	4096	Maximum number of output tokens the model may generate.	—

Reasoning 1 param

Parameter	Type	Default	Description	Condition
Reasoning effort `reasoning_effort`	enum (minimal \| low \| medium \| high)	"medium"	Controls how much reasoning the model should perform before producing an answer.	—

Open O1 Preview page →

o3 OpenAI 2 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max completion tokens `max_completion_tokens`	integer (16…+∞)	4096	Maximum number of output tokens the model may generate.	—

Reasoning 1 param

Parameter	Type	Default	Description	Condition
Reasoning effort `reasoning_effort`	enum (low \| medium \| high \| xhigh)	"medium"	Controls how much reasoning the model should perform before producing an answer.	—

Open o3 page →

o3-mini OpenAI 2 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max completion tokens `max_completion_tokens`	integer (16…+∞)	4096	Maximum number of output tokens the model may generate.	—

Reasoning 1 param

Parameter	Type	Default	Description	Condition
Reasoning effort `reasoning_effort`	enum (low \| medium \| high \| xhigh)	"medium"	Controls how much reasoning the model should perform before producing an answer.	—

Open o3-mini page →

O3 Pro OpenAI 2 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max completion tokens `max_completion_tokens`	integer (16…+∞)	4096	Maximum number of output tokens the model may generate.	—

Reasoning 1 param

Parameter	Type	Default	Description	Condition
Reasoning effort `reasoning_effort`	enum (low \| medium \| high \| xhigh)	"medium"	Controls how much reasoning the model should perform before producing an answer.	—

Open O3 Pro page →

o4-mini OpenAI 2 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max completion tokens `max_completion_tokens`	integer (16…+∞)	4096	Maximum number of output tokens the model may generate.	—

Reasoning 1 param

Parameter	Type	Default	Description	Condition
Reasoning effort `reasoning_effort`	enum (low \| medium \| high \| xhigh)	"medium"	Controls how much reasoning the model should perform before producing an answer.	—

Open o4-mini page →

Claude 3.5 Haiku 20241022 Anthropic 4 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max tokens `max_tokens`	integer (1…+∞)	4096	Maximum number of output tokens the model may generate.	—

Sampling 3 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number (0…1 step 0.1)	1	Controls randomness. Lower values make outputs more focused; higher values make them more varied.	Not when thinking.type ∈ {"adaptive", "enabled"}
Top P `top_p`	number (0…1 step 0.01)	1	Controls nucleus sampling by limiting generation to tokens whose cumulative probability reaches this value.	Not when thinking.type ∈ {"adaptive", "enabled"} or temperature ≠ 1
Top K `top_k`	integer (0…+∞)	0	Limits token sampling to the top K most likely next tokens.	Not when thinking.type ∈ {"adaptive", "enabled"}

Open Claude 3.5 Haiku 20241022 page →

Claude 3.5 Haiku Latest Anthropic 4 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max tokens `max_tokens`	integer (1…+∞)	4096	Maximum number of output tokens the model may generate.	—

Sampling 3 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number (0…1 step 0.1)	1	Controls randomness. Lower values make outputs more focused; higher values make them more varied.	Not when thinking.type ∈ {"adaptive", "enabled"}
Top P `top_p`	number (0…1 step 0.01)	1	Controls nucleus sampling by limiting generation to tokens whose cumulative probability reaches this value.	Not when thinking.type ∈ {"adaptive", "enabled"} or temperature ≠ 1
Top K `top_k`	integer (0…+∞)	0	Limits token sampling to the top K most likely next tokens.	Not when thinking.type ∈ {"adaptive", "enabled"}

Open Claude 3.5 Haiku Latest page →

Claude 3.5 Sonnet 20241022 Anthropic 4 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max tokens `max_tokens`	integer (1…+∞)	4096	Maximum number of output tokens the model may generate.	—

Sampling 3 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number (0…1 step 0.1)	1	Controls randomness. Lower values make outputs more focused; higher values make them more varied.	Not when thinking.type ∈ {"adaptive", "enabled"}
Top P `top_p`	number (0…1 step 0.01)	1	Controls nucleus sampling by limiting generation to tokens whose cumulative probability reaches this value.	Not when thinking.type ∈ {"adaptive", "enabled"} or temperature ≠ 1
Top K `top_k`	integer (0…+∞)	0	Limits token sampling to the top K most likely next tokens.	Not when thinking.type ∈ {"adaptive", "enabled"}

Open Claude 3.5 Sonnet 20241022 page →

Claude 3.5 Sonnet Latest Anthropic 4 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max tokens `max_tokens`	integer (1…+∞)	4096	Maximum number of output tokens the model may generate.	—

Sampling 3 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number (0…1 step 0.1)	1	Controls randomness. Lower values make outputs more focused; higher values make them more varied.	Not when thinking.type ∈ {"adaptive", "enabled"}
Top P `top_p`	number (0…1 step 0.01)	1	Controls nucleus sampling by limiting generation to tokens whose cumulative probability reaches this value.	Not when thinking.type ∈ {"adaptive", "enabled"} or temperature ≠ 1
Top K `top_k`	integer (0…+∞)	0	Limits token sampling to the top K most likely next tokens.	Not when thinking.type ∈ {"adaptive", "enabled"}

Open Claude 3.5 Sonnet Latest page →

Claude 3.7 Sonnet 20250219 Anthropic 6 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max tokens `max_tokens`	integer (1…+∞)	4096	Maximum number of output tokens the model may generate.	—

Sampling 3 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number (0…1 step 0.1)	1	Controls randomness. Lower values make outputs more focused; higher values make them more varied.	Not when thinking.type ∈ {"adaptive", "enabled"}
Top P `top_p`	number (0…1 step 0.01)	1	Controls nucleus sampling by limiting generation to tokens whose cumulative probability reaches this value.	Not when thinking.type ∈ {"adaptive", "enabled"} or temperature ≠ 1
Top K `top_k`	integer (0…+∞)	0	Limits token sampling to the top K most likely next tokens.	Not when thinking.type ∈ {"adaptive", "enabled"}

Reasoning 2 params

Parameter	Type	Default	Description	Condition
Thinking mode `thinking.type`	enum (disabled \| enabled)	"disabled"	Controls the Anthropic thinking mode values supported by this model.	—
Budget tokens `thinking.budget_tokens`	integer (1024…+∞)	4096	Maximum token budget Anthropic may use for extended thinking before producing the final answer.	Only when thinking.type = "enabled"

Open Claude 3.7 Sonnet 20250219 page →

Claude 3.7 Sonnet Latest Anthropic 6 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max tokens `max_tokens`	integer (1…+∞)	4096	Maximum number of output tokens the model may generate.	—

Sampling 3 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number (0…1 step 0.1)	1	Controls randomness. Lower values make outputs more focused; higher values make them more varied.	Not when thinking.type ∈ {"adaptive", "enabled"}
Top P `top_p`	number (0…1 step 0.01)	1	Controls nucleus sampling by limiting generation to tokens whose cumulative probability reaches this value.	Not when thinking.type ∈ {"adaptive", "enabled"} or temperature ≠ 1
Top K `top_k`	integer (0…+∞)	0	Limits token sampling to the top K most likely next tokens.	Not when thinking.type ∈ {"adaptive", "enabled"}

Reasoning 2 params

Parameter	Type	Default	Description	Condition
Thinking mode `thinking.type`	enum (disabled \| enabled)	"disabled"	Controls the Anthropic thinking mode values supported by this model.	—
Budget tokens `thinking.budget_tokens`	integer (1024…+∞)	4096	Maximum token budget Anthropic may use for extended thinking before producing the final answer.	Only when thinking.type = "enabled"

Open Claude 3.7 Sonnet Latest page →

Claude 3 Opus 20240229 Anthropic 4 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max tokens `max_tokens`	integer (1…+∞)	4096	Maximum number of output tokens the model may generate.	—

Sampling 3 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number (0…1 step 0.1)	1	Controls randomness. Lower values make outputs more focused; higher values make them more varied.	Not when thinking.type ∈ {"adaptive", "enabled"}
Top P `top_p`	number (0…1 step 0.01)	1	Controls nucleus sampling by limiting generation to tokens whose cumulative probability reaches this value.	Not when thinking.type ∈ {"adaptive", "enabled"} or temperature ≠ 1
Top K `top_k`	integer (0…+∞)	0	Limits token sampling to the top K most likely next tokens.	Not when thinking.type ∈ {"adaptive", "enabled"}

Open Claude 3 Opus 20240229 page →

Claude 3 Opus Latest Anthropic 4 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max tokens `max_tokens`	integer (1…+∞)	4096	Maximum number of output tokens the model may generate.	—

Sampling 3 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number (0…1 step 0.1)	1	Controls randomness. Lower values make outputs more focused; higher values make them more varied.	Not when thinking.type ∈ {"adaptive", "enabled"}
Top P `top_p`	number (0…1 step 0.01)	1	Controls nucleus sampling by limiting generation to tokens whose cumulative probability reaches this value.	Not when thinking.type ∈ {"adaptive", "enabled"} or temperature ≠ 1
Top K `top_k`	integer (0…+∞)	0	Limits token sampling to the top K most likely next tokens.	Not when thinking.type ∈ {"adaptive", "enabled"}

Open Claude 3 Opus Latest page →

Claude Fable 5 Anthropic 4 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max tokens `max_tokens`	integer (1…+∞)	4096	Maximum number of output tokens the model may generate.	—

Reasoning 3 params

Parameter	Type	Default	Description	Condition
Thinking mode `thinking.type`	enum (adaptive)	—	Only adaptive thinking is supported; omit the parameter entirely to run without thinking (an explicit disabled value is rejected).	—
Thinking display `thinking.display`	enum (summarized \| omitted)	"omitted"	Controls whether Anthropic returns summarized or omitted thinking content.	Only when thinking.type = "adaptive"
Effort `output_config.effort`	enum (low \| medium \| high \| xhigh \| max)	"high"	Controls Anthropic response thoroughness and token spend.	—

Open Claude Fable 5 page →

Claude Fable 5 Anthropic Subscription 4 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max tokens `max_tokens`	integer (1…+∞)	4096	Maximum number of output tokens the model may generate.	—

Reasoning 3 params

Parameter	Type	Default	Description	Condition
Thinking mode `thinking.type`	enum (adaptive)	—	Only adaptive thinking is supported; omit the parameter entirely to run without thinking (an explicit disabled value is rejected).	—
Thinking display `thinking.display`	enum (summarized \| omitted)	"omitted"	Controls whether Anthropic returns summarized or omitted thinking content.	Only when thinking.type = "adaptive"
Effort `output_config.effort`	enum (low \| medium \| high \| xhigh \| max)	"high"	Controls Anthropic response thoroughness and token spend.	—

Open Claude Fable 5 page →

Claude Haiku 4 Anthropic 6 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max tokens `max_tokens`	integer (1…+∞)	4096	Maximum number of output tokens the model may generate.	—

Sampling 3 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number (0…1 step 0.1)	1	Controls randomness. Lower values make outputs more focused; higher values make them more varied.	Not when thinking.type ∈ {"adaptive", "enabled"}
Top P `top_p`	number (0…1 step 0.01)	1	Controls nucleus sampling by limiting generation to tokens whose cumulative probability reaches this value.	Not when thinking.type ∈ {"adaptive", "enabled"} or temperature ≠ 1
Top K `top_k`	integer (0…+∞)	0	Limits token sampling to the top K most likely next tokens.	Not when thinking.type ∈ {"adaptive", "enabled"}

Reasoning 2 params

Parameter	Type	Default	Description	Condition
Thinking mode `thinking.type`	enum (disabled \| enabled)	"disabled"	Controls the Anthropic thinking mode values supported by this model.	—
Budget tokens `thinking.budget_tokens`	integer (1024…+∞)	4096	Maximum token budget Anthropic may use for extended thinking before producing the final answer.	Only when thinking.type = "enabled"

Open Claude Haiku 4 page →

Claude Haiku 4.5 Anthropic 6 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max tokens `max_tokens`	integer (1…+∞)	4096	Maximum number of output tokens the model may generate.	—

Sampling 3 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number (0…1 step 0.1)	1	Controls randomness. Lower values make outputs more focused; higher values make them more varied.	Not when thinking.type ∈ {"adaptive", "enabled"}
Top P `top_p`	number (0…1 step 0.01)	1	Controls nucleus sampling by limiting generation to tokens whose cumulative probability reaches this value.	Not when thinking.type ∈ {"adaptive", "enabled"} or temperature ≠ 1
Top K `top_k`	integer (0…+∞)	0	Limits token sampling to the top K most likely next tokens.	Not when thinking.type ∈ {"adaptive", "enabled"}

Reasoning 2 params

Parameter	Type	Default	Description	Condition
Thinking mode `thinking.type`	enum (disabled \| enabled)	"disabled"	Controls the Anthropic thinking mode values supported by this model.	—
Budget tokens `thinking.budget_tokens`	integer (1024…+∞)	4096	Maximum token budget Anthropic may use for extended thinking before producing the final answer.	Only when thinking.type = "enabled"

Open Claude Haiku 4.5 page →

Claude Haiku 4.5 20251001 Anthropic 6 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max tokens `max_tokens`	integer (1…+∞)	4096	Maximum number of output tokens the model may generate.	—

Sampling 3 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number (0…1 step 0.1)	1	Controls randomness. Lower values make outputs more focused; higher values make them more varied.	Not when thinking.type = "enabled" or top_p ≠ null
Top P `top_p`	number (0…1 step 0.01)	1	Controls nucleus sampling by limiting generation to tokens whose cumulative probability reaches this value.	Not when thinking.type = "enabled" or temperature ≠ null
Top K `top_k`	integer (0…+∞)	0	Limits token sampling to the top K most likely next tokens.	Not when thinking.type = "enabled"

Reasoning 2 params

Parameter	Type	Default	Description	Condition
Thinking mode `thinking.type`	enum (disabled \| enabled)	"disabled"	Controls the Anthropic thinking mode values supported by this model.	—
Budget tokens `thinking.budget_tokens`	integer (1024…+∞)	4096	Maximum token budget Anthropic may use for extended thinking before producing the final answer.	Only when thinking.type = "enabled"

Open Claude Haiku 4.5 20251001 page →

Claude Haiku 4.5 20251001 Anthropic Subscription 6 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max tokens `max_tokens`	integer (1…+∞)	4096	Maximum number of output tokens the model may generate.	—

Sampling 3 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number (0…1 step 0.1)	1	Controls randomness. Lower values make outputs more focused; higher values make them more varied.	Not when thinking.type = "enabled" or top_p ≠ null
Top P `top_p`	number (0…1 step 0.01)	1	Controls nucleus sampling by limiting generation to tokens whose cumulative probability reaches this value.	Not when thinking.type = "enabled" or temperature ≠ null
Top K `top_k`	integer (0…+∞)	0	Limits token sampling to the top K most likely next tokens.	Not when thinking.type = "enabled"

Reasoning 2 params

Parameter	Type	Default	Description	Condition
Thinking mode `thinking.type`	enum (disabled \| enabled)	"disabled"	Controls the Anthropic thinking mode values supported by this model.	—
Budget tokens `thinking.budget_tokens`	integer (1024…+∞)	4096	Maximum token budget Anthropic may use for extended thinking before producing the final answer.	Only when thinking.type = "enabled"

Open Claude Haiku 4.5 20251001 page →

Claude Haiku 4.5 Anthropic Subscription 6 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max tokens `max_tokens`	integer (1…+∞)	4096	Maximum number of output tokens the model may generate.	—

Sampling 3 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number (0…1 step 0.1)	1	Controls randomness. Lower values make outputs more focused; higher values make them more varied.	Not when thinking.type ∈ {"adaptive", "enabled"}
Top P `top_p`	number (0…1 step 0.01)	1	Controls nucleus sampling by limiting generation to tokens whose cumulative probability reaches this value.	Not when thinking.type ∈ {"adaptive", "enabled"} or temperature ≠ 1
Top K `top_k`	integer (0…+∞)	0	Limits token sampling to the top K most likely next tokens.	Not when thinking.type ∈ {"adaptive", "enabled"}

Reasoning 2 params

Parameter	Type	Default	Description	Condition
Thinking mode `thinking.type`	enum (disabled \| enabled)	"disabled"	Controls the Anthropic thinking mode values supported by this model.	—
Budget tokens `thinking.budget_tokens`	integer (1024…+∞)	4096	Maximum token budget Anthropic may use for extended thinking before producing the final answer.	Only when thinking.type = "enabled"

Open Claude Haiku 4.5 page →

Claude Haiku 4 Anthropic Subscription 6 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max tokens `max_tokens`	integer (1…+∞)	4096	Maximum number of output tokens the model may generate.	—

Sampling 3 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number (0…1 step 0.1)	1	Controls randomness. Lower values make outputs more focused; higher values make them more varied.	Not when thinking.type ∈ {"adaptive", "enabled"}
Top P `top_p`	number (0…1 step 0.01)	1	Controls nucleus sampling by limiting generation to tokens whose cumulative probability reaches this value.	Not when thinking.type ∈ {"adaptive", "enabled"} or temperature ≠ 1
Top K `top_k`	integer (0…+∞)	0	Limits token sampling to the top K most likely next tokens.	Not when thinking.type ∈ {"adaptive", "enabled"}

Reasoning 2 params

Parameter	Type	Default	Description	Condition
Thinking mode `thinking.type`	enum (disabled \| enabled)	"disabled"	Controls the Anthropic thinking mode values supported by this model.	—
Budget tokens `thinking.budget_tokens`	integer (1024…+∞)	4096	Maximum token budget Anthropic may use for extended thinking before producing the final answer.	Only when thinking.type = "enabled"

Open Claude Haiku 4 page →

Claude Opus 4.1 20250805 Anthropic 7 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max tokens `max_tokens`	integer (1…+∞)	4096	Maximum number of output tokens the model may generate.	—

Sampling 3 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number (0…1 step 0.1)	1	Controls randomness. Lower values make outputs more focused; higher values make them more varied.	Not when thinking.type = "enabled" or top_p ≠ null
Top P `top_p`	number (0…1 step 0.01)	1	Controls nucleus sampling by limiting generation to tokens whose cumulative probability reaches this value.	Not when thinking.type = "enabled" or temperature ≠ null
Top K `top_k`	integer (0…+∞)	0	Limits token sampling to the top K most likely next tokens.	Not when thinking.type = "enabled"

Reasoning 3 params

Parameter	Type	Default	Description	Condition
Thinking mode `thinking.type`	enum (disabled \| enabled)	"disabled"	Controls the Anthropic thinking mode values supported by this model.	—
Budget tokens `thinking.budget_tokens`	integer (1024…+∞)	4096	Maximum token budget Anthropic may use for extended thinking before producing the final answer.	Only when thinking.type = "enabled"
Thinking display `thinking.display`	enum (summarized \| omitted)	"summarized"	Controls whether Anthropic returns summarized or omitted thinking content.	Only when thinking.type = "enabled"

Open Claude Opus 4.1 20250805 page →

Claude Opus 4.1 20250805 Anthropic Subscription 7 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max tokens `max_tokens`	integer (1…+∞)	4096	Maximum number of output tokens the model may generate.	—

Sampling 3 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number (0…1 step 0.1)	1	Controls randomness. Lower values make outputs more focused; higher values make them more varied.	Not when thinking.type = "enabled" or top_p ≠ null
Top P `top_p`	number (0…1 step 0.01)	1	Controls nucleus sampling by limiting generation to tokens whose cumulative probability reaches this value.	Not when thinking.type = "enabled" or temperature ≠ null
Top K `top_k`	integer (0…+∞)	0	Limits token sampling to the top K most likely next tokens.	Not when thinking.type = "enabled"

Reasoning 3 params

Parameter	Type	Default	Description	Condition
Thinking mode `thinking.type`	enum (disabled \| enabled)	"disabled"	Controls the Anthropic thinking mode values supported by this model.	—
Budget tokens `thinking.budget_tokens`	integer (1024…+∞)	4096	Maximum token budget Anthropic may use for extended thinking before producing the final answer.	Only when thinking.type = "enabled"
Thinking display `thinking.display`	enum (summarized \| omitted)	"summarized"	Controls whether Anthropic returns summarized or omitted thinking content.	Only when thinking.type = "enabled"

Open Claude Opus 4.1 20250805 page →

Claude Opus 4 20250514 Anthropic 7 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max tokens `max_tokens`	integer (1…+∞)	4096	Maximum number of output tokens the model may generate.	—

Sampling 3 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number (0…1 step 0.1)	1	Controls randomness. Lower values make outputs more focused; higher values make them more varied.	Not when thinking.type = "enabled"
Top P `top_p`	number (0…1 step 0.01)	1	Controls nucleus sampling by limiting generation to tokens whose cumulative probability reaches this value.	Not when thinking.type = "enabled"
Top K `top_k`	integer (0…+∞)	0	Limits token sampling to the top K most likely next tokens.	Not when thinking.type = "enabled"

Reasoning 3 params

Parameter	Type	Default	Description	Condition
Thinking mode `thinking.type`	enum (disabled \| enabled)	"disabled"	Controls the Anthropic thinking mode values supported by this model.	—
Budget tokens `thinking.budget_tokens`	integer (1024…+∞)	4096	Maximum token budget Anthropic may use for extended thinking before producing the final answer.	Only when thinking.type = "enabled"
Thinking display `thinking.display`	enum (summarized \| omitted)	"summarized"	Controls whether Anthropic returns summarized or omitted thinking content.	Only when thinking.type = "enabled"

Open Claude Opus 4 20250514 page →

Claude Opus 4 20250514 Anthropic Subscription 7 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max tokens `max_tokens`	integer (1…+∞)	4096	Maximum number of output tokens the model may generate.	—

Sampling 3 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number (0…1 step 0.1)	1	Controls randomness. Lower values make outputs more focused; higher values make them more varied.	Not when thinking.type = "enabled"
Top P `top_p`	number (0…1 step 0.01)	1	Controls nucleus sampling by limiting generation to tokens whose cumulative probability reaches this value.	Not when thinking.type = "enabled"
Top K `top_k`	integer (0…+∞)	0	Limits token sampling to the top K most likely next tokens.	Not when thinking.type = "enabled"

Reasoning 3 params

Parameter	Type	Default	Description	Condition
Thinking mode `thinking.type`	enum (disabled \| enabled)	"disabled"	Controls the Anthropic thinking mode values supported by this model.	—
Budget tokens `thinking.budget_tokens`	integer (1024…+∞)	4096	Maximum token budget Anthropic may use for extended thinking before producing the final answer.	Only when thinking.type = "enabled"
Thinking display `thinking.display`	enum (summarized \| omitted)	"summarized"	Controls whether Anthropic returns summarized or omitted thinking content.	Only when thinking.type = "enabled"

Open Claude Opus 4 20250514 page →

Claude Opus 4.5 20251101 Anthropic 8 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max tokens `max_tokens`	integer (1…+∞)	4096	Maximum number of output tokens the model may generate.	—

Sampling 3 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number (0…1 step 0.1)	1	Controls randomness. Lower values make outputs more focused; higher values make them more varied.	Not when thinking.type = "enabled" or top_p ≠ null
Top P `top_p`	number (0…1 step 0.01)	1	Controls nucleus sampling by limiting generation to tokens whose cumulative probability reaches this value.	Not when thinking.type = "enabled" or temperature ≠ null
Top K `top_k`	integer (0…+∞)	0	Limits token sampling to the top K most likely next tokens.	Not when thinking.type = "enabled"

Reasoning 4 params

Parameter	Type	Default	Description	Condition
Thinking mode `thinking.type`	enum (disabled \| enabled)	"disabled"	Controls the Anthropic thinking mode values supported by this model.	—
Budget tokens `thinking.budget_tokens`	integer (1024…+∞)	4096	Maximum token budget Anthropic may use for extended thinking before producing the final answer.	Only when thinking.type = "enabled"
Thinking display `thinking.display`	enum (summarized \| omitted)	"summarized"	Controls whether Anthropic returns summarized or omitted thinking content.	Only when thinking.type = "enabled"
Effort `output_config.effort`	enum (low \| medium \| high)	"high"	Controls Anthropic response thoroughness and token spend.	—

Open Claude Opus 4.5 20251101 page →

Claude Opus 4.5 20251101 Anthropic Subscription 8 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max tokens `max_tokens`	integer (1…+∞)	4096	Maximum number of output tokens the model may generate.	—

Sampling 3 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number (0…1 step 0.1)	1	Controls randomness. Lower values make outputs more focused; higher values make them more varied.	Not when thinking.type = "enabled" or top_p ≠ null
Top P `top_p`	number (0…1 step 0.01)	1	Controls nucleus sampling by limiting generation to tokens whose cumulative probability reaches this value.	Not when thinking.type = "enabled" or temperature ≠ null
Top K `top_k`	integer (0…+∞)	0	Limits token sampling to the top K most likely next tokens.	Not when thinking.type = "enabled"

Reasoning 4 params

Parameter	Type	Default	Description	Condition
Thinking mode `thinking.type`	enum (disabled \| enabled)	"disabled"	Controls the Anthropic thinking mode values supported by this model.	—
Budget tokens `thinking.budget_tokens`	integer (1024…+∞)	4096	Maximum token budget Anthropic may use for extended thinking before producing the final answer.	Only when thinking.type = "enabled"
Thinking display `thinking.display`	enum (summarized \| omitted)	"summarized"	Controls whether Anthropic returns summarized or omitted thinking content.	Only when thinking.type = "enabled"
Effort `output_config.effort`	enum (low \| medium \| high)	"high"	Controls Anthropic response thoroughness and token spend.	—

Open Claude Opus 4.5 20251101 page →

Claude Opus 4.6 Anthropic 8 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max tokens `max_tokens`	integer (1…+∞)	4096	Maximum number of output tokens the model may generate.	—

Sampling 3 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number (0…1 step 0.1)	1	Controls randomness. Lower values make outputs more focused; higher values make them more varied.	Not when thinking.type ∈ {"enabled", "adaptive"} or top_p ≠ null
Top P `top_p`	number (0…1 step 0.01)	1	Controls nucleus sampling by limiting generation to tokens whose cumulative probability reaches this value.	Not when thinking.type ∈ {"enabled", "adaptive"} or temperature ≠ null
Top K `top_k`	integer (0…+∞)	0	Limits token sampling to the top K most likely next tokens.	Not when thinking.type ∈ {"enabled", "adaptive"}

Reasoning 4 params

Parameter	Type	Default	Description	Condition
Thinking mode `thinking.type`	enum (disabled \| adaptive \| enabled)	"disabled"	Controls the Anthropic thinking mode values supported by this model.	—
Budget tokens `thinking.budget_tokens`	integer (1024…+∞)	4096	Maximum token budget Anthropic may use for extended thinking before producing the final answer.	Only when thinking.type = "enabled"
Thinking display `thinking.display`	enum (summarized \| omitted)	"summarized"	Controls whether Anthropic returns summarized or omitted thinking content.	Only when thinking.type ∈ {"adaptive", "enabled"}
Effort `output_config.effort`	enum (low \| medium \| high \| max)	"high"	Controls Anthropic response thoroughness and token spend.	—

Open Claude Opus 4.6 page →

Claude Opus 4.6 Anthropic Subscription 8 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max tokens `max_tokens`	integer (1…+∞)	4096	Maximum number of output tokens the model may generate.	—

Sampling 3 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number (0…1 step 0.1)	1	Controls randomness. Lower values make outputs more focused; higher values make them more varied.	Not when thinking.type ∈ {"enabled", "adaptive"} or top_p ≠ null
Top P `top_p`	number (0…1 step 0.01)	1	Controls nucleus sampling by limiting generation to tokens whose cumulative probability reaches this value.	Not when thinking.type ∈ {"enabled", "adaptive"} or temperature ≠ null
Top K `top_k`	integer (0…+∞)	0	Limits token sampling to the top K most likely next tokens.	Not when thinking.type ∈ {"enabled", "adaptive"}

Reasoning 4 params

Parameter	Type	Default	Description	Condition
Thinking mode `thinking.type`	enum (disabled \| adaptive \| enabled)	"disabled"	Controls the Anthropic thinking mode values supported by this model.	—
Budget tokens `thinking.budget_tokens`	integer (1024…+∞)	4096	Maximum token budget Anthropic may use for extended thinking before producing the final answer.	Only when thinking.type = "enabled"
Thinking display `thinking.display`	enum (summarized \| omitted)	"summarized"	Controls whether Anthropic returns summarized or omitted thinking content.	Only when thinking.type ∈ {"adaptive", "enabled"}
Effort `output_config.effort`	enum (low \| medium \| high \| max)	"high"	Controls Anthropic response thoroughness and token spend.	—

Open Claude Opus 4.6 page →

Claude Opus 4.7 Anthropic 4 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max tokens `max_tokens`	integer (1…+∞)	4096	Maximum number of output tokens the model may generate.	—

Reasoning 3 params

Parameter	Type	Default	Description	Condition
Thinking mode `thinking.type`	enum (disabled \| adaptive)	"disabled"	Controls the Anthropic thinking mode values supported by this model.	—
Thinking display `thinking.display`	enum (summarized \| omitted)	"omitted"	Controls whether Anthropic returns summarized or omitted thinking content.	Only when thinking.type = "adaptive"
Effort `output_config.effort`	enum (low \| medium \| high \| xhigh \| max)	"high"	Controls Anthropic response thoroughness and token spend.	—

Open Claude Opus 4.7 page →

Claude Opus 4.7 Anthropic Subscription 4 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max tokens `max_tokens`	integer (1…+∞)	4096	Maximum number of output tokens the model may generate.	—

Reasoning 3 params

Parameter	Type	Default	Description	Condition
Thinking mode `thinking.type`	enum (disabled \| adaptive)	"disabled"	Controls the Anthropic thinking mode values supported by this model.	—
Thinking display `thinking.display`	enum (summarized \| omitted)	"omitted"	Controls whether Anthropic returns summarized or omitted thinking content.	Only when thinking.type = "adaptive"
Effort `output_config.effort`	enum (low \| medium \| high \| xhigh \| max)	"high"	Controls Anthropic response thoroughness and token spend.	—

Open Claude Opus 4.7 page →

Claude Opus 4.8 Anthropic 4 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max tokens `max_tokens`	integer (1…+∞)	4096	Maximum number of output tokens the model may generate.	—

Reasoning 3 params

Parameter	Type	Default	Description	Condition
Thinking mode `thinking.type`	enum (disabled \| adaptive)	"disabled"	Controls the Anthropic thinking mode values supported by this model.	—
Thinking display `thinking.display`	enum (summarized \| omitted)	"omitted"	Controls whether Anthropic returns summarized or omitted thinking content.	Only when thinking.type = "adaptive"
Effort `output_config.effort`	enum (low \| medium \| high \| xhigh \| max)	"high"	Controls Anthropic response thoroughness and token spend.	—

Open Claude Opus 4.8 page →

Claude Opus 4.8 Anthropic Subscription 4 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max tokens `max_tokens`	integer (1…+∞)	4096	Maximum number of output tokens the model may generate.	—

Reasoning 3 params

Parameter	Type	Default	Description	Condition
Thinking mode `thinking.type`	enum (disabled \| adaptive)	"disabled"	Controls the Anthropic thinking mode values supported by this model.	—
Thinking display `thinking.display`	enum (summarized \| omitted)	"omitted"	Controls whether Anthropic returns summarized or omitted thinking content.	Only when thinking.type = "adaptive"
Effort `output_config.effort`	enum (low \| medium \| high \| xhigh \| max)	"high"	Controls Anthropic response thoroughness and token spend.	—

Open Claude Opus 4.8 page →

Claude Opus 4 Anthropic Subscription 6 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max tokens `max_tokens`	integer (1…+∞)	4096	Maximum number of output tokens the model may generate.	—

Sampling 3 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number (0…1 step 0.1)	1	Controls randomness. Lower values make outputs more focused; higher values make them more varied.	Not when thinking.type ∈ {"adaptive", "enabled"}
Top P `top_p`	number (0…1 step 0.01)	1	Controls nucleus sampling by limiting generation to tokens whose cumulative probability reaches this value.	Not when thinking.type ∈ {"adaptive", "enabled"} or temperature ≠ 1
Top K `top_k`	integer (0…+∞)	0	Limits token sampling to the top K most likely next tokens.	Not when thinking.type ∈ {"adaptive", "enabled"}

Reasoning 2 params

Parameter	Type	Default	Description	Condition
Thinking mode `thinking.type`	enum (disabled \| adaptive \| enabled)	"disabled"	Controls the Anthropic thinking mode values supported by this model.	—
Budget tokens `thinking.budget_tokens`	integer (1024…+∞)	4096	Maximum token budget Anthropic may use for extended thinking before producing the final answer.	Only when thinking.type = "enabled"

Open Claude Opus 4 page →

Claude Sonnet 4 20250514 Anthropic 7 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max tokens `max_tokens`	integer (1…+∞)	4096	Maximum number of output tokens the model may generate.	—

Sampling 3 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number (0…1 step 0.1)	1	Controls randomness. Lower values make outputs more focused; higher values make them more varied.	Not when thinking.type = "enabled"
Top P `top_p`	number (0…1 step 0.01)	1	Controls nucleus sampling by limiting generation to tokens whose cumulative probability reaches this value.	Not when thinking.type = "enabled"
Top K `top_k`	integer (0…+∞)	0	Limits token sampling to the top K most likely next tokens.	Not when thinking.type = "enabled"

Reasoning 3 params

Parameter	Type	Default	Description	Condition
Thinking mode `thinking.type`	enum (disabled \| enabled)	"disabled"	Controls the Anthropic thinking mode values supported by this model.	—
Budget tokens `thinking.budget_tokens`	integer (1024…+∞)	4096	Maximum token budget Anthropic may use for extended thinking before producing the final answer.	Only when thinking.type = "enabled"
Thinking display `thinking.display`	enum (summarized \| omitted)	"summarized"	Controls whether Anthropic returns summarized or omitted thinking content.	Only when thinking.type = "enabled"

Open Claude Sonnet 4 20250514 page →

Claude Sonnet 4 20250514 Anthropic Subscription 7 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max tokens `max_tokens`	integer (1…+∞)	4096	Maximum number of output tokens the model may generate.	—

Sampling 3 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number (0…1 step 0.1)	1	Controls randomness. Lower values make outputs more focused; higher values make them more varied.	Not when thinking.type = "enabled"
Top P `top_p`	number (0…1 step 0.01)	1	Controls nucleus sampling by limiting generation to tokens whose cumulative probability reaches this value.	Not when thinking.type = "enabled"
Top K `top_k`	integer (0…+∞)	0	Limits token sampling to the top K most likely next tokens.	Not when thinking.type = "enabled"

Reasoning 3 params

Parameter	Type	Default	Description	Condition
Thinking mode `thinking.type`	enum (disabled \| enabled)	"disabled"	Controls the Anthropic thinking mode values supported by this model.	—
Budget tokens `thinking.budget_tokens`	integer (1024…+∞)	4096	Maximum token budget Anthropic may use for extended thinking before producing the final answer.	Only when thinking.type = "enabled"
Thinking display `thinking.display`	enum (summarized \| omitted)	"summarized"	Controls whether Anthropic returns summarized or omitted thinking content.	Only when thinking.type = "enabled"

Open Claude Sonnet 4 20250514 page →

Claude Sonnet 4.5 Anthropic 6 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max tokens `max_tokens`	integer (1…+∞)	4096	Maximum number of output tokens the model may generate.	—

Sampling 3 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number (0…1 step 0.1)	1	Controls randomness. Lower values make outputs more focused; higher values make them more varied.	Not when thinking.type ∈ {"adaptive", "enabled"}
Top P `top_p`	number (0…1 step 0.01)	1	Controls nucleus sampling by limiting generation to tokens whose cumulative probability reaches this value.	Not when thinking.type ∈ {"adaptive", "enabled"} or temperature ≠ 1
Top K `top_k`	integer (0…+∞)	0	Limits token sampling to the top K most likely next tokens.	Not when thinking.type ∈ {"adaptive", "enabled"}

Reasoning 2 params

Parameter	Type	Default	Description	Condition
Thinking mode `thinking.type`	enum (disabled \| adaptive \| enabled)	"disabled"	Controls the Anthropic thinking mode values supported by this model.	—
Budget tokens `thinking.budget_tokens`	integer (1024…+∞)	4096	Maximum token budget Anthropic may use for extended thinking before producing the final answer.	Only when thinking.type = "enabled"

Open Claude Sonnet 4.5 page →

Claude Sonnet 4.5 20250929 Anthropic 6 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max tokens `max_tokens`	integer (1…+∞)	4096	Maximum number of output tokens the model may generate.	—

Sampling 3 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number (0…1 step 0.1)	1	Controls randomness. Lower values make outputs more focused; higher values make them more varied.	Not when thinking.type = "enabled" or top_p ≠ null
Top P `top_p`	number (0…1 step 0.01)	1	Controls nucleus sampling by limiting generation to tokens whose cumulative probability reaches this value.	Not when thinking.type = "enabled" or temperature ≠ null
Top K `top_k`	integer (0…+∞)	0	Limits token sampling to the top K most likely next tokens.	Not when thinking.type = "enabled"

Reasoning 2 params

Parameter	Type	Default	Description	Condition
Thinking mode `thinking.type`	enum (disabled \| enabled)	"disabled"	Controls the Anthropic thinking mode values supported by this model.	—
Budget tokens `thinking.budget_tokens`	integer (1024…+∞)	4096	Maximum token budget Anthropic may use for extended thinking before producing the final answer.	Only when thinking.type = "enabled"

Open Claude Sonnet 4.5 20250929 page →

Claude Sonnet 4.5 20250929 Anthropic Subscription 6 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max tokens `max_tokens`	integer (1…+∞)	4096	Maximum number of output tokens the model may generate.	—

Sampling 3 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number (0…1 step 0.1)	1	Controls randomness. Lower values make outputs more focused; higher values make them more varied.	Not when thinking.type = "enabled" or top_p ≠ null
Top P `top_p`	number (0…1 step 0.01)	1	Controls nucleus sampling by limiting generation to tokens whose cumulative probability reaches this value.	Not when thinking.type = "enabled" or temperature ≠ null
Top K `top_k`	integer (0…+∞)	0	Limits token sampling to the top K most likely next tokens.	Not when thinking.type = "enabled"

Reasoning 2 params

Parameter	Type	Default	Description	Condition
Thinking mode `thinking.type`	enum (disabled \| enabled)	"disabled"	Controls the Anthropic thinking mode values supported by this model.	—
Budget tokens `thinking.budget_tokens`	integer (1024…+∞)	4096	Maximum token budget Anthropic may use for extended thinking before producing the final answer.	Only when thinking.type = "enabled"

Open Claude Sonnet 4.5 20250929 page →

Claude Sonnet 4.5 Anthropic Subscription 6 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max tokens `max_tokens`	integer (1…+∞)	4096	Maximum number of output tokens the model may generate.	—

Sampling 3 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number (0…1 step 0.1)	1	Controls randomness. Lower values make outputs more focused; higher values make them more varied.	Not when thinking.type ∈ {"adaptive", "enabled"}
Top P `top_p`	number (0…1 step 0.01)	1	Controls nucleus sampling by limiting generation to tokens whose cumulative probability reaches this value.	Not when thinking.type ∈ {"adaptive", "enabled"} or temperature ≠ 1
Top K `top_k`	integer (0…+∞)	0	Limits token sampling to the top K most likely next tokens.	Not when thinking.type ∈ {"adaptive", "enabled"}

Reasoning 2 params

Parameter	Type	Default	Description	Condition
Thinking mode `thinking.type`	enum (disabled \| adaptive \| enabled)	"disabled"	Controls the Anthropic thinking mode values supported by this model.	—
Budget tokens `thinking.budget_tokens`	integer (1024…+∞)	4096	Maximum token budget Anthropic may use for extended thinking before producing the final answer.	Only when thinking.type = "enabled"

Open Claude Sonnet 4.5 page →

Claude Sonnet 4.6 Anthropic 8 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max tokens `max_tokens`	integer (1…+∞)	4096	Maximum number of output tokens the model may generate.	—

Sampling 3 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number (0…1 step 0.1)	1	Controls randomness. Lower values make outputs more focused; higher values make them more varied.	Not when thinking.type ∈ {"enabled", "adaptive"} or top_p ≠ null
Top P `top_p`	number (0…1 step 0.01)	1	Controls nucleus sampling by limiting generation to tokens whose cumulative probability reaches this value.	Not when thinking.type ∈ {"enabled", "adaptive"} or temperature ≠ null
Top K `top_k`	integer (0…+∞)	0	Limits token sampling to the top K most likely next tokens.	Not when thinking.type ∈ {"enabled", "adaptive"}

Reasoning 4 params

Parameter	Type	Default	Description	Condition
Thinking mode `thinking.type`	enum (disabled \| adaptive \| enabled)	"disabled"	Controls the Anthropic thinking mode values supported by this model.	—
Budget tokens `thinking.budget_tokens`	integer (1024…+∞)	4096	Maximum token budget Anthropic may use for extended thinking before producing the final answer.	Only when thinking.type = "enabled"
Thinking display `thinking.display`	enum (summarized \| omitted)	"summarized"	Controls whether Anthropic returns summarized or omitted thinking content.	Only when thinking.type ∈ {"adaptive", "enabled"}
Effort `output_config.effort`	enum (low \| medium \| high \| max)	"high"	Controls Anthropic response thoroughness and token spend.	—

Open Claude Sonnet 4.6 page →

Claude Sonnet 4.6 Anthropic Subscription 8 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max tokens `max_tokens`	integer (1…+∞)	4096	Maximum number of output tokens the model may generate.	—

Sampling 3 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number (0…1 step 0.1)	1	Controls randomness. Lower values make outputs more focused; higher values make them more varied.	Not when thinking.type ∈ {"enabled", "adaptive"} or top_p ≠ null
Top P `top_p`	number (0…1 step 0.01)	1	Controls nucleus sampling by limiting generation to tokens whose cumulative probability reaches this value.	Not when thinking.type ∈ {"enabled", "adaptive"} or temperature ≠ null
Top K `top_k`	integer (0…+∞)	0	Limits token sampling to the top K most likely next tokens.	Not when thinking.type ∈ {"enabled", "adaptive"}

Reasoning 4 params

Parameter	Type	Default	Description	Condition
Thinking mode `thinking.type`	enum (disabled \| adaptive \| enabled)	"disabled"	Controls the Anthropic thinking mode values supported by this model.	—
Budget tokens `thinking.budget_tokens`	integer (1024…+∞)	4096	Maximum token budget Anthropic may use for extended thinking before producing the final answer.	Only when thinking.type = "enabled"
Thinking display `thinking.display`	enum (summarized \| omitted)	"summarized"	Controls whether Anthropic returns summarized or omitted thinking content.	Only when thinking.type ∈ {"adaptive", "enabled"}
Effort `output_config.effort`	enum (low \| medium \| high \| max)	"high"	Controls Anthropic response thoroughness and token spend.	—

Open Claude Sonnet 4.6 page →

Claude Sonnet 4 Anthropic Subscription 6 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max tokens `max_tokens`	integer (1…+∞)	4096	Maximum number of output tokens the model may generate.	—

Sampling 3 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number (0…1 step 0.1)	1	Controls randomness. Lower values make outputs more focused; higher values make them more varied.	Not when thinking.type ∈ {"adaptive", "enabled"}
Top P `top_p`	number (0…1 step 0.01)	1	Controls nucleus sampling by limiting generation to tokens whose cumulative probability reaches this value.	Not when thinking.type ∈ {"adaptive", "enabled"} or temperature ≠ 1
Top K `top_k`	integer (0…+∞)	0	Limits token sampling to the top K most likely next tokens.	Not when thinking.type ∈ {"adaptive", "enabled"}

Reasoning 2 params

Parameter	Type	Default	Description	Condition
Thinking mode `thinking.type`	enum (disabled \| adaptive \| enabled)	"disabled"	Controls the Anthropic thinking mode values supported by this model.	—
Budget tokens `thinking.budget_tokens`	integer (1024…+∞)	4096	Maximum token budget Anthropic may use for extended thinking before producing the final answer.	Only when thinking.type = "enabled"

Open Claude Sonnet 4 page →

Claude Sonnet 5 Anthropic 7 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max tokens `max_tokens`	integer (1…+∞)	4096	Maximum number of output tokens the model may generate.	—

Sampling 3 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number (0…1 step 0.1)	1	Controls randomness. Lower values make outputs more focused; higher values make them more varied.	Not when thinking.type = "adaptive" or top_p ≠ null
Top P `top_p`	number (0…1 step 0.01)	1	Controls nucleus sampling by limiting generation to tokens whose cumulative probability reaches this value.	Not when thinking.type = "adaptive" or temperature ≠ null
Top K `top_k`	integer (0…+∞)	0	Limits token sampling to the top K most likely next tokens.	Not when thinking.type = "adaptive"

Reasoning 3 params

Parameter	Type	Default	Description	Condition
Thinking mode `thinking.type`	enum (disabled \| adaptive)	"disabled"	Controls the Anthropic thinking mode values supported by this model.	—
Thinking display `thinking.display`	enum (summarized \| omitted)	"summarized"	Controls whether Anthropic returns summarized or omitted thinking content.	Only when thinking.type = "adaptive"
Effort `output_config.effort`	enum (low \| medium \| high \| max)	"high"	Controls Anthropic response thoroughness and token spend.	—

Open Claude Sonnet 5 page →

Gemini 2.5 Flash Google 8 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max output tokens `generationConfig.maxOutputTokens`	integer (1…65536)	—	Maximum number of tokens to include in a response candidate.	—

Sampling 4 params

Parameter	Type	Default	Description	Condition
Temperature `generationConfig.temperature`	number (0…2 step 0.1)	1	Controls randomness. Lower values make outputs more focused; higher values make them more varied.	—
Top P `generationConfig.topP`	number (0…1 step 0.01)	0.95	Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.	—
Top K `generationConfig.topK`	integer (0…+∞)	64	Limits token sampling to the top K most likely next tokens.	—
Seed `generationConfig.seed`	integer	—	Optional seed used for decoding when reproducible sampling is desired.	—

Reasoning 2 params

Parameter	Type	Default	Description	Condition
Thinking budget `generationConfig.thinkingConfig.thinkingBudget`	integer (-1…24576)	-1	Number of thinking tokens Gemini should use; 0 disables thinking and -1 uses dynamic thinking.	—
Include thoughts `generationConfig.thinkingConfig.includeThoughts`	boolean	false	Controls whether Gemini returns available thought summaries in the response parts.	—

Output 1 param

Parameter	Type	Default	Description	Condition
Response MIME type `generationConfig.responseMimeType`	enum (text/plain \| application/json)	"text/plain"	MIME type for generated text candidates.	—

Open Gemini 2.5 Flash page →

Gemini 2.5 Flash Lite Google 8 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max output tokens `generationConfig.maxOutputTokens`	integer (1…65536)	—	Maximum number of tokens to include in a response candidate.	—

Sampling 4 params

Parameter	Type	Default	Description	Condition
Temperature `generationConfig.temperature`	number (0…2 step 0.1)	1	Controls randomness. Lower values make outputs more focused; higher values make them more varied.	—
Top P `generationConfig.topP`	number (0…1 step 0.01)	0.95	Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.	—
Top K `generationConfig.topK`	integer (0…+∞)	64	Limits token sampling to the top K most likely next tokens.	—
Seed `generationConfig.seed`	integer	—	Optional seed used for decoding when reproducible sampling is desired.	—

Reasoning 2 params

Parameter	Type	Default	Description	Condition
Thinking budget `generationConfig.thinkingConfig.thinkingBudget`	integer	0	Number of thinking tokens Gemini should use; -1 uses dynamic thinking, 0 disables thinking, and fixed budgets start at 512 tokens.	—
Include thoughts `generationConfig.thinkingConfig.includeThoughts`	boolean	false	Controls whether Gemini returns available thought summaries in the response parts.	—

Output 1 param

Parameter	Type	Default	Description	Condition
Response MIME type `generationConfig.responseMimeType`	enum (text/plain \| application/json)	"text/plain"	MIME type for generated text candidates.	—

Open Gemini 2.5 Flash Lite page →

Gemini 2.5 Flash Lite Google Subscription 8 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max output tokens `generationConfig.maxOutputTokens`	integer (1…65536)	—	Maximum number of tokens to include in a response candidate.	—

Sampling 4 params

Parameter	Type	Default	Description	Condition
Temperature `generationConfig.temperature`	number (0…2 step 0.1)	1	Controls randomness. Lower values make outputs more focused; higher values make them more varied.	—
Top P `generationConfig.topP`	number (0…1 step 0.01)	0.95	Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.	—
Top K `generationConfig.topK`	integer (0…+∞)	64	Limits token sampling to the top K most likely next tokens.	—
Seed `generationConfig.seed`	integer	—	Optional seed used for decoding when reproducible sampling is desired.	—

Reasoning 2 params

Parameter	Type	Default	Description	Condition
Thinking budget `generationConfig.thinkingConfig.thinkingBudget`	integer	0	Number of thinking tokens Gemini should use; -1 uses dynamic thinking, 0 disables thinking, and fixed budgets start at 512 tokens.	—
Include thoughts `generationConfig.thinkingConfig.includeThoughts`	boolean	false	Controls whether Gemini returns available thought summaries in the response parts.	—

Output 1 param

Parameter	Type	Default	Description	Condition
Response MIME type `generationConfig.responseMimeType`	enum (text/plain \| application/json)	"text/plain"	MIME type for generated text candidates.	—

Open Gemini 2.5 Flash Lite page →

Gemini 2.5 Flash Google Subscription 8 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max output tokens `generationConfig.maxOutputTokens`	integer (1…65536)	—	Maximum number of tokens to include in a response candidate.	—

Sampling 4 params

Parameter	Type	Default	Description	Condition
Temperature `generationConfig.temperature`	number (0…2 step 0.1)	1	Controls randomness. Lower values make outputs more focused; higher values make them more varied.	—
Top P `generationConfig.topP`	number (0…1 step 0.01)	0.95	Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.	—
Top K `generationConfig.topK`	integer (0…+∞)	64	Limits token sampling to the top K most likely next tokens.	—
Seed `generationConfig.seed`	integer	—	Optional seed used for decoding when reproducible sampling is desired.	—

Reasoning 2 params

Parameter	Type	Default	Description	Condition
Thinking budget `generationConfig.thinkingConfig.thinkingBudget`	integer (-1…24576)	-1	Number of thinking tokens Gemini should use; 0 disables thinking and -1 uses dynamic thinking.	—
Include thoughts `generationConfig.thinkingConfig.includeThoughts`	boolean	false	Controls whether Gemini returns available thought summaries in the response parts.	—

Output 1 param

Parameter	Type	Default	Description	Condition
Response MIME type `generationConfig.responseMimeType`	enum (text/plain \| application/json)	"text/plain"	MIME type for generated text candidates.	—

Open Gemini 2.5 Flash page →

Gemini 2.5 Pro Google 8 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max output tokens `generationConfig.maxOutputTokens`	integer (1…65536)	—	Maximum number of tokens to include in a response candidate.	—

Sampling 4 params

Parameter	Type	Default	Description	Condition
Temperature `generationConfig.temperature`	number (0…2 step 0.1)	1	Controls randomness. Lower values make outputs more focused; higher values make them more varied.	—
Top P `generationConfig.topP`	number (0…1 step 0.01)	0.95	Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.	—
Top K `generationConfig.topK`	integer (0…+∞)	64	Limits token sampling to the top K most likely next tokens.	—
Seed `generationConfig.seed`	integer	—	Optional seed used for decoding when reproducible sampling is desired.	—

Reasoning 2 params

Parameter	Type	Default	Description	Condition
Thinking budget `generationConfig.thinkingConfig.thinkingBudget`	integer (128…32768)	—	Maximum number of thinking tokens Gemini should use before producing the final answer.	—
Include thoughts `generationConfig.thinkingConfig.includeThoughts`	boolean	false	Controls whether Gemini returns available thought summaries in the response parts.	—

Output 1 param

Parameter	Type	Default	Description	Condition
Response MIME type `generationConfig.responseMimeType`	enum (text/plain \| application/json)	"text/plain"	MIME type for generated text candidates.	—

Open Gemini 2.5 Pro page →

Gemini 2.5 Pro Google Subscription 8 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max output tokens `generationConfig.maxOutputTokens`	integer (1…65536)	—	Maximum number of tokens to include in a response candidate.	—

Sampling 4 params

Parameter	Type	Default	Description	Condition
Temperature `generationConfig.temperature`	number (0…2 step 0.1)	1	Controls randomness. Lower values make outputs more focused; higher values make them more varied.	—
Top P `generationConfig.topP`	number (0…1 step 0.01)	0.95	Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.	—
Top K `generationConfig.topK`	integer (0…+∞)	64	Limits token sampling to the top K most likely next tokens.	—
Seed `generationConfig.seed`	integer	—	Optional seed used for decoding when reproducible sampling is desired.	—

Reasoning 2 params

Parameter	Type	Default	Description	Condition
Thinking budget `generationConfig.thinkingConfig.thinkingBudget`	integer (128…32768)	—	Maximum number of thinking tokens Gemini should use before producing the final answer.	—
Include thoughts `generationConfig.thinkingConfig.includeThoughts`	boolean	false	Controls whether Gemini returns available thought summaries in the response parts.	—

Output 1 param

Parameter	Type	Default	Description	Condition
Response MIME type `generationConfig.responseMimeType`	enum (text/plain \| application/json)	"text/plain"	MIME type for generated text candidates.	—

Open Gemini 2.5 Pro page →

Gemini 3 Flash Preview Google Subscription 8 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max output tokens `generationConfig.maxOutputTokens`	integer (1…65536)	—	Maximum number of tokens to include in a response candidate.	—

Sampling 4 params

Parameter	Type	Default	Description	Condition
Temperature `generationConfig.temperature`	number (0…2 step 0.1)	1	Controls randomness. Lower values make outputs more focused; higher values make them more varied.	—
Top P `generationConfig.topP`	number (0…1 step 0.01)	0.95	Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.	—
Top K `generationConfig.topK`	integer (0…+∞)	64	Limits token sampling to the top K most likely next tokens.	—
Seed `generationConfig.seed`	integer	—	Optional seed used for decoding when reproducible sampling is desired.	—

Reasoning 2 params

Parameter	Type	Default	Description	Condition
Thinking level `generationConfig.thinkingConfig.thinkingLevel`	enum (minimal \| low \| medium \| high)	"high"	Controls Gemini 3 Flash reasoning effort.	—
Include thoughts `generationConfig.thinkingConfig.includeThoughts`	boolean	false	Controls whether Gemini returns available thought summaries in the response parts.	—

Output 1 param

Parameter	Type	Default	Description	Condition
Response MIME type `generationConfig.responseMimeType`	enum (text/plain \| application/json)	"text/plain"	MIME type for generated text candidates.	—

Open Gemini 3 Flash Preview page →

Gemini 3.1 Flash Lite Google 8 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max output tokens `generationConfig.maxOutputTokens`	integer (1…65536)	—	Maximum number of tokens to include in a response candidate.	—

Sampling 4 params

Parameter	Type	Default	Description	Condition
Temperature `generationConfig.temperature`	number (0…2 step 0.1)	1	Controls randomness. Lower values make outputs more focused; higher values make them more varied.	—
Top P `generationConfig.topP`	number (0…1 step 0.01)	0.95	Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.	—
Top K `generationConfig.topK`	integer (0…+∞)	64	Limits token sampling to the top K most likely next tokens.	—
Seed `generationConfig.seed`	integer	—	Optional seed used for decoding when reproducible sampling is desired.	—

Reasoning 2 params

Parameter	Type	Default	Description	Condition
Thinking level `generationConfig.thinkingConfig.thinkingLevel`	enum (minimal \| low \| medium \| high)	"minimal"	Controls Gemini 3.1 Flash-Lite reasoning effort.	—
Include thoughts `generationConfig.thinkingConfig.includeThoughts`	boolean	false	Controls whether Gemini returns available thought summaries in the response parts.	—

Output 1 param

Parameter	Type	Default	Description	Condition
Response MIME type `generationConfig.responseMimeType`	enum (text/plain \| application/json)	"text/plain"	MIME type for generated text candidates.	—

Open Gemini 3.1 Flash Lite page →

Gemini 3.1 Flash Lite Preview Google Subscription 8 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max output tokens `generationConfig.maxOutputTokens`	integer (1…65536)	—	Maximum number of tokens to include in a response candidate.	—

Sampling 4 params

Parameter	Type	Default	Description	Condition
Temperature `generationConfig.temperature`	number (0…2 step 0.1)	1	Controls randomness. Lower values make outputs more focused; higher values make them more varied.	—
Top P `generationConfig.topP`	number (0…1 step 0.01)	0.95	Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.	—
Top K `generationConfig.topK`	integer (0…+∞)	64	Limits token sampling to the top K most likely next tokens.	—
Seed `generationConfig.seed`	integer	—	Optional seed used for decoding when reproducible sampling is desired.	—

Reasoning 2 params

Parameter	Type	Default	Description	Condition
Thinking level `generationConfig.thinkingConfig.thinkingLevel`	enum (minimal \| low \| medium \| high)	"high"	Controls Gemini 3.1 Flash-Lite reasoning effort.	—
Include thoughts `generationConfig.thinkingConfig.includeThoughts`	boolean	false	Controls whether Gemini returns available thought summaries in the response parts.	—

Output 1 param

Parameter	Type	Default	Description	Condition
Response MIME type `generationConfig.responseMimeType`	enum (text/plain \| application/json)	"text/plain"	MIME type for generated text candidates.	—

Open Gemini 3.1 Flash Lite Preview page →

Gemini 3.1 Flash Lite Google Subscription 8 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max output tokens `generationConfig.maxOutputTokens`	integer (1…65536)	—	Maximum number of tokens to include in a response candidate.	—

Sampling 4 params

Parameter	Type	Default	Description	Condition
Temperature `generationConfig.temperature`	number (0…2 step 0.1)	1	Controls randomness. Lower values make outputs more focused; higher values make them more varied.	—
Top P `generationConfig.topP`	number (0…1 step 0.01)	0.95	Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.	—
Top K `generationConfig.topK`	integer (0…+∞)	64	Limits token sampling to the top K most likely next tokens.	—
Seed `generationConfig.seed`	integer	—	Optional seed used for decoding when reproducible sampling is desired.	—

Reasoning 2 params

Parameter	Type	Default	Description	Condition
Thinking level `generationConfig.thinkingConfig.thinkingLevel`	enum (minimal \| low \| medium \| high)	"minimal"	Controls Gemini 3.1 Flash-Lite reasoning effort.	—
Include thoughts `generationConfig.thinkingConfig.includeThoughts`	boolean	false	Controls whether Gemini returns available thought summaries in the response parts.	—

Output 1 param

Parameter	Type	Default	Description	Condition
Response MIME type `generationConfig.responseMimeType`	enum (text/plain \| application/json)	"text/plain"	MIME type for generated text candidates.	—

Open Gemini 3.1 Flash Lite page →

Gemini 3.1 Pro Preview Google Subscription 8 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max output tokens `generationConfig.maxOutputTokens`	integer (1…65536)	—	Maximum number of tokens to include in a response candidate.	—

Sampling 4 params

Parameter	Type	Default	Description	Condition
Temperature `generationConfig.temperature`	number (0…2 step 0.1)	1	Controls randomness. Lower values make outputs more focused; higher values make them more varied.	—
Top P `generationConfig.topP`	number (0…1 step 0.01)	0.95	Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.	—
Top K `generationConfig.topK`	integer (0…+∞)	64	Limits token sampling to the top K most likely next tokens.	—
Seed `generationConfig.seed`	integer	—	Optional seed used for decoding when reproducible sampling is desired.	—

Reasoning 2 params

Parameter	Type	Default	Description	Condition
Thinking level `generationConfig.thinkingConfig.thinkingLevel`	enum (low \| high)	"high"	Controls Gemini 3 Pro reasoning effort.	—
Include thoughts `generationConfig.thinkingConfig.includeThoughts`	boolean	false	Controls whether Gemini returns available thought summaries in the response parts.	—

Output 1 param

Parameter	Type	Default	Description	Condition
Response MIME type `generationConfig.responseMimeType`	enum (text/plain \| application/json)	"text/plain"	MIME type for generated text candidates.	—

Open Gemini 3.1 Pro Preview page →

Gemini 3.5 Flash Google 8 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max output tokens `generationConfig.maxOutputTokens`	integer (1…65536)	—	Maximum number of tokens to include in a response candidate.	—

Sampling 4 params

Parameter	Type	Default	Description	Condition
Temperature `generationConfig.temperature`	number (0…2 step 0.1)	1	Controls randomness. Lower values make outputs more focused; higher values make them more varied.	—
Top P `generationConfig.topP`	number (0…1 step 0.01)	0.95	Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.	—
Top K `generationConfig.topK`	integer (0…+∞)	64	Limits token sampling to the top K most likely next tokens.	—
Seed `generationConfig.seed`	integer	—	Optional seed used for decoding when reproducible sampling is desired.	—

Reasoning 2 params

Parameter	Type	Default	Description	Condition
Thinking level `generationConfig.thinkingConfig.thinkingLevel`	enum (minimal \| low \| medium \| high)	"medium"	Controls Gemini 3.5 Flash reasoning effort.	—
Include thoughts `generationConfig.thinkingConfig.includeThoughts`	boolean	false	Controls whether Gemini returns available thought summaries in the response parts.	—

Output 1 param

Parameter	Type	Default	Description	Condition
Response MIME type `generationConfig.responseMimeType`	enum (text/plain \| application/json)	"text/plain"	MIME type for generated text candidates.	—

Open Gemini 3.5 Flash page →

Gemini Flash Latest Google 8 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max output tokens `generationConfig.maxOutputTokens`	integer (1…65536)	—	Maximum number of tokens to include in a response candidate.	—

Sampling 4 params

Parameter	Type	Default	Description	Condition
Temperature `generationConfig.temperature`	number (0…2 step 0.1)	1	Controls randomness. Lower values make outputs more focused; higher values make them more varied.	—
Top P `generationConfig.topP`	number (0…1 step 0.01)	0.95	Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.	—
Top K `generationConfig.topK`	integer (0…+∞)	64	Limits token sampling to the top K most likely next tokens.	—
Seed `generationConfig.seed`	integer	—	Optional seed used for decoding when reproducible sampling is desired.	—

Reasoning 2 params

Parameter	Type	Default	Description	Condition
Thinking budget `generationConfig.thinkingConfig.thinkingBudget`	integer	0	Number of thinking tokens Gemini should use; -1 uses dynamic thinking, 0 disables thinking, and fixed budgets start at 512 tokens.	—
Include thoughts `generationConfig.thinkingConfig.includeThoughts`	boolean	false	Controls whether Gemini returns available thought summaries in the response parts.	—

Output 1 param

Parameter	Type	Default	Description	Condition
Response MIME type `generationConfig.responseMimeType`	enum (text/plain \| application/json)	"text/plain"	MIME type for generated text candidates.	—

Open Gemini Flash Latest page →

Gemma 3 12b It Google 4 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max completion tokens `max_completion_tokens`	integer (1…131072)	—	Maximum number of output tokens the model may generate; output shares the 128K context window with the input.	—

Sampling 3 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number	1	Controls randomness. Lower values make outputs more focused; higher values make them more varied.	—
Top P `top_p`	number (0…1)	0.95	Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.	—
Top K `top_k`	integer (0…+∞)	64	Limits generation to the selected number of highest-probability tokens.	—

Open Gemma 3 12b It page →

Gemma 3 1b It Google 4 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max completion tokens `max_completion_tokens`	integer (1…32768)	—	Maximum number of output tokens the model may generate; output shares the 32K context window with the input.	—

Sampling 3 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number	1	Controls randomness. Lower values make outputs more focused; higher values make them more varied.	—
Top P `top_p`	number (0…1)	0.95	Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.	—
Top K `top_k`	integer (0…+∞)	64	Limits generation to the selected number of highest-probability tokens.	—

Open Gemma 3 1b It page →

Gemma 3 27b It Google 4 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max completion tokens `max_completion_tokens`	integer (1…131072)	—	Maximum number of output tokens the model may generate; output shares the 128K context window with the input.	—

Sampling 3 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number	1	Controls randomness. Lower values make outputs more focused; higher values make them more varied.	—
Top P `top_p`	number (0…1)	0.95	Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.	—
Top K `top_k`	integer (0…+∞)	64	Limits generation to the selected number of highest-probability tokens.	—

Open Gemma 3 27b It page →

Gemma 3 4b It Google 4 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max completion tokens `max_completion_tokens`	integer (1…131072)	—	Maximum number of output tokens the model may generate; output shares the 128K context window with the input.	—

Sampling 3 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number	1	Controls randomness. Lower values make outputs more focused; higher values make them more varied.	—
Top P `top_p`	number (0…1)	0.95	Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.	—
Top K `top_k`	integer (0…+∞)	64	Limits generation to the selected number of highest-probability tokens.	—

Open Gemma 3 4b It page →

Gemma 3n E2B It Google 4 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max completion tokens `max_completion_tokens`	integer (1…32768)	—	Maximum number of output tokens the model may generate; output shares the 32K context window with the input.	—

Sampling 3 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number	1	Controls randomness. Lower values make outputs more focused; higher values make them more varied.	—
Top P `top_p`	number (0…1)	0.95	Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.	—
Top K `top_k`	integer (0…+∞)	64	Limits generation to the selected number of highest-probability tokens.	—

Open Gemma 3n E2B It page →

Gemma 3n E4B It Google 4 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max completion tokens `max_completion_tokens`	integer (1…32768)	—	Maximum number of output tokens the model may generate; output shares the 32K context window with the input.	—

Sampling 3 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number	1	Controls randomness. Lower values make outputs more focused; higher values make them more varied.	—
Top P `top_p`	number (0…1)	0.95	Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.	—
Top K `top_k`	integer (0…+∞)	64	Limits generation to the selected number of highest-probability tokens.	—

Open Gemma 3n E4B It page →

Gemma 4 12B It Google 4 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max completion tokens `max_completion_tokens`	integer (1…+∞)	—	Maximum number of output tokens the model may generate.	—

Sampling 3 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number	1	Controls randomness. Lower values make outputs more focused; higher values make them more varied. Google's standardized Gemma 4 sampling uses 1.0.	—
Top P `top_p`	number (0…1)	0.95	Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.	—
Top K `top_k`	integer (0…+∞)	64	Limits generation to the selected number of highest-probability tokens.	—

Open Gemma 4 12B It page →

Gemma 4 26b A4b It Google 9 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max output tokens `generationConfig.maxOutputTokens`	integer (1…65536)	—	Maximum number of tokens to include in a response candidate.	—

Sampling 4 params

Parameter	Type	Default	Description	Condition
Temperature `generationConfig.temperature`	number (0…2 step 0.1)	1	Controls randomness. Lower values make outputs more focused; higher values make them more varied.	—
Top P `generationConfig.topP`	number (0…1 step 0.01)	0.95	Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.	—
Top K `generationConfig.topK`	integer (0…+∞)	64	Limits token sampling to the top K most likely next tokens.	—
Seed `generationConfig.seed`	integer	—	Optional seed used for decoding when reproducible sampling is desired.	—

Reasoning 3 params

Parameter	Type	Default	Description	Condition
Thinking budget `generationConfig.thinkingConfig.thinkingBudget`	integer	0	Number of thinking tokens Gemini should use; -1 uses dynamic thinking, 0 disables thinking, and fixed budgets start at 512 tokens.	—
Thinking level `generationConfig.thinkingConfig.thinkingLevel`	enum (minimal \| high)	—	Toggles Gemma 4 reasoning; high enables thinking and minimal disables it.	—
Include thoughts `generationConfig.thinkingConfig.includeThoughts`	boolean	false	Controls whether Gemini returns available thought summaries in the response parts.	—

Output 1 param

Parameter	Type	Default	Description	Condition
Response MIME type `generationConfig.responseMimeType`	enum (text/plain \| application/json)	"text/plain"	MIME type for generated text candidates.	—

Open Gemma 4 26b A4b It page →

Gemma 4 31b It Google 9 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max output tokens `generationConfig.maxOutputTokens`	integer (1…65536)	—	Maximum number of tokens to include in a response candidate.	—

Sampling 4 params

Parameter	Type	Default	Description	Condition
Temperature `generationConfig.temperature`	number (0…2 step 0.1)	1	Controls randomness. Lower values make outputs more focused; higher values make them more varied.	—
Top P `generationConfig.topP`	number (0…1 step 0.01)	0.95	Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.	—
Top K `generationConfig.topK`	integer (0…+∞)	64	Limits token sampling to the top K most likely next tokens.	—
Seed `generationConfig.seed`	integer	—	Optional seed used for decoding when reproducible sampling is desired.	—

Reasoning 3 params

Parameter	Type	Default	Description	Condition
Thinking budget `generationConfig.thinkingConfig.thinkingBudget`	integer	0	Number of thinking tokens Gemini should use; -1 uses dynamic thinking, 0 disables thinking, and fixed budgets start at 512 tokens.	—
Thinking level `generationConfig.thinkingConfig.thinkingLevel`	enum (minimal \| high)	—	Toggles Gemma 4 reasoning; high enables thinking and minimal disables it.	—
Include thoughts `generationConfig.thinkingConfig.includeThoughts`	boolean	false	Controls whether Gemini returns available thought summaries in the response parts.	—

Output 1 param

Parameter	Type	Default	Description	Condition
Response MIME type `generationConfig.responseMimeType`	enum (text/plain \| application/json)	"text/plain"	MIME type for generated text candidates.	—

Open Gemma 4 31b It page →

Gemma 4 E2B It Google 4 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max completion tokens `max_completion_tokens`	integer (1…+∞)	—	Maximum number of output tokens the model may generate.	—

Sampling 3 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number	1	Controls randomness. Lower values make outputs more focused; higher values make them more varied. Google's standardized Gemma 4 sampling uses 1.0.	—
Top P `top_p`	number (0…1)	0.95	Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.	—
Top K `top_k`	integer (0…+∞)	64	Limits generation to the selected number of highest-probability tokens.	—

Open Gemma 4 E2B It page →

Gemma 4 E4B It Google 4 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max completion tokens `max_completion_tokens`	integer (1…+∞)	—	Maximum number of output tokens the model may generate.	—

Sampling 3 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number	1	Controls randomness. Lower values make outputs more focused; higher values make them more varied. Google's standardized Gemma 4 sampling uses 1.0.	—
Top P `top_p`	number (0…1)	0.95	Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.	—
Top K `top_k`	integer (0…+∞)	64	Limits generation to the selected number of highest-probability tokens.	—

Open Gemma 4 E4B It page →

GLM-4.5 Z.ai 6 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max tokens `max_tokens`	integer (1…+∞)	—	Maximum number of tokens to generate in the response.	—

Sampling 3 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number (0…1 step 0.1)	0.6	Controls randomness. Lower values make outputs more focused; higher values make them more varied.	Not when do_sample = false
Top P `top_p`	number (0.01…1 step 0.01)	0.95	Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.	Not when do_sample = false
Do sample `do_sample`	boolean	true	When false, the model uses greedy decoding and ignores temperature and top_p.	—

Reasoning 1 param

Parameter	Type	Default	Description	Condition
Thinking mode `thinking.type`	enum (enabled \| disabled)	"enabled"	Toggles the model's extended reasoning before it produces the final answer.	—

Output 1 param

Parameter	Type	Default	Description	Condition
Response format `response_format.type`	enum (text \| json_object)	"text"	Forces the response into plain text or a JSON object.	—

Open GLM-4.5 page →

GLM-4.5-Air Z.ai 6 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max tokens `max_tokens`	integer (1…+∞)	—	Maximum number of tokens to generate in the response.	—

Sampling 3 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number (0…1 step 0.1)	0.6	Controls randomness. Lower values make outputs more focused; higher values make them more varied.	Not when do_sample = false
Top P `top_p`	number (0.01…1 step 0.01)	0.95	Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.	Not when do_sample = false
Do sample `do_sample`	boolean	true	When false, the model uses greedy decoding and ignores temperature and top_p.	—

Reasoning 1 param

Parameter	Type	Default	Description	Condition
Thinking mode `thinking.type`	enum (enabled \| disabled)	"enabled"	Toggles the model's extended reasoning before it produces the final answer.	—

Output 1 param

Parameter	Type	Default	Description	Condition
Response format `response_format.type`	enum (text \| json_object)	"text"	Forces the response into plain text or a JSON object.	—

Open GLM-4.5-Air page →

GLM-4.5-Air Z.ai Subscription 6 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max tokens `max_tokens`	integer (1…+∞)	—	Maximum number of tokens to generate in the response.	—

Sampling 3 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number (0…1 step 0.1)	0.6	Controls randomness. Lower values make outputs more focused; higher values make them more varied.	Not when do_sample = false
Top P `top_p`	number (0.01…1 step 0.01)	0.95	Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.	Not when do_sample = false
Do sample `do_sample`	boolean	true	When false, the model uses greedy decoding and ignores temperature and top_p.	—

Reasoning 1 param

Parameter	Type	Default	Description	Condition
Thinking mode `thinking.type`	enum (enabled \| disabled)	"enabled"	Toggles the model's extended reasoning before it produces the final answer.	—

Output 1 param

Parameter	Type	Default	Description	Condition
Response format `response_format.type`	enum (text \| json_object)	"text"	Forces the response into plain text or a JSON object.	—

Open GLM-4.5-Air page →

GLM-4.5-AirX Z.ai 6 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max tokens `max_tokens`	integer (1…+∞)	—	Maximum number of tokens to generate in the response.	—

Sampling 3 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number (0…1 step 0.1)	0.6	Controls randomness. Lower values make outputs more focused; higher values make them more varied.	Not when do_sample = false
Top P `top_p`	number (0.01…1 step 0.01)	0.95	Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.	Not when do_sample = false
Do sample `do_sample`	boolean	true	When false, the model uses greedy decoding and ignores temperature and top_p.	—

Reasoning 1 param

Parameter	Type	Default	Description	Condition
Thinking mode `thinking.type`	enum (enabled \| disabled)	"enabled"	Toggles the model's extended reasoning before it produces the final answer.	—

Output 1 param

Parameter	Type	Default	Description	Condition
Response format `response_format.type`	enum (text \| json_object)	"text"	Forces the response into plain text or a JSON object.	—

Open GLM-4.5-AirX page →

GLM-4.5-Flash Z.ai 6 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max tokens `max_tokens`	integer (1…+∞)	—	Maximum number of tokens to generate in the response.	—

Sampling 3 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number (0…1 step 0.1)	0.6	Controls randomness. Lower values make outputs more focused; higher values make them more varied.	Not when do_sample = false
Top P `top_p`	number (0.01…1 step 0.01)	0.95	Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.	Not when do_sample = false
Do sample `do_sample`	boolean	true	When false, the model uses greedy decoding and ignores temperature and top_p.	—

Reasoning 1 param

Parameter	Type	Default	Description	Condition
Thinking mode `thinking.type`	enum (enabled \| disabled)	"enabled"	Toggles the model's extended reasoning before it produces the final answer.	—

Output 1 param

Parameter	Type	Default	Description	Condition
Response format `response_format.type`	enum (text \| json_object)	"text"	Forces the response into plain text or a JSON object.	—

Open GLM-4.5-Flash page →

GLM-4.5 Z.ai Subscription 6 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max tokens `max_tokens`	integer (1…+∞)	—	Maximum number of tokens to generate in the response.	—

Sampling 3 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number (0…1 step 0.1)	0.6	Controls randomness. Lower values make outputs more focused; higher values make them more varied.	Not when do_sample = false
Top P `top_p`	number (0.01…1 step 0.01)	0.95	Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.	Not when do_sample = false
Do sample `do_sample`	boolean	true	When false, the model uses greedy decoding and ignores temperature and top_p.	—

Reasoning 1 param

Parameter	Type	Default	Description	Condition
Thinking mode `thinking.type`	enum (enabled \| disabled)	"enabled"	Toggles the model's extended reasoning before it produces the final answer.	—

Output 1 param

Parameter	Type	Default	Description	Condition
Response format `response_format.type`	enum (text \| json_object)	"text"	Forces the response into plain text or a JSON object.	—

Open GLM-4.5 page →

GLM-4.5-X Z.ai 6 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max tokens `max_tokens`	integer (1…+∞)	—	Maximum number of tokens to generate in the response.	—

Sampling 3 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number (0…1 step 0.1)	0.6	Controls randomness. Lower values make outputs more focused; higher values make them more varied.	Not when do_sample = false
Top P `top_p`	number (0.01…1 step 0.01)	0.95	Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.	Not when do_sample = false
Do sample `do_sample`	boolean	true	When false, the model uses greedy decoding and ignores temperature and top_p.	—

Reasoning 1 param

Parameter	Type	Default	Description	Condition
Thinking mode `thinking.type`	enum (enabled \| disabled)	"enabled"	Toggles the model's extended reasoning before it produces the final answer.	—

Output 1 param

Parameter	Type	Default	Description	Condition
Response format `response_format.type`	enum (text \| json_object)	"text"	Forces the response into plain text or a JSON object.	—

Open GLM-4.5-X page →

GLM-4.6 Z.ai 6 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max tokens `max_tokens`	integer (1…+∞)	—	Maximum number of tokens to generate in the response.	—

Sampling 3 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number (0…1 step 0.1)	1	Controls randomness. Lower values make outputs more focused; higher values make them more varied.	Not when do_sample = false
Top P `top_p`	number (0.01…1 step 0.01)	0.95	Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.	Not when do_sample = false
Do sample `do_sample`	boolean	true	When false, the model uses greedy decoding and ignores temperature and top_p.	—

Reasoning 1 param

Parameter	Type	Default	Description	Condition
Thinking mode `thinking.type`	enum (enabled \| disabled)	"enabled"	Toggles the model's extended reasoning before it produces the final answer.	—

Output 1 param

Parameter	Type	Default	Description	Condition
Response format `response_format.type`	enum (text \| json_object)	"text"	Forces the response into plain text or a JSON object.	—

Open GLM-4.6 page →

GLM-4.6 Z.ai Subscription 6 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max tokens `max_tokens`	integer (1…+∞)	—	Maximum number of tokens to generate in the response.	—

Sampling 3 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number (0…1 step 0.1)	1	Controls randomness. Lower values make outputs more focused; higher values make them more varied.	Not when do_sample = false
Top P `top_p`	number (0.01…1 step 0.01)	0.95	Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.	Not when do_sample = false
Do sample `do_sample`	boolean	true	When false, the model uses greedy decoding and ignores temperature and top_p.	—

Reasoning 1 param

Parameter	Type	Default	Description	Condition
Thinking mode `thinking.type`	enum (enabled \| disabled)	"enabled"	Toggles the model's extended reasoning before it produces the final answer.	—

Output 1 param

Parameter	Type	Default	Description	Condition
Response format `response_format.type`	enum (text \| json_object)	"text"	Forces the response into plain text or a JSON object.	—

Open GLM-4.6 page →

GLM-4.7 Z.ai 6 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max tokens `max_tokens`	integer (1…+∞)	—	Maximum number of tokens to generate in the response.	—

Sampling 3 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number (0…1 step 0.1)	1	Controls randomness. Lower values make outputs more focused; higher values make them more varied.	Not when do_sample = false
Top P `top_p`	number (0.01…1 step 0.01)	0.95	Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.	Not when do_sample = false
Do sample `do_sample`	boolean	true	When false, the model uses greedy decoding and ignores temperature and top_p.	—

Reasoning 1 param

Parameter	Type	Default	Description	Condition
Thinking mode `thinking.type`	enum (enabled \| disabled)	"enabled"	Toggles the model's extended reasoning before it produces the final answer.	—

Output 1 param

Parameter	Type	Default	Description	Condition
Response format `response_format.type`	enum (text \| json_object)	"text"	Forces the response into plain text or a JSON object.	—

Open GLM-4.7 page →

GLM-4.7-Flash Z.ai 6 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max tokens `max_tokens`	integer (1…+∞)	—	Maximum number of tokens to generate in the response.	—

Sampling 3 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number (0…1 step 0.1)	1	Controls randomness. Lower values make outputs more focused; higher values make them more varied.	Not when do_sample = false
Top P `top_p`	number (0.01…1 step 0.01)	0.95	Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.	Not when do_sample = false
Do sample `do_sample`	boolean	true	When false, the model uses greedy decoding and ignores temperature and top_p.	—

Reasoning 1 param

Parameter	Type	Default	Description	Condition
Thinking mode `thinking.type`	enum (enabled \| disabled)	"enabled"	Toggles the model's extended reasoning before it produces the final answer.	—

Output 1 param

Parameter	Type	Default	Description	Condition
Response format `response_format.type`	enum (text \| json_object)	"text"	Forces the response into plain text or a JSON object.	—

Open GLM-4.7-Flash page →

GLM-4.7-FlashX Z.ai 6 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max tokens `max_tokens`	integer (1…+∞)	—	Maximum number of tokens to generate in the response.	—

Sampling 3 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number (0…1 step 0.1)	1	Controls randomness. Lower values make outputs more focused; higher values make them more varied.	Not when do_sample = false
Top P `top_p`	number (0.01…1 step 0.01)	0.95	Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.	Not when do_sample = false
Do sample `do_sample`	boolean	true	When false, the model uses greedy decoding and ignores temperature and top_p.	—

Reasoning 1 param

Parameter	Type	Default	Description	Condition
Thinking mode `thinking.type`	enum (enabled \| disabled)	"enabled"	Toggles the model's extended reasoning before it produces the final answer.	—

Output 1 param

Parameter	Type	Default	Description	Condition
Response format `response_format.type`	enum (text \| json_object)	"text"	Forces the response into plain text or a JSON object.	—

Open GLM-4.7-FlashX page →

GLM-4.7 Z.ai Subscription 6 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max tokens `max_tokens`	integer (1…+∞)	—	Maximum number of tokens to generate in the response.	—

Sampling 3 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number (0…1 step 0.1)	1	Controls randomness. Lower values make outputs more focused; higher values make them more varied.	Not when do_sample = false
Top P `top_p`	number (0.01…1 step 0.01)	0.95	Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.	Not when do_sample = false
Do sample `do_sample`	boolean	true	When false, the model uses greedy decoding and ignores temperature and top_p.	—

Reasoning 1 param

Parameter	Type	Default	Description	Condition
Thinking mode `thinking.type`	enum (enabled \| disabled)	"enabled"	Toggles the model's extended reasoning before it produces the final answer.	—

Output 1 param

Parameter	Type	Default	Description	Condition
Response format `response_format.type`	enum (text \| json_object)	"text"	Forces the response into plain text or a JSON object.	—

Open GLM-4.7 page →

GLM-5 Z.ai 6 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max tokens `max_tokens`	integer (1…+∞)	—	Maximum number of tokens to generate in the response.	—

Sampling 3 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number (0…1 step 0.1)	1	Controls randomness. Lower values make outputs more focused; higher values make them more varied.	Not when do_sample = false
Top P `top_p`	number (0.01…1 step 0.01)	0.95	Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.	Not when do_sample = false
Do sample `do_sample`	boolean	true	When false, the model uses greedy decoding and ignores temperature and top_p.	—

Reasoning 1 param

Parameter	Type	Default	Description	Condition
Thinking mode `thinking.type`	enum (enabled \| disabled)	"enabled"	Toggles the model's extended reasoning before it produces the final answer.	—

Output 1 param

Parameter	Type	Default	Description	Condition
Response format `response_format.type`	enum (text \| json_object)	"text"	Forces the response into plain text or a JSON object.	—

Open GLM-5 page →

GLM-5 Z.ai Subscription 6 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max tokens `max_tokens`	integer (1…+∞)	—	Maximum number of tokens to generate in the response.	—

Sampling 3 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number (0…1 step 0.1)	1	Controls randomness. Lower values make outputs more focused; higher values make them more varied.	Not when do_sample = false
Top P `top_p`	number (0.01…1 step 0.01)	0.95	Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.	Not when do_sample = false
Do sample `do_sample`	boolean	true	When false, the model uses greedy decoding and ignores temperature and top_p.	—

Reasoning 1 param

Parameter	Type	Default	Description	Condition
Thinking mode `thinking.type`	enum (enabled \| disabled)	"enabled"	Toggles the model's extended reasoning before it produces the final answer.	—

Output 1 param

Parameter	Type	Default	Description	Condition
Response format `response_format.type`	enum (text \| json_object)	"text"	Forces the response into plain text or a JSON object.	—

Open GLM-5 page →

GLM-5-Turbo Z.ai 6 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max tokens `max_tokens`	integer (1…+∞)	—	Maximum number of tokens to generate in the response.	—

Sampling 3 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number (0…1 step 0.1)	1	Controls randomness. Lower values make outputs more focused; higher values make them more varied.	Not when do_sample = false
Top P `top_p`	number (0.01…1 step 0.01)	0.95	Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.	Not when do_sample = false
Do sample `do_sample`	boolean	true	When false, the model uses greedy decoding and ignores temperature and top_p.	—

Reasoning 1 param

Parameter	Type	Default	Description	Condition
Thinking mode `thinking.type`	enum (enabled \| disabled)	"enabled"	Toggles the model's extended reasoning before it produces the final answer.	—

Output 1 param

Parameter	Type	Default	Description	Condition
Response format `response_format.type`	enum (text \| json_object)	"text"	Forces the response into plain text or a JSON object.	—

Open GLM-5-Turbo page →

GLM-5-Turbo Z.ai Subscription 6 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max tokens `max_tokens`	integer (1…+∞)	—	Maximum number of tokens to generate in the response.	—

Sampling 3 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number (0…1 step 0.1)	1	Controls randomness. Lower values make outputs more focused; higher values make them more varied.	Not when do_sample = false
Top P `top_p`	number (0.01…1 step 0.01)	0.95	Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.	Not when do_sample = false
Do sample `do_sample`	boolean	true	When false, the model uses greedy decoding and ignores temperature and top_p.	—

Reasoning 1 param

Parameter	Type	Default	Description	Condition
Thinking mode `thinking.type`	enum (enabled \| disabled)	"enabled"	Toggles the model's extended reasoning before it produces the final answer.	—

Output 1 param

Parameter	Type	Default	Description	Condition
Response format `response_format.type`	enum (text \| json_object)	"text"	Forces the response into plain text or a JSON object.	—

Open GLM-5-Turbo page →

GLM-5.1 Z.ai 6 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max tokens `max_tokens`	integer (1…+∞)	—	Maximum number of tokens to generate in the response.	—

Sampling 3 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number (0…1 step 0.1)	1	Controls randomness. Lower values make outputs more focused; higher values make them more varied.	Not when do_sample = false
Top P `top_p`	number (0.01…1 step 0.01)	0.95	Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.	Not when do_sample = false
Do sample `do_sample`	boolean	true	When false, the model uses greedy decoding and ignores temperature and top_p.	—

Reasoning 1 param

Parameter	Type	Default	Description	Condition
Thinking mode `thinking.type`	enum (enabled \| disabled)	"enabled"	Toggles the model's extended reasoning before it produces the final answer.	—

Output 1 param

Parameter	Type	Default	Description	Condition
Response format `response_format.type`	enum (text \| json_object)	"text"	Forces the response into plain text or a JSON object.	—

Open GLM-5.1 page →

GLM-5.1 Z.ai Subscription 6 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max tokens `max_tokens`	integer (1…+∞)	—	Maximum number of tokens to generate in the response.	—

Sampling 3 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number (0…1 step 0.1)	1	Controls randomness. Lower values make outputs more focused; higher values make them more varied.	Not when do_sample = false
Top P `top_p`	number (0.01…1 step 0.01)	0.95	Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.	Not when do_sample = false
Do sample `do_sample`	boolean	true	When false, the model uses greedy decoding and ignores temperature and top_p.	—

Reasoning 1 param

Parameter	Type	Default	Description	Condition
Thinking mode `thinking.type`	enum (enabled \| disabled)	"enabled"	Toggles the model's extended reasoning before it produces the final answer.	—

Output 1 param

Parameter	Type	Default	Description	Condition
Response format `response_format.type`	enum (text \| json_object)	"text"	Forces the response into plain text or a JSON object.	—

Open GLM-5.1 page →

GLM-5.2 Z.ai 7 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max tokens `max_tokens`	integer (1…131072)	65536	Maximum number of tokens to generate in the response.	—

Sampling 3 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number (0…1 step 0.1)	1	Controls randomness. Lower values make outputs more focused; higher values make them more varied.	Not when do_sample = false
Top P `top_p`	number (0.01…1 step 0.01)	0.95	Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.	Not when do_sample = false
Do sample `do_sample`	boolean	true	When false, the model uses greedy decoding and ignores temperature and top_p.	—

Reasoning 2 params

Parameter	Type	Default	Description	Condition
Thinking mode `thinking.type`	enum (enabled \| disabled)	"enabled"	Toggles the model's extended reasoning before it produces the final answer.	—
Reasoning effort `reasoning_effort`	enum (none \| minimal \| low \| medium \| high \| xhigh \| max)	"max"	Controls how much reasoning effort GLM-5.2 spends when thinking is enabled.	Only when thinking.type = "enabled"

Output 1 param

Parameter	Type	Default	Description	Condition
Response format `response_format.type`	enum (text \| json_object)	"text"	Forces the response into plain text or a JSON object.	—

Open GLM-5.2 page →

GLM-5.2 Z.ai Subscription 7 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max tokens `max_tokens`	integer (1…131072)	65536	Maximum number of tokens to generate in the response.	—

Sampling 3 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number (0…1 step 0.1)	1	Controls randomness. Lower values make outputs more focused; higher values make them more varied.	Not when do_sample = false
Top P `top_p`	number (0.01…1 step 0.01)	0.95	Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.	Not when do_sample = false
Do sample `do_sample`	boolean	true	When false, the model uses greedy decoding and ignores temperature and top_p.	—

Reasoning 2 params

Parameter	Type	Default	Description	Condition
Thinking mode `thinking.type`	enum (enabled \| disabled)	"enabled"	Toggles the model's extended reasoning before it produces the final answer.	—
Reasoning effort `reasoning_effort`	enum (none \| minimal \| low \| medium \| high \| xhigh \| max)	"max"	Controls how much reasoning effort GLM-5.2 spends when thinking is enabled.	Only when thinking.type = "enabled"

Output 1 param

Parameter	Type	Default	Description	Condition
Response format `response_format.type`	enum (text \| json_object)	"text"	Forces the response into plain text or a JSON object.	—

Open GLM-5.2 page →

MiniMax M2 MiniMax 4 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max completion tokens `max_completion_tokens`	integer (1…+∞)	—	Maximum number of tokens to generate in the completion.	—

Sampling 2 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number (0.01…1 step 0.01)	1	Controls randomness. Lower values make outputs more focused; higher values make them more varied. Values must be greater than 0 and at most 1.	—
Top P `top_p`	number (0.01…1 step 0.01)	0.95	Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.	—

Reasoning 1 param

Parameter	Type	Default	Description	Condition
Split reasoning `reasoning_split`	boolean	false	Returns the model's reasoning in a separate reasoning_details field instead of inline with the response.	—

Open MiniMax M2 page →

MiniMax M2 MiniMax Subscription 3 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max tokens `max_tokens`	integer (1…+∞)	—	Maximum number of tokens to generate in the response.	—

Sampling 2 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number (0.01…1 step 0.01)	1	Controls randomness. Lower values make outputs more focused; higher values make them more varied. Values must be greater than 0 and at most 1.	—
Top P `top_p`	number (0.01…1 step 0.01)	0.95	Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.	—

Open MiniMax M2 page →

MiniMax M2.1 MiniMax 4 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max completion tokens `max_completion_tokens`	integer (1…+∞)	—	Maximum number of tokens to generate in the completion.	—

Sampling 2 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number (0.01…1 step 0.01)	1	Controls randomness. Lower values make outputs more focused; higher values make them more varied. Values must be greater than 0 and at most 1.	—
Top P `top_p`	number (0.01…1 step 0.01)	0.95	Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.	—

Reasoning 1 param

Parameter	Type	Default	Description	Condition
Split reasoning `reasoning_split`	boolean	false	Returns the model's reasoning in a separate reasoning_details field instead of inline with the response.	—

Open MiniMax M2.1 page →

MiniMax M2.1 Highspeed MiniMax 4 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max completion tokens `max_completion_tokens`	integer (1…+∞)	—	Maximum number of tokens to generate in the completion.	—

Sampling 2 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number (0.01…1 step 0.01)	1	Controls randomness. Lower values make outputs more focused; higher values make them more varied. Values must be greater than 0 and at most 1.	—
Top P `top_p`	number (0.01…1 step 0.01)	0.95	Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.	—

Reasoning 1 param

Parameter	Type	Default	Description	Condition
Split reasoning `reasoning_split`	boolean	false	Returns the model's reasoning in a separate reasoning_details field instead of inline with the response.	—

Open MiniMax M2.1 Highspeed page →

MiniMax M2.1 Highspeed MiniMax Subscription 3 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max tokens `max_tokens`	integer (1…+∞)	—	Maximum number of tokens to generate in the response.	—

Sampling 2 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number (0.01…1 step 0.01)	1	Controls randomness. Lower values make outputs more focused; higher values make them more varied. Values must be greater than 0 and at most 1.	—
Top P `top_p`	number (0.01…1 step 0.01)	0.95	Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.	—

Open MiniMax M2.1 Highspeed page →

MiniMax M2.1 MiniMax Subscription 3 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max tokens `max_tokens`	integer (1…+∞)	—	Maximum number of tokens to generate in the response.	—

Sampling 2 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number (0.01…1 step 0.01)	1	Controls randomness. Lower values make outputs more focused; higher values make them more varied. Values must be greater than 0 and at most 1.	—
Top P `top_p`	number (0.01…1 step 0.01)	0.95	Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.	—

Open MiniMax M2.1 page →

MiniMax M2.5 MiniMax 4 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max completion tokens `max_completion_tokens`	integer (1…+∞)	—	Maximum number of tokens to generate in the completion.	—

Sampling 2 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number (0.01…1 step 0.01)	1	Controls randomness. Lower values make outputs more focused; higher values make them more varied. Values must be greater than 0 and at most 1.	—
Top P `top_p`	number (0.01…1 step 0.01)	0.95	Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.	—

Reasoning 1 param

Parameter	Type	Default	Description	Condition
Split reasoning `reasoning_split`	boolean	false	Returns the model's reasoning in a separate reasoning_details field instead of inline with the response.	—

Open MiniMax M2.5 page →

MiniMax M2.5 Highspeed MiniMax 4 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max completion tokens `max_completion_tokens`	integer (1…+∞)	—	Maximum number of tokens to generate in the completion.	—

Sampling 2 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number (0.01…1 step 0.01)	1	Controls randomness. Lower values make outputs more focused; higher values make them more varied. Values must be greater than 0 and at most 1.	—
Top P `top_p`	number (0.01…1 step 0.01)	0.95	Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.	—

Reasoning 1 param

Parameter	Type	Default	Description	Condition
Split reasoning `reasoning_split`	boolean	false	Returns the model's reasoning in a separate reasoning_details field instead of inline with the response.	—

Open MiniMax M2.5 Highspeed page →

MiniMax M2.5 Highspeed MiniMax Subscription 3 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max tokens `max_tokens`	integer (1…+∞)	—	Maximum number of tokens to generate in the response.	—

Sampling 2 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number (0.01…1 step 0.01)	1	Controls randomness. Lower values make outputs more focused; higher values make them more varied. Values must be greater than 0 and at most 1.	—
Top P `top_p`	number (0.01…1 step 0.01)	0.95	Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.	—

Open MiniMax M2.5 Highspeed page →

MiniMax M2.5 MiniMax Subscription 3 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max tokens `max_tokens`	integer (1…+∞)	—	Maximum number of tokens to generate in the response.	—

Sampling 2 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number (0.01…1 step 0.01)	1	Controls randomness. Lower values make outputs more focused; higher values make them more varied. Values must be greater than 0 and at most 1.	—
Top P `top_p`	number (0.01…1 step 0.01)	0.95	Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.	—

Open MiniMax M2.5 page →

MiniMax M2.7 MiniMax 4 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max completion tokens `max_completion_tokens`	integer (1…+∞)	—	Maximum number of tokens to generate in the completion.	—

Sampling 2 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number (0.01…1 step 0.01)	1	Controls randomness. Lower values make outputs more focused; higher values make them more varied. Values must be greater than 0 and at most 1.	—
Top P `top_p`	number (0.01…1 step 0.01)	0.95	Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.	—

Reasoning 1 param

Parameter	Type	Default	Description	Condition
Split reasoning `reasoning_split`	boolean	false	Returns the model's reasoning in a separate reasoning_details field instead of inline with the response.	—

Open MiniMax M2.7 page →

MiniMax M2.7 Highspeed MiniMax 4 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max completion tokens `max_completion_tokens`	integer (1…+∞)	—	Maximum number of tokens to generate in the completion.	—

Sampling 2 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number (0.01…1 step 0.01)	1	Controls randomness. Lower values make outputs more focused; higher values make them more varied. Values must be greater than 0 and at most 1.	—
Top P `top_p`	number (0.01…1 step 0.01)	0.95	Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.	—

Reasoning 1 param

Parameter	Type	Default	Description	Condition
Split reasoning `reasoning_split`	boolean	false	Returns the model's reasoning in a separate reasoning_details field instead of inline with the response.	—

Open MiniMax M2.7 Highspeed page →

MiniMax M2.7 Highspeed MiniMax Subscription 3 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max tokens `max_tokens`	integer (1…+∞)	—	Maximum number of tokens to generate in the response.	—

Sampling 2 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number (0.01…1 step 0.01)	1	Controls randomness. Lower values make outputs more focused; higher values make them more varied. Values must be greater than 0 and at most 1.	—
Top P `top_p`	number (0.01…1 step 0.01)	0.95	Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.	—

Open MiniMax M2.7 Highspeed page →

MiniMax M2.7 MiniMax Subscription 3 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max tokens `max_tokens`	integer (1…+∞)	—	Maximum number of tokens to generate in the response.	—

Sampling 2 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number (0.01…1 step 0.01)	1	Controls randomness. Lower values make outputs more focused; higher values make them more varied. Values must be greater than 0 and at most 1.	—
Top P `top_p`	number (0.01…1 step 0.01)	0.95	Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.	—

Open MiniMax M2.7 page →

Minimax M3 MiniMax 4 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max completion tokens `max_completion_tokens`	integer (1…+∞)	—	Maximum number of tokens to generate in the completion.	—

Sampling 2 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number (0.01…1 step 0.01)	1	Controls randomness. Lower values make outputs more focused; higher values make them more varied. Values must be greater than 0 and at most 1.	—
Top P `top_p`	number (0.01…1 step 0.01)	0.95	Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.	—

Reasoning 1 param

Parameter	Type	Default	Description	Condition
Split reasoning `reasoning_split`	boolean	false	Returns the model's reasoning in a separate reasoning_details field instead of inline with the response.	—

Open Minimax M3 page →

MiniMax M3 MiniMax Subscription 3 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max tokens `max_tokens`	integer (1…+∞)	—	Maximum number of tokens to generate in the response.	—

Sampling 2 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number (0.01…1 step 0.01)	1	Controls randomness. Lower values make outputs more focused; higher values make them more varied. Values must be greater than 0 and at most 1.	—
Top P `top_p`	number (0.01…1 step 0.01)	0.95	Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.	—

Open MiniMax M3 page →

Gliner Pii Nvidia 4 params

Sampling 1 param

Parameter	Type	Default	Description	Condition
Threshold `threshold`	number (0…1)	0.5	Confidence threshold for entity detection. Lower values detect more entities but may include false positives.	—

Metadata 3 params

Parameter	Type	Default	Description	Condition
Chunk length `chunk_length`	integer (1…2048)	384	Context window size for processing. Longer texts are automatically split into chunks with overlap for complete coverage. Must be greater than overlap.	—
Overlap `overlap`	integer (0…512)	128	Token overlap between chunks to prevent entity clipping. Must be less than chunk_length.	—
Flat NER `flat_ner`	boolean	false	When true, prevents overlapping entity spans. When false, may return nested entities such as both a full name and its constituent first name.	—

Open Gliner Pii page →

Llama 3.1 Nemoguard 8b Topic Control Nvidia 6 params

Length 2 params

Parameter	Type	Default	Description	Condition
Max tokens `max_tokens`	integer (1…+∞)	1024	Maximum number of tokens to generate. Generation stops when this limit is reached.	—
Stop `stop`	string	—	A string or list of strings where the API will stop generating further tokens. The returned text will not contain the stop sequence.	—

Sampling 4 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number (0…2)	0.5	Controls randomness. Lower values make outputs more focused; higher values make them more varied. Not recommended to modify both temperature and top_p in the same call.	—
Top P `top_p`	number (-∞…1)	1	Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability. Not recommended to modify both temperature and top_p in the same call.	—
Frequency penalty `frequency_penalty`	number (-2…2)	0	Penalizes new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.	—
Presence penalty `presence_penalty`	number (-2…2)	0	Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.	—

Open Llama 3.1 Nemoguard 8b Topic Control page →

Llama 3.1 Nemotron Nano 8b V1 Nvidia 7 params

Length 2 params

Parameter	Type	Default	Description	Condition
Max tokens `max_tokens`	integer (1…16384)	4096	Maximum number of tokens to generate. Generation stops when this limit is reached.	—
Stop `stop`	string	—	A string or list of strings where the API will stop generating further tokens. The returned text will not contain the stop sequence.	—

Sampling 5 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number (0…1)	0.6	Controls randomness. Lower values make outputs more focused; higher values make them more varied. Not recommended to modify both temperature and top_p in the same call.	—
Top P `top_p`	number (-∞…1)	0.95	Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability. Not recommended to modify both temperature and top_p in the same call.	—
Frequency penalty `frequency_penalty`	number (-2…2)	0	Penalizes new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.	—
Presence penalty `presence_penalty`	number (-2…2)	0	Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.	—
Seed `seed`	integer (0…18446744073709552000)	0	Best-effort deterministic sampling seed. Changing the seed produces a different response with similar characteristics. Fix the seed to reproduce results.	—

Open Llama 3.1 Nemotron Nano 8b V1 page →

Llama 3.1 Nemotron Safety Guard 8b V3 Nvidia 1 param

Sampling 1 param

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number (0…1)	0	Controls randomness. Lower values make outputs more focused; higher values make them more varied.	—

Open Llama 3.1 Nemotron Safety Guard 8b V3 page →

Llama 3.1 Nemotron Ultra 253b V1 Nvidia 7 params

Length 2 params

Parameter	Type	Default	Description	Condition
Max tokens `max_tokens`	integer (1…16384)	4096	Maximum number of tokens to generate. Generation stops when this limit is reached.	—
Stop `stop`	string	—	A string or list of strings where the API will stop generating further tokens. The returned text will not contain the stop sequence.	—

Sampling 5 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number (0…1)	0.6	Controls randomness. Lower values make outputs more focused; higher values make them more varied. Not recommended to modify both temperature and top_p in the same call.	—
Top P `top_p`	number (-∞…1)	0.95	Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability. Not recommended to modify both temperature and top_p in the same call.	—
Frequency penalty `frequency_penalty`	number (-2…2)	0	Penalizes new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.	—
Presence penalty `presence_penalty`	number (-2…2)	0	Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.	—
Seed `seed`	integer (0…18446744073709552000)	0	Best-effort deterministic sampling seed. Changing the seed produces a different response with similar characteristics. Fix the seed to reproduce results.	—

Open Llama 3.1 Nemotron Ultra 253b V1 page →

Llama 3.3 Nemotron Super 49b V1 Nvidia 7 params

Length 2 params

Parameter	Type	Default	Description	Condition
Max tokens `max_tokens`	integer (1…16384)	4096	Maximum number of tokens to generate. Generation stops when this limit is reached.	—
Stop `stop`	string	—	A string or list of strings where the API will stop generating further tokens. The returned text will not contain the stop sequence.	—

Sampling 5 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number (0…1)	0.6	Controls randomness. Lower values make outputs more focused; higher values make them more varied. Not recommended to modify both temperature and top_p in the same call.	—
Top P `top_p`	number (-∞…1)	0.95	Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability. Not recommended to modify both temperature and top_p in the same call.	—
Frequency penalty `frequency_penalty`	number (-2…2)	0	Penalizes new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.	—
Presence penalty `presence_penalty`	number (-2…2)	0	Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.	—
Seed `seed`	integer (0…18446744073709552000)	0	Best-effort deterministic sampling seed. Changing the seed produces a different response with similar characteristics. Fix the seed to reproduce results.	—

Open Llama 3.3 Nemotron Super 49b V1 page →

Llama 3.3 Nemotron Super 49b V1.5 Nvidia 7 params

Length 2 params

Parameter	Type	Default	Description	Condition
Max tokens `max_tokens`	integer (1…65536)	65536	Maximum number of tokens to generate. Generation stops when this limit is reached.	—
Stop `stop`	string	—	A string or list of strings where the API will stop generating further tokens. The returned text will not contain the stop sequence.	—

Sampling 5 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number (0…1)	0.6	Controls randomness. Lower values make outputs more focused; higher values make them more varied. Not recommended to modify both temperature and top_p in the same call.	—
Top P `top_p`	number (-∞…1)	0.95	Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability. Not recommended to modify both temperature and top_p in the same call.	—
Frequency penalty `frequency_penalty`	number (-2…2)	0	Penalizes new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.	—
Presence penalty `presence_penalty`	number (-2…2)	0	Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.	—
Seed `seed`	integer (0…18446744073709552000)	0	Best-effort deterministic sampling seed. Changing the seed produces a different response with similar characteristics. Fix the seed to reproduce results.	—

Open Llama 3.3 Nemotron Super 49b V1.5 page →

Nemoguard Jailbreak Detect Nvidia 0 params

No parameters documented yet.

Open Nemoguard Jailbreak Detect page →

Nemotron 3 Nano 30b A3b Nvidia 5 params

Length 2 params

Parameter	Type	Default	Description	Condition
Max tokens `max_tokens`	integer (1…32768)	16384	Maximum number of tokens to generate. Generation stops when this limit is reached.	—
Stop `stop`	string	—	A string or list of strings where the API will stop generating further tokens. The returned text will not contain the stop sequence.	—

Sampling 3 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number (-∞…1)	1	Controls randomness. Lower values make outputs more focused; higher values make them more varied. Not recommended to modify both temperature and top_p in the same call.	—
Top P `top_p`	number (-∞…1)	1	Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability. Not recommended to modify both temperature and top_p in the same call.	—
Seed `seed`	integer (0…18446744073709552000)	—	Best-effort deterministic sampling seed. Repeated requests with the same seed and parameters should return the same result.	—

Open Nemotron 3 Nano 30b A3b page →

Nemotron 3 Super 120b A12b Nvidia 7 params

Length 2 params

Parameter	Type	Default	Description	Condition
Max tokens `max_tokens`	integer (1…32768)	16384	Maximum number of tokens to generate. Generation stops when this limit is reached.	—
Stop `stop`	string	—	A string or list of strings where the API will stop generating further tokens. The returned text will not contain the stop sequence.	—

Sampling 3 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number (-∞…1)	1	Controls randomness. Lower values make outputs more focused; higher values make them more varied. Not recommended to modify both temperature and top_p in the same call.	—
Top P `top_p`	number (-∞…1)	0.95	Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability. Not recommended to modify both temperature and top_p in the same call.	—
Seed `seed`	integer (0…18446744073709552000)	—	Best-effort deterministic sampling seed. Repeated requests with the same seed and parameters should return the same result.	—

Reasoning 2 params

Parameter	Type	Default	Description	Condition
Reasoning effort `reasoning_effort`	enum (none \| low \| high)	"high"	Controls the reasoning mode. 'none' disables reasoning tokens, 'low' enables low-effort reasoning, and 'high' enables full reasoning.	—
Reasoning budget `reasoning_budget`	integer (-1…32768)	16384	Maximum number of tokens the model may use for internal reasoning before being forced to end the reasoning trace. Use -1 to disable budget enforcement.	—

Open Nemotron 3 Super 120b A12b page →

Nemotron 3 Ultra 550b A55b Nvidia 7 params

Length 2 params

Parameter	Type	Default	Description	Condition
Max tokens `max_tokens`	integer (1…32768)	16384	Maximum number of tokens to generate. Generation stops when this limit is reached.	—
Stop `stop`	string	—	A string or list of strings where the API will stop generating further tokens. The returned text will not contain the stop sequence.	—

Sampling 3 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number (-∞…1)	1	Controls randomness. Lower values make outputs more focused; higher values make them more varied. Not recommended to modify both temperature and top_p in the same call.	—
Top P `top_p`	number (-∞…1)	0.95	Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability. Not recommended to modify both temperature and top_p in the same call.	—
Seed `seed`	integer (0…18446744073709552000)	—	Best-effort deterministic sampling seed. Repeated requests with the same seed and parameters should return the same result.	—

Reasoning 2 params

Parameter	Type	Default	Description	Condition
Reasoning effort `reasoning_effort`	enum (none \| medium \| high)	"high"	Controls the reasoning mode. 'none' disables reasoning tokens, 'medium' enables efficient reasoning, and 'high' enables full reasoning.	—
Reasoning budget `reasoning_budget`	integer (-1…32768)	16384	Maximum number of tokens the model may use for internal reasoning before being forced to end the reasoning trace. Use -1 to disable budget enforcement.	—

Open Nemotron 3 Ultra 550b A55b page →

Nemotron 3 Ultra Nvidia Subscription 6 params

Length 2 params

Parameter	Type	Default	Description	Condition
Max tokens `max_tokens`	integer (1…32768)	16384	Maximum number of tokens to generate. Generation stops when this limit is reached.	—
Stop `stop`	string	—	A string or list of strings where the API will stop generating further tokens. The returned text will not contain the stop sequence.	—

Sampling 2 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number (-∞…1)	1	Controls randomness. Lower values make outputs more focused; higher values make them more varied. Not recommended to modify both temperature and top_p in the same call.	—
Top P `top_p`	number (-∞…1)	0.95	Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability. Not recommended to modify both temperature and top_p in the same call.	—

Reasoning 2 params

Parameter	Type	Default	Description	Condition
Reasoning effort `reasoning_effort`	enum (none \| medium \| high)	"high"	Controls the reasoning mode. 'none' disables reasoning tokens, 'medium' enables efficient reasoning, and 'high' enables full reasoning.	—
Reasoning budget `reasoning_budget`	integer (-1…32768)	16384	Maximum number of tokens the model may use for internal reasoning before being forced to end the reasoning trace. Use -1 to disable budget enforcement.	—

Open Nemotron 3 Ultra page →

Nemotron Content Safety Reasoning 4b Nvidia 5 params

Length 2 params

Parameter	Type	Default	Description	Condition
Max tokens `max_tokens`	integer (1…32768)	16384	Maximum number of tokens to generate. Generation stops when this limit is reached.	—
Stop `stop`	string	—	A string or list of strings where the API will stop generating further tokens. The returned text will not contain the stop sequence.	—

Sampling 3 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number (-∞…1)	1	Controls randomness. Lower values make outputs more focused; higher values make them more varied. Not recommended to modify both temperature and top_p in the same call.	—
Top P `top_p`	number (-∞…1)	1	Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability. Not recommended to modify both temperature and top_p in the same call.	—
Seed `seed`	integer (0…18446744073709552000)	—	Best-effort deterministic sampling seed. Repeated requests with the same seed and parameters should return the same result.	—

Open Nemotron Content Safety Reasoning 4b page →

Nemotron Mini 4b Instruct Nvidia 6 params

Length 2 params

Parameter	Type	Default	Description	Condition
Max tokens `max_tokens`	integer (1…4096)	1024	Maximum number of tokens to generate. Generation stops when this limit is reached.	—
Stop `stop`	string	—	A string or list of strings where the API will stop generating further tokens. The returned text will not contain the stop sequence.	—

Sampling 4 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number (0…1)	0.2	Controls randomness. Lower values make outputs more focused; higher values make them more varied. Not recommended to modify both temperature and top_p in the same call.	—
Top P `top_p`	number (-∞…1)	0.7	Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability. Not recommended to modify both temperature and top_p in the same call.	—
Frequency penalty `frequency_penalty`	number (-2…2)	0	Penalizes new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.	—
Presence penalty `presence_penalty`	number (-2…2)	0	Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.	—

Open Nemotron Mini 4b Instruct page →

Riva Translate 4b Instruct V1.1 Nvidia 6 params

Length 2 params

Parameter	Type	Default	Description	Condition
Max tokens `max_tokens`	integer (1…4096)	512	Maximum number of tokens to generate. Generation stops when this limit is reached.	—
Stop `stop`	string	—	A string or list of strings where the API will stop generating further tokens. The returned text will not contain the stop sequence.	—

Sampling 4 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number (0…1)	0	Controls randomness. Lower values make outputs more focused; higher values make them more varied. Not recommended to modify both temperature and top_p in the same call.	—
Top P `top_p`	number (-∞…1)	0.9	Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability. Not recommended to modify both temperature and top_p in the same call.	—
Frequency penalty `frequency_penalty`	number (-2…2)	0	Penalizes new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.	—
Presence penalty `presence_penalty`	number (-2…2)	0	Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.	—

Open Riva Translate 4b Instruct V1.1 page →

Usdcode Llama 3.1 70b Instruct Nvidia 4 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max tokens `max_tokens`	integer (1…2048)	1024	Maximum number of tokens to generate. Generation stops when this limit is reached.	—

Sampling 2 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number (0…1)	0.1	Controls randomness. Lower values make outputs more focused; higher values make them more varied. Not recommended to modify both temperature and top_p in the same call.	—
Top P `top_p`	number (-∞…1)	1	Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability. Not recommended to modify both temperature and top_p in the same call.	—

Metadata 1 param

Parameter	Type	Default	Description	Condition
Expert type `expert_type`	enum (auto \| code \| knowledge \| helperfunction)	"auto"	The type of expert to use. 'knowledge' answers with USD knowledge, 'code' responds with vanilla OpenUSD code, 'helperfunction' uses high-level helper functions, and 'auto' lets the LLM determine which expert to use.	—

Open Usdcode Llama 3.1 70b Instruct page →

Codestral Latest Mistral 9 params

Length 2 params

Parameter	Type	Default	Description	Condition
Max tokens `max_tokens`	integer (1…+∞)	—	Maximum number of tokens to generate in the completion.	—
Stop sequence `stop`	string	—	Stops generation when this string is detected.	—

Sampling 5 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number (0…1.5 step 0.1)	—	Controls randomness. Lower values make outputs more focused; higher values make them more varied.	—
Top P `top_p`	number (0…1 step 0.01)	1	Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.	—
Random seed `random_seed`	integer (0…+∞)	—	Seed used for deterministic sampling when reproducible outputs are desired.	—
Presence penalty `presence_penalty`	number (-2…2 step 0.1)	0	Penalizes repeated words or phrases to encourage a wider variety of generated content.	—
Frequency penalty `frequency_penalty`	number (-2…2 step 0.1)	0	Penalizes words based on how often they already appear in the generated text.	—

Output 1 param

Parameter	Type	Default	Description	Condition
Response format `response_format.type`	enum (text \| json_object)	"text"	Controls whether the model returns normal text or JSON mode output.	—

Metadata 1 param

Parameter	Type	Default	Description	Condition
Safe prompt `safe_prompt`	boolean	false	Controls whether Mistral injects its safety prompt before the conversation.	—

Open Codestral Latest page →

Devstral 2512 Mistral 9 params

Length 2 params

Parameter	Type	Default	Description	Condition
Max tokens `max_tokens`	integer (1…+∞)	—	Maximum number of tokens to generate in the completion.	—
Stop sequence `stop`	string	—	Stops generation when this string is detected.	—

Sampling 5 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number (0…1.5 step 0.1)	—	Controls randomness. Lower values make outputs more focused; higher values make them more varied.	—
Top P `top_p`	number (0…1 step 0.01)	1	Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.	—
Random seed `random_seed`	integer (0…+∞)	—	Seed used for deterministic sampling when reproducible outputs are desired.	—
Presence penalty `presence_penalty`	number (-2…2 step 0.1)	0	Penalizes repeated words or phrases to encourage a wider variety of generated content.	—
Frequency penalty `frequency_penalty`	number (-2…2 step 0.1)	0	Penalizes words based on how often they already appear in the generated text.	—

Output 1 param

Parameter	Type	Default	Description	Condition
Response format `response_format.type`	enum (text \| json_object)	"text"	Controls whether the model returns normal text or JSON mode output.	—

Metadata 1 param

Parameter	Type	Default	Description	Condition
Safe prompt `safe_prompt`	boolean	false	Controls whether Mistral injects its safety prompt before the conversation.	—

Open Devstral 2512 page →

Devstral Latest Mistral 9 params

Length 2 params

Parameter	Type	Default	Description	Condition
Max tokens `max_tokens`	integer (1…+∞)	—	Maximum number of tokens to generate in the completion.	—
Stop sequence `stop`	string	—	Stops generation when this string is detected.	—

Sampling 5 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number (0…1.5 step 0.1)	—	Controls randomness. Lower values make outputs more focused; higher values make them more varied.	—
Top P `top_p`	number (0…1 step 0.01)	1	Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.	—
Random seed `random_seed`	integer (0…+∞)	—	Seed used for deterministic sampling when reproducible outputs are desired.	—
Presence penalty `presence_penalty`	number (-2…2 step 0.1)	0	Penalizes repeated words or phrases to encourage a wider variety of generated content.	—
Frequency penalty `frequency_penalty`	number (-2…2 step 0.1)	0	Penalizes words based on how often they already appear in the generated text.	—

Output 1 param

Parameter	Type	Default	Description	Condition
Response format `response_format.type`	enum (text \| json_object)	"text"	Controls whether the model returns normal text or JSON mode output.	—

Metadata 1 param

Parameter	Type	Default	Description	Condition
Safe prompt `safe_prompt`	boolean	false	Controls whether Mistral injects its safety prompt before the conversation.	—

Open Devstral Latest page →

Magistral Medium Latest Mistral 10 params

Length 2 params

Parameter	Type	Default	Description	Condition
Max tokens `max_tokens`	integer (1…+∞)	—	Maximum number of tokens to generate in the completion.	—
Stop sequence `stop`	string	—	Stops generation when this string is detected.	—

Sampling 5 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number (0…1.5 step 0.1)	—	Controls randomness. Lower values make outputs more focused; higher values make them more varied.	—
Top P `top_p`	number (0…1 step 0.01)	1	Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.	—
Random seed `random_seed`	integer (0…+∞)	—	Seed used for deterministic sampling when reproducible outputs are desired.	—
Presence penalty `presence_penalty`	number (-2…2 step 0.1)	0	Penalizes repeated words or phrases to encourage a wider variety of generated content.	—
Frequency penalty `frequency_penalty`	number (-2…2 step 0.1)	0	Penalizes words based on how often they already appear in the generated text.	—

Reasoning 1 param

Parameter	Type	Default	Description	Condition
Prompt mode `prompt_mode`	enum (reasoning)	—	Enables Mistral's reasoning system prompt; leave unset to disable the default reasoning behavior.	—

Output 1 param

Parameter	Type	Default	Description	Condition
Response format `response_format.type`	enum (text \| json_object)	"text"	Controls whether the model returns normal text or JSON mode output.	—

Metadata 1 param

Parameter	Type	Default	Description	Condition
Safe prompt `safe_prompt`	boolean	false	Controls whether Mistral injects its safety prompt before the conversation.	—

Open Magistral Medium Latest page →

Magistral Small Latest Mistral 10 params

Length 2 params

Parameter	Type	Default	Description	Condition
Max tokens `max_tokens`	integer (1…+∞)	—	Maximum number of tokens to generate in the completion.	—
Stop sequence `stop`	string	—	Stops generation when this string is detected.	—

Sampling 5 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number (0…1.5 step 0.1)	—	Controls randomness. Lower values make outputs more focused; higher values make them more varied.	—
Top P `top_p`	number (0…1 step 0.01)	1	Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.	—
Random seed `random_seed`	integer (0…+∞)	—	Seed used for deterministic sampling when reproducible outputs are desired.	—
Presence penalty `presence_penalty`	number (-2…2 step 0.1)	0	Penalizes repeated words or phrases to encourage a wider variety of generated content.	—
Frequency penalty `frequency_penalty`	number (-2…2 step 0.1)	0	Penalizes words based on how often they already appear in the generated text.	—

Reasoning 1 param

Parameter	Type	Default	Description	Condition
Prompt mode `prompt_mode`	enum (reasoning)	—	Enables Mistral's reasoning system prompt; leave unset to disable the default reasoning behavior.	—

Output 1 param

Parameter	Type	Default	Description	Condition
Response format `response_format.type`	enum (text \| json_object)	"text"	Controls whether the model returns normal text or JSON mode output.	—

Metadata 1 param

Parameter	Type	Default	Description	Condition
Safe prompt `safe_prompt`	boolean	false	Controls whether Mistral injects its safety prompt before the conversation.	—

Open Magistral Small Latest page →

Ministral 14b Latest Mistral 9 params

Length 2 params

Parameter	Type	Default	Description	Condition
Max tokens `max_tokens`	integer (1…+∞)	—	Maximum number of tokens to generate in the completion.	—
Stop sequence `stop`	string	—	Stops generation when this string is detected.	—

Sampling 5 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number (0…1.5 step 0.1)	—	Controls randomness. Lower values make outputs more focused; higher values make them more varied.	—
Top P `top_p`	number (0…1 step 0.01)	1	Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.	—
Random seed `random_seed`	integer (0…+∞)	—	Seed used for deterministic sampling when reproducible outputs are desired.	—
Presence penalty `presence_penalty`	number (-2…2 step 0.1)	0	Penalizes repeated words or phrases to encourage a wider variety of generated content.	—
Frequency penalty `frequency_penalty`	number (-2…2 step 0.1)	0	Penalizes words based on how often they already appear in the generated text.	—

Output 1 param

Parameter	Type	Default	Description	Condition
Response format `response_format.type`	enum (text \| json_object)	"text"	Controls whether the model returns normal text or JSON mode output.	—

Metadata 1 param

Parameter	Type	Default	Description	Condition
Safe prompt `safe_prompt`	boolean	false	Controls whether Mistral injects its safety prompt before the conversation.	—

Open Ministral 14b Latest page →

Ministral 3b Latest Mistral 9 params

Length 2 params

Parameter	Type	Default	Description	Condition
Max tokens `max_tokens`	integer (1…+∞)	—	Maximum number of tokens to generate in the completion.	—
Stop sequence `stop`	string	—	Stops generation when this string is detected.	—

Sampling 5 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number (0…1.5 step 0.1)	—	Controls randomness. Lower values make outputs more focused; higher values make them more varied.	—
Top P `top_p`	number (0…1 step 0.01)	1	Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.	—
Random seed `random_seed`	integer (0…+∞)	—	Seed used for deterministic sampling when reproducible outputs are desired.	—
Presence penalty `presence_penalty`	number (-2…2 step 0.1)	0	Penalizes repeated words or phrases to encourage a wider variety of generated content.	—
Frequency penalty `frequency_penalty`	number (-2…2 step 0.1)	0	Penalizes words based on how often they already appear in the generated text.	—

Output 1 param

Parameter	Type	Default	Description	Condition
Response format `response_format.type`	enum (text \| json_object)	"text"	Controls whether the model returns normal text or JSON mode output.	—

Metadata 1 param

Parameter	Type	Default	Description	Condition
Safe prompt `safe_prompt`	boolean	false	Controls whether Mistral injects its safety prompt before the conversation.	—

Open Ministral 3b Latest page →

Ministral 8b Latest Mistral 9 params

Length 2 params

Parameter	Type	Default	Description	Condition
Max tokens `max_tokens`	integer (1…+∞)	—	Maximum number of tokens to generate in the completion.	—
Stop sequence `stop`	string	—	Stops generation when this string is detected.	—

Sampling 5 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number (0…1.5 step 0.1)	—	Controls randomness. Lower values make outputs more focused; higher values make them more varied.	—
Top P `top_p`	number (0…1 step 0.01)	1	Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.	—
Random seed `random_seed`	integer (0…+∞)	—	Seed used for deterministic sampling when reproducible outputs are desired.	—
Presence penalty `presence_penalty`	number (-2…2 step 0.1)	0	Penalizes repeated words or phrases to encourage a wider variety of generated content.	—
Frequency penalty `frequency_penalty`	number (-2…2 step 0.1)	0	Penalizes words based on how often they already appear in the generated text.	—

Output 1 param

Parameter	Type	Default	Description	Condition
Response format `response_format.type`	enum (text \| json_object)	"text"	Controls whether the model returns normal text or JSON mode output.	—

Metadata 1 param

Parameter	Type	Default	Description	Condition
Safe prompt `safe_prompt`	boolean	false	Controls whether Mistral injects its safety prompt before the conversation.	—

Open Ministral 8b Latest page →

Mistral Large Latest Mistral 9 params

Length 2 params

Parameter	Type	Default	Description	Condition
Max tokens `max_tokens`	integer (1…+∞)	—	Maximum number of tokens to generate in the completion.	—
Stop sequence `stop`	string	—	Stops generation when this string is detected.	—

Sampling 5 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number (0…1.5 step 0.1)	—	Controls randomness. Lower values make outputs more focused; higher values make them more varied.	—
Top P `top_p`	number (0…1 step 0.01)	1	Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.	—
Random seed `random_seed`	integer (0…+∞)	—	Seed used for deterministic sampling when reproducible outputs are desired.	—
Presence penalty `presence_penalty`	number (-2…2 step 0.1)	0	Penalizes repeated words or phrases to encourage a wider variety of generated content.	—
Frequency penalty `frequency_penalty`	number (-2…2 step 0.1)	0	Penalizes words based on how often they already appear in the generated text.	—

Output 1 param

Parameter	Type	Default	Description	Condition
Response format `response_format.type`	enum (text \| json_object)	"text"	Controls whether the model returns normal text or JSON mode output.	—

Metadata 1 param

Parameter	Type	Default	Description	Condition
Safe prompt `safe_prompt`	boolean	false	Controls whether Mistral injects its safety prompt before the conversation.	—

Open Mistral Large Latest page →

Mistral Medium 3.5 Mistral 9 params

Length 2 params

Parameter	Type	Default	Description	Condition
Max tokens `max_tokens`	integer (1…+∞)	—	Maximum number of tokens to generate in the completion.	—
Stop sequence `stop`	string	—	Stops generation when this string is detected.	—

Sampling 5 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number (0…1.5 step 0.1)	—	Controls randomness. Lower values make outputs more focused; higher values make them more varied.	—
Top P `top_p`	number (0…1 step 0.01)	1	Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.	—
Random seed `random_seed`	integer (0…+∞)	—	Seed used for deterministic sampling when reproducible outputs are desired.	—
Presence penalty `presence_penalty`	number (-2…2 step 0.1)	0	Penalizes repeated words or phrases to encourage a wider variety of generated content.	—
Frequency penalty `frequency_penalty`	number (-2…2 step 0.1)	0	Penalizes words based on how often they already appear in the generated text.	—

Output 1 param

Parameter	Type	Default	Description	Condition
Response format `response_format.type`	enum (text \| json_object)	"text"	Controls whether the model returns normal text or JSON mode output.	—

Metadata 1 param

Parameter	Type	Default	Description	Condition
Safe prompt `safe_prompt`	boolean	false	Controls whether Mistral injects its safety prompt before the conversation.	—

Open Mistral Medium 3.5 page →

Mistral Medium Latest Mistral 9 params

Length 2 params

Parameter	Type	Default	Description	Condition
Max tokens `max_tokens`	integer (1…+∞)	—	Maximum number of tokens to generate in the completion.	—
Stop sequence `stop`	string	—	Stops generation when this string is detected.	—

Sampling 5 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number (0…1.5 step 0.1)	—	Controls randomness. Lower values make outputs more focused; higher values make them more varied.	—
Top P `top_p`	number (0…1 step 0.01)	1	Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.	—
Random seed `random_seed`	integer (0…+∞)	—	Seed used for deterministic sampling when reproducible outputs are desired.	—
Presence penalty `presence_penalty`	number (-2…2 step 0.1)	0	Penalizes repeated words or phrases to encourage a wider variety of generated content.	—
Frequency penalty `frequency_penalty`	number (-2…2 step 0.1)	0	Penalizes words based on how often they already appear in the generated text.	—

Output 1 param

Parameter	Type	Default	Description	Condition
Response format `response_format.type`	enum (text \| json_object)	"text"	Controls whether the model returns normal text or JSON mode output.	—

Metadata 1 param

Parameter	Type	Default	Description	Condition
Safe prompt `safe_prompt`	boolean	false	Controls whether Mistral injects its safety prompt before the conversation.	—

Open Mistral Medium Latest page →

Mistral Small Latest Mistral 9 params

Length 2 params

Parameter	Type	Default	Description	Condition
Max tokens `max_tokens`	integer (1…+∞)	—	Maximum number of tokens to generate in the completion.	—
Stop sequence `stop`	string	—	Stops generation when this string is detected.	—

Sampling 5 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number (0…1.5 step 0.1)	—	Controls randomness. Lower values make outputs more focused; higher values make them more varied.	—
Top P `top_p`	number (0…1 step 0.01)	1	Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.	—
Random seed `random_seed`	integer (0…+∞)	—	Seed used for deterministic sampling when reproducible outputs are desired.	—
Presence penalty `presence_penalty`	number (-2…2 step 0.1)	0	Penalizes repeated words or phrases to encourage a wider variety of generated content.	—
Frequency penalty `frequency_penalty`	number (-2…2 step 0.1)	0	Penalizes words based on how often they already appear in the generated text.	—

Output 1 param

Parameter	Type	Default	Description	Condition
Response format `response_format.type`	enum (text \| json_object)	"text"	Controls whether the model returns normal text or JSON mode output.	—

Metadata 1 param

Parameter	Type	Default	Description	Condition
Safe prompt `safe_prompt`	boolean	false	Controls whether Mistral injects its safety prompt before the conversation.	—

Open Mistral Small Latest page →

Open Mistral Nemo Mistral 9 params

Length 2 params

Parameter	Type	Default	Description	Condition
Max tokens `max_tokens`	integer (1…+∞)	—	Maximum number of tokens to generate in the completion.	—
Stop sequence `stop`	string	—	Stops generation when this string is detected.	—

Sampling 5 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number (0…1.5 step 0.1)	—	Controls randomness. Lower values make outputs more focused; higher values make them more varied.	—
Top P `top_p`	number (0…1 step 0.01)	1	Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.	—
Random seed `random_seed`	integer (0…+∞)	—	Seed used for deterministic sampling when reproducible outputs are desired.	—
Presence penalty `presence_penalty`	number (-2…2 step 0.1)	0	Penalizes repeated words or phrases to encourage a wider variety of generated content.	—
Frequency penalty `frequency_penalty`	number (-2…2 step 0.1)	0	Penalizes words based on how often they already appear in the generated text.	—

Output 1 param

Parameter	Type	Default	Description	Condition
Response format `response_format.type`	enum (text \| json_object)	"text"	Controls whether the model returns normal text or JSON mode output.	—

Metadata 1 param

Parameter	Type	Default	Description	Condition
Safe prompt `safe_prompt`	boolean	false	Controls whether Mistral injects its safety prompt before the conversation.	—

Open Open Mistral Nemo page →

Qwen Flash Alibaba 5 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max tokens `max_tokens`	integer (1…+∞)	—	Maximum number of output tokens the model may generate.	—

Sampling 3 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number (0…2 step 0.1)	—	Controls randomness. Lower values make outputs more focused; higher values make them more varied.	—
Top P `top_p`	number (0…1 step 0.01)	—	Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.	—
Top K `extra_body.top_k`	integer (1…+∞)	20	Limits generation to the selected number of highest-probability tokens.	—

Reasoning 1 param

Parameter	Type	Default	Description	Condition
Enable thinking `extra_body.chat_template_kwargs.enable_thinking`	boolean	true	Controls Qwen3 thinking mode when using OpenAI-compatible clients that pass provider-specific extra body fields.	—

Open Qwen Flash page →

Qwen Plus Alibaba 5 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max tokens `max_tokens`	integer (1…+∞)	—	Maximum number of output tokens the model may generate.	—

Sampling 3 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number (0…2 step 0.1)	—	Controls randomness. Lower values make outputs more focused; higher values make them more varied.	—
Top P `top_p`	number (0…1 step 0.01)	—	Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.	—
Top K `extra_body.top_k`	integer (1…+∞)	20	Limits generation to the selected number of highest-probability tokens.	—

Reasoning 1 param

Parameter	Type	Default	Description	Condition
Enable thinking `extra_body.chat_template_kwargs.enable_thinking`	boolean	true	Controls Qwen3 thinking mode when using OpenAI-compatible clients that pass provider-specific extra body fields.	—

Open Qwen Plus page →

Qwen3 Coder Flash Alibaba 4 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max tokens `max_tokens`	integer (1…+∞)	—	Maximum number of output tokens the model may generate.	—

Sampling 3 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number (0…2 step 0.1)	—	Controls randomness. Lower values make outputs more focused; higher values make them more varied.	—
Top P `top_p`	number (0…1 step 0.01)	—	Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.	—
Top K `extra_body.top_k`	integer (1…+∞)	20	Limits generation to the selected number of highest-probability tokens.	—

Open Qwen3 Coder Flash page →

Qwen3 Coder Plus Alibaba 4 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max tokens `max_tokens`	integer (1…+∞)	—	Maximum number of output tokens the model may generate.	—

Sampling 3 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number (0…2 step 0.1)	—	Controls randomness. Lower values make outputs more focused; higher values make them more varied.	—
Top P `top_p`	number (0…1 step 0.01)	—	Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.	—
Top K `extra_body.top_k`	integer (1…+∞)	20	Limits generation to the selected number of highest-probability tokens.	—

Open Qwen3 Coder Plus page →

Qwen3 Max Alibaba 5 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max tokens `max_tokens`	integer (1…+∞)	—	Maximum number of output tokens the model may generate.	—

Sampling 3 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number (0…2 step 0.1)	—	Controls randomness. Lower values make outputs more focused; higher values make them more varied.	—
Top P `top_p`	number (0…1 step 0.01)	—	Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.	—
Top K `extra_body.top_k`	integer (1…+∞)	20	Limits generation to the selected number of highest-probability tokens.	—

Reasoning 1 param

Parameter	Type	Default	Description	Condition
Enable thinking `extra_body.chat_template_kwargs.enable_thinking`	boolean	false	Controls Qwen3 thinking mode when using OpenAI-compatible clients that pass provider-specific extra body fields.	—

Open Qwen3 Max page →

Qwen3.5 Alibaba 5 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max tokens `max_tokens`	integer (1…+∞)	—	Maximum number of output tokens the model may generate.	—

Sampling 3 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number (0…2 step 0.1)	—	Controls randomness. Lower values make outputs more focused; higher values make them more varied.	—
Top P `top_p`	number (0…1 step 0.01)	—	Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.	—
Top K `extra_body.top_k`	integer (1…+∞)	20	Limits generation to the selected number of highest-probability tokens.	—

Reasoning 1 param

Parameter	Type	Default	Description	Condition
Enable thinking `extra_body.chat_template_kwargs.enable_thinking`	boolean	true	Controls Qwen3 thinking mode when using OpenAI-compatible clients that pass provider-specific extra body fields.	—

Open Qwen3.5 page →

Qwen3.5 Flash Alibaba 5 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max tokens `max_tokens`	integer (1…+∞)	—	Maximum number of output tokens the model may generate.	—

Sampling 3 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number (0…2 step 0.1)	—	Controls randomness. Lower values make outputs more focused; higher values make them more varied.	—
Top P `top_p`	number (0…1 step 0.01)	—	Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.	—
Top K `extra_body.top_k`	integer (1…+∞)	20	Limits generation to the selected number of highest-probability tokens.	—

Reasoning 1 param

Parameter	Type	Default	Description	Condition
Enable thinking `extra_body.chat_template_kwargs.enable_thinking`	boolean	true	Controls Qwen3 thinking mode when using OpenAI-compatible clients that pass provider-specific extra body fields.	—

Open Qwen3.5 Flash page →

Qwen3.6 Flash Alibaba 6 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max completion tokens `max_completion_tokens`	integer (1…+∞)	—	Maximum number of tokens to generate, including both reasoning and the final answer.	—

Sampling 3 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number (0…2 step 0.1)	—	Controls randomness. Lower values make outputs more focused; higher values make them more varied.	—
Top P `top_p`	number (0…1 step 0.01)	—	Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.	—
Top K `extra_body.top_k`	integer (0…+∞)	20	Limits generation to the selected number of highest-probability tokens. Values above 100 disable top-k sampling.	—

Reasoning 2 params

Parameter	Type	Default	Description	Condition
Enable thinking `extra_body.enable_thinking`	boolean	true	Toggles the model's hybrid thinking mode, sent as a provider-specific extra body field on OpenAI-compatible clients.	—
Thinking budget `extra_body.thinking_budget`	integer (1…+∞)	—	Maximum number of tokens the model may spend on reasoning before it starts the final answer; defaults to the model's maximum reasoning length.	Only when extra_body.enable_thinking = true

Open Qwen3.6 Flash page →

Qwen3.7 Max Alibaba 6 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max completion tokens `max_completion_tokens`	integer (1…+∞)	—	Maximum number of tokens to generate, including both reasoning and the final answer.	—

Sampling 3 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number (0…2 step 0.1)	—	Controls randomness. Lower values make outputs more focused; higher values make them more varied.	—
Top P `top_p`	number (0…1 step 0.01)	—	Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.	—
Top K `extra_body.top_k`	integer (0…+∞)	20	Limits generation to the selected number of highest-probability tokens. Values above 100 disable top-k sampling.	—

Reasoning 2 params

Parameter	Type	Default	Description	Condition
Enable thinking `extra_body.enable_thinking`	boolean	true	Toggles the model's hybrid thinking mode, sent as a provider-specific extra body field on OpenAI-compatible clients.	—
Thinking budget `extra_body.thinking_budget`	integer (1…+∞)	—	Maximum number of tokens the model may spend on reasoning before it starts the final answer; defaults to the model's maximum reasoning length.	Only when extra_body.enable_thinking = true

Open Qwen3.7 Max page →

Qwen3.7 Plus Alibaba 6 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max completion tokens `max_completion_tokens`	integer (1…+∞)	—	Maximum number of tokens to generate, including both reasoning and the final answer.	—

Sampling 3 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number (0…2 step 0.1)	—	Controls randomness. Lower values make outputs more focused; higher values make them more varied.	—
Top P `top_p`	number (0…1 step 0.01)	—	Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.	—
Top K `extra_body.top_k`	integer (0…+∞)	20	Limits generation to the selected number of highest-probability tokens. Values above 100 disable top-k sampling.	—

Reasoning 2 params

Parameter	Type	Default	Description	Condition
Enable thinking `extra_body.enable_thinking`	boolean	true	Toggles the model's hybrid thinking mode, sent as a provider-specific extra body field on OpenAI-compatible clients.	—
Thinking budget `extra_body.thinking_budget`	integer (1…+∞)	—	Maximum number of tokens the model may spend on reasoning before it starts the final answer; defaults to the model's maximum reasoning length.	Only when extra_body.enable_thinking = true

Open Qwen3.7 Plus page →

Qwq Plus Alibaba 4 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max tokens `max_tokens`	integer (1…+∞)	—	Maximum number of output tokens the model may generate.	—

Sampling 3 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number (0…2 step 0.1)	—	Controls randomness. Lower values make outputs more focused; higher values make them more varied.	—
Top P `top_p`	number (0…1 step 0.01)	—	Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.	—
Top K `extra_body.top_k`	integer (1…+∞)	20	Limits generation to the selected number of highest-probability tokens.	—

Open Qwq Plus page →

Kimi K2.5 Moonshot AI 3 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max completion tokens `max_completion_tokens`	integer (1…+∞)	—	Maximum number of tokens to generate in the chat completion.	—

Reasoning 1 param

Parameter	Type	Default	Description	Condition
Thinking mode `thinking.type`	enum (enabled \| disabled)	—	Controls whether Kimi reasons step by step before answering, or responds directly when set to disabled.	—

Output 1 param

Parameter	Type	Default	Description	Condition
Response format `response_format.type`	enum (text \| json_object)	"text"	Forces the response into plain text or a JSON object.	—

Open Kimi K2.5 page →

Kimi K2.6 Moonshot AI 3 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max completion tokens `max_completion_tokens`	integer (1…+∞)	—	Maximum number of tokens to generate in the chat completion.	—

Reasoning 1 param

Parameter	Type	Default	Description	Condition
Thinking mode `thinking.type`	enum (enabled \| disabled)	"enabled"	Controls whether Kimi reasons step by step before answering. Thinking is enabled by default; set disabled to respond directly.	—

Output 1 param

Parameter	Type	Default	Description	Condition
Response format `response_format.type`	enum (text \| json_object)	"text"	Forces the response into plain text or a JSON object.	—

Open Kimi K2.6 page →

Kimi K2.6 Moonshot AI Subscription 3 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max completion tokens `max_completion_tokens`	integer (1…+∞)	—	Maximum number of tokens to generate in the chat completion.	—

Reasoning 1 param

Parameter	Type	Default	Description	Condition
Thinking mode `thinking.type`	enum (enabled \| disabled)	"enabled"	Controls whether Kimi reasons step by step before answering. Thinking is enabled by default; set disabled to respond directly.	—

Output 1 param

Parameter	Type	Default	Description	Condition
Response format `response_format.type`	enum (text \| json_object)	"text"	Forces the response into plain text or a JSON object.	—

Open Kimi K2.6 page →

Kimi K2.7 Code Moonshot AI 2 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max completion tokens `max_completion_tokens`	integer (1…+∞)	32768	Maximum number of tokens to generate in the chat completion, including reasoning tokens.	—

Output 1 param

Parameter	Type	Default	Description	Condition
Response format `response_format.type`	enum (text \| json_object \| json_schema)	"text"	Forces the response into plain text, a JSON object, or JSON matching a provided schema.	—

Open Kimi K2.7 Code page →

Kimi K2.7 Code Highspeed Moonshot AI 2 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max completion tokens `max_completion_tokens`	integer (1…+∞)	32768	Maximum number of tokens to generate in the chat completion, including reasoning tokens.	—

Output 1 param

Parameter	Type	Default	Description	Condition
Response format `response_format.type`	enum (text \| json_object \| json_schema)	"text"	Forces the response into plain text, a JSON object, or JSON matching a provided schema.	—

Open Kimi K2.7 Code Highspeed page →

Kimi K2.7 Code Highspeed Moonshot AI Subscription 2 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max completion tokens `max_completion_tokens`	integer (1…+∞)	—	Maximum number of tokens to generate in the chat completion, covering both thinking and the final answer.	—

Output 1 param

Parameter	Type	Default	Description	Condition
Response format `response_format.type`	enum (text \| json_object)	"text"	Forces the response into plain text or a JSON object.	—

Open Kimi K2.7 Code Highspeed page →

Kimi K2.7 Code Moonshot AI Subscription 2 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max completion tokens `max_completion_tokens`	integer (1…+∞)	—	Maximum number of tokens to generate in the chat completion, covering both thinking and the final answer.	—

Output 1 param

Parameter	Type	Default	Description	Condition
Response format `response_format.type`	enum (text \| json_object)	"text"	Forces the response into plain text or a JSON object.	—

Open Kimi K2.7 Code page →

Kimi K3 Moonshot AI 3 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max completion tokens `max_completion_tokens`	integer (1…+∞)	—	Maximum number of tokens to generate in the chat completion.	—

Reasoning 1 param

Parameter	Type	Default	Description	Condition
Thinking mode `thinking.type`	enum (enabled \| disabled)	—	Controls whether Kimi reasons step by step before answering, or responds directly when set to disabled.	—

Output 1 param

Parameter	Type	Default	Description	Condition
Response format `response_format.type`	enum (text \| json_object \| json_schema)	"text"	Forces the response into plain text, a JSON object, or JSON matching a provided schema.	—

Open Kimi K3 page →

Moonshot v1 128K Moonshot AI 7 params

Length 2 params

Parameter	Type	Default	Description	Condition
Max completion tokens `max_completion_tokens`	integer (1…+∞)	—	Maximum number of tokens to generate in the chat completion.	—
Number of completions `n`	integer (1…5)	1	How many chat completion choices to generate for the request.	—

Sampling 4 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number (0…1 step 0.1)	0.3	Controls randomness. Lower values make outputs more focused; higher values make them more varied.	—
Top P `top_p`	number (0…1 step 0.01)	1	Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.	—
Presence penalty `presence_penalty`	number (-2…2 step 0.1)	0	Penalizes tokens that have already appeared, encouraging the model to talk about new topics.	—
Frequency penalty `frequency_penalty`	number (-2…2 step 0.1)	0	Penalizes tokens by how often they have appeared, reducing verbatim repetition.	—

Output 1 param

Parameter	Type	Default	Description	Condition
Response format `response_format.type`	enum (text \| json_object)	"text"	Forces the response into plain text or a JSON object.	—

Open Moonshot v1 128K page →

Moonshot v1 32K Moonshot AI 7 params

Length 2 params

Parameter	Type	Default	Description	Condition
Max completion tokens `max_completion_tokens`	integer (1…+∞)	—	Maximum number of tokens to generate in the chat completion.	—
Number of completions `n`	integer (1…5)	1	How many chat completion choices to generate for the request.	—

Sampling 4 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number (0…1 step 0.1)	0.3	Controls randomness. Lower values make outputs more focused; higher values make them more varied.	—
Top P `top_p`	number (0…1 step 0.01)	1	Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.	—
Presence penalty `presence_penalty`	number (-2…2 step 0.1)	0	Penalizes tokens that have already appeared, encouraging the model to talk about new topics.	—
Frequency penalty `frequency_penalty`	number (-2…2 step 0.1)	0	Penalizes tokens by how often they have appeared, reducing verbatim repetition.	—

Output 1 param

Parameter	Type	Default	Description	Condition
Response format `response_format.type`	enum (text \| json_object)	"text"	Forces the response into plain text or a JSON object.	—

Open Moonshot v1 32K page →

Moonshot v1 8K Moonshot AI 7 params

Length 2 params

Parameter	Type	Default	Description	Condition
Max completion tokens `max_completion_tokens`	integer (1…+∞)	—	Maximum number of tokens to generate in the chat completion.	—
Number of completions `n`	integer (1…5)	1	How many chat completion choices to generate for the request.	—

Sampling 4 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number (0…1 step 0.1)	0.3	Controls randomness. Lower values make outputs more focused; higher values make them more varied.	—
Top P `top_p`	number (0…1 step 0.01)	1	Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.	—
Presence penalty `presence_penalty`	number (-2…2 step 0.1)	0	Penalizes tokens that have already appeared, encouraging the model to talk about new topics.	—
Frequency penalty `frequency_penalty`	number (-2…2 step 0.1)	0	Penalizes tokens by how often they have appeared, reducing verbatim repetition.	—

Output 1 param

Parameter	Type	Default	Description	Condition
Response format `response_format.type`	enum (text \| json_object)	"text"	Forces the response into plain text or a JSON object.	—

Open Moonshot v1 8K page →

Grok 4.20 0309 Non Reasoning xAI 6 params

Length 2 params

Parameter	Type	Default	Description	Condition
Max completion tokens `max_completion_tokens`	integer (1…+∞)	—	Upper bound for visible output tokens generated in the chat completion.	—
Stop sequence `stop`	string	—	Stops generation when this sequence is produced. xAI accepts up to four stop sequences.	—

Sampling 3 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number (0…2 step 0.1)	1	Controls randomness. Lower values make outputs more focused; higher values make them more varied.	—
Top P `top_p`	number (0…1 step 0.01)	1	Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.	—
Seed `seed`	integer	—	Optional seed used for decoding when reproducible sampling is desired.	—

Output 1 param

Parameter	Type	Default	Description	Condition
Response format `response_format.type`	enum (text \| json_object \| json_schema)	"text"	Controls whether the model returns text, JSON mode output, or structured JSON schema output.	—

Open Grok 4.20 0309 Non Reasoning page →

Grok 4.20 0309 Non Reasoning xAI Subscription 6 params

Length 2 params

Parameter	Type	Default	Description	Condition
Max completion tokens `max_completion_tokens`	integer (1…+∞)	—	Upper bound for visible output tokens generated in the chat completion.	—
Stop sequence `stop`	string	—	Stops generation when this sequence is produced. xAI accepts up to four stop sequences.	—

Sampling 3 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number (0…2 step 0.1)	1	Controls randomness. Lower values make outputs more focused; higher values make them more varied.	—
Top P `top_p`	number (0…1 step 0.01)	1	Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.	—
Seed `seed`	integer	—	Optional seed used for decoding when reproducible sampling is desired.	—

Output 1 param

Parameter	Type	Default	Description	Condition
Response format `response_format.type`	enum (text \| json_object \| json_schema)	"text"	Controls whether the model returns text, JSON mode output, or structured JSON schema output.	—

Open Grok 4.20 0309 Non Reasoning page →

Grok 4.20 0309 Reasoning xAI 5 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max completion tokens `max_completion_tokens`	integer (1…+∞)	—	Upper bound for visible output tokens generated in the chat completion.	—

Sampling 3 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number (0…2 step 0.1)	1	Controls randomness. Lower values make outputs more focused; higher values make them more varied.	—
Top P `top_p`	number (0…1 step 0.01)	1	Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.	—
Seed `seed`	integer	—	Optional seed used for decoding when reproducible sampling is desired.	—

Output 1 param

Parameter	Type	Default	Description	Condition
Response format `response_format.type`	enum (text \| json_object \| json_schema)	"text"	Controls whether the model returns text, JSON mode output, or structured JSON schema output.	—

Open Grok 4.20 0309 Reasoning page →

Grok 4.20 0309 Reasoning xAI Subscription 5 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max completion tokens `max_completion_tokens`	integer (1…+∞)	—	Upper bound for visible output tokens generated in the chat completion.	—

Sampling 3 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number (0…2 step 0.1)	1	Controls randomness. Lower values make outputs more focused; higher values make them more varied.	—
Top P `top_p`	number (0…1 step 0.01)	1	Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.	—
Seed `seed`	integer	—	Optional seed used for decoding when reproducible sampling is desired.	—

Output 1 param

Parameter	Type	Default	Description	Condition
Response format `response_format.type`	enum (text \| json_object \| json_schema)	"text"	Controls whether the model returns text, JSON mode output, or structured JSON schema output.	—

Open Grok 4.20 0309 Reasoning page →

Grok 4.20 Multi Agent 0309 xAI 5 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max output tokens `max_output_tokens`	integer (1…+∞)	—	Upper bound for output tokens generated in the Responses API response.	—

Sampling 2 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number (0…2 step 0.1)	0.7	Controls randomness. Lower values make outputs more focused; higher values make them more varied.	—
Top P `top_p`	number (0…1 step 0.01)	0.95	Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.	—

Reasoning 1 param

Parameter	Type	Default	Description	Condition
Reasoning effort `reasoning.effort`	enum (low \| medium \| high \| xhigh)	—	Controls whether the Responses API request uses the 4-agent or 16-agent multi-agent setup.	—

Output 1 param

Parameter	Type	Default	Description	Condition
Text format `text.format.type`	enum (text \| json_object \| json_schema)	"text"	Controls whether the Responses API returns free-form text, JSON mode output, or structured JSON schema output.	—

Open Grok 4.20 Multi Agent 0309 page →

Grok 4.3 xAI 6 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max completion tokens `max_completion_tokens`	integer (1…+∞)	—	Upper bound for visible output tokens generated in the chat completion.	—

Sampling 3 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number (0…2 step 0.1)	1	Controls randomness. Lower values make outputs more focused; higher values make them more varied.	—
Top P `top_p`	number (0…1 step 0.01)	1	Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.	—
Seed `seed`	integer	—	Optional seed used for decoding when reproducible sampling is desired.	—

Reasoning 1 param

Parameter	Type	Default	Description	Condition
Reasoning effort `reasoning_effort`	enum (none \| low \| medium \| high)	"low"	Controls how much reasoning Grok performs before responding. Set to none for non-reasoning requests.	—

Output 1 param

Parameter	Type	Default	Description	Condition
Response format `response_format.type`	enum (text \| json_object \| json_schema)	"text"	Controls whether the model returns text, JSON mode output, or structured JSON schema output.	—

Open Grok 4.3 page →

Grok 4.3 xAI Subscription 6 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max completion tokens `max_completion_tokens`	integer (1…+∞)	—	Upper bound for visible output tokens generated in the chat completion.	—

Sampling 3 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number (0…2 step 0.1)	1	Controls randomness. Lower values make outputs more focused; higher values make them more varied.	—
Top P `top_p`	number (0…1 step 0.01)	1	Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.	—
Seed `seed`	integer	—	Optional seed used for decoding when reproducible sampling is desired.	—

Reasoning 1 param

Parameter	Type	Default	Description	Condition
Reasoning effort `reasoning_effort`	enum (none \| low \| medium \| high)	"low"	Controls how much reasoning Grok performs before responding. Set to none for non-reasoning requests.	—

Output 1 param

Parameter	Type	Default	Description	Condition
Response format `response_format.type`	enum (text \| json_object \| json_schema)	"text"	Controls whether the model returns text, JSON mode output, or structured JSON schema output.	—

Open Grok 4.3 page →

Grok 4.5 xAI 6 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max completion tokens `max_completion_tokens`	integer (1…+∞)	—	Upper bound for visible output tokens generated in the chat completion.	—

Sampling 3 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number (0…2 step 0.1)	1	Controls randomness. Lower values make outputs more focused; higher values make them more varied.	—
Top P `top_p`	number (0…1 step 0.01)	1	Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.	—
Seed `seed`	integer	—	Optional seed used for decoding when reproducible sampling is desired.	—

Reasoning 1 param

Parameter	Type	Default	Description	Condition
Reasoning effort `reasoning_effort`	enum (none \| low \| medium \| high)	"low"	Controls how much reasoning Grok performs before responding. Set to none for non-reasoning requests.	—

Output 1 param

Parameter	Type	Default	Description	Condition
Response format `response_format.type`	enum (text \| json_object \| json_schema)	"text"	Controls whether the model returns text, JSON mode output, or structured JSON schema output.	—

Open Grok 4.5 page →

Grok 4.5 xAI Subscription 6 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max completion tokens `max_completion_tokens`	integer (1…+∞)	—	Upper bound for visible output tokens generated in the chat completion.	—

Sampling 3 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number (0…2 step 0.1)	1	Controls randomness. Lower values make outputs more focused; higher values make them more varied.	—
Top P `top_p`	number (0…1 step 0.01)	1	Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.	—
Seed `seed`	integer	—	Optional seed used for decoding when reproducible sampling is desired.	—

Reasoning 1 param

Parameter	Type	Default	Description	Condition
Reasoning effort `reasoning_effort`	enum (none \| low \| medium \| high)	"low"	Controls how much reasoning Grok performs before responding. Set to none for non-reasoning requests.	—

Output 1 param

Parameter	Type	Default	Description	Condition
Response format `response_format.type`	enum (text \| json_object \| json_schema)	"text"	Controls whether the model returns text, JSON mode output, or structured JSON schema output.	—

Open Grok 4.5 page →

Grok Build 0.1 xAI 5 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max completion tokens `max_completion_tokens`	integer (1…+∞)	—	Upper bound for visible output tokens generated in the chat completion.	—

Sampling 3 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number (0…2 step 0.1)	1	Controls randomness. Lower values make outputs more focused; higher values make them more varied.	—
Top P `top_p`	number (0…1 step 0.01)	1	Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.	—
Seed `seed`	integer	—	Optional seed used for decoding when reproducible sampling is desired.	—

Output 1 param

Parameter	Type	Default	Description	Condition
Response format `response_format.type`	enum (text \| json_object \| json_schema)	"text"	Controls whether the model returns text, JSON mode output, or structured JSON schema output.	—

Open Grok Build 0.1 page →

Grok Build 0.1 xAI Subscription 5 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max completion tokens `max_completion_tokens`	integer (1…+∞)	—	Upper bound for visible output tokens generated in the chat completion.	—

Sampling 3 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number (0…2 step 0.1)	1	Controls randomness. Lower values make outputs more focused; higher values make them more varied.	—
Top P `top_p`	number (0…1 step 0.01)	1	Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.	—
Seed `seed`	integer	—	Optional seed used for decoding when reproducible sampling is desired.	—

Output 1 param

Parameter	Type	Default	Description	Condition
Response format `response_format.type`	enum (text \| json_object \| json_schema)	"text"	Controls whether the model returns text, JSON mode output, or structured JSON schema output.	—

Open Grok Build 0.1 page →

Command A 03 2025 Cohere 12 params

Length 2 params

Parameter	Type	Default	Description	Condition
Max tokens `max_tokens`	integer (1…+∞)	—	Maximum number of output tokens the model may generate.	—
Stop sequences `stop_sequences`	string	—	Stops generation when one of these sequences is detected; up to five are allowed.	—

Sampling 6 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number (0…+∞ step 0.1)	0.3	Controls randomness. Lower values make outputs more focused; higher values make them more varied.	—
Top P `p`	number (0.01…0.99 step 0.01)	0.75	Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.	—
Top K `k`	integer (0…500)	0	Limits sampling to the K most likely tokens; 0 disables top-k sampling.	—
Frequency penalty `frequency_penalty`	number (0…1 step 0.1)	0	Penalizes tokens proportional to how often they have already appeared to reduce repetition.	—
Presence penalty `presence_penalty`	number (0…1 step 0.1)	0	Penalizes tokens that have already appeared to encourage a wider variety of content.	—
Seed `seed`	integer	—	Seed used for best-effort deterministic sampling when reproducible outputs are desired.	—

Tools 1 param

Parameter	Type	Default	Description	Condition
Tool choice `tool_choice`	enum (REQUIRED \| NONE)	—	Forces the model to either call a tool or skip tool calls for this request.	—

Output 1 param

Parameter	Type	Default	Description	Condition
Response format `response_format.type`	enum (text \| json_object)	"text"	Controls whether the model returns normal text or JSON object output.	—

Observability 1 param

Parameter	Type	Default	Description	Condition
Log probabilities `logprobs`	boolean	false	Controls whether the response includes log probabilities for the generated tokens.	—

Metadata 1 param

Parameter	Type	Default	Description	Condition
Safety mode `safety_mode`	enum (CONTEXTUAL \| STRICT)	"CONTEXTUAL"	Controls Cohere's built-in safety instructions applied to the generation.	—

Open Command A 03 2025 page →

Command A Plus 05 2026 Cohere 12 params

Length 2 params

Parameter	Type	Default	Description	Condition
Max tokens `max_tokens`	integer (1…+∞)	—	Maximum number of output tokens the model may generate.	—
Stop sequences `stop_sequences`	string	—	Stops generation when one of these sequences is detected; up to five are allowed.	—

Sampling 6 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number (0…+∞ step 0.1)	0.3	Controls randomness. Lower values make outputs more focused; higher values make them more varied.	—
Top P `p`	number (0.01…0.99 step 0.01)	0.75	Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.	—
Top K `k`	integer (0…500)	0	Limits sampling to the K most likely tokens; 0 disables top-k sampling.	—
Frequency penalty `frequency_penalty`	number (0…1 step 0.1)	0	Penalizes tokens proportional to how often they have already appeared to reduce repetition.	—
Presence penalty `presence_penalty`	number (0…1 step 0.1)	0	Penalizes tokens that have already appeared to encourage a wider variety of content.	—
Seed `seed`	integer	—	Seed used for best-effort deterministic sampling when reproducible outputs are desired.	—

Tools 1 param

Parameter	Type	Default	Description	Condition
Tool choice `tool_choice`	enum (REQUIRED \| NONE)	—	Forces the model to either call a tool or skip tool calls for this request.	—

Output 1 param

Parameter	Type	Default	Description	Condition
Response format `response_format.type`	enum (text \| json_object)	"text"	Controls whether the model returns normal text or JSON object output.	—

Observability 1 param

Parameter	Type	Default	Description	Condition
Log probabilities `logprobs`	boolean	false	Controls whether the response includes log probabilities for the generated tokens.	—

Metadata 1 param

Parameter	Type	Default	Description	Condition
Safety mode `safety_mode`	enum (CONTEXTUAL \| STRICT)	"CONTEXTUAL"	Controls Cohere's built-in safety instructions applied to the generation.	—

Open Command A Plus 05 2026 page →

Command A Reasoning 08 2025 Cohere 14 params

Length 2 params

Parameter	Type	Default	Description	Condition
Max tokens `max_tokens`	integer (1…+∞)	—	Maximum number of output tokens the model may generate.	—
Stop sequences `stop_sequences`	string	—	Stops generation when one of these sequences is detected; up to five are allowed.	—

Sampling 6 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number (0…+∞ step 0.1)	0.3	Controls randomness. Lower values make outputs more focused; higher values make them more varied.	—
Top P `p`	number (0.01…0.99 step 0.01)	0.75	Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.	—
Top K `k`	integer (0…500)	0	Limits sampling to the K most likely tokens; 0 disables top-k sampling.	—
Frequency penalty `frequency_penalty`	number (0…1 step 0.1)	0	Penalizes tokens proportional to how often they have already appeared to reduce repetition.	—
Presence penalty `presence_penalty`	number (0…1 step 0.1)	0	Penalizes tokens that have already appeared to encourage a wider variety of content.	—
Seed `seed`	integer	—	Seed used for best-effort deterministic sampling when reproducible outputs are desired.	—

Reasoning 2 params

Parameter	Type	Default	Description	Condition
Thinking mode `thinking.type`	enum (enabled \| disabled)	"disabled"	Controls whether the model reasons step by step before producing its final answer.	—
Thinking token budget `thinking.token_budget`	integer (1…+∞)	—	Maximum number of tokens the model may spend on reasoning before answering.	Only when thinking.type = "enabled"

Tools 1 param

Parameter	Type	Default	Description	Condition
Tool choice `tool_choice`	enum (REQUIRED \| NONE)	—	Forces the model to either call a tool or skip tool calls for this request.	—

Output 1 param

Parameter	Type	Default	Description	Condition
Response format `response_format.type`	enum (text \| json_object)	"text"	Controls whether the model returns normal text or JSON object output.	—

Observability 1 param

Parameter	Type	Default	Description	Condition
Log probabilities `logprobs`	boolean	false	Controls whether the response includes log probabilities for the generated tokens.	—

Metadata 1 param

Parameter	Type	Default	Description	Condition
Safety mode `safety_mode`	enum (CONTEXTUAL \| STRICT)	"CONTEXTUAL"	Controls Cohere's built-in safety instructions applied to the generation.	—

Open Command A Reasoning 08 2025 page →

Command A Translate 08 2025 Cohere 12 params

Length 2 params

Parameter	Type	Default	Description	Condition
Max tokens `max_tokens`	integer (1…+∞)	—	Maximum number of output tokens the model may generate.	—
Stop sequences `stop_sequences`	string	—	Stops generation when one of these sequences is detected; up to five are allowed.	—

Sampling 6 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number (0…+∞ step 0.1)	0.3	Controls randomness. Lower values make outputs more focused; higher values make them more varied.	—
Top P `p`	number (0.01…0.99 step 0.01)	0.75	Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.	—
Top K `k`	integer (0…500)	0	Limits sampling to the K most likely tokens; 0 disables top-k sampling.	—
Frequency penalty `frequency_penalty`	number (0…1 step 0.1)	0	Penalizes tokens proportional to how often they have already appeared to reduce repetition.	—
Presence penalty `presence_penalty`	number (0…1 step 0.1)	0	Penalizes tokens that have already appeared to encourage a wider variety of content.	—
Seed `seed`	integer	—	Seed used for best-effort deterministic sampling when reproducible outputs are desired.	—

Tools 1 param

Parameter	Type	Default	Description	Condition
Tool choice `tool_choice`	enum (REQUIRED \| NONE)	—	Forces the model to either call a tool or skip tool calls for this request.	—

Output 1 param

Parameter	Type	Default	Description	Condition
Response format `response_format.type`	enum (text \| json_object)	"text"	Controls whether the model returns normal text or JSON object output.	—

Observability 1 param

Parameter	Type	Default	Description	Condition
Log probabilities `logprobs`	boolean	false	Controls whether the response includes log probabilities for the generated tokens.	—

Metadata 1 param

Parameter	Type	Default	Description	Condition
Safety mode `safety_mode`	enum (CONTEXTUAL \| STRICT)	"CONTEXTUAL"	Controls Cohere's built-in safety instructions applied to the generation.	—

Open Command A Translate 08 2025 page →

Command A Vision 07 2025 Cohere 12 params

Length 2 params

Parameter	Type	Default	Description	Condition
Max tokens `max_tokens`	integer (1…+∞)	—	Maximum number of output tokens the model may generate.	—
Stop sequences `stop_sequences`	string	—	Stops generation when one of these sequences is detected; up to five are allowed.	—

Sampling 6 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number (0…+∞ step 0.1)	0.3	Controls randomness. Lower values make outputs more focused; higher values make them more varied.	—
Top P `p`	number (0.01…0.99 step 0.01)	0.75	Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.	—
Top K `k`	integer (0…500)	0	Limits sampling to the K most likely tokens; 0 disables top-k sampling.	—
Frequency penalty `frequency_penalty`	number (0…1 step 0.1)	0	Penalizes tokens proportional to how often they have already appeared to reduce repetition.	—
Presence penalty `presence_penalty`	number (0…1 step 0.1)	0	Penalizes tokens that have already appeared to encourage a wider variety of content.	—
Seed `seed`	integer	—	Seed used for best-effort deterministic sampling when reproducible outputs are desired.	—

Tools 1 param

Parameter	Type	Default	Description	Condition
Tool choice `tool_choice`	enum (REQUIRED \| NONE)	—	Forces the model to either call a tool or skip tool calls for this request.	—

Output 1 param

Parameter	Type	Default	Description	Condition
Response format `response_format.type`	enum (text \| json_object)	"text"	Controls whether the model returns normal text or JSON object output.	—

Observability 1 param

Parameter	Type	Default	Description	Condition
Log probabilities `logprobs`	boolean	false	Controls whether the response includes log probabilities for the generated tokens.	—

Metadata 1 param

Parameter	Type	Default	Description	Condition
Safety mode `safety_mode`	enum (CONTEXTUAL \| STRICT)	"CONTEXTUAL"	Controls Cohere's built-in safety instructions applied to the generation.	—

Open Command A Vision 07 2025 page →

Command R 08 2024 Cohere 11 params

Length 2 params

Parameter	Type	Default	Description	Condition
Max tokens `max_tokens`	integer (1…+∞)	—	Maximum number of output tokens the model may generate.	—
Stop sequences `stop_sequences`	string	—	Stops generation when one of these sequences is detected; up to five are allowed.	—

Sampling 6 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number (0…+∞ step 0.1)	0.3	Controls randomness. Lower values make outputs more focused; higher values make them more varied.	—
Top P `p`	number (0.01…0.99 step 0.01)	0.75	Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.	—
Top K `k`	integer (0…500)	0	Limits sampling to the K most likely tokens; 0 disables top-k sampling.	—
Frequency penalty `frequency_penalty`	number (0…1 step 0.1)	0	Penalizes tokens proportional to how often they have already appeared to reduce repetition.	—
Presence penalty `presence_penalty`	number (0…1 step 0.1)	0	Penalizes tokens that have already appeared to encourage a wider variety of content.	—
Seed `seed`	integer	—	Seed used for best-effort deterministic sampling when reproducible outputs are desired.	—

Output 1 param

Parameter	Type	Default	Description	Condition
Response format `response_format.type`	enum (text \| json_object)	"text"	Controls whether the model returns normal text or JSON object output.	—

Observability 1 param

Parameter	Type	Default	Description	Condition
Log probabilities `logprobs`	boolean	false	Controls whether the response includes log probabilities for the generated tokens.	—

Metadata 1 param

Parameter	Type	Default	Description	Condition
Safety mode `safety_mode`	enum (CONTEXTUAL \| STRICT \| OFF)	"CONTEXTUAL"	Controls Cohere's built-in safety instructions applied to the generation.	—

Open Command R 08 2024 page →

Command R Plus 08 2024 Cohere 11 params

Length 2 params

Parameter	Type	Default	Description	Condition
Max tokens `max_tokens`	integer (1…+∞)	—	Maximum number of output tokens the model may generate.	—
Stop sequences `stop_sequences`	string	—	Stops generation when one of these sequences is detected; up to five are allowed.	—

Sampling 6 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number (0…+∞ step 0.1)	0.3	Controls randomness. Lower values make outputs more focused; higher values make them more varied.	—
Top P `p`	number (0.01…0.99 step 0.01)	0.75	Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.	—
Top K `k`	integer (0…500)	0	Limits sampling to the K most likely tokens; 0 disables top-k sampling.	—
Frequency penalty `frequency_penalty`	number (0…1 step 0.1)	0	Penalizes tokens proportional to how often they have already appeared to reduce repetition.	—
Presence penalty `presence_penalty`	number (0…1 step 0.1)	0	Penalizes tokens that have already appeared to encourage a wider variety of content.	—
Seed `seed`	integer	—	Seed used for best-effort deterministic sampling when reproducible outputs are desired.	—

Output 1 param

Parameter	Type	Default	Description	Condition
Response format `response_format.type`	enum (text \| json_object)	"text"	Controls whether the model returns normal text or JSON object output.	—

Observability 1 param

Parameter	Type	Default	Description	Condition
Log probabilities `logprobs`	boolean	false	Controls whether the response includes log probabilities for the generated tokens.	—

Metadata 1 param

Parameter	Type	Default	Description	Condition
Safety mode `safety_mode`	enum (CONTEXTUAL \| STRICT \| OFF)	"CONTEXTUAL"	Controls Cohere's built-in safety instructions applied to the generation.	—

Open Command R Plus 08 2024 page →

Command R7b 12 2024 Cohere 12 params

Length 2 params

Parameter	Type	Default	Description	Condition
Max tokens `max_tokens`	integer (1…+∞)	—	Maximum number of output tokens the model may generate.	—
Stop sequences `stop_sequences`	string	—	Stops generation when one of these sequences is detected; up to five are allowed.	—

Sampling 6 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number (0…+∞ step 0.1)	0.3	Controls randomness. Lower values make outputs more focused; higher values make them more varied.	—
Top P `p`	number (0.01…0.99 step 0.01)	0.75	Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.	—
Top K `k`	integer (0…500)	0	Limits sampling to the K most likely tokens; 0 disables top-k sampling.	—
Frequency penalty `frequency_penalty`	number (0…1 step 0.1)	0	Penalizes tokens proportional to how often they have already appeared to reduce repetition.	—
Presence penalty `presence_penalty`	number (0…1 step 0.1)	0	Penalizes tokens that have already appeared to encourage a wider variety of content.	—
Seed `seed`	integer	—	Seed used for best-effort deterministic sampling when reproducible outputs are desired.	—

Tools 1 param

Parameter	Type	Default	Description	Condition
Tool choice `tool_choice`	enum (REQUIRED \| NONE)	—	Forces the model to either call a tool or skip tool calls for this request.	—

Output 1 param

Parameter	Type	Default	Description	Condition
Response format `response_format.type`	enum (text \| json_object)	"text"	Controls whether the model returns normal text or JSON object output.	—

Observability 1 param

Parameter	Type	Default	Description	Condition
Log probabilities `logprobs`	boolean	false	Controls whether the response includes log probabilities for the generated tokens.	—

Metadata 1 param

Parameter	Type	Default	Description	Condition
Safety mode `safety_mode`	enum (CONTEXTUAL \| STRICT)	"CONTEXTUAL"	Controls Cohere's built-in safety instructions applied to the generation.	—

Open Command R7b 12 2024 page →

Deepseek Chat DeepSeek 4 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max tokens `max_tokens`	integer (1…+∞)	4096	Maximum number of output tokens the model may generate.	—

Sampling 2 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number (0…2 step 0.1)	1	Controls randomness. In DeepSeek thinking mode this parameter is accepted for compatibility but has no effect.	Not when thinking.type = "enabled"
Top P `top_p`	number (0…1 step 0.01)	1	Controls nucleus sampling. In DeepSeek thinking mode this parameter is accepted for compatibility but has no effect.	Not when thinking.type = "enabled"

Reasoning 1 param

Parameter	Type	Default	Description	Condition
Thinking mode `thinking.type`	enum (disabled \| enabled)	"disabled"	Controls whether DeepSeek uses thinking mode before producing the final answer.	—

Open Deepseek Chat page →

Deepseek Reasoner DeepSeek 5 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max tokens `max_tokens`	integer (1…+∞)	4096	Maximum number of output tokens the model may generate.	—

Sampling 2 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number (0…2 step 0.1)	1	Controls randomness. In DeepSeek thinking mode this parameter is accepted for compatibility but has no effect.	Not when thinking.type = "enabled"
Top P `top_p`	number (0…1 step 0.01)	1	Controls nucleus sampling. In DeepSeek thinking mode this parameter is accepted for compatibility but has no effect.	Not when thinking.type = "enabled"

Reasoning 2 params

Parameter	Type	Default	Description	Condition
Thinking mode `thinking.type`	enum (enabled \| disabled)	"enabled"	Controls whether DeepSeek uses thinking mode before producing the final answer.	—
Reasoning effort `reasoning_effort`	enum (high \| max)	"high"	Controls DeepSeek thinking effort when thinking mode is enabled.	Only when thinking.type = "enabled"

Open Deepseek Reasoner page →

Deepseek V4 Flash DeepSeek 5 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max tokens `max_tokens`	integer (1…+∞)	4096	Maximum number of output tokens the model may generate.	—

Sampling 2 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number (0…2 step 0.1)	1	Controls randomness. In DeepSeek thinking mode this parameter is accepted for compatibility but has no effect.	Not when thinking.type = "enabled"
Top P `top_p`	number (0…1 step 0.01)	1	Controls nucleus sampling. In DeepSeek thinking mode this parameter is accepted for compatibility but has no effect.	Not when thinking.type = "enabled"

Reasoning 2 params

Parameter	Type	Default	Description	Condition
Thinking mode `thinking.type`	enum (enabled \| disabled)	"enabled"	Controls whether DeepSeek uses thinking mode before producing the final answer.	—
Reasoning effort `reasoning_effort`	enum (high \| max)	"high"	Controls DeepSeek thinking effort when thinking mode is enabled.	Only when thinking.type = "enabled"

Open Deepseek V4 Flash page →

Deepseek V4 Pro DeepSeek 5 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max tokens `max_tokens`	integer (1…+∞)	4096	Maximum number of output tokens the model may generate.	—

Sampling 2 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number (0…2 step 0.1)	1	Controls randomness. In DeepSeek thinking mode this parameter is accepted for compatibility but has no effect.	Not when thinking.type = "enabled"
Top P `top_p`	number (0…1 step 0.01)	1	Controls nucleus sampling. In DeepSeek thinking mode this parameter is accepted for compatibility but has no effect.	Not when thinking.type = "enabled"

Reasoning 2 params

Parameter	Type	Default	Description	Condition
Thinking mode `thinking.type`	enum (enabled \| disabled)	"enabled"	Controls whether DeepSeek uses thinking mode before producing the final answer.	—
Reasoning effort `reasoning_effort`	enum (high \| max)	"high"	Controls DeepSeek thinking effort when thinking mode is enabled.	Only when thinking.type = "enabled"

Open Deepseek V4 Pro page →

Llama 3.3 70B Instruct Meta 7 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max completion tokens `max_completion_tokens`	integer (1…+∞)	—	Maximum number of output tokens the model may generate.	—

Sampling 4 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number	—	Controls randomness. Lower values make outputs more focused; higher values make them more varied.	—
Top P `top_p`	number	—	Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.	—
Top K `top_k`	integer	—	Limits generation to the selected number of highest-probability tokens.	—
Repetition penalty `repetition_penalty`	number	—	Penalizes tokens that have already appeared to reduce repetition in the output.	—

Tools 1 param

Parameter	Type	Default	Description	Condition
Tool choice `tool_choice`	enum (auto \| none \| required)	—	Controls whether the model may call tools, must call one, or skips tool calls.	—

Output 1 param

Parameter	Type	Default	Description	Condition
Response format `response_format.type`	enum (text \| json_schema)	"text"	Controls whether the model returns normal text or a schema-constrained JSON object.	—

Open Llama 3.3 70B Instruct page →

Llama 3.3 8B Instruct Meta 7 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max completion tokens `max_completion_tokens`	integer (1…+∞)	—	Maximum number of output tokens the model may generate.	—

Sampling 4 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number	—	Controls randomness. Lower values make outputs more focused; higher values make them more varied.	—
Top P `top_p`	number	—	Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.	—
Top K `top_k`	integer	—	Limits generation to the selected number of highest-probability tokens.	—
Repetition penalty `repetition_penalty`	number	—	Penalizes tokens that have already appeared to reduce repetition in the output.	—

Tools 1 param

Parameter	Type	Default	Description	Condition
Tool choice `tool_choice`	enum (auto \| none \| required)	—	Controls whether the model may call tools, must call one, or skips tool calls.	—

Output 1 param

Parameter	Type	Default	Description	Condition
Response format `response_format.type`	enum (text \| json_schema)	"text"	Controls whether the model returns normal text or a schema-constrained JSON object.	—

Open Llama 3.3 8B Instruct page →

Llama 4 Maverick 17B 128E Instruct FP8 Meta 7 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max completion tokens `max_completion_tokens`	integer (1…+∞)	—	Maximum number of output tokens the model may generate.	—

Sampling 4 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number	—	Controls randomness. Lower values make outputs more focused; higher values make them more varied.	—
Top P `top_p`	number	—	Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.	—
Top K `top_k`	integer	—	Limits generation to the selected number of highest-probability tokens.	—
Repetition penalty `repetition_penalty`	number	—	Penalizes tokens that have already appeared to reduce repetition in the output.	—

Tools 1 param

Parameter	Type	Default	Description	Condition
Tool choice `tool_choice`	enum (auto \| none \| required)	—	Controls whether the model may call tools, must call one, or skips tool calls.	—

Output 1 param

Parameter	Type	Default	Description	Condition
Response format `response_format.type`	enum (text \| json_schema)	"text"	Controls whether the model returns normal text or a schema-constrained JSON object.	—

Open Llama 4 Maverick 17B 128E Instruct FP8 page →

Llama 4 Scout 17B 16E Instruct FP8 Meta 7 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max completion tokens `max_completion_tokens`	integer (1…+∞)	—	Maximum number of output tokens the model may generate.	—

Sampling 4 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number	—	Controls randomness. Lower values make outputs more focused; higher values make them more varied.	—
Top P `top_p`	number	—	Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.	—
Top K `top_k`	integer	—	Limits generation to the selected number of highest-probability tokens.	—
Repetition penalty `repetition_penalty`	number	—	Penalizes tokens that have already appeared to reduce repetition in the output.	—

Tools 1 param

Parameter	Type	Default	Description	Condition
Tool choice `tool_choice`	enum (auto \| none \| required)	—	Controls whether the model may call tools, must call one, or skips tool calls.	—

Output 1 param

Parameter	Type	Default	Description	Condition
Response format `response_format.type`	enum (text \| json_schema)	"text"	Controls whether the model returns normal text or a schema-constrained JSON object.	—

Open Llama 4 Scout 17B 16E Instruct FP8 page →

Sonar Perplexity 12 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max tokens `max_tokens`	integer (1…128000)	—	Maximum number of output tokens the model may generate.	—

Sampling 2 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number (0…2 step 0.1)	—	Controls randomness. Lower values make outputs more focused; higher values make them more varied.	—
Top P `top_p`	number (0…1 step 0.01)	—	Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.	—

Metadata 9 params

Parameter	Type	Default	Description	Condition
Search mode `search_mode`	enum (web \| academic \| sec)	—	Selects the corpus the model searches when grounding its answer.	—
Search recency filter `search_recency_filter`	enum (hour \| day \| week \| month \| year)	—	Restricts web search results to a recent time window.	—
Search domain filter `search_domain_filter`	string	—	Limits search to, or excludes, specific domains.	—
Search after date `search_after_date_filter`	string	—	Restricts search results to content published after this date (MM/DD/YYYY).	—
Search before date `search_before_date_filter`	string	—	Restricts search results to content published before this date (MM/DD/YYYY).	—
Search context size `web_search_options.search_context_size`	enum (low \| medium \| high)	"low"	Controls how much web search context is retrieved before generating the answer.	—
Return images `return_images`	boolean	false	Controls whether the response may include related images from the search.	—
Return related questions `return_related_questions`	boolean	false	Controls whether the response includes suggested follow-up questions.	—
Disable search `disable_search`	boolean	false	Turns off web search so the model answers from its own knowledge only.	—

Open Sonar page →

Sonar Deep Research Perplexity 12 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max tokens `max_tokens`	integer (1…128000)	—	Maximum number of output tokens the model may generate.	—

Sampling 2 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number (0…2 step 0.1)	—	Controls randomness. Lower values make outputs more focused; higher values make them more varied.	—
Top P `top_p`	number (0…1 step 0.01)	—	Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.	—

Reasoning 1 param

Parameter	Type	Default	Description	Condition
Reasoning effort `reasoning_effort`	enum (minimal \| low \| medium \| high)	—	Controls how much reasoning and searching the model performs before producing the report.	—

Metadata 8 params

Parameter	Type	Default	Description	Condition
Search mode `search_mode`	enum (web \| academic \| sec)	—	Selects the corpus the model searches when grounding its answer.	—
Search recency filter `search_recency_filter`	enum (hour \| day \| week \| month \| year)	—	Restricts web search results to a recent time window.	—
Search domain filter `search_domain_filter`	string	—	Limits search to, or excludes, specific domains.	—
Search after date `search_after_date_filter`	string	—	Restricts search results to content published after this date (MM/DD/YYYY).	—
Search before date `search_before_date_filter`	string	—	Restricts search results to content published before this date (MM/DD/YYYY).	—
Search context size `web_search_options.search_context_size`	enum (low \| medium \| high)	"low"	Controls how much web search context is retrieved before generating the answer.	—
Return images `return_images`	boolean	false	Controls whether the response may include related images from the search.	—
Return related questions `return_related_questions`	boolean	false	Controls whether the response includes suggested follow-up questions.	—

Open Sonar Deep Research page →

Sonar Pro Perplexity 12 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max tokens `max_tokens`	integer (1…128000)	—	Maximum number of output tokens the model may generate.	—

Sampling 2 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number (0…2 step 0.1)	—	Controls randomness. Lower values make outputs more focused; higher values make them more varied.	—
Top P `top_p`	number (0…1 step 0.01)	—	Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.	—

Metadata 9 params

Parameter	Type	Default	Description	Condition
Search mode `search_mode`	enum (web \| academic \| sec)	—	Selects the corpus the model searches when grounding its answer.	—
Search recency filter `search_recency_filter`	enum (hour \| day \| week \| month \| year)	—	Restricts web search results to a recent time window.	—
Search domain filter `search_domain_filter`	string	—	Limits search to, or excludes, specific domains.	—
Search after date `search_after_date_filter`	string	—	Restricts search results to content published after this date (MM/DD/YYYY).	—
Search before date `search_before_date_filter`	string	—	Restricts search results to content published before this date (MM/DD/YYYY).	—
Search context size `web_search_options.search_context_size`	enum (low \| medium \| high)	"low"	Controls how much web search context is retrieved before generating the answer.	—
Return images `return_images`	boolean	false	Controls whether the response may include related images from the search.	—
Return related questions `return_related_questions`	boolean	false	Controls whether the response includes suggested follow-up questions.	—
Disable search `disable_search`	boolean	false	Turns off web search so the model answers from its own knowledge only.	—

Open Sonar Pro page →

Sonar Reasoning Pro Perplexity 12 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max tokens `max_tokens`	integer (1…128000)	—	Maximum number of output tokens the model may generate.	—

Sampling 2 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number (0…2 step 0.1)	—	Controls randomness. Lower values make outputs more focused; higher values make them more varied.	—
Top P `top_p`	number (0…1 step 0.01)	—	Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.	—

Metadata 9 params

Parameter	Type	Default	Description	Condition
Search mode `search_mode`	enum (web \| academic \| sec)	—	Selects the corpus the model searches when grounding its answer.	—
Search recency filter `search_recency_filter`	enum (hour \| day \| week \| month \| year)	—	Restricts web search results to a recent time window.	—
Search domain filter `search_domain_filter`	string	—	Limits search to, or excludes, specific domains.	—
Search after date `search_after_date_filter`	string	—	Restricts search results to content published after this date (MM/DD/YYYY).	—
Search before date `search_before_date_filter`	string	—	Restricts search results to content published before this date (MM/DD/YYYY).	—
Search context size `web_search_options.search_context_size`	enum (low \| medium \| high)	"low"	Controls how much web search context is retrieved before generating the answer.	—
Return images `return_images`	boolean	false	Controls whether the response may include related images from the search.	—
Return related questions `return_related_questions`	boolean	false	Controls whether the response includes suggested follow-up questions.	—
Disable search `disable_search`	boolean	false	Turns off web search so the model answers from its own knowledge only.	—

Open Sonar Reasoning Pro page →

Mimo V2.5 Xiaomi 5 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max completion tokens `max_completion_tokens`	integer (1…+∞)	—	Maximum number of tokens to generate, covering both the thinking trace and the final answer.	—

Sampling 2 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number (0…2 step 0.1)	1	Controls randomness. Lower values are more focused; higher values are more varied. Ignored while thinking is enabled, where it is forced to 1.0.	Not when thinking.type = "enabled"
Top P `top_p`	number (0…1 step 0.01)	0.95	Nucleus sampling cutoff. Ignored while thinking is enabled, where it is forced to 0.95.	Not when thinking.type = "enabled"

Reasoning 1 param

Parameter	Type	Default	Description	Condition
Thinking mode `thinking.type`	enum (enabled \| disabled)	"enabled"	Controls whether MiMo reasons step by step before answering. Enabled by default; set disabled to respond directly.	—

Output 1 param

Parameter	Type	Default	Description	Condition
Response format `response_format.type`	enum (text \| json_object)	"text"	Forces the response into plain text or a JSON object.	—

Open Mimo V2.5 page →

Mimo V2.5 Pro Xiaomi 8 params

Length 2 params

Parameter	Type	Default	Description	Condition
Max completion tokens `max_completion_tokens`	integer (1…+∞)	—	Maximum number of tokens to generate, covering both the thinking trace and the final answer.	—
Stop sequences `stop`	string	—	Up to a few sequences where generation stops; the stop text is not included in the output.	—

Sampling 4 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number (0…2 step 0.1)	1	Controls randomness. Lower values are more focused; higher values are more varied. Ignored while thinking is enabled, where it is forced to 1.0.	Not when thinking.type = "enabled"
Top P `top_p`	number (0…1 step 0.01)	0.95	Nucleus sampling cutoff. Ignored while thinking is enabled, where it is forced to 0.95.	Not when thinking.type = "enabled"
Presence penalty `presence_penalty`	number (-2…2 step 0.1)	0	Penalizes tokens that have already appeared, encouraging the model to introduce new topics.	—
Frequency penalty `frequency_penalty`	number (-2…2 step 0.1)	0	Penalizes tokens in proportion to how often they have appeared, reducing verbatim repetition.	—

Reasoning 1 param

Parameter	Type	Default	Description	Condition
Thinking mode `thinking.type`	enum (enabled \| disabled)	"enabled"	Controls whether MiMo reasons step by step before answering. Enabled by default; set disabled to respond directly.	—

Output 1 param

Parameter	Type	Default	Description	Condition
Response format `response_format.type`	enum (text \| json_object)	"text"	Forces the response into plain text or a JSON object.	—

Open Mimo V2.5 Pro page →

Mimo V2.5 Xiaomi Subscription 5 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max completion tokens `max_completion_tokens`	integer (1…+∞)	—	Maximum number of tokens to generate, covering both the thinking trace and the final answer.	—

Sampling 2 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number (0…2 step 0.1)	1	Controls randomness. Lower values are more focused; higher values are more varied. Ignored while thinking is enabled, where it is forced to 1.0.	Not when thinking.type = "enabled"
Top P `top_p`	number (0…1 step 0.01)	0.95	Nucleus sampling cutoff. Ignored while thinking is enabled, where it is forced to 0.95.	Not when thinking.type = "enabled"

Reasoning 1 param

Parameter	Type	Default	Description	Condition
Thinking mode `thinking.type`	enum (enabled \| disabled)	"enabled"	Controls whether MiMo reasons step by step before answering. Enabled by default; set disabled to respond directly.	—

Output 1 param

Parameter	Type	Default	Description	Condition
Response format `response_format.type`	enum (text \| json_object)	"text"	Forces the response into plain text or a JSON object.	—

Open Mimo V2.5 page →

Inkling Thinking Machines 5 params

Length 1 param

Parameter	Type	Default	Description	Condition
Max tokens `max_tokens`	integer (1…+∞)	—	Maximum number of output tokens the model may generate.	—

Sampling 2 params

Parameter	Type	Default	Description	Condition
Temperature `temperature`	number	1	Controls randomness. Lower values make outputs more focused; higher values make them more varied.	—
Top P `top_p`	number (0…1)	1	Controls nucleus sampling by limiting generation to tokens within the selected cumulative probability.	—

Reasoning 2 params

Parameter	Type	Default	Description	Condition
Reasoning effort `reasoning_effort`	enum (none \| minimal \| low \| medium \| high \| xhigh)	"high"	Controls how much thinking Inkling performs before answering. Accepts a preset name or a number between 0.0 and 0.99; presets map to numeric effort levels (none=0.0, minimal=0.1, low=0.2, medium=0.7, high=0.9, xhigh=0.99).	—
Separate reasoning `separate_reasoning`	boolean	true	Returns the model's reasoning in a dedicated reasoning_content field instead of interleaving it with the final message content.	—

Open Inkling page →

Every parameter,
for every model.

Browse by parameter

How to use

Catalog API

Single model

JSON Schema

Logos

Contribute

Every parameter,for every model.

OpenAI

Anthropic

Google

Z.ai

MiniMax

Nvidia

Mistral

Alibaba

Moonshot AI

xAI

Cohere

DeepSeek

Meta

Perplexity

Xiaomi

Thinking Machines

Browse by parameter

Every parameter,
for every model.