Build multi-turn chat applications with streaming responses, function calling, system prompts, and persistent conversation management. MiniMax conversational AI runs on MiniMax-Text-01 for the strongest dialogue quality across every interaction.
MiniMax processes conversation history as a message array, enabling coherent multi-turn dialogue across hundreds of exchanges.
The chat completion API accepts a messages list where each entry carries a role and content. Roles: system (behavior instructions), user (end-user messages), assistant (model responses), and tool (function call results). MiniMax-Text-01 reads the full message array on every request. The model's 256K context window accommodates conversation histories spanning tens of thousands of words — roughly 40-50 pages of back-and-forth dialogue. For applications with very long user sessions, implement a sliding window that retains the most recent N turns plus a summary of earlier exchanges.
Response generation follows the message pattern strictly. The model does not assume omitted context or infer past states from outside the message array. Every API call is stateless from the server's perspective; you own the conversation history. This design gives you full control over what the model sees, making it straightforward to edit, truncate, or filter messages before each request. The rate of token consumption grows linearly with the total message length, so monitor cumulative token counts for long-running sessions.
MiniMax conversational AI provides streaming token delivery, function calling with parallel tool invocation, system prompt configuration up to 32K characters, and client-managed conversation history. Build chatbots, voice assistants, support agents, and interactive product guides with a single API endpoint.
MiniMax streams tokens in real time via SSE, delivering sub-100ms time-to-first-token for responsive chat experiences.
Enable streaming by setting stream: true in your API request. The SSE endpoint pushes each generated token with a JSON payload containing the token text, cumulative log probability, an index counter, and a finish_reason field that signals generation completion. The Python SDK wraps this in an async iterator: `async for chunk in client.chat.stream(messages):` yields tokens as they arrive. Time to first token averages 85ms for prompts under 200 tokens on the US West endpoint, measured at the 50th percentile under typical concurrent load.
Streaming handles interruptions cleanly. If the user clicks a stop button or the client disconnects, the server terminates generation and charges only for tokens produced up to the cancellation point. Resume functionality is not built in — send a new request with the full message history including the partial assistant response to continue. For voice applications, pair streaming with a text-to-speech pipeline that converts tokens to audio chunks as they arrive, minimizing end-to-end latency.
MiniMax function calling lets the model decide when to invoke external APIs, query databases, or trigger actions during a conversation.
Define available functions in the tools array using JSON Schema for parameter descriptions. Each tool entry specifies a name, description, and an input_schema object. When the model determines a tool call is needed, it outputs a tool_calls block with the function name and a JSON arguments string. Your application executes the function and appends the result as a tool role message. The model then continues the conversation with the tool output in context.
Parallel tool invocation is supported: MiniMax can call up to 8 functions in a single generation step when the conversation requires multiple independent lookups — for example, checking weather in three cities or fetching customer records from separate databases. The tool_choice parameter provides routing control. Set it to auto for model-determined invocation, none to disable tool calls entirely, or a specific function name to force the model to call that tool on the next turn. Nesting works naturally: the model can issue a tool call, receive the result, and trigger another tool in the same conversation flow without special handling.
System prompts define the assistant's role, tone, and constraints, persisting throughout the conversation.
Place a system role message at the start of the messages array with role: system. The model weights these instructions across all subsequent turns. Effective system prompts include a role declaration ("You are a financial advisor assistant"), output formatting rules ("Respond in JSON with fields summary, risk_level, and next_steps"), prohibited behaviors ("Do not provide medical diagnoses"), and stylistic preferences ("Use a professional tone with short paragraphs. Avoid exclamation marks."). System prompts can run up to 32,000 characters. For complex applications, embed domain knowledge, company policies, or reference material directly into the system prompt rather than including it in every user message.
System prompts remain active for all conversation turns until replaced. Send a new system message mid-conversation to update instructions — the model switches behavior from that point forward. This is useful for multi-stage workflows where the assistant should follow different rules during information gathering vs. action execution phases.
Complete reference of MiniMax conversational AI features with descriptions and API parameter names.
| Feature | Description | API Parameter |
|---|---|---|
| Multi-turn messages | Full conversation history passed as role/content array; model references all prior turns | messages |
| Streaming tokens | Real-time token delivery via SSE; async iterator in SDKs | stream |
| Function calling | Model invokes defined tools with JSON arguments; parallel up to 8 calls | tools, tool_choice |
| System prompts | Persistent behavior/rules in system role message; up to 32K characters | messages[role="system"] |
| Temperature control | Sampling temperature from 0.0 (deterministic) to 2.0 (high variance) | temperature |
| Max output tokens | Hard cap on generated tokens per response; up to 16,384 tokens | max_tokens |
| Stop sequences | Up to 4 strings that halt generation immediately when encountered | stop |
| Top-p sampling | Nucleus sampling threshold; 0.0 to 1.0 | top_p |
| Log probabilities | Return log probabilities for top N tokens per generation step | logprobs |
| JSON mode | Constrained decoding ensures valid JSON output | response_format |
"We switched our customer support chat from a rule-based system to MiniMax conversational AI and saw resolution rates climb from 62% to 84% within the first quarter. The function calling integration with our internal ticketing API was the deciding factor — the model dispatches tickets, checks order status, and issues refunds without any handoff to a human script."
— Benjamin L. Cross, Founder & CEO, Spectral Content, Miami
MiniMax maintains conversation state through a message array that you append to with each turn. The API accepts a messages parameter containing the full conversation history up to the context window limit. Each message includes a role (system, user, assistant, or tool) and content. MiniMax-Text-01 references the entire history when generating responses, enabling coherent dialogue across hundreds of turns. The 256K context window means conversations spanning tens of thousands of words stay on track without summarization.
MiniMax supports server-sent events (SSE) streaming for real-time token delivery. Set stream: true in the API request to receive tokens as they are generated. The streaming endpoint pushes each token with metadata including token index, cumulative log probability, and finish reason when generation completes. The Python and JavaScript SDKs include async iterator wrappers that yield tokens in a simple for-loop pattern. Streaming works with all model variants and supports function calling — the model streams text tokens until it hits a tool call trigger, then delivers the tool call as a complete JSON block.
MiniMax function calling uses a tools array in the API request where you define available functions with JSON Schema parameter descriptions. The model decides when to invoke a tool based on the conversation context, outputs a tool call with a generated function name and arguments, and then appends the tool result to the message history for the next model response. MiniMax supports parallel tool calls — up to 8 functions in a single generation step — and nested tool calls where one tool's output triggers another. The tool_choice parameter accepts auto, none, or a specific function name for deterministic routing.
MiniMax does not store conversation history server-side across sessions. You manage conversation context by including previous messages in the messages array with each API request. For persistent memory, implement a client-side message store that loads relevant history before each API call. The platform hub provides a conversation logging feature that records all API interactions for debugging and compliance auditing, but these logs are not automatically injected into future requests. Teams building long-term user relationships typically implement a retrieval pipeline that fetches past conversations from a vector store and injects them into the system prompt.
System prompts in MiniMax set the assistant's personality, tone, constraints, and behavioral rules for the entire conversation. Include a system message as the first entry in the messages array with role: system. The model weights system instructions throughout the dialogue. Effective system prompts specify the assistant's role, define output format preferences, list prohibited topics, and establish response style guidelines. System prompts persist across all conversation turns until overridden by a new system message. The maximum system prompt length is 32,000 characters.