Integrate MiniMax AI models into your applications with a clean REST API built for developers who value speed, clarity, and predictability.
The MiniMax API is a RESTful interface that accepts JSON request bodies and returns JSON responses, with streaming support for real-time chat and video generation.
The MiniMax API operates over HTTPS on api.minimax.gr.com. Every request requires a valid Bearer token sent in the Authorization header. Response codes follow standard HTTP semantics — 200 for success, 401 for authentication failures, 429 when you hit rate limits, and 5xx for server-side issues. All endpoints support idempotency keys via the Idempotency-Key header, letting you safely retry requests without duplicating operations.
The API is versioned through a date-based scheme in the URL path. The current version is /v1. Older versions receive a six-month deprecation window with clear migration guides. Breaking changes are announced through the changelog and the developer mailing list. Non-breaking additions — new model IDs, optional parameters, additional response fields — appear without version bumps.
All endpoints live under https://api.minimax.gr.com/v1. Chat, embeddings, video, and model management endpoints share a common auth model and error format. Use /v1/models to list available models and their capabilities before building your integration.
Every MiniMax API call authenticates with a Bearer token generated from your platform hub dashboard.
Create an API key in the platform hub under Settings > API Keys. Keys support scope restrictions — you can limit a key to read-only access, video-only endpoints, or specific models. Production deployments should use separate keys for development and production environments. Rotate keys on a regular schedule and monitor key usage in the dashboard's activity log.
The authorization header format is Authorization: Bearer mmx_live_a1b2c3d4e5f6g7h8i9j0. Keys prefixed with mmx_live_ are production keys. Test keys use the mmx_test_ prefix and route to a sandbox environment with no billing impact. Never commit API keys to version control — use environment variables or a secrets manager.
MiniMax enforces tiered rate limits with transparent headers so your application can adapt without guesswork.
Rate limits apply per API key, not per IP address. Every response includes X-RateLimit-Limit, X-RateLimit-Remaining, and X-RateLimit-Reset headers. When you exceed your limit, the API returns HTTP 429 with a Retry-After header indicating the number of seconds to wait. Free tier accounts receive 60 requests per minute. Pay-as-you-go accounts get 600 RPM with higher limits on video endpoints. Enterprise customers receive dedicated capacity with burst allowances negotiated during onboarding.
Implement exponential backoff in your client: wait Retry-After seconds on the first 429, double the wait on subsequent failures, and add jitter to avoid thundering-herd retry patterns. The official MiniMax SDKs handle this automatically.
The MiniMax API groups endpoints into four categories — Chat, Embeddings, Video, and Models — each with its own request shape and response convention.
Chat: The chat completions endpoint (POST /v1/chat/completions) accepts a messages array with role/content pairs. It supports system prompts, multi-turn conversations, function calling, and streaming via SSE. Temperature, top_p, and max_tokens give you granular control over output.
Embeddings: Call POST /v1/embeddings with an array of input strings and a model ID. Returns fixed-dimension float vectors suitable for semantic search, clustering, and RAG pipelines. Batch up to 2,048 input strings per request.
Video: The video generation endpoint (POST /v1/video/generations) takes a text prompt, optional reference image, and output parameters including resolution and duration. Generation is asynchronous — you receive a task ID immediately, then poll GET /v1/video/generations/{task_id} for completion status.
Models: GET /v1/models returns available models with metadata including context window size, pricing per token, capabilities (chat, embeddings, vision), and deprecation status. GET /v1/models/{model_id} provides detailed information for a specific model.
All MiniMax API responses follow a consistent JSON envelope — success payloads go in data, errors go in error, and metadata lives in a top-level meta object.
A successful response looks like: {"data": {...}, "meta": {"request_id": "req_abc123", "model": "minimax-chat-v2", "usage": {"prompt_tokens": 42, "completion_tokens": 18}}}. The request_id field is critical for support inquiries — include it when reporting unexpected behavior. Streaming responses use SSE frames, each containing a JSON chunk with a delta object that accumulates into the full response.
Error responses use the structure {"error": {"code": "invalid_request", "message": "The 'model' field is required", "details": {"field": "model"}}}. Common error codes include invalid_request, authentication_failed, rate_limit_exceeded, model_not_found, and server_error.
The MiniMax SDKs wrap the REST API in idiomatic language interfaces — install with one command, make your first call in under ten lines of code.
Python: pip install minimax-sdk. Import from minimax import MiniMax, instantiate with client = MiniMax(api_key="..."), and call client.chat.completions.create(model="minimax-chat-v2", messages=[{"role": "user", "content": "Hello"}]).
JavaScript: npm install @minimax/sdk. Import import MiniMax from '@minimax/sdk', create const client = new MiniMax({ apiKey: '...' }), and call await client.chat.completions.create({ model: 'minimax-chat-v2', messages: [{ role: 'user', content: 'Hello' }] }).
Go: go get github.com/minimax/minimax-go. Import the package, create a client with client := minimax.NewClient("..."), and call client.Chat.Completions(ctx, &minimax.ChatCompletionRequest{Model: "minimax-chat-v2", Messages: []minimax.Message{{Role: "user", Content: "Hello"}}}).
Test the MiniMax API directly from your terminal with these curl snippets — replace the placeholder key and model ID with your credentials.
Chat completion: curl https://api.minimax.gr.com/v1/chat/completions -H "Authorization: Bearer $MINIMAX_API_KEY" -H "Content-Type: application/json" -d '{"model":"minimax-chat-v2","messages":[{"role":"user","content":"Explain REST APIs in one paragraph."}]}'
Generate embeddings: curl https://api.minimax.gr.com/v1/embeddings -H "Authorization: Bearer $MINIMAX_API_KEY" -H "Content-Type: application/json" -d '{"model":"minimax-embed-v1","input":["MiniMax provides powerful AI tools for developers."]}'
List models: curl https://api.minimax.gr.com/v1/models -H "Authorization: Bearer $MINIMAX_API_KEY"
Start video generation: curl https://api.minimax.gr.com/v1/video/generations -H "Authorization: Bearer $MINIMAX_API_KEY" -H "Content-Type: application/json" -d '{"prompt":"A golden retriever running through a field of sunflowers at golden hour","duration":5}'
Robust MiniMax integrations handle errors at three levels — network timeouts, HTTP error codes, and application-level response validation.
Set reasonable timeouts: 30 seconds for chat completions, 120 seconds for video generation polling. Wrap API calls in retry logic that respects the Retry-After header. Log the request_id from every response so you can trace failures back through the MiniMax infrastructure. For production services, implement circuit breakers that pause requests when error rates exceed a threshold, giving the API time to recover.
The table below lists every available MiniMax API endpoint with its HTTP method, description, and rate limit tier.
| Endpoint | Method | Description | Rate Limit |
|---|---|---|---|
| /v1/chat/completions | POST | Generate chat completions with streaming support | 600 RPM |
| /v1/embeddings | POST | Create vector embeddings for text inputs | 600 RPM |
| /v1/video/generations | POST | Submit video generation tasks | 120 RPM |
| /v1/video/generations/{id} | GET | Poll video generation task status | 300 RPM |
| /v1/models | GET | List all available models and their capabilities | 300 RPM |
| /v1/models/{id} | GET | Retrieve detailed metadata for a specific model | 300 RPM |
| /v1/files | POST | Upload files for fine-tuning or video reference | 60 RPM |
| /v1/files/{id} | GET | Retrieve file metadata and download URL | 300 RPM |
| /v1/fine-tunes | POST | Create a fine-tuning job on a base model | 30 RPM |
| /v1/fine-tunes/{id} | GET | Check fine-tuning job status and progress | 300 RPM |
"We migrated our entire inference pipeline to MiniMax in a single sprint. The API design is consistent across endpoints — same auth, same error format, same streaming contract. Our team didn't need a single support ticket during integration. The rate limit headers alone saved us from building custom throttling middleware."
— Rafael M. Costa, DevOps Lead, Horizon Cloud, Phoenix
MiniMax API authentication uses Bearer tokens generated from the platform hub. Create an API key under Settings, select the scopes your application needs, and include the key in the Authorization: Bearer YOUR_API_KEY header for every request. Keys prefixed with mmx_test_ route to a sandbox environment for development. Production keys use mmx_live_ and count against your billing. You can create up to 20 keys per account, monitor usage per key in the dashboard, and revoke compromised keys instantly. All authentication traffic travels over TLS 1.3.
Rate limits vary by endpoint and account tier. The free tier permits 60 requests per minute across all endpoints. Pay-as-you-go plans raise that to 600 RPM for chat and embeddings endpoints, 120 RPM for video generation submissions. Enterprise accounts can negotiate dedicated capacity with multi-thousand RPM limits. Every response includes rate limit headers: X-RateLimit-Limit (your ceiling), X-RateLimit-Remaining (calls left in this window), and X-RateLimit-Reset (when the window resets). Burst capacity is available on enterprise plans for traffic spikes.
The MiniMax API returns JSON for all standard responses. Successful responses wrap data inside a data key with metadata — including the model used, token counts, and a unique request ID — in a meta object. Error responses use an error key with code, message, and optional details fields. Streaming endpoints use server-sent events (SSE) with content type text/event-stream, emitting JSON chunks that accumulate into the final output. Timestamps follow ISO 8601 in UTC. File upload responses include signed URLs for retrieval.
Official MiniMax SDKs are available for Python (3.8+), JavaScript and TypeScript (Node.js 18+), and Go (1.21+). Each SDK provides idiomatic client creation, automatic retry with exponential backoff, streaming support, and type-safe request builders. Python SDK users get async support via AsyncMiniMax. JavaScript users can choose between CommonJS and ESM imports. The Go SDK follows standard library conventions with context-aware cancellation. Community-maintained libraries exist for Ruby and Java, listed in the integrations section. All official SDKs are open source on GitHub with permissive licensing.
MiniMax provides an API playground inside the platform hub. Log in, navigate to the Developer section, and use the interactive request builder to test any endpoint with authentication pre-filled. You can tweak parameters, inspect responses, and copy the equivalent curl command. For terminal-based testing, copy the curl examples from the API reference, set your MINIMAX_API_KEY environment variable, and run commands directly. New accounts start with free credits — enough for thousands of test calls. The test environment uses the mmx_test_ key prefix and mirrors the production API without billing impact.