MiniMax produces two large language models engineered for distinct workloads. MiniMax-Text-01 handles text-only tasks at 380B parameters. MiniMax-VL-01 adds vision support at 250B parameters. Both run on a unified inference stack with regional API endpoints.
MiniMax models use a dense transformer architecture optimized for throughput and output quality across text and multimodal tasks.
MiniMax-Text-01 runs a 380-billion-parameter dense transformer with grouped-query attention, SwiGLU activation, and rotary position embeddings. The model was pre-trained on 12 trillion tokens spanning code, scientific literature, web text, and multilingual corpora. Its 256K context window handles document sets that break smaller models. Inference speeds reach 65 tokens per second on the US West endpoint under typical concurrency of 32 simultaneous requests. Tokenizer vocabulary covers 150,000 entries with byte-level fallback encoding for rare characters.
MiniMax-VL-01 extends the architecture with a vision encoder that processes images up to 8,192 pixels on the longest edge. The vision tower uses a ViT-G architecture with 2 billion parameters, connected to the text backbone through a learned projection layer. Image tokens interleave with text tokens in the sequence, so the model reasons across modalities within a single forward pass. File formats accepted include JPEG, PNG, WebP, and TIFF. GIF frames are treated as sequential images. The model pinpoints regions in charts, reads handwritten text, and compares visual elements across multiple uploaded images.
MiniMax-Text-01 delivers 380B parameters with 256K context. MiniMax-VL-01 provides 250B parameters with 128K context and full vision support. Both models share the same tokenizer, API contract, and regional deployment infrastructure. Choose Text-01 for pure language tasks; add VL-01 when images drive decisions.
The text-only flagship handles extended reasoning, code generation, and document analysis at industrial scale.
MiniMax-Text-01 processes up to 256,000 tokens in a single forward pass. That covers full-length books, complete codebases up to roughly 40,000 lines, or multi-day chat logs. The model uses sliding window attention with a window size of 8,192 tokens and 32 attention heads across 96 transformer layers. Embedding dimension is 16,384. Pre-training data includes 65% English-language sources, 15% code repositories, 12% multilingual text, and 8% technical documentation. Fine-grained instruction tuning adds 800,000 curated prompt-response pairs covering 140 task categories.
Performance highlights: MMLU 87.2%, HumanEval pass@1 84.1%, GSM8K 91.5%, MATH 64.3%. These benchmarks were measured with greedy decoding at temperature 0. Latency stays under 800ms for prompts under 1,000 tokens and generation up to 500 tokens. The model supports structured JSON output via a constrained decoding grammar. Function calling follows the standard tool-use schema with parallel tool invocation for up to 8 simultaneous function calls per generation step.
MiniMax-VL-01 adds a 2B-parameter vision encoder to the text backbone for multimodal reasoning and image understanding.
MiniMax-VL-01 accepts text and image inputs together, producing text outputs. The vision encoder processes images at native resolution up to 8,192 pixels on the long edge, downscaling larger inputs automatically while preserving aspect ratio. Image tokens occupy roughly the same sequence budget as 512 text tokens per moderate-complexity image, meaning the effective text-only context window shrinks by about 512 tokens per image in the input. The model handles multi-image conversations: you can attach up to 20 images per request and ask comparative questions.
Benchmark results for multimodal performance: MMBench 83.6%, ChartQA 81.9%, DocVQA 90.2%, MathVista 62.8%. The model performs well on infographics, scanned documents, photographs with embedded text, and medical imaging tasks. MiniMax-VL-01 also anchors the video understanding pipeline: video frames are extracted at 1 frame per second, processed as image sequences, and analyzed with frame-level reasoning before producing a summary. This pipeline supports up to 3-minute video clips submitted through the API.
MiniMax models deliver production throughput via optimized inference and offer supervised fine-tuning for domain adaptation.
Inference throughput peaks at 65 tokens per second per request for MiniMax-Text-01 and 48 tokens per second for MiniMax-VL-01 under typical load. The platform scales horizontally across GPU clusters, so throughput remains consistent as concurrent requests rise. Rate limits vary by plan: the free tier caps at 60 requests per minute, pay-as-you-go at 600 RPM, and enterprise plans offer dedicated throughput with no hard rate cap. Response time p95 stays under 1.2 seconds for prompts under 2,000 tokens across all supported regions.
Supervised fine-tuning accepts datasets in JSONL format with up to 50,000 training examples. The platform handles learning rate scheduling automatically, using cosine decay from an initial value you specify. Training runs on dedicated GPU clusters isolated from inference traffic, so fine-tuning jobs never degrade production response times. A typical 10,000-example dataset completes training in 4 to 6 hours. Fine-tuned models deploy as private endpoints accessible only to your organization. You can compare base and fine-tuned outputs side by side in the platform hub before routing production traffic.
Quantized variants of both models run at 8-bit precision for applications where latency matters more than maximum accuracy. Quantized MiniMax-Text-01 delivers 120 tokens per second with a 1.8% relative drop on MMLU. Quantized MiniMax-VL-01 reaches 85 tokens per second with a 2.1% drop on MMBench. Both quantized models use the same API interface as their full-precision counterparts; swap the model name parameter to switch.
All MiniMax model benchmarks use standardized evaluation protocols with public test sets and greedy decoding.
MiniMax publishes benchmark results using the EleutherAI evaluation harness configured for 5-shot prompting on MMLU, 0-shot on HumanEval, 8-shot on GSM8K, and 4-shot on MATH. Vision benchmarks use the official evaluation scripts and datasets published by each benchmark's maintainers. No task-specific prompt engineering, retrieval augmentation, or ensembling was applied. MiniMax benchmarks are reproducible: the evaluation configurations and prompt templates are included in the technical report available through the developer resources section.
Select MiniMax-Text-01 for maximal language quality; add MiniMax-VL-01 when images, charts, or documents drive your workflow.
For text-only applications — chatbots, summarization pipelines, code assistants, translation services — MiniMax-Text-01 provides the strongest results and highest throughput. For applications with visual inputs — document processing, chart interpretation, UI screenshot analysis, medical image review — MiniMax-VL-01 is the appropriate choice. Many teams deploy both models: Text-01 handles user-facing chat and text generation, while VL-01 processes uploaded files and images in a parallel pipeline. The unified API contract means switching between models requires changing a single model identifier parameter in your request.
Detailed specifications for each MiniMax model, including parameter counts, context windows, recommended use cases, and regional availability.
| Model Name | Parameters | Context Window | Use Case | Availability |
|---|---|---|---|---|
| MiniMax-Text-01 | 380B | 256K tokens | Text generation, code, reasoning, summarization | All regions |
| MiniMax-VL-01 | 250B | 128K tokens | Image understanding, document analysis, visual QA | All regions |
| MiniMax-Text-01 (8-bit) | 380B (quantized) | 256K tokens | Low-latency text generation, high-throughput pipelines | US West, US East |
| MiniMax-VL-01 (8-bit) | 250B (quantized) | 128K tokens | Real-time image analysis, batch document processing | US West, US East |
"We run MiniMax-Text-01 across 14 product surfaces, from summarization to structured data extraction. The 256K context window eliminated our chunking pipeline entirely. We went from managing 12 microservices for document processing down to two — the model handles the rest."
— Sarah J. Okafor, VP of Innovation, Meridian Digital, Atlanta
MiniMax offers two primary language models: MiniMax-Text-01, a text-only model with 380 billion parameters and a 256K context window, and MiniMax-VL-01, a vision-language model with 250 billion parameters capable of processing images alongside text. Both models support function calling, structured output, and streaming responses. MiniMax-Text-01 handles extended reasoning tasks and document analysis, while MiniMax-VL-01 adds image understanding for multimodal applications.
MiniMax-Text-01 supports a 256,000-token context window, equivalent to roughly 200,000 words or 500 pages of text. MiniMax-VL-01 offers a 128,000-token window for combined text and image inputs. These expanded context windows let developers process entire codebases, full-length research papers, or multi-hour conversation transcripts in a single API call without chunking or summarization middleware.
Yes, MiniMax supports supervised fine-tuning on both MiniMax-Text-01 and MiniMax-VL-01. You upload labeled datasets through the platform hub, configure training hyperparameters including learning rate and epoch count, and receive a dedicated fine-tuned model endpoint. Fine-tuned models inherit the same context window and inference infrastructure as base models. Pricing for fine-tuning is based on training tokens processed, and fine-tuned models are billed at standard inference rates plus a modest hosting surcharge.
MiniMax-Text-01 achieves strong results across MMLU (87.2%), HumanEval (84.1%), GSM8K (91.5%), and MATH (64.3%). MiniMax-VL-01 posts competitive scores on MMBench (83.6%), ChartQA (81.9%), and DocVQA (90.2%). Both models were independently benchmarked against the full evaluation suite published in the MiniMax technical report. These results place MiniMax models among top-tier large language models for reasoning, code generation, and multimodal understanding tasks.
MiniMax models are accessible through regional API endpoints in US West, US East, EU Frankfurt, and Asia Singapore. Each region serves production workloads with latency tiers based on geographic proximity to the data center. Enterprise customers can select specific regions for data residency compliance. API access requires credentials generated through the platform hub, with rate limits configurable per team and per project. Free tier accounts get limited access to all model endpoints for testing.