MiniMax

MiniMax Video Generation Tools

Generate short-form video from text prompts. MiniMax handles 720p and 1080p output at 24 or 30 frames per second with 12 built-in style presets, scene composition controls, and batch processing for high-volume production.

Text-to-Video Pipeline

MiniMax converts natural language prompts into video clips through a diffusion-based generation pipeline with configurable resolution, duration, frame rate, and style.

The video generation pipeline accepts a text prompt up to 3,000 characters describing scene content, camera movement, lighting, and visual style. MiniMax processes the prompt through a diffusion model trained on paired text-video data, producing frames at the requested resolution sequentially. Each frame passes through a post-processing stage that applies color correction, temporal smoothing, and optional audio synchronization. Output videos range from 2 to 60 seconds in duration. Longer outputs composite multiple generation segments with automatic scene transition detection and blending.

Generation time scales with resolution and duration. A 5-second 720p clip at 24fps typically completes in 45 to 90 seconds on the US West endpoint under normal load. A 10-second 1080p clip at 30fps requires 3 to 5 minutes. The asynchronous generation endpoint returns a job ID immediately so your application can poll for completion while handling other tasks. Webhook notifications are available for completed, failed, and partially-completed jobs.

Video Pipeline Summary:

MiniMax video generation covers text-to-video at 720p/1080p, 12 style presets, batch processing of up to 50 prompts, prompt-based scene editing, and frame-level control over camera motion and lighting. Output in MP4 or WebM with optional audio tracks.

Resolution and Format Options

Generate at 720p or 1080p in landscape, portrait, or square aspect ratios with MP4 or WebM output.

MiniMax supports two resolutions: 720p (1280x720) and 1080p (1920x1080). Both resolutions are available in 16:9 landscape, 9:16 portrait, and 1:1 square aspect ratios. The 1080p tier uses a higher-fidelity diffusion pass with approximately 40% longer generation time per second of video. Frame rate options are 24fps for cinematic motion and 30fps for standard video playback. Output encoding uses H.264 in MP4 containers and VP9 in WebM containers. Maximum output file size per single clip is 500MB. For clips approaching this limit, MiniMax automatically segments the output into downloadable parts.

Generated videos include optional audio tracks synthesized from text-to-speech or ambient sound generation. Audio is embedded as AAC at 44.1kHz. If your prompt describes specific sounds ("waves crashing," "keyboard typing"), the audio synthesis pipeline attempts to match those descriptions. Audio generation adds approximately 20% to total generation time. Disable audio for silent output or provide your own audio track through the editing endpoint.

Style Presets and Custom Styling

Choose from 12 built-in style presets or blend multiple presets for unique visual identities across video projects.

MiniMax ships with 12 style presets: Cinematic, Anime, 3D Render, Watercolor, Pixel Art, Oil Painting, Flat Vector, Claymation, Pencil Sketch, Photorealistic, Neon Cyberpunk, and Vintage Film. Each preset configures the diffusion model's guidance scale, color temperature, motion interpolation method, and noise scheduling. The presets work across all resolutions and durations without additional tuning. A mix parameter (0.0 to 1.0) blends two presets: 0.3 mixes 30% of preset A with 70% of preset B. Custom style transfer accepts a reference image that the pipeline uses as visual guidance through a CLIP-based style embedding extraction step.

Prompt-based style control works alongside presets. Describe "hand-drawn animation with visible pencil lines" in the prompt even when the Anime preset is selected, and the model weights the prompt's stylistic instructions against the preset's default parameters. The prompt engineering guide on the developer resources page includes examples of prompt-preset combinations for common visual styles.

Prompt Engineering and Batch Production

Write structured prompts for consistent video output and run batch jobs of up to 50 clips with asynchronous processing.

MiniMax video prompts follow a structured format for best results. Include a subject (what appears in frame), attributes (color, size, texture, material), action (what is happening or moving), environment (background, setting, time of day), camera (angle, movement, shot type), lighting (direction, intensity, color temperature), and mood (atmosphere, emotional tone). Separate these elements with commas. The platform evaluates prompt quality automatically and returns suggestions if key elements are missing. A prompt quality score from 0 to 100 appears in every API response; scores above 70 typically produce reliable output.

Negative prompts use a minus prefix to exclude elements: "-blurry, -distorted faces, -watermarks, -text overlay." Negative prompts are capped at 500 characters. The model treats negative prompt terms as strong avoidance signals during the denoising steps. Use negative prompts sparingly — too many exclusions can degrade output quality by over-constraining the generation path.

Batch Generation Workflow

Submit up to 50 prompts in a single batch request with shared resolution and style settings for efficient high-volume production.

The batch endpoint accepts a prompts array of up to 50 entries, each with its own text description and optional style overrides. All clips in a batch share the same resolution, duration, frame rate, and format settings, though individual clips can specify custom style mix ratios. Batch jobs process asynchronously: the API returns a batch_id immediately. Poll the batch status endpoint or register a webhook URL for completion notification. Batch processing distributes clips across available GPU capacity, so 50 short 720p clips often complete in 15 to 30 minutes rather than the 75+ minutes serial processing would require.

Batch pricing discounts 20% relative to individual generation requests at the same settings. The platform hub batch dashboard displays job progress with per-clip thumbnails, estimated completion time, and configurable notification preferences. Completed batch outputs are available for 14 days on MiniMax servers. Paid plans include permanent storage and batch download as a ZIP archive. Organize batches into projects and folders for multi-campaign video production workflows.

Video Specification Reference

Complete reference of MiniMax video generation parameters, available options, and technical limits.

Parameter Options Limits
Resolution 720p (1280x720), 1080p (1920x1080) 16:9, 9:16, 1:1 aspect ratios
Duration 2 to 60 seconds per clip Longer clips composited from segments
Frame rate 24fps, 30fps FPS applies to entire clip
Output formats MP4 (H.264), WebM (VP9) Max 500MB per clip
Style presets 12 built-in presets Blend 2 presets with mix ratio
Prompt length Up to 3,000 characters Negative prompts capped at 500 chars
Batch size Up to 50 prompts per request Shared resolution/duration per batch
Audio generation Text-to-speech, ambient sound, silent AAC 44.1kHz, adds ~20% generation time
Storage 14 days (free), permanent (paid) Batch ZIP downloads on paid plans
Custom style reference Upload reference image for style transfer JPEG, PNG up to 10MB

How Creative Teams Use MiniMax Video

Frequently Asked Questions

Popular Searches on MiniMax