How does MiniMax text-to-video generation work?

MiniMax text-to-video generation accepts a text prompt describing the desired video content, scene composition, visual style, motion, and duration. The system processes the prompt through a diffusion-based video generation pipeline that produces video frames at the requested resolution and frame rate. You can specify camera movements (pan, zoom, dolly), lighting conditions, and color palettes in natural language. The pipeline generates 24 or 30 frames per second depending on the preset, with output durations from 2 to 60 seconds per clip. Longer videos are composited from multiple generation passes with scene transition smoothing.

What resolutions and formats does MiniMax video support?

MiniMax generates video at 720p (1280x720) and 1080p (1920x1080) resolutions in landscape, portrait, and square aspect ratios. Output formats include MP4 with H.264 encoding and WebM with VP9 encoding. Frame rates are configurable at 24fps or 30fps. 1080p generation requires approximately 40% more processing time per second of video compared to 720p. All output includes synchronized audio tracks at 44.1kHz sample rate if audio generation is enabled. Maximum file size per request is 500MB. Generated videos are hosted on MiniMax servers for 14 days by default; permanent storage is available on paid plans.

What style presets are available for video generation?

MiniMax provides 12 built-in style presets: Cinematic, Anime, 3D Render, Watercolor, Pixel Art, Oil Painting, Flat Vector, Claymation, Pencil Sketch, Photorealistic, Neon Cyberpunk, and Vintage Film. Each preset adjusts the diffusion model's guidance parameters, color distribution, and motion characteristics. You can blend two presets with a mix ratio parameter (0.0 to 1.0) for hybrid styles. Custom style transfer is available by uploading a reference image that the generation pipeline uses for visual guidance. Style presets are compatible with all resolution and duration options.

How do I write effective prompts for MiniMax video generation?

Effective MiniMax video prompts include: a clear subject with descriptive attributes, the environment or background setting, specific actions or motion descriptions, lighting and camera direction, and the desired mood or atmosphere. Example structure: 'A [subject] with [attributes] performing [action] in [environment], [lighting], [camera movement], [mood].' Prompts are limited to 3,000 characters. The platform hub includes a prompt engineering guide with 40 worked examples across different use cases. Negative prompts can exclude unwanted elements by prefixing terms with a minus sign. The generation endpoint returns a list of suggested prompt improvements when the input is ambiguous.

Can MiniMax generate videos in batch?

Yes, MiniMax supports batch video generation through a dedicated batch endpoint that accepts up to 50 prompts in a single request. Each prompt in the batch is processed independently with the same resolution, duration, and style settings. Batch jobs run asynchronously — you receive a job ID immediately and poll for completion or configure a webhook URL for notification. Batch processing benefits from parallelization across GPU clusters: 50 short clips at 720p typically complete within 15 to 30 minutes. Batch pricing is discounted 20% relative to individual generation requests. The batch dashboard in the platform hub shows progress, estimated completion time, and per-clip download links.

MiniMax Video Generation Tools

Text-to-Video Pipeline

MiniMax converts natural language prompts into video clips through a diffusion-based generation pipeline with configurable resolution, duration, frame rate, and style.

The video generation pipeline accepts a text prompt up to 3,000 characters describing scene content, camera movement, lighting, and visual style. MiniMax processes the prompt through a diffusion model trained on paired text-video data, producing frames at the requested resolution sequentially. Each frame passes through a post-processing stage that applies color correction, temporal smoothing, and optional audio synchronization. Output videos range from 2 to 60 seconds in duration. Longer outputs composite multiple generation segments with automatic scene transition detection and blending.

Generation time scales with resolution and duration. A 5-second 720p clip at 24fps typically completes in 45 to 90 seconds on the US West endpoint under normal load. A 10-second 1080p clip at 30fps requires 3 to 5 minutes. The asynchronous generation endpoint returns a job ID immediately so your application can poll for completion while handling other tasks. Webhook notifications are available for completed, failed, and partially-completed jobs.

Video Pipeline Summary:

MiniMax video generation covers text-to-video at 720p/1080p, 12 style presets, batch processing of up to 50 prompts, prompt-based scene editing, and frame-level control over camera motion and lighting. Output in MP4 or WebM with optional audio tracks.

Resolution and Format Options

Generate at 720p or 1080p in landscape, portrait, or square aspect ratios with MP4 or WebM output.

MiniMax supports two resolutions: 720p (1280x720) and 1080p (1920x1080). Both resolutions are available in 16:9 landscape, 9:16 portrait, and 1:1 square aspect ratios. The 1080p tier uses a higher-fidelity diffusion pass with approximately 40% longer generation time per second of video. Frame rate options are 24fps for cinematic motion and 30fps for standard video playback. Output encoding uses H.264 in MP4 containers and VP9 in WebM containers. Maximum output file size per single clip is 500MB. For clips approaching this limit, MiniMax automatically segments the output into downloadable parts.

Generated videos include optional audio tracks synthesized from text-to-speech or ambient sound generation. Audio is embedded as AAC at 44.1kHz. If your prompt describes specific sounds ("waves crashing," "keyboard typing"), the audio synthesis pipeline attempts to match those descriptions. Audio generation adds approximately 20% to total generation time. Disable audio for silent output or provide your own audio track through the editing endpoint.

Style Presets and Custom Styling

Choose from 12 built-in style presets or blend multiple presets for unique visual identities across video projects.

MiniMax ships with 12 style presets: Cinematic, Anime, 3D Render, Watercolor, Pixel Art, Oil Painting, Flat Vector, Claymation, Pencil Sketch, Photorealistic, Neon Cyberpunk, and Vintage Film. Each preset configures the diffusion model's guidance scale, color temperature, motion interpolation method, and noise scheduling. The presets work across all resolutions and durations without additional tuning. A mix parameter (0.0 to 1.0) blends two presets: 0.3 mixes 30% of preset A with 70% of preset B. Custom style transfer accepts a reference image that the pipeline uses as visual guidance through a CLIP-based style embedding extraction step.

Prompt-based style control works alongside presets. Describe "hand-drawn animation with visible pencil lines" in the prompt even when the Anime preset is selected, and the model weights the prompt's stylistic instructions against the preset's default parameters. The prompt engineering guide on the developer resources page includes examples of prompt-preset combinations for common visual styles.

Prompt Engineering and Batch Production

Write structured prompts for consistent video output and run batch jobs of up to 50 clips with asynchronous processing.

MiniMax video prompts follow a structured format for best results. Include a subject (what appears in frame), attributes (color, size, texture, material), action (what is happening or moving), environment (background, setting, time of day), camera (angle, movement, shot type), lighting (direction, intensity, color temperature), and mood (atmosphere, emotional tone). Separate these elements with commas. The platform evaluates prompt quality automatically and returns suggestions if key elements are missing. A prompt quality score from 0 to 100 appears in every API response; scores above 70 typically produce reliable output.

Negative prompts use a minus prefix to exclude elements: "-blurry, -distorted faces, -watermarks, -text overlay." Negative prompts are capped at 500 characters. The model treats negative prompt terms as strong avoidance signals during the denoising steps. Use negative prompts sparingly — too many exclusions can degrade output quality by over-constraining the generation path.

Batch Generation Workflow

Submit up to 50 prompts in a single batch request with shared resolution and style settings for efficient high-volume production.

The batch endpoint accepts a prompts array of up to 50 entries, each with its own text description and optional style overrides. All clips in a batch share the same resolution, duration, frame rate, and format settings, though individual clips can specify custom style mix ratios. Batch jobs process asynchronously: the API returns a batch_id immediately. Poll the batch status endpoint or register a webhook URL for completion notification. Batch processing distributes clips across available GPU capacity, so 50 short 720p clips often complete in 15 to 30 minutes rather than the 75+ minutes serial processing would require.

Batch pricing discounts 20% relative to individual generation requests at the same settings. The platform hub batch dashboard displays job progress with per-clip thumbnails, estimated completion time, and configurable notification preferences. Completed batch outputs are available for 14 days on MiniMax servers. Paid plans include permanent storage and batch download as a ZIP archive. Organize batches into projects and folders for multi-campaign video production workflows.

Video Specification Reference

Complete reference of MiniMax video generation parameters, available options, and technical limits.

Parameter	Options	Limits
Resolution	720p (1280x720), 1080p (1920x1080)	16:9, 9:16, 1:1 aspect ratios
Duration	2 to 60 seconds per clip	Longer clips composited from segments
Frame rate	24fps, 30fps	FPS applies to entire clip
Output formats	MP4 (H.264), WebM (VP9)	Max 500MB per clip
Style presets	12 built-in presets	Blend 2 presets with mix ratio
Prompt length	Up to 3,000 characters	Negative prompts capped at 500 chars
Batch size	Up to 50 prompts per request	Shared resolution/duration per batch
Audio generation	Text-to-speech, ambient sound, silent	AAC 44.1kHz, adds ~20% generation time
Storage	14 days (free), permanent (paid)	Batch ZIP downloads on paid plans
Custom style reference	Upload reference image for style transfer	JPEG, PNG up to 10MB

MiniMax Video Generation Tools

Text-to-Video Pipeline

Resolution and Format Options

Style Presets and Custom Styling

Prompt Engineering and Batch Production

Batch Generation Workflow

Video Specification Reference

How Creative Teams Use MiniMax Video

Frequently Asked Questions

Popular Searches on MiniMax

MiniMax Video Generation Tools

Text-to-Video Pipeline

Resolution and Format Options

Style Presets and Custom Styling

Prompt Engineering and Batch Production

Batch Generation Workflow

Video Specification Reference

How Creative Teams Use MiniMax Video

Frequently Asked Questions

Related Services

AI Models

Conversational AI

Developer API

Platform Hub

GitHub Resources

Popular Searches on MiniMax