MiniMax: One Platform, Five Modalities

MiniMax is a Shanghai-based AI company that has quietly built one of the more interesting multi-modal stacks in the space. Where most AI platforms specialize in one area, MiniMax covers text, speech, video, image, and music generation under a single API and product umbrella. The pitch is genuine integration rather than a loosely bolted-together collection of third-party models. Everything runs on MiniMax's own foundation models.

The company was founded in early 2022 and has since shipped a suite of products: the Hailuo AI video generator, MiniMax Audio for speech synthesis, a general-purpose LLM platform, a music generator, and an Agent product. All of these are accessible through their open developer platform at platform.minimax.io.

What It Actually Does

The core of the developer offering is API access to MiniMax's model families. On the text side, M2.5 is an open-weights model under MIT license that can autonomously plan and execute tasks without constant human guidance.

In benchmarks, M2.5 scores 80.2% on SWE-Bench Verified for programming. The architecture is a 230B MoE model with only 10B active parameters, which keeps inference fast and costs low.

For video, MiniMax operates under the Hailuo AI brand. Hailuo Minimax 2 stands out for its smooth motion, quick rendering, and affordability. From a technical standpoint, it generates high-quality video output with strong motion coherence. The latest iteration, Hailuo 2.3, supports 1080p output. That said, the model struggles with prompts that hinge on more detailed generations, such as fine-grained physics or abstract looping animations. Expect the overall feel of a scene to come through, but don't expect pixel-perfect prompt adherence on complex setups.

Who It Is For

MiniMax targets developers building production applications, not just people experimenting with a consumer UI. The open platform supports programmatic access to all modalities. The primary benefit is moving from prototype to production, with MiniMax's APIs, quota controls, and scalable pricing suited for real throughput.

Multimodality also reduces tool sprawl: if you need both video generation and narration, keeping them under one platform simplifies integration and operations.

It is also a strong fit for teams building agentic workflows. M2.5 was explicitly designed for agent use cases, with strong tool-use and search performance, and the company itself claims to run internal operations using their own Agent product.

Pricing Overview

Pricing is genuinely competitive, especially on the text side.

Model / Modality	Cost
M2.5 Standard (input)	$0.15 per 1M tokens
M2.5 Standard (output)	$1.20 per 1M tokens
M2.5 Lightning (100 TPS)	~$1 per continuous hour
Coding Plan Starter	$10/month
Coding Plan Pro	$20/month
Coding Plan Max	$50/month

M2.5 costs $0.15 per million input tokens and $1.20 per million output tokens, compared to Claude Opus 4 at $5/$25.

MiniMax claims that M2.5's output cost is one-tenth to one-twentieth that of Opus, Gemini 3 Pro, and GPT-5. Those are significant numbers if the model holds up in your specific workload, and for coding tasks it mostly does.

Speech and video are priced separately on a usage basis, with the Hailuo 2.3 Fast variant offering a roughly 50% cost reduction for batch video creation compared to the standard model.

Strengths and Limitations

The biggest strength is the price-to-performance ratio on M2.5, particularly for coding and agent use cases. M2.5 achieves SWE-Bench 80.2% using only 10B active parameters from a 230B MoE architecture, at roughly 1/20th the cost of Claude Opus with comparable coding performance. The open-weights MIT license is also a real differentiator: you can self-host if you need to keep data on-premises.

On the video side, motion quality and physics coherence are genuine strengths. Hailuo's key strength lies in its understanding of physics and continuity, making scenes feel more grounded and coherent compared to many competitors.

The limitations are real, though. The model may struggle or refuse prompts involving real celebrity or public figure faces due to ethical guardrails. Video generation is also slow: each task can take a couple of minutes or more, especially at high quality, and is not real-time. The platform is also less mature than OpenAI or Anthropic in terms of tooling, observability, and ecosystem integrations.

If you are already paying frontier prices for coding agents or multi-modal pipelines, MiniMax is worth a serious benchmarking session against your actual workloads. The pricing alone makes it hard to ignore.

MiniMax

Key Features

MiniMax: One Platform, Five Modalities

What It Actually Does

Who It Is For

Pricing Overview

Strengths and Limitations

Similar Tools

Manus

Google Gemini

ChatGPT

Claude

Grok

DeepSeek