MiniMax Revenue Doubles as M3 Promises 15x Long-Context Speed Boost
MiniMax annualized revenue doubled in two months, crossing 1M enterprise users. The company just teased M3's sparse attention architecture claiming 15.6x faster decoding at 1M-token contexts.

Image by CWA
MiniMax Revenue Doubles as M3 Promises 15x Long-Context Speed Boost
MiniMax's annualized revenue more than doubled over the past two months, Bloomberg reported today, citing an interview with co-founder and President Yun Yeyi. The company has crossed one million enterprise service users — up fivefold in six months — and is preparing to roll out its next flagship model, M3, to developers and enterprise clients.
Revenue and Adoption
The growth is driven by the M2.7 model released in mid-March 2026, which Yun said pushed annual recurring revenue past MiniMax's own internal projections. Enterprise users numbered around 200,000 six months ago. They are now at one million. The user base is concentrated in enterprise AI services — companies integrating MiniMax APIs into production workflows — not casual consumer usage.
MiniMax's current developer model is M2.5: an open-weights, MIT-licensed mixture-of-experts architecture with 229.9 billion total parameters and 9.8 billion active per token. It scores 80.2% on SWE-Bench Verified — competitive with frontier models on coding tasks — at $0.15 per million input tokens. That pricing is roughly one-twentieth of Claude Opus-class models, which has been the company's primary competitive angle in the developer market.
M3: Sparse Attention and 15.6x Faster Decoding
Alongside the revenue figures, MiniMax published a detailed technical report on the M2 model series and, in doing so, previewed the architectural shift coming in M3. The key claim: a custom sub-quadratic sparse attention mechanism that yields 9.7x faster prefill and 15.6x faster decoding at one million token contexts compared to M2.
The significance for developers: ultra-long-context inference is currently expensive enough that many teams avoid it for production agent workloads, relying on chunking workarounds instead. If M3's sparse attention holds at production scale, it would change the cost model for large codebase indexing, multi-session agent runs, and RAG pipelines that today require splitting documents.
MiniMax explicitly skipped sparse attention for M2 because they were not confident it was production-ready. Their candid reversal now — backed by a published technical paper rather than marketing claims — adds credibility. The company has demonstrated this pattern before: M2.5 shipped with benchmark results first and the open weights shortly after.
M3 is targeted for H2 2026. No confirmed release date. No published accuracy benchmarks alongside the speedup numbers yet — the 15.6x figure is a decode speed claim, not a quality metric.
What to Watch
For developers evaluating whether to build long-context pipelines on MiniMax's stack, M3 is the signal to wait on before committing. The M2.7 model is the current best option for cost-sensitive coding and agent tasks — it improves on M2.5's reasoning and tool-use performance while maintaining the MIT license and open-weights availability.
The revenue doubling and enterprise user growth confirm MiniMax has moved beyond early-adopter traction. It is now a production vendor for a meaningful slice of the enterprise AI market, which matters for long-term API reliability. The M3 preview, combined with today's Bloomberg revenue figures, is a coordinated signal that MiniMax is positioning for a significant push in H2 2026.





