Skip to main content
Tool Voice paid active Below 8
6.8/10 Useful
Active

$5-$999/mo subscriptions / $60-$100 per 1M chars PAYG

Best plan

$5-$999/mo subscriptions / $60-$100 per 1M chars PAYG

Watch out: Model names and billing surfaces changed: Speech 2.8 is latest, Speech 2.6/Speech-02 remain supported, and subscription, token-plan, and pay-as-you-go routes can expose different limits

Try MiniMax Speech

Editorial · no paid placements

The call

MiniMax Speech is the budget ElevenLabs alternative for multilingual TTS and voice cloning. Pick Speech 2.8 Turbo or HD when hosted API economics, voice slots, RPM, and multilingual output matter, especially for apps, IVR, dubbing, agents, or bulk narration. Choose ElevenLabs when the highest creator polish, marketplace breadth, and integrations matter more.

  • Buy if Cost-sensitive production tts workloads
  • Pick $5-$999/mo subscriptions / $60-$100 per 1M chars PAYG
  • Skip if Users needing the highest quality ceiling for audiobooks or luxury production

Evidence rail

Why this recommendation is trusted

Source
Registered source
Freshness
Current
Confidence
High confidence
Verified
Review
Volatility
Volatile

High-volatility evidence needs frequent review.

Build comparison
Watch out
Model names and billing surfaces changed: Speech 2.8 is latest, Speech 2.6/Speech-02 remain supported, and subscription, token-plan, and pay-as-you-go routes can expose different limits.

Editorial score

Unweighted average of 4 axes · confidence high

  • Utility 8/10

    How much real work it can do for a competent operator, end to end.

  • Value 9/10

    What you get for the dollar relative to the closest alternative.

  • Moat 4/10

    How hard it would be for a competitor to replicate the underlying advantage.

  • Longevity 6/10

    How likely the product is to still be best-in-class 24 months out.

Key facts

  1. Best For Multilingual TTS, long-form speech generation, streaming, and voice cloning API with Speech 2.8 HD/Turbo as the current model family, 300+ system voices, custom cloned voices, and multiple pricing modes.
    high Drifts 2026-06-12 MiniMax T2A docs
  2. Pricing Anchor Audio Subscription starts at $5/month for 100,000 credits; pay-as-you-go lists T2A Turbo at $60/M characters and T2A HD at $100/M characters.
    high Volatile 2026-06-12 MiniMax Speech pricing docs
  3. Watch Out For Model names and billing surfaces changed: Speech 2.8 is latest, Speech 2.6/Speech-02 remain supported, and subscription, token-plan, and pay-as-you-go routes can expose different limits.
    high Volatile 2026-06-12 MiniMax T2A docs

The text-to-speech and voice-cloning product line from MiniMax, the Shanghai AI lab. The current API docs now put speech-2.8-hd and speech-2.8-turbo at the front of the model list, while Speech 2.6 and Speech-02 remain supported legacy/current-compatibility routes.

The current docs list 300+ system voices plus custom cloned voices, streaming output, MP3/WAV/FLAC/PCM-style audio support across endpoints, synchronous requests up to 10,000 characters, and async long-form generation up to 1 million characters per task.

System Verdict

Pick MiniMax Speech if the brief is multilingual TTS at production volume where API economics drive the budget. As of June 12, 2026 the pay-as-you-go page lists T2A Turbo at $60 per million characters and T2A HD at $100 per million characters, while Audio Subscription plans start at $5/month for 100,000 credits and scale to $999/month for 20,000,000 credits.

Skip it for peak-quality audiobook and luxury production work. ElevenLabs still holds the quality ceiling, the larger curated voice marketplace, and the deeper third-party integration stack. Cartesia owns low-latency guarantees.

The naming drift matters. Current docs list Speech 2.8 as latest, while Speech 2.6 and Speech-02 remain visible in API references, pay-as-you-go pricing, token plans, and third-party mirrors. Integration requires checking the exact endpoint and plan, not just the model family name.

Key Facts

VendorMiniMax (Shanghai, HKEX-listed)
Current API modelsspeech-2.8-hd · speech-2.8-turbo
Supported older speech modelsspeech-2.6-hd · speech-2.6-turbo · speech-02-hd · speech-02-turbo · speech-01-hd · speech-01-turbo
Pay-as-you-go T2A priceTurbo $60/M characters · HD $100/M characters
Audio Subscription entryStarter $5/mo · 100,000 credits/mo
System voices300+ plus custom cloned voices
Voice cloningRapid cloning from uploaded mono/stereo reference audio; clone is temporary unless used in T2A within 168 hours
Long-form asyncUp to 1 million characters per async task
StreamingSupported through HTTP/WebSocket T2A endpoints
Output formatsMP3, WAV, FLAC, PCM depending on endpoint and streaming mode
Official MCPPython and JavaScript MCP server implementations with voice cloning support

What it actually is

A hosted TTS API with synchronous T2A, WebSocket T2A, async long-form T2A, voice cloning, voice design, and voice management. Turbo is the cost/speed lane. HD is the fidelity lane. Speech 2.8 is the latest named model family in the current API docs.

Voice cloning now matters as a workflow and governance question, not just a feature bullet. The current API intro says rapid clones are temporary unless used in speech synthesis within 168 hours, and the fee is charged the first time the cloned voice is used in T2A synthesis.

Speed, pitch, volume, bitrate, sample rate, language boost, subtitle output, voice effects, and streaming settings are exposed through the API. Sync endpoints handle up to 10,000 characters per request; async long-form generation handles up to 1 million characters.

When to pick MiniMax Speech

  • Scaling multilingual IVR, chatbots, or conversational AI. Turbo at $60 per million characters supports high-volume voice agents economically when the team can integrate directly.
  • Multilingual content pipelines. One vendor for 40 languages avoids per-market vendor sprawl.
  • Voice cloning from reference clips. The current voice-cloning endpoint can rapidly reproduce a target timbre from uploaded mono or stereo audio.
  • Cost-sensitive prototyping. Subscription, token-plan, and pay-as-you-go routes let teams choose predictable monthly credits or usage billing.
  • Agent/MCP voice workflows. MiniMax provides official MCP server implementations for Python and JavaScript with speech/voice-cloning support.

When to pick something else

  • Peak-quality audiobook and luxury narration: ElevenLabs. MiniMax may be cheaper, but ElevenLabs still has the creator polish, marketplace, and workflow maturity advantage.
  • Curated community voice library: ElevenLabs and Cartesia have thousands of community-contributed voices. MiniMax’s 300+ is a narrower catalog.
  • Lowest-latency streaming for voice agents: Cartesia is tuned for this. MiniMax streams well, but Cartesia leads.
  • Offline or self-hosted requirement: Kokoro at Apache 2.0 runs locally. MiniMax Speech is hosted only.
  • Western vendor compliance posture: ElevenLabs, Cartesia, or Azure Speech. MiniMax is China-based by default.

Pricing

Model / PlanPriceNotes
Pay-as-you-go T2A Turbo$60/M charactersApplies to speech-2.8-turbo, speech-2.6-turbo, and speech-02-turbo
Pay-as-you-go T2A HD$100/M charactersApplies to speech-2.8-hd, speech-2.6-hd, and speech-02-hd
Rapid voice cloning$1.50 per voiceFee is charged on first T2A use of the cloned voice, not preview
Voice design$3 per voicePrompt-generated voice design
Starter sub$5/mo100,000 credits
Standard sub$30/mo300,000 credits
Pro sub$99/mo1,100,000 credits
Scale sub$249/mo3,300,000 credits
Business sub$999/mo20,000,000 credits

Prices verified 2026-06-12 via the MiniMax Audio Subscription docs and MiniMax pay-as-you-go pricing. Do not mix up Audio Subscription, Token Plan, and pay-as-you-go: they are different purchase routes with different limits.

Against the alternatives

MiniMax Speech 2.8 HDElevenLabs v3Cartesia SonicKokoro
List usage price$100/M chars for HD; $60/M chars for TurboHigher, plan/credit dependentUsage-basedFree (self-host)
Languages4032+15+9
Voice cloning3-10s zero-shotBest-in-classYesNo
Cross-lingual cloningYesYesLimitedN/A
Real-time streamingYesYesStrongestNo
Quality ceilingHighHighestHighMid (narration-grade)
Voice library breadth300+3,000+Large26 (v1.0)
Best viewed asCheapest hosted multilingualPremium hostedStreaming specialistOffline-first

Failure modes

  • Quality ceiling and workflow maturity below ElevenLabs on critical creator work. MiniMax is strong on API economics, but ElevenLabs remains the safer default for polished creator narration, voice marketplace breadth, and non-developer production workflows.
  • Voice library is narrower. 300+ voices against ElevenLabs’ thousands. Specific demographic or style gaps can force workarounds.
  • Voice-clone lifecycle can surprise teams. Rapid clones are temporary unless used in T2A within 168 hours, and fees are charged when the clone is first synthesized through T2A.
  • Ecosystem is thinner. Fewer SDKs, integrations, and community tutorials compared to ElevenLabs or Cartesia as of June 12, 2026.
  • Peak-load latency spikes. Some reviews note occasional processing delays under heavy load. Base latency is competitive.
  • China-based vendor. Enterprise compliance teams with US or EU data-residency requirements should use the private deployment option or choose a Western vendor.
  • Model naming and plan surfaces are easy to confuse. Speech 2.8, Speech 2.6, and Speech-02 appear across different docs; Audio Subscription, Token Plan, and pay-as-you-go are separate purchase routes.
  • Accent drift on non-native cloned voices. Cloning an English speaker into Mandarin output preserves timbre but can drift on native accent nuances.

Methodology

This page was rechecked by the aipedia.wiki editorial workflow on June 12, 2026 against the MiniMax T2A API overview, MiniMax T2A HTTP docs, MiniMax Voice Cloning docs, MiniMax Audio Subscription pricing, MiniMax pay-as-you-go pricing, and MiniMax’s March 2026 financial-results release. Scoring follows the four-dimension rubric at /about/scoring/ (Utility × Value × Moat × Longevity, unweighted average).

FAQ

How does MiniMax Speech pricing compare to ElevenLabs? MiniMax is positioned as the cheaper developer API lane. As of June 12, 2026, MiniMax pay-as-you-go lists T2A Turbo at $60 per million characters and HD at $100 per million characters, plus monthly Audio Subscription plans from $5 to $999. ElevenLabs retains a broader voice library, richer integrations, and a higher quality ceiling for premium production.

What is the difference between Speech 2.8 HD and Speech 2.8 Turbo? HD is the fidelity lane for voiceovers, audiobook-style narration, and polished output. Turbo is the speed/value lane for live apps, chatbots, gaming, IVR, and high-volume generation.

Does MiniMax Speech have a free tier? MiniMax has several purchase paths rather than one simple free tier. Token Plan pages include Speech 2.8 daily character allowances on Plus/Max plans, while Audio Subscription and pay-as-you-go are separate. Check the exact purchase route before assuming credits carry across products.

What languages does MiniMax Speech cover? The current API docs expose language boost options across Chinese, Cantonese, English, Arabic, Russian, Spanish, French, Portuguese, German, Turkish, Dutch, Ukrainian, Vietnamese, Indonesian, Japanese, Italian, Korean, Thai, Polish, Romanian, Greek, Czech, Finnish, Hindi, Bulgarian, Danish, Hebrew, Malay, Persian, Slovak, Swedish, Croatian, Filipino, Hungarian, Norwegian, Slovenian, Catalan, Nynorsk, Tamil, Afrikaans, and auto-detection.

Can I clone a voice across languages? Yes, but treat it as a consent-sensitive production feature. The current voice-cloning API can rapidly reproduce a target timbre from uploaded reference audio, but clones are temporary unless used in T2A within 168 hours and should only be used with proper rights and consent.

Sources

Reader reviews

Loading…
Share LinkedIn
Was this review helpful?
Embed this score on your site Free. Links back.
MiniMax Speech editorial score badge
<a href="https://aipedia.wiki/tools/minimax-speech/" target="_blank" rel="noopener"><img src="https://aipedia.wiki/badges/minimax-speech.svg" alt="MiniMax Speech on aipedia.wiki" width="260" height="72" /></a>
[![MiniMax Speech on aipedia.wiki](https://aipedia.wiki/badges/minimax-speech.svg)](https://aipedia.wiki/tools/minimax-speech/)

Badge value auto-updates if the editorial score changes. Attribution via the link is required.

Cite this page For journalists, researchers, and bloggers
According to aipedia.wiki Editorial at aipedia.wiki (https://aipedia.wiki/tools/minimax-speech/)
aipedia.wiki Editorial. (2026). MiniMax Speech: Editorial Review. aipedia.wiki. Retrieved June 22, 2026, from https://aipedia.wiki/tools/minimax-speech/
aipedia.wiki Editorial. "MiniMax Speech: Editorial Review." aipedia.wiki, 2026, https://aipedia.wiki/tools/minimax-speech/. Accessed June 22, 2026.
aipedia.wiki Editorial. 2026. "MiniMax Speech: Editorial Review." aipedia.wiki. https://aipedia.wiki/tools/minimax-speech/.
@misc{minimax-speech-editorial-review-2026, author = {{aipedia.wiki Editorial}}, title = {MiniMax Speech: Editorial Review}, year = {2026}, publisher = {aipedia.wiki}, url = {https://aipedia.wiki/tools/minimax-speech/}, note = {Accessed: 2026-06-22} }
Spotted an error or want to share your experience with MiniMax Speech?

Every tool page is re-verified on a recurring cycle, and corrections land faster when readers flag them directly. If you spot a stale fact, a missing capability, or have used MiniMax Speech and want to share what worked or didn't, the editorial desk reviews every message sent through this form.

Email editorial@aipedia.wiki
Report outdated info Help us keep this page accurate