Skip to main content
Tool Voice freemium active 8-8.9
8/10 Strong
Active

Free open weights (Apache 2.0 / Realtime) / API from $0.001 per minute

Try Voxtral free

Editorial · no paid placements

The call

Voxtral is Mistral AI's open-weight speech understanding family (STT, not TTS). Voxtral Mini Transcribe V2 handles batch transcription with speaker ID, custom vocab, and word-level timing across 13 languages; Voxtral Realtime ships under Apache 2.0 with sub-200ms latency for live voice agents and edge deployment. API pricing starts around $0.001/min, well under half of Whisper and ElevenLabs Scribe. Skip it for TTS, voice cloning, or narration.

  • Buy if Developers running transcription at scale and on a budget
  • Pick Free open weights (Apache 2.0 / Realtime) / API from $0.001 per minute
  • Skip if Text-to-speech, narration, dubbing, or voice cloning workloads

Editorial score

Unweighted average of 4 axes · confidence high

  • Utility 8/10

    How much real work it can do for a competent operator, end to end.

  • Value 10/10

    What you get for the dollar relative to the closest alternative.

  • Moat 6/10

    How hard it would be for a competitor to replicate the underlying advantage.

  • Longevity 8/10

    How likely the product is to still be best-in-class 24 months out.

Key facts

  1. Best For Teams running transcription, voice-agent, or audio-understanding pipelines at scale that need cheap per-minute STT, edge deployment via Apache 2.0 weights, or native semantic understanding alongside raw transcripts. Not a TTS tool.
    high Volatile 2026-05-13 Mistral audio docs
  2. Pricing Anchor Voxtral API pricing starts at about $0.001 per minute via La Plateforme, which Mistral positions as less than half the price of Whisper and ElevenLabs Scribe. Confirm current rates on the Mistral pricing page before production rollout.
    medium Volatile 2026-05-13 Mistral model overview
  3. Watch Out For Voxtral is a speech-to-text and audio-understanding family, not a text-to-speech engine. Do not pick it for voice generation, voice cloning, narration, or dubbing workflows. Pair it with a dedicated TTS provider (ElevenLabs, Fish Audio, Cartesia) if you need voice output.
    high Volatile 2026-05-13 Mistral Voxtral launch
  4. Api Available Yes. Mistral exposes hosted speech-to-text APIs through La Plateforme covering both batch transcription (Voxtral Mini Transcribe V2) and live transcription (Voxtral Realtime). The full Realtime weights also ship under Apache 2.0 for self-hosting.
    high Volatile 2026-05-13 Mistral audio docs
  5. Model Family Voxtral is Mistral's open-weight speech understanding (STT) family. As of May 2026 the API exposes Voxtral Mini Transcribe V2 for batch transcription and Voxtral Realtime for sub-200ms live transcription. Voxtral Realtime is published under Apache 2.0 and is deployable on edge devices.
    high Volatile 2026-05-13 Mistral Voxtral launch

Mistral AI’s open-weight speech understanding family. Voxtral launched July 15, 2025 with two open-weight models (Voxtral 24B for scale and Voxtral Mini 3B for edge), positioned as frontier STT plus native semantic understanding rather than raw transcription.

As of May 2026 the production API pricing starts around $0.001 per minute on La Plateforme. Voxtral is a speech-to-text family. It does not generate speech.

System Verdict

Pick Voxtral if you need cheap, fast multilingual STT and Mistral-stack consolidation. Voxtral Realtime delivers sub-200ms latency for live voice agents and is one of the few production-grade STT systems shipped under Apache 2.0. Voxtral Mini Transcribe V2 handles up to 3-hour recordings with speaker identification, custom vocabulary, and word-level timestamps across 13 languages.

Skip it if you need text-to-speech, voice cloning, or narration. Voxtral is an understanding and transcription family. For voice output pick ElevenLabs, Fish Audio, or Cartesia. Also skip it if your language coverage falls outside the documented set or you want a polished consumer creator UI.

Who pays which tier: Free tier on La Plateforme for testing. API at about $0.001 per minute for developers at scale. Voxtral Realtime weights ship under Apache 2.0 for production self-hosting and edge deployment. Enterprise self-hosting for the Mini Transcribe stack runs through Mistral commercial agreements.

Key Facts

FamilyVoxtral (Mistral AI, launched July 15, 2025)
CapabilitiesSpeech-to-text and speech understanding (no TTS)
Production API models (May 2026)Voxtral Mini Transcribe V2 (batch) and Voxtral Realtime (live)
Open-weight launch modelsVoxtral 24B and Voxtral Mini 3B
LicenseApache 2.0 on Voxtral Realtime; commercial-friendly
Languages13 with automatic language detection (English, Spanish, French, Portuguese, Hindi, German, Dutch, Italian, plus additional V2 languages)
LatencySub-200ms on Voxtral Realtime
Batch capacityUp to 3-hour recordings on Mini Transcribe V2
Context window32k tokens; up to 30 minutes transcription / 40 minutes understanding per call
FeaturesSpeaker identification, custom vocabulary, word-level timestamps, semantic understanding
RoadmapSpeaker segmentation, emotion detection, non-speech audio recognition (per Mistral docs)
API pricingFrom about $0.001 per minute on La Plateforme; Mistral positions it as less than half the price of Whisper and ElevenLabs Scribe

Every data point above was verified against vendor sources on 2026-05-13. See Sources.

Recent changes

  • 2026-05-13: Mistral docs now list Voxtral Mini Transcribe V2 (batch, up to 3-hour recordings, speaker ID, custom vocabulary, word-level timestamps, 13 languages) and Voxtral Realtime (sub-200ms latency, Apache 2.0 weights, edge-deployable) as the production audio surface.
  • 2026-05-13: Voxtral Realtime confirmed Apache 2.0, which removes the prior CC BY-NC commercial-use concern for the live transcription path.
  • 2026-05-13: Original July 15, 2025 launch corrected on this page; earlier copy describing a “March 2026 TTS launch beating ElevenLabs Flash v2.5” was inaccurate. Voxtral has always been STT-first, not TTS.

What it actually is

A Mistral-native speech understanding family that ships through the same La Plateforme stack as Mistral’s text models. One account, one API key, one invoice for text plus audio.

Voxtral handles both raw transcription and semantic understanding of the underlying audio. Voxtral Mini Transcribe V2 is the batch path, with up to 3-hour recordings, speaker identification, custom vocabulary biasing, and word-level timestamps. Voxtral Realtime is the live path, with sub-200ms latency and an Apache 2.0 license that lets teams self-host the model on edge devices.

The differentiator is the pricing-plus-license combination. At roughly $0.001 per minute the API is materially cheaper than Whisper API and ElevenLabs Scribe at the same workload, and the Apache 2.0 weights on Voxtral Realtime remove most of the licensing friction that previously sent commercial self-hosters elsewhere.

When to pick Voxtral

  • You already use Mistral for text generation. One vendor, one billing line, one SDK covers text plus speech-to-text.
  • Cost at scale drives unit economics. ~$0.001 per minute undercuts most commercial transcription alternatives.
  • You need a real-time STT path for voice agents. Sub-200ms latency on Voxtral Realtime is competitive with the tightest speech-understanding APIs on the market.
  • Edge deployment matters. Voxtral Realtime weights ship under Apache 2.0 and can run on edge hardware.
  • Audio understanding alongside transcripts. Voxtral pairs raw transcripts with native semantic understanding, which simplifies downstream RAG and agent workflows.

When to pick something else

  • Text-to-speech, narration, or voice cloning: Voxtral does not generate speech. Pick ElevenLabs, Fish Audio, or Cartesia.
  • Top-tier transcription accuracy for clean enterprise audio: Deepgram Nova-3 and AssemblyAI Universal-2 still post strong benchmark numbers on US-English broadcast and call-center audio.
  • Polished consumer creator UI: Whisper-based tools like Otter and Descript are built around end-user workflows.
  • Languages outside the documented set: Whisper and Deepgram cover wider language menus on the commercial API.
  • Enterprise voice cloning with compliance: Resemble AI bundles watermarking, deepfake detection, and on-premise deployment for voice generation, which Voxtral does not do.

Pricing

AccessCostNotes
Open weights (Voxtral Realtime)Free under Apache 2.0Edge-deployable; commercial self-hosting permitted
Open weights (Voxtral 24B / Mini 3B, July 2025 launch models)Free downloadHugging Face; check current license terms before commercial use
La Plateforme free tierFreeTesting and evaluation
Mistral API (transcription / understanding)From about $0.001/minMistral positions it as less than half the price of Whisper and ElevenLabs Scribe

Prices verified 2026-05-13 via Mistral Voxtral announcement and Mistral audio docs. Whisper API list price ($0.006/min) and ElevenLabs Scribe pricing cited for comparison.

Against the alternatives

Voxtral is an STT family, so the natural peers are speech-to-text APIs, not TTS engines.

Voxtral (Realtime + Mini Transcribe V2)OpenAI Whisper APIDeepgram Nova-3AssemblyAI Universal-2
API price per minuteFrom ~$0.001$0.006$0.0043 streaming$0.0085 streaming
Real-time latencySub-200ms (Realtime)Batch-only APISub-300ms streamingSub-300ms streaming
Languages13 with auto-detect50+3612
Open weights for self-hostingYes, Apache 2.0 on RealtimeNoNoNo
Semantic understanding alongside transcriptsNativeNoNoAdd-on LeMUR
Speaker ID, custom vocab, word timingYes (Mini Transcribe V2)Word-level timing onlyYesYes
Best viewed asMistral-stack STT plus understandingGeneral-purpose batch transcriptionStreaming-first STTWorkflow-first STT

Failure modes

  • No text-to-speech. Voxtral does not synthesize voices. Workflows that need narration, dubbing, or voice cloning must pair Voxtral with a TTS engine.
  • Language coverage trails Whisper. Mistral docs cite 13 languages with auto-detect on Voxtral Mini Transcribe V2. Whisper covers 50+. Languages outside the documented set should be tested before commitment.
  • Real-time depends on Voxtral Realtime. Voxtral Mini Transcribe V2 is batch-oriented and not suitable for live voice agents.
  • Per-call duration caps. Up to 30 minutes for transcription and 40 minutes for understanding per call; longer recordings need chunking.
  • Younger production lineup than peers. Voxtral Mini Transcribe V2 and Voxtral Realtime are the second-generation lineup after the July 2025 open-weight launch. Community tooling around Whisper and Deepgram is still broader.
  • Vendor-published benchmarks. Mistral positions Voxtral as less than half the price of Whisper and Scribe and ahead on aggregate quality in its own evaluations. Independent blind tests should be checked before standardizing on it for accuracy-sensitive domains (medical, legal, finance).
  • No native consumer UI. API-only product. Creators wanting a polished transcription studio should consider Otter, Descript, or third-party UIs built on Voxtral.

Methodology

This page was produced by the aipedia.wiki editorial pipeline, an automated system that ingests vendor documentation, verifies pricing and model details against primary sources, and generates the editorial analysis you are reading. No individual human wrote this review. Scoring follows the four-dimension rubric at /about/scoring/ (Utility, Value, Moat, Longevity; unweighted average). Last verified 2026-05-13 against the Mistral Voxtral announcement, Mistral audio docs, and Mistral model overview.

FAQ

Is Voxtral free? Partially. Voxtral Realtime weights are free under Apache 2.0 and can be self-hosted commercially. The La Plateforme free tier covers testing. Production API workloads pay from about $0.001 per minute.

Does Voxtral do text-to-speech? No. Voxtral is a speech-to-text and audio-understanding family. For TTS, pair it with ElevenLabs, Fish Audio, or Cartesia.

Can I self-host Voxtral commercially? Yes for Voxtral Realtime, which ships under Apache 2.0 and is designed for edge deployment. Check current license terms on the July 2025 launch models (Voxtral 24B and Mini 3B) on Hugging Face before commercial use.

What languages does Voxtral support? Voxtral Mini Transcribe V2 lists 13 languages with automatic language detection, including English, Spanish, French, Portuguese, Hindi, German, Dutch, and Italian.

How fast is Voxtral Realtime? Sub-200ms latency, fast enough for live voice agents.

How does Voxtral compare to Whisper API? Mistral positions Voxtral at less than half the price of Whisper API at the same workload, with native semantic understanding alongside transcripts. Whisper still wins on raw language breadth.

Sources

Voxtral comparisons

See all →

Reader reviews

Loading…
Share LinkedIn
Was this review helpful?
Embed this score on your site Free. Links back.
Voxtral editorial score badge
<a href="https://aipedia.wiki/tools/voxtral/" target="_blank" rel="noopener"><img src="https://aipedia.wiki/badges/voxtral.svg" alt="Voxtral on aipedia.wiki" width="260" height="72" /></a>
[![Voxtral on aipedia.wiki](https://aipedia.wiki/badges/voxtral.svg)](https://aipedia.wiki/tools/voxtral/)

Badge value auto-updates if the editorial score changes. Attribution via the link is required.

Cite this page For journalists, researchers, and bloggers
According to aipedia.wiki Editorial at aipedia.wiki (https://aipedia.wiki/tools/voxtral/)
aipedia.wiki Editorial. (2026). Voxtral — Editorial Review. aipedia.wiki. Retrieved May 29, 2026, from https://aipedia.wiki/tools/voxtral/
aipedia.wiki Editorial. "Voxtral — Editorial Review." aipedia.wiki, 2026, https://aipedia.wiki/tools/voxtral/. Accessed May 29, 2026.
aipedia.wiki Editorial. 2026. "Voxtral — Editorial Review." aipedia.wiki. https://aipedia.wiki/tools/voxtral/.
@misc{voxtral-editorial-review-2026, author = {{aipedia.wiki Editorial}}, title = {Voxtral — Editorial Review}, year = {2026}, publisher = {aipedia.wiki}, url = {https://aipedia.wiki/tools/voxtral/}, note = {Accessed: 2026-05-29} }
Spotted an error or want to share your experience with Voxtral?

Every tool page is re-verified on a recurring cycle, and corrections land faster when readers flag them directly. If you spot a stale fact, a missing capability, or have used Voxtral and want to share what worked or didn't, the editorial desk reviews every message sent through this form.

Email editorial@aipedia.wiki
Report outdated info Help us keep this page accurate