xAI officially opened the Grok Text-to-Speech API to developers on March 16, 2026. It’s the first time Grok has shipped audio capability outside the chat UI, and it lands as a direct competitor to ElevenLabs and OpenAI’s voice stack.
Voice roster + languages:
- Five distinct voice personalities: Ara, Eve, Leo, Rex, Sal
- Each engineered to sound “natural and expressive rather than flat or robotic”
- 20+ languages with automatic language detection
Developer controls that matter:
- Inline speech tags for pauses, laughter, whispers, emphasis
- Audio output in MP3, WAV, PCM (Linear16), G.711 μ-law, G.711 A-law
- No format conversion needed; the API slots into existing audio pipelines
Pricing:
- $4.20 per 1 million characters (Beta pricing)
- 100 concurrent requests per team
Why it matters:
Voice is the third-layer frontier that frontier labs are racing to own. ElevenLabs market since 2023 with voice cloning + multilingual at premium prices. OpenAI has shipped Advanced Voice Mode inside ChatGPT but kept TTS API access limited. Grok’s entry with 20+ languages at $4.20 per 1M characters undercuts ElevenLabs’ Creator pricing significantly and opens the door for applications that need personality-driven voices without the ElevenLabs premium.
Pair with: Grok Voice Mode went live on X (Android + web) three days later on March 19, 2026. It’s the consumer-facing complement to the API. See the separate news item.
Sources
Sources
Primary and corroborating references used for this news item.
Spotted an error or want to share your experience with xAI Launches Grok Text-to-Speech API with 5 Voices and 20+ Languages?
Every tool page is re-verified on a recurring cycle, and corrections land faster when readers flag them directly. If you spot a stale fact, a missing capability, or have used xAI Launches Grok Text-to-Speech API with 5 Voices and 20+ Languages and want to share what worked or didn't, the editorial desk reviews every message sent through this form.
Email editorial@aipedia.wiki