xAI officially opened the Grok Text-to-Speech API to developers on March 16, 2026. It’s the first time Grok has shipped audio capability outside the chat UI, and it lands as a direct competitor to ElevenLabs and OpenAI’s voice stack.
Voice roster + languages:
- Five distinct voice personalities: Ara, Eve, Leo, Rex, Sal
- Each engineered to sound “natural and expressive rather than flat or robotic”
- 20+ languages with automatic language detection
Developer controls that matter:
- Inline speech tags for pauses, laughter, whispers, emphasis
- Audio output in MP3, WAV, PCM (Linear16), G.711 μ-law, G.711 A-law
- No format conversion needed; the API slots into existing audio pipelines
Pricing:
- $4.20 per 1 million characters (Beta pricing)
- 100 concurrent requests per team
Why it matters:
Voice is the third-layer frontier that frontier labs are racing to own. ElevenLabs market since 2023 with voice cloning + multilingual at premium prices. OpenAI has shipped Advanced Voice Mode inside ChatGPT but kept TTS API access limited. Grok’s entry with 20+ languages at $4.20 per 1M characters undercuts ElevenLabs’ Creator pricing significantly and opens the door for applications that need personality-driven voices without the ElevenLabs premium.
Pair with: Grok Voice Mode went live on X (Android + web) three days later on March 19, 2026. It’s the consumer-facing complement to the API. See the separate news item.
Sources
Primary and corroborating references used for this news item.