Kokoro TTS

Q: How do I run Kokoro?

`pip install kokoro soundfile`. The current GitHub README uses `kokoro>=0.9.4` in examples. Basic inference: ```python from kokoro import KPipeline pipeline = KPipeline(lang_code='a') audio, _ = pipeline("Your text here.", voice='af_heart') ``` ONNX builds exist for deployment outside Python (onnx-community).

Q: Which Kokoro website is official?

Use the Hugging Face model card and the linked `hexgrad/kokoro` GitHub repository as the source of truth. The model card warns that Kokoro-looking third-party root domains are not owned by or affiliated with the author unless the official model page links them.

Kokoro is a free, Apache 2.0 text-to-speech model at 82M...

7.5/10 Useful

Active

Free (open-source)

Best plan

Free (open-source)

Risk: Before commercial use, review license, voice rights

Try Kokoro TTS

Editorial · no paid placements

Should you use it?

Kokoro is a free, Apache 2.0 text-to-speech model at 82M parameters. It runs locally on CPU or GPU with a ~300MB download, no API key, no per-character fees. The v1.0 model card lists 54 voices across 8 language groups, while the current GitHub examples include Brazilian Portuguese as an additional library path to test. Best for offline or high-volume work; skip for voice cloning or managed SLA needs.

Buy if Offline and local text-to-speech workflows
Pick Free (open-source)
Skip if Voice cloning, emotional direction, or real-time voice agents

Plan guidance

What to buy

Best plan Free (open-source)

Watch: Before commercial use, review license, voice rights

Price range Free (open-source)

Free (Apache 2.0)

Upgrade only if Not for voice cloning, emotional direction, or real-time voice agents

Before commercial use, review license, voice rights

Current pricing source: Kokoro GitHub repository

Fit

Use it for this, skip it for that

Best for

Offline and local text-to-speech workflows
High-volume narration where API costs would dominate
Privacy-constrained English speech generation
Developers embedding a small Apache 2.0 model in apps or devices

Avoid if

Voice cloning, emotional direction, or real-time voice agents
Managed enterprise APIs with uptime guarantees and support
Large multilingual voice libraries
Nontechnical teams that do not want local model setup

Watch out: Before commercial use, review license, voice rights, quality across languages, hallucinated pronunciations, model provenance, abuse/safety controls, and avoid third-party domains that imply affiliation with the Kokoro model unless the official model card links them.

Recent changes

Only what affects the decision

Jun 23, 2026
Self-hosted
Reverified open model distribution, current `kokoro>=0.9.4` install examples, supported language-code examples, and official source-of-truth warning
Kokoro GitHub repository
Jun 8, 2026
Self-hosted
Reverified license, official model-card warning about scam domains, and current GitHub install examples
Kokoro model card
May 13, 2026
Self-hosted
Model remains Apache 2.0; no first-party SaaS pricing
Kokoro model card

Alternatives

Best swaps

ElevenLabs

The top-ranked AI voice platform in June 2026. Eleven v3 covers 70+ languages with expressive audio tags, Flash v2.5 hits ~75ms

$0-$990/month · 9.3/10 Whisper

OpenAI's open-weights speech-to-text baseline. MIT-licensed code and weights remain useful for self-hosted batch transcription,

Free self-host / OpenAI transcription API $0.003-$0.006 per minute; GPT-Realtime-Whisper $0.017 per minute · 9/10 Cartesia

Real-time voice stack for agents. Sonic-3.5 TTS and Ink-2 STT now form the default Line model pair for eligible voice agents, wi

$0-$239/month + credits · 8.5/10

Build comparison

Proof and score math Verified Jun 25

Proof

Why this recommendation is trusted

Evidence Kokoro model card

Source: Registered source
Freshness: Current
Confidence: High confidence
Verified: Jun 25, 2026
Review: Aug 13, 2026
Volatility: Volatile

High-volatility evidence needs frequent review.

Editorial score

Unweighted average of 4 axes · confidence high

Utility 8/10

How much real work it can do for a competent operator, end to end.
Value 10/10

What you get for the dollar relative to the closest alternative.
Moat 5/10

How hard it would be for a competitor to replicate the underlying advantage.
Longevity 7/10

How likely the product is to still be best-in-class 24 months out.

Verified facts

Best For Best for developers experimenting with lightweight open TTS models and local/offline voice synthesis workflows.
high Volatile 2026-06-25 Kokoro model card
Pricing Anchor Kokoro is distributed as an open model; costs come from inference hardware/hosting and any downstream service wrapper rather than a vendor SaaS plan.
high Drifts 2026-06-25 Kokoro GitHub repository
Watch Out For Before commercial use, review license, voice rights, quality across languages, hallucinated pronunciations, model provenance, abuse/safety controls, and avoid third-party domains that imply affiliation with the Kokoro model unless the official model card links them.
high Volatile 2026-06-25 Kokoro model card
Open Source Or Local The Hugging Face model card and GitHub repository are the authoritative sources for model files, Apache 2.0 license notes, examples, official scam-domain warnings, and project activity.
high Volatile 2026-06-25 Kokoro README
Workflow Surface The Hugging Face Space is useful for quick evaluation, but production usage should verify local inference, license, voices, and latency separately.
high Volatile 2026-06-25 Kokoro TTS Hugging Face Space

10 months agolast commit

Full review notes Long-form details, FAQ, and source history

An open-weight text-to-speech model released by hexgrad in late 2024. At 82M parameters Arena leaderboard in January 2026 above much larger models like XTTS v2 (467M) and MetaVoice (1.2B).

Apache 2.0 licensed. No API key. No usage caps. No network calls after the initial model download.

June 25, 2026 trust note: the official Hugging Face model card explicitly warns that Kokoro-looking third-party domains can be scams or unaffiliated. Treat the Hugging Face model page and the linked hexgrad/kokoro GitHub repository as the source of truth before downloading binaries, entering payment details, or trusting a hosted wrapper.

System Verdict

Pick Kokoro if the use case is offline, high-volume, or privacy-constrained English TTS with a fixed voice. The download is ~300MB, runs on a laptop, and costs nothing past electricity. Community ONNX builds ship in 88MB-310MB size variants for mobile and browser deployment.

Skip it if the job needs voice cloning, fine-grained emotion control, or real-time streaming. ElevenLabs keeps the quality ceiling and the voice-library breadth. Cartesia owns low-latency conversational use cases. MiniMax Speech undercuts ElevenLabs on price for multilingual workloads that still want a hosted API.

Kokoro’s moat is size-efficiency, not features. The 82M parameter count means laptop-local inference at commercial-grade quality for a narrow slice of jobs.

Key Facts


Model size	82M parameters (~300MB download)
License	Apache 2.0 (commercial use permitted)
Architecture	Modified StyleTTS 2
Voices (v1.0)	54 voices across 8 languages
Languages (model card v1.0)	English (US + UK), Spanish, French, Hindi, Italian, Japanese, Mandarin Chinese
Current library examples	Also show a Brazilian Portuguese `lang_code='p'` path to test before production use
Inference	CPU and CUDA GPU; Apple Silicon via ONNX
Deployment formats	PyTorch, ONNX (fp32 310MB, fp16 169MB, int8 88MB)
Hosted API cost	Historical hosted-market anchor below $1 per 1M input characters via third-party providers; verify each wrapper before buying
Released	November 2024; v1.0 early 2026

What it actually is

A small neural TTS model that turns text into audio locally. The architecture is a modified StyleTTS 2 trained on permissive, non-copyrighted audio with IPA phoneme labels.

The Python package (pip install kokoro) wraps inference with a minimal API. ONNX builds target mobile, browser, and non-Python runtimes. A Gradio demo ships for no-code local testing.

The moat is size. At 82M parameters Kokoro takes under 300MB on disk and runs in real-time on CPU. Competing open models at comparable quality (XTTS v2, Tortoise) are 4-5x larger and need a GPU for acceptable latency.

When to pick Kokoro

Self-hosted AI stacks that must stay offline. Pair with a local LLM for end-to-end air-gapped audio pipelines.
High-volume narration where per-character fees hurt. Audiobooks, podcasts, subtitles, game VO at scale.
Privacy-sensitive text (medical, legal, financial). No outbound API call means no data egress.
Edge and mobile deployments. The int8 ONNX build is 88MB. Fits on a phone.
Research and reproducibility. Fixed weights and deterministic inference avoid the drift introduced by hosted-model upgrades.

When to pick something else

Voice cloning from a reference clip: ElevenLabs, Fish Audio, or MiniMax Speech. Kokoro ships fixed voices only.
Fine-grained emotional control: ElevenLabs v3 or MiniMax Speech-02. Kokoro’s prosody controls stay basic.
Real-time streaming for conversational agents: Cartesia is built for this. Kokoro generates full audio before playback.
Broad hosted multilingual coverage: ElevenLabs and MiniMax cover far more languages with managed voices, support, and native-prosody tuning.
Studio production UI with takes and timeline editing: Murf or ElevenLabs Studio. Kokoro is code-first.

Pricing

Path	Cost
Self-hosted model	Free (Apache 2.0)
Own hardware	Electricity only
Hosted API (third-party)	Historical hosted-market anchor below $1 per 1M input characters; verify each wrapper before buying
Commercial use	Permitted under Apache 2.0 without royalty

Reverified 2026-06-25 via the Kokoro-82M Hugging Face repo, the hexgrad/kokoro GitHub README, and ONNX community builds. Self-hosted inference is free; hosted wrappers price per million characters and should be checked separately.

Against the alternatives

	Kokoro (82M)	XTTS v2 (~467M)	ElevenLabs (hosted)
License	Apache 2.0	CPML (non-commercial by default)	Proprietary
Parameter count	82M	467M	Not disclosed
Voice cloning	No	Yes (instant)	Yes (strong hosted option)
Languages	8 (v1.0)	17	32+
Real-time streaming	No	Limited	Yes
Emotion control	Basic	Basic	Fine-grained
Cost at 10M chars	Electricity	Electricity	~$300+ on paid tier
Best viewed as	Small, offline-first	Mid-size clone-capable	Hosted quality ceiling

Failure modes

No voice cloning. Fixed pre-trained voices only. Custom-voice work requires a different model.
Prosody is basic. No fine-grained emotion sliders. Tone is controlled mainly by text wording and punctuation.
No streaming. Full audio generates before playback. Latency is not viable for real-time agent loops.
English quality leads; other languages lag. The 8-language v1.0 list is functional but native-speaker critique can show gaps against specialist models.
No hosted first-party API. Third-party providers (Together, Replicate) exist, but there is no vendor SLA.
Scam-domain risk. The official model card warns that Kokoro-looking root domains are not owned by or affiliated with the model author unless linked from the official source. Avoid entering credentials or payment details on lookalike sites.
CPU runs are real-time, GPU is 10-20x faster. Long-document batches on CPU get slow.
Community-driven release cadence. Version bumps depend on hexgrad’s time. Update frequency is irregular.

Methodology

This page was produced by the aipedia.wiki editorial pipeline, an automated system that ingests vendor documentation, verifies claims against primary sources, and generates the editorial analysis shown here. No individual human wrote this review. Scoring follows the four-dimension rubric at /about/scoring/ (Utility, Value, Moat, Longevity, unweighted average). Last verified 2026-06-25 against the Kokoro-82M Hugging Face repo, VOICES.md, hexgrad GitHub, and onnx-community Kokoro-82M-v1.0-ONNX builds.

FAQ

Is Kokoro free for commercial use? Yes. The model is Apache 2.0 licensed, which allows commercial use without royalties (Hugging Face).

How does Kokoro compare to ElevenLabs? Kokoro matches ElevenLabs on fixed-voice English narration quality in blind TTS Arena tests. ElevenLabs still wins on voice cloning, emotion sliders, real-time streaming, and language breadth. Kokoro wins on cost (free vs per-character) and privacy (local vs hosted).

How do I run Kokoro? pip install kokoro soundfile. The current GitHub README uses kokoro>=0.9.4 in examples. Basic inference:

from kokoro import KPipeline
pipeline = KPipeline(lang_code='a')
audio, _ = pipeline("Your text here.", voice='af_heart')

ONNX builds exist for deployment outside Python (onnx-community).

How many voices and languages does Kokoro support? The official v1.0 release table lists 54 voices across 8 language groups: English (US, UK), Spanish, French, Hindi, Italian, Japanese, and Mandarin Chinese. The current GitHub README also shows a Brazilian Portuguese language-code example, so production multilingual buyers should test the exact voicepack and library version rather than relying on a simple language count. See the VOICES.md reference for the full list.

Which Kokoro website is official? Use the Hugging Face model card and the linked hexgrad/kokoro GitHub repository as the source of truth. The model card warns that Kokoro-looking third-party root domains are not owned by or affiliated with the author unless the official model page links them.

Can Kokoro clone my voice? No. Kokoro supports fixed voices only. For zero-shot voice cloning from a short reference clip, use ElevenLabs, Fish Audio, or MiniMax Speech.

Sources

Kokoro-82M on Hugging Face: official model, voicepacks, documentation
Kokoro VOICES.md: canonical voice/language reference
hexgrad GitHub: source code and Python library
Kokoro-82M-v1.0-ONNX: community ONNX builds for mobile and browser
Together AI hosted Kokoro-82M: hosted API pricing reference

Category: AI Voice
Compare: ElevenLabs · Cartesia · MiniMax Speech

Share LinkedIn

Was this review helpful?

Embed this score on your site Free. Links back.

HTML

<a href="https://aipedia.wiki/tools/kokoro/" target="_blank" rel="noopener"><img src="https://aipedia.wiki/badges/kokoro.svg" alt="Kokoro TTS on aipedia.wiki" width="260" height="72" /></a>

Markdown

[![Kokoro TTS on aipedia.wiki](https://aipedia.wiki/badges/kokoro.svg)](https://aipedia.wiki/tools/kokoro/)

Badge value auto-updates if the editorial score changes. Attribution via the link is required.

Cite this page For journalists, researchers, and bloggers

News writers

According to aipedia.wiki Editorial at aipedia.wiki (https://aipedia.wiki/tools/kokoro/)

APA

aipedia.wiki Editorial. (2026). Kokoro TTS: Editorial Review. aipedia.wiki. Retrieved July 2, 2026, from https://aipedia.wiki/tools/kokoro/

MLA 9

aipedia.wiki Editorial. "Kokoro TTS: Editorial Review." aipedia.wiki, 2026, https://aipedia.wiki/tools/kokoro/. Accessed July 2, 2026.

Chicago

aipedia.wiki Editorial. 2026. "Kokoro TTS: Editorial Review." aipedia.wiki. https://aipedia.wiki/tools/kokoro/.

BibTeX

@misc{kokoro-tts-editorial-review-2026,
  author = {{aipedia.wiki Editorial}},
  title = {Kokoro TTS: Editorial Review},
  year = {2026},
  publisher = {aipedia.wiki},
  url = {https://aipedia.wiki/tools/kokoro/},
  note = {Accessed: 2026-07-02}
}

Spotted an error or want to share your experience with Kokoro TTS?

Every tool page is re-verified on a recurring cycle, and corrections land faster when readers flag them directly. If you spot a stale fact, a missing capability, or have used Kokoro TTS and want to share what worked or didn't, the editorial desk reviews every message sent through this form.

Email editorial@aipedia.wiki

Report outdated info Help us keep this page accurate

Free (open-source)

Should you use it?

What to buy

Use it for this, skip it for that

Best for

Avoid if

Only what affects the decision

Best swaps

Why this recommendation is trusted

Verified facts

System Verdict

Key Facts

What it actually is

When to pick Kokoro

When to pick something else

Pricing

Against the alternatives

Failure modes

Methodology

FAQ

Sources

Related

Reader reviews