Kokoro-82M, an open source, SOTA (State-of-the-art) TTS(Text-to-Speech) model 🗣️

I have been playing around with Kokoro-82M a extremely light weight Text-to-Speech model launched in December, 2024. Probably the most fun model to try and experiment with.

TTS means using it you can convert text into audio/ speech, which has some wide and exciting use cases:
1. TTS for private documents: Since this is open source you can host it locally don't need to rely on external services.
2. Custom voices and ability to blend voices, trim and control speed of the generation.
3. Can save a lot of $$ while prototyping.

What I liked about the model:
1. Quality of voices are extremely good. Quite comparable to Eleven Labs or other providers
2. Easy setup, takes up less memory (350MB), quite fast (can even be used for real time TTS).
3. Open source and launched under Apache 2.0 licence so can be used for commercial purposes.

Supported Languages: American English, British English, Japanese, Chinese

At a size of 350MB is not just super small in size but also very easy to get started with, you can test it on Hugging Face 🤗 spaces.
https://lnkd.in/dzPnveRV

Or if you want to set it up locally, check this.https://https://lnkd.in/d77d3ihR

Kokoro-82M, an open source, SOTA (State-of-the-art) TTS(Text-to-Speech) model 🗣️

Keep reading

SliceOfAI