platoseed
The most emotive foundation models for voice
Miso Labs is building the world’s most emotive foundation models for voice. We believe that the next generation of AI interactions shouldn't just be functional—they should be human. By bringing warmth and lightning-fast speed to the voice layer, we empower developers to build voice agents that users truly love.
Miso Labs offers emotive foundation models for voice, enabling real-time voice agents with on-premises deployment and one-shot voice cloning. Their platform emphasizes low latency, data sovereignty, and enterprise hosting options.
Miso Labs provides voice AI models (Miso-TTS) that deliver real-time latency around 110 ms, with one-shot voice cloning from a 10-second clip, and on-premises deployment for enterprise data control. The product is available via API access with public voices and custom voices, streaming via WebSocket, and an option for BYO custom voices. Pricing is plan-based (Starter, Scale, Enterprise) with included minutes, per-minute overage pricing, and on-premises deployment for select customers.
Who it’s for: Enterprises and developers building voice agents that require low latency, custom or cloned voices, and on-premises deployment for data sovereignty.
Pricing page details, enterprise onboarding options, on-prem deployment and dedicated support suggest growth traction and active enterprise engagements.
Building the most emotive voice AI models
Miso Labs released Miso One, an 8-billion-parameter text-to-speech model that generates highly expressive speech with human-like emotion at 110 milliseconds latency. The model weights are open-sourced with API access coming soon.
▲ 12
Formerly “Kamino Learning”, “Dalta Labs” · why startups rename →

foundational models which understand how it's said - capturing the emotion, tone, and intent that hold the real meaning and having conversations for voice first economies.

Nuance Labs is building the first human foundation model that understands and expresses emotion in real-time, across speech, facial expression, and body language.