Miso Labs

Active

The most emotive foundation models for voice

Spring 2026Founded 20252 peopleSan Francisco, CA, USA

misolabs.ai ↗LinkedIn ↗X ↗GitHub ↗See on the Idea Map B2B momentum

About

Miso Labs is building the world’s most emotive foundation models for voice. We believe that the next generation of AI interactions shouldn't just be functional—they should be human. By bringing warmth and lightning-fast speed to the voice layer, we empower developers to build voice agents that users truly love.

From their website

as of Jun 7, 2026misolabs.ai ↗

API/InfraSubscription · Starter: $20/month with 150 included minutes; $0.15/min after. Scale: $100/month with 1,000 included minutes; $0.10/min after. Enterprise: custom pricing with on-premises deployment and dedicated support; volume pricing under $0.08/min. Annual plans available for some discounts. Minutes are counted

Miso Labs offers emotive foundation models for voice, enabling real-time voice agents with on-premises deployment and one-shot voice cloning. Their platform emphasizes low latency, data sovereignty, and enterprise hosting options.

Miso Labs provides voice AI models (Miso-TTS) that deliver real-time latency around 110 ms, with one-shot voice cloning from a 10-second clip, and on-premises deployment for enterprise data control. The product is available via API access with public voices and custom voices, streaming via WebSocket, and an option for BYO custom voices. Pricing is plan-based (Starter, Scale, Enterprise) with included minutes, per-minute overage pricing, and on-premises deployment for select customers.

Who it’s for: Enterprises and developers building voice agents that require low latency, custom or cloned voices, and on-premises deployment for data sovereignty.

Features

110ms latency guarantee
10-second one-shot voice cloning
On-premises deployment for enterprise data sovereignty
Streaming + WebSocket SDK
BYO/custom voices
Dedicated voice fine-tuning
Public and custom voices

Pricing page details, enterprise onboarding options, on-prem deployment and dedicated support suggest growth traction and active enterprise engagements.

Founders · 2

Cassidy DalvaFounder

Stanford

co-founder @ miso labs | stanford alum

LinkedIn ↗X ↗

Aoden TeoFounder

Stanford

Math major from Stanford building the future of AI voice.

LinkedIn ↗

Launch · 1 launch

Miso Labs - emotive voice models · Jun 2026
View ↗
Building the most emotive voice AI models
Miso Labs released Miso One, an 8-billion-parameter text-to-speech model that generates highly expressive speech with human-like emotion at 110 milliseconds latency. The model weights are open-sourced with API access coming soon.
▲ 12

Website over time

Step through Miso Labs’s homepage year by year — the pitch, the product and the design, as the web remembers them.

2025Today ↗

Each year opens the Wayback Machine capture closest to that date.

Formerly “Kamino Learning”, “Dalta Labs” · why startups rename →

B2B InfrastructureAI

Related startups

Maya

foundational models which understand how it's said - capturing the emotion, tone, and intent that hold the real meaning and having conversations for voice first economies.

Active

Nuance Labs

Nuance Labs is building the first human foundation model that understands and expresses emotion in real-time, across speech, facial expression, and body language.

Active

3 more related startups + AI insights

Free account · no credit card.

Also in Spring 2026

HeyClicky Huscarl Inc.Thomas Cohesion Maquoketa Research Intelligence Factory