Exla

Active

An SDK to run transformer models anywhere

Winter 2025Founded 20252 peopleSan Francisco, CA, USA

exla.ai/ ↗LinkedIn ↗X ↗See on the Idea Map B2B momentum

About

Exla aggressively quantizes AI models to minimize memory usage and maximize inference speed. Whether you're deploying LLMs, VLMs, VLAs, or custom models, Exla reduces memory footprint by up to 80% and accelerates inference by 3–20x - all with just a few lines of code. https://cal.com/exla-ai/schedule

Founders · 2

Pranav NairCo-Founder

Apple

CTO at Exla. Previously an OS engineer at Apple leading sleep/hibernation for all Apple devices. B.S. Computer Science from Purdue.

LinkedIn ↗

Viraat DasFounder

Amazon

CEO @ Exla. Previously machine learning engineer @ Amazon.

LinkedIn ↗X ↗

Launch

Launched on Y Combinator · Feb 2025

View launch post ↗

Optimize models to run on edge devices (e.g. Jetsons) with 3-20x faster inference and 80% less memory requirements

Exla builds the Exla SDK to optimize transformer-based and CV models for edge devices (e.g., NVIDIA Jetsons), reducing memory usage by up to 80% and delivering 3–20x faster inference, with aims to deploy LLMs, VLMs, VLAs, and other CV models on edge and embedded platforms.