platoseed
An SDK to run transformer models anywhere
Exla aggressively quantizes AI models to minimize memory usage and maximize inference speed. Whether you're deploying LLMs, VLMs, VLAs, or custom models, Exla reduces memory footprint by up to 80% and accelerates inference by 3โ20x - all with just a few lines of code. https://cal.com/exla-ai/schedule
CTO at Exla. Previously an OS engineer at Apple leading sleep/hibernation for all Apple devices. B.S. Computer Science from Purdue.
Optimize models to run on edge devices (e.g. Jetsons) with 3-20x faster inference and 80% less memory requirements
Exla builds the Exla SDK to optimize transformer-based and CV models for edge devices (e.g., NVIDIA Jetsons), reducing memory usage by up to 80% and delivering 3โ20x faster inference, with aims to deploy LLMs, VLMs, VLAs, and other CV models on edge and embedded platforms.

AI Physics Engine to replace simulations and prototypes

Artificial Specialized Intelligence