platoseed
Fast, reliable, reproducible AI with GPU live migration
Cedana (YC S23) brings hyperscaler and frontier-lab orchestration capabilities for AI workflows. Our core capability is live migration for CPUs and GPUs workloads. This increases cost savings up to 80%, accelerates time to first token 2-10x, and enables stateful reliability of training jobs even through catastrophic GPU failures. We've integrated our solution into K8s, and support Kueue and Slurm for training distributed jobs, and Kserve for serving inference. OpenAI, Meta and Microsoft have flavors of these capabilities internally and we’re bringing them to everyone. Our vision is to transform cloud compute into a real-time, arbitraged commodity. https://www.cedana.ai
CEO of Cedana. Previously CEO/co-founder of Engooden, AI-powered chronic disease management proven to improves outcomes and lower costs for patients (Series B). VP of Corp Development for Petra Systems (predictive smart grid/solar company) scaled from $0-$70M ARR. At TL Ventures ($1.6B VC fund) investing across semi, software and systems. Built a system for large-scale, automated ML and computer vision at MIT CSAIL. Patents and publications in AI, computer vision.
Co-founder/CTO at Cedana. Previously Robotics at Shopify. Previously previously Mech/Aero, published in control systems, neuromorphic computing and satellite formation flight.
Solving critical GPU performance, reliability, capacity and cost problems
Cedana provides real-time migration for compute, automatically scheduling and moving workloads across instances and vendors to maximize utilization and prevent progress loss. It supports AI training/inference, HPC, ML Ops, and more, with open-source and managed options, aiming to reduce idle resources, meet job SLAs, and enable fast suspend-resume and vendor aggregation.
From the original launch (Aug 2023) — may be outdated.

AI agents that automate AP, procurement, and finance

The Control Plane for Enterprise Agents