
LLM Stats
ActiveIndependent AI evaluations lab
About
We build independent and contamination-proof benchmarks that measure real world performance. LLM Stats is the most complete LLM leaderboard. We have the most complete archive of LLM benchmark results and also run independent evaluations that are not the classical ones that are already in the training data of most models. Our mission: become the biggest community dedicated to AI transparency.
Founders · 2
Co-Founder at CallingBox. Previously, I was the founder of LLM-Stats.com (500k MAU), I was an early employee on the LLM Observability team at Datadog. I did undergrad research on Vision Transformers for particle physics and RL for robotics.
Co-Founder @ LLM Stats. Previous founding engineer at Micro building the future of email (backed by a16z), as well as founding engineer at Atrato Pago (W21). Formerly built and scaled Minecraft servers during my spare time during highschool.
Launch
A tool to evaluate and optimize AI agents using human feedback.
ZeroEval builds tools to evaluate and improve AI agents via human-informed feedback. It offers calibrated LLM judges that learn from production data and incorrect samples, plus Autotune for automatic evaluation and prompt optimization across multiple models.
Formerly “ZeroEval”, “CallingBox”
Related startups

The LLM Eval and Observability Platform for AI Quality

Frontier models for critical domains



