LLM Stats

Active

Independent AI evaluations lab

Summer 2025Founded 20252 peopleNew York, NY, USA

llm-stats.com ↗LinkedIn ↗X ↗See on the Idea Map B2B momentum

About

We build independent and contamination-proof benchmarks that measure real world performance. LLM Stats is the most complete LLM leaderboard. We have the most complete archive of LLM benchmark results and also run independent evaluations that are not the classical ones that are already in the training data of most models. Our mission: become the biggest community dedicated to AI transparency.

From their website

as of Jun 7, 2026llm-stats.com ↗

SaaSSubscription · Pricing details are shown per model in the leaderboard (e.g., $/M tok) and not listed as a single site-wide price; exact subscription tiers are not provided in the text.

LLM Stats is an independent AI evaluations lab that maintains the AI Leaderboard, ranking 300+ AI models by intelligence, speed, and price. It provides continuous updates from public benchmarks and live API metrics, with filters and methodology explanations for comparing models.

The product aggregates public benchmark results and live API performance to compute the LLM Stats Score for hundreds of AI models. It presents a sortable leaderboard with filters, model details, pricing data, and methodology explanations. Users can view full leaderboards (300+ models), compare two models, explore benchmarks (e.g., MMLU, GPQA, SWE-Bench, AIME), and read model-specific pages that include performance metrics, context windows, and pricing pulled from provider lists and proxy checks. Pricing and metadata revalidate hourly, with a 7-day rolling average for live performance updates.

Who it’s for: Researchers, AI practitioners, product teams evaluating LLMs, and decision-makers who compare models for reasoning, coding, writing, and other capabilities.

Features

Independent model rankings
Composite scoring from benchmarks and pricing
Live API performance metrics
Advanced filters and comparisons
Full leaderboard with 300+ models
Model detail pages with pricing and context data
Methodology explanations and FAQs

Active with regular leaderboard updates, new model entries, and public-facing pricing data; mentions of ongoing rankings and methodology, indicating product traction and ongoing data collection.

Founders · 2

Jonathan ChávezFounder

Co-Founder at CallingBox. Previously, I was the founder of LLM-Stats.com (500k MAU), I was an early employee on the LLM Observability team at Datadog. I did undergrad research on Vision Transformers for particle physics and RL for robotics.

LinkedIn ↗X ↗

Sebastian CrossaCo-Founder

Co-Founder @ LLM Stats. Previous founding engineer at Micro building the future of email (backed by a16z), as well as founding engineer at Atrato Pago (W21). Formerly built and scaled Minecraft servers during my spare time during highschool.

LinkedIn ↗X ↗

Launch

Launched on Y Combinator · Aug 2025

View launch post ↗

A tool to evaluate and optimize AI agents using human feedback.

ZeroEval builds tools to evaluate and improve AI agents via human-informed feedback. It offers calibrated LLM judges that learn from production data and incorrect samples, plus Autotune for automatic evaluation and prompt optimization across multiple models.