platoseed
Simulation Engine for Benchmarking AI Products
Kashikoi is a simulation engine to benchmark AI agents. We generate CPU friendly world models that autonomously interview agents and generate deep behavioral assessments. We built a similar technology at Moveworks which was used to ship 250+ enterprise agents to customers daily.
Kashikoi provides an all-in-one simulation platform to build, evaluate, and test AI agents with real-world-like scenarios. It emphasizes automated, code-free benchmarking to help ship AI with confidence and reduce manual evals.
Kashikoi offers a simulation engine that connects to your AI stack via custom integrations, runs realistic scenarios, and tracks performance metrics to identify edge cases. It enables testing of agents through active simulations and provides actionable insights and synthetic data to optimize prompts, fine-tune models, and improve agent performance. The platform highlights metrics such as accuracy, context understanding, and response times, with cost-per-call and prompt-ready analytics, plus a 3-step workflow from connecting agents to deriving insights.
Who itβs for: AI product teams and developers responsible for benchmarking and improving AI agents; teams seeking risk-free testing and faster shipping of AI capabilities.
Mentions YC backing, demo booking, performance dashboards, and real-world scenario testing indicates traction and growth activity.
Founder, Kashikoi. Aaksha led the Simulation & Evaluation stack at Moveworks(ServiceNow) shipping 250+ customized enterprise ready agents to Fortune 500 and federal agencies. Aaksha has done award winning NSF sponsored research in Transformers at CMU (long before OpenAI made them cool). She shipped edge speech recognition models on 1bn+ iPhones (for the most esoteric dialects you can think of) at Apple. The innovation behind this was nominated for a Best Paper at Interspeech 2021.
Autonomously interview your Agents!
Kashikoi builds a simulation engine that uses world models to benchmark GenAI agents by autonomously interviewing them and generating deep behavioral evaluations. It targets teams evaluating and improving AI agents, offering automatic prompt optimization, long-term evaluation data, and scalable test-time adaptation without hand-written prompts.
Formerly βEigenAIβ

Specialized AI for Critical Industries

Build AI coworkers using natural language, it's Lovable for agents