platoseed
Open data lakehouse for biology
Query, trace, and validate datasets and models at scale. Automate context for agents and humans. One API: lakehouse, lineage, feature store, ontologies, bio-registries & formats.
Lamin provides an open data platform and lakehouse for biology, enabling tracked data management, collaboration, and learning at scale. It emphasizes lineage, bio-formats support, and zero-lock-in across databases and storage with open standards.
Lamin combines a lakehouse with support for bio-formats and ontologies to manage datasets and models. It allows querying and batch-loading across formats like Parquet, Zarr, AnnData, and SpatialData, while unifying metadata in relational sheets synced with datasets. It offers lineage tracking with a single-line-of-code, dataset annotation using schemas, and fine-grained permissions administered at the database and storage level. The platform runs on Postgres/SQLite with unified access to storage (local, S3, GCP, Azure) and supports integration with PyData/R workflows, Nextflow, and other tools. It provides both an open-source core (LaminDB) and hosted offerings (LaminHub) with options for hosted databases, SSO, audit logs, and compliance features, while emphasizing zero lock-in and open standards across infrastructure.
Who itβs for: Biology researchers, bioinformaticians, data scientists, wet-lab teams, and institutions requiring open, lineage-aware data lakehouse for biological datasets and models.
Pricing page presence, hosted offering LaminHub, academic discounts, evidence of productized hosted plans, multiple integrations and open-source core
Building open-source data infra for biology at Lamin. Previously, created Scanpy and led the build-up of Cellarityβs compute platform.

A Cloud-Based Bioinformatics and AI Platform

Open-source, serverless vectordb for production-scale generative AI