LanceDB

LanceDB is a multimodal lakehouse for AI teams that need one data layer for curation, feature engineering, search and retrieval, and model training. It is built on top of Lance, an open-source lakehouse format designed for multimodal AI data. Move from data exploration to model training on one, unified platform without needing to manage a fragmented stack of storage, feature, retrieval, and training systems.

Build better models, faster

Training data and experimentation slow down when raw data, metadata, embeddings, features, and governance artifacts live in separate systems. LanceDB keeps them together in one versioned multimodal table, so AI teams spend less time stitching infrastructure together and more time improving datasets, testing features, and keeping GPUs fed.

Training data lifecycle: Curation, Feature Engineering, Search and Retrieval, Training

Use the same table to curate training data, add derived features, retrieve examples, and feed training jobs that rely on expensive GPUs. Training workloads can sample, shuffle, and scan projected columns from local storage or object storage, then assemble GPU-ready batches from a tagged dataset version. For a deeper look at how this works in training pipelines, start with Why LanceDB for training.

LanceDB suite

The LanceDB suite includes LanceDB OSS, an open-source embedded retrieval library, and LanceDB Enterprise, a multimodal lakehouse platform for the full AI data lifecycle. OSS is easy to set up on a local machine for search and regular-scale workflows. LanceDB Enterprise is built for teams that need scale without building bespoke infrastructure for curation, feature engineering, search and retrieval, and efficient training data access.

Why teams use LanceDB

One table for the whole AI data loop

Store images, video, audio, text, annotations, embeddings, and model-generated features together in one schema-enforced table. The same table can support dataset curation, feature backfills, experiment splits, retrieval, and training.

High-throughput data access for training

Training workloads mix fast random access with high-throughput sequential scans. LanceDB is designed for both, so teams can shuffle data into GPU-ready batches more efficiently, improve input throughput, and iterate on experiments faster.

Fast, versatile search and retrieval

Whether the end user is a human or an agent, LanceDB powers production retrieval workloads such as semantic search, hybrid search, RAG, agent memory, and recommendation systems. Retrieval runs against the same LanceDB tables used for curation, feature engineering, and training workflows.

Start with your workload

Train and fine-tune models

Learn why LanceDB works well as the data layer for training workloads.

Load data into PyTorch

Use LanceDB tables and permutations for projected, shuffled, random-access training reads.

Browse ready-to-use datasets

Explore Lance-formatted multimodal datasets with raw bytes, metadata, embeddings, and indices.

Build search and retrieval

Use vector search, full-text search, hybrid search, reranking, filtering, and SQL.

From local development to production scale

LanceDB OSS and LanceDB Enterprise share the same Lance format and table model. Start locally with the embedded OSS library, then move to Enterprise when your team needs distributed scale, managed infrastructure, private deployment, or higher-throughput curation, feature engineering, search and retrieval, and training workflows.

1. LanceDB OSS

The fastest way to get started is the open-source embedded library, with client SDKs in Python, TypeScript and Rust. Run it locally in just a few steps, which lets you explore datasets, curate data, and run search and retrieval workloads for agents. Start here:

Quickstart

Get started with LanceDB in minutes.

Basic Table Operations

Create tables, evolve schemas, version data, and modify rows in LanceDB.

2. LanceDB Enterprise

LanceDB Enterprise is a petabyte-scale (and beyond), distributed multimodal lakehouse platform built for search, curation, feature engineering, and high-throughput training data access workflows on top of the same core table abstraction. This eliminates the need for teams to build bespoke infrastructure to manage large multimodal datasets. To set up LanceDB Enterprise in your organization, reach out to us at contact@lancedb.com.

Built with scale, performance, and security in mind.LanceDB Enterprise is designed for very large-scale, high-performance, distributed workloads in private deployments, and can operate under strict security requirements.

Quickstart

Get started with LanceDB in minutes, including Enterprise db:// connections.

Get started

Model training

Guides

Feature Engineering (Geneva)

Support

Build better models, faster

LanceDB suite

Why teams use LanceDB

Start with your workload

Train and fine-tune models

Load data into PyTorch

Browse ready-to-use datasets

Build search and retrieval

From local development to production scale

1. LanceDB OSS

Quickstart

Basic Table Operations

2. LanceDB Enterprise

Quickstart

​Build better models, faster

​LanceDB suite

​Why teams use LanceDB

​Start with your workload

Train and fine-tune models

Load data into PyTorch

Browse ready-to-use datasets

Build search and retrieval

​From local development to production scale

​1. LanceDB OSS

Quickstart

Basic Table Operations

​2. LanceDB Enterprise

Quickstart

Build better models, faster

LanceDB suite

Why teams use LanceDB

Start with your workload

From local development to production scale

1. LanceDB OSS

2. LanceDB Enterprise