Skip to main content
Lance is an open-source lakehouse format, which provides the foundation for LanceDB’s capabilities. Lance combines the performance of Apache Arrow with advanced features designed specifically for AI workloads.
https://mintcdn.com/lancedb-bcbb4faf/0sS6vrpmM3KSVyss/static/assets/logo/lance-logo-gray.svg?fit=max&auto=format&n=0sS6vrpmM3KSVyss&q=85&s=1c7311e59aacc6a085345618f357d380

Lance format documentation

Learn more about the Lance format by reading the docs.

How Lance Enables the Multimodal Lakehouse

Lance is a file format, table format, and catalog spec for multimodal AI, allowing developers to build a complete open lakehouse on top of object storage to power AI workflows. The format brings high-performance vector search, full-text search, random access, and feature engineering capabilities to a single unified system, eliminating the need for multiple specialized databases. Unlike traditional vector databases that only store embeddings alongside the metadata, LanceDB’s multimodal lakehouse stores both the original data (including image, video or audio bytes) and its vector representations alongside traditional tabular data in the same efficient format.

Advantages of the Lance format

AdvantageDescription
Multimodal storageEfficiently holds vectors, images, videos, audio, text, and more
Version controlBuilt-in data versioning for reproducible ML experiments and data lineage
ML-optimizedDesigned for training and inference workloads with fast random access
Query performanceColumnar storage enables blazing-fast vector search and analytics
Cloud-nativeSeamless integration with cloud object stores (S3, GCS, Azure Blob)

Key concepts

The following concepts are core to the Lance format:
1
Data storage is columnar and is interoperable with other columnar formats (such as Parquet) via Arrow
2
Data is divided into fragments that represent a subset of the data. Fragments are chunks of data in a Lance dataset. Each fragment includes multiple files that contain several columns in the chunk of data that it represents.
3
Data is versioned, with each insert operation creating a new version of the dataset and an update to the manifest that tracks versions via metadata

Data versioning

Data in Lance tables are versioned — this helps keep LanceDB scalable and consistent. We do not immediately blow away old versions when creating new ones because other clients might be in the middle of querying the old version. It’s important to retain older versions for as long as they might be queried. Each version contains metadata and just the new/updated data in your transaction. So if you have 100 versions, they aren’t 100 duplicates of the same data. However, they do have 100x the metadata overhead of a single version, which can result in slower queries.

Data compaction

As you insert more data, your dataset will grow and you’ll need to perform compaction to maintain query throughput (i.e., keep latencies down to a minimum). Compaction is the process of merging fragments together to reduce the amount of metadata that needs to be managed, and to reduce the number of files that need to be opened while scanning the dataset.

Performance Optimization Through Compaction

Compaction performs the following tasks in the background:
  • Removes deleted rows from fragments
  • Removes dropped columns from fragments
  • Merges small fragments into larger ones

Data deletion and recovery

Although Lance allows you to delete rows from a dataset, it does not actually delete the data immediately. It simply marks the row as deleted in the DataFile that represents a fragment. For a given version of the dataset, each fragment can have up to one deletion file (if no rows were ever deleted from that fragment, it will not have a deletion file). This is important to keep in mind because it means that the data is still there, and can be recovered if needed, as long as that version still exists based on your backup policy.
https://mintcdn.com/lancedb-bcbb4faf/0sS6vrpmM3KSVyss/static/assets/logo/lance-logo-gray.svg?fit=max&auto=format&n=0sS6vrpmM3KSVyss&q=85&s=1c7311e59aacc6a085345618f357d380

Learn more about Lance

Lance is a separate open source project. Check out its documentation to learn more.