Ingest data
We support high-throughput writes, comfortably handling 4GB per second. Our client SDK maintains 1:1 parity with the open-source version, enabling existing users to migrate seamlessly—zero refactoring required.
LanceDB supports table creation using multiple data formats, including:
- Pandas DataFrames (example below)
- Polars DataFrames
- Apache Arrow Tables
For the Python SDK, you can also define tables flexibly using:
- PyArrow schemas (for explicit schema control)
LanceModel
(a Pydantic-based model for structured data validation and serialization)
This ensures compatibility with modern data workflows while maintaining performance and type safety.
Insert data
The vector column needs to be a pyarrow.FixedSizeList type.
Using Pydantic Models
Using Nested Models
You can use nested Pydantic models to represent complex data structures. For example, you may want to store the document string and the document source name as a nested Document object:
This can be used as the type of a LanceDB table column:
This creates a struct column called document
that has two subfields called content
and source
:
Insert large datasets
It is recommended to use itertators to add large datasets in batches when creating your table in one go. Data will be automatically compacted for the best query performance.
Explore full documentation in our SDK guides: Python and Typescript.