- Ingest data into tables from JSON data (and in Python, Pandas or Polars DataFrames)
- Create empty tables by defining explicit Arrow schemas
- Vector similarity search with filtering and projections
- Filtered queries that can operate on nested structs
- Interoperate with DuckDB and run traditional SQL queries on an Arrow table (Python)
Dataset
We’ll work with this small dataset based on characters from the legends of Camelot. Note that thevector column holds 4-dimensional embeddings, and the stats column is a nested struct
with several integer fields, indicating each character’s attributes.
camelot.json
Connect to a database
We start by connecting to a LanceDB database path. If you’re using LanceDB Cloud or Enterprise, replace the local connection string with the appropriate remote URI and authentication details.Create a table and ingest data
From JSON
LanceDB stores records in Lance tables. Each row is a record and each column holds a field or related metadata. The simplest way to start is to obtain the source data as a list of JSON records that includes a vector column and any metadata fields you care about. Load the data from the JSON file: You can now create a LanceDB table from the loaded data. Use themode="overwrite" to
replace any existing table with the same name and overwrite its data (useful during
initial testing).
If you want to avoid overwriting an existing table, omit the overwrite mode.
From Pandas DataFrames
Python Only You can create LanceDB tables directly from Pandas DataFrames. Simply obtain the source data as a Pandas DataFrame, then create the table and directly ingest to it.From Polars DataFrames
Python Only You can also create LanceDB tables directly from Polars DataFrames. Simply obtain the source data as a Polars DataFrame, then create the table and directly ingest to it.From an Arrow schema
If you want to create an empty table without any data — say you want to define the schema first and then incrementally add data later — you can do so by defining an Arrow schema explicitly. Once the empty table is defined, LanceDB is ready to accept new data via theadd method, as shown in the next section.
Append data to a table
LanceDB tables are mutable, and you can append new records to existing tables. If you’re starting with a fresh session, connect to the database and open the existing table namedcamelot.
Prepare the new records to add. Here, we add two new magical characters
via the add method.
We now have two new records in the table. Let’s begin to query our data!
Vector search
It’s straightforward to run vector similarity search in LanceDB. Let’s answer some questions about the data using vector search with projections (returning only the desired columns).Q1: Who are the characters similar to “wizard”?
| name | role | description |
|---|---|---|
| Merlin | Wizard and Advisor | A powerful wizard and prophet |
| The Lady of the Lake | Mystical Guardian | A mysterious supernatural figu… |
| Morgan le Fay | Sorceress | A powerful enchantress, Arthur… |
| Queen Guinevere | Queen of Camelot | Arthur’s queen, admired for he… |
| Sir Galahad | Knight of the Round Table | The purest and most virtuous k… |
where method, where you can
pass in SQL-like expressions.
Q2: Who are the characters similar to “wizard” with high magic stats?
| name | role | description |
|---|---|---|
| Merlin | Wizard and Advisor | A powerful wizard and prophet |
| The Lady of the Lake | Mystical Guardian | A mysterious supernatural figu… |
| Morgan le Fay | Sorceress | A powerful enchantress, Arthur… |
Filtered search
You can also run traditional analytics-style search queries that do not involve vectors. For example, let’s find the strongest characters in the dataset. In the query below, we leave thesearch method empty to indicate
that we don’t want to use any vector for similarity search (in TypeScript, use query() instead),
and use the where method to filter on the strength field.
Q3: Who are the strongest characters?
| name | role | description |
|---|---|---|
| Sir Galahad | Knight of the Round Table | The purest and most virtuous k… |
| Sir Gawain | Knight of the Round Table | A noble and honorable knight k… |
| Sir Percival | Knight of the Round Table | A loyal and innocent knight wh… |
| Sir Lancelot | Knight of the Round Table | Arthur’s most skilled knight, … |
| Mordred | Traitor Knight | Arthur’s treacherous son or ne… |
SQL queries using DuckDB
Python Only You can leverage a full SQL engine like DuckDB to run more complex queries that involve sorting, aggregations, joins, and so on by converting LanceDB tables to DuckDB tables.| name | role | description |
|---|---|---|
| Sir Galahad | Knight of the Round Table | The purest and most virtuous knight, chosen to… |
| Sir Lancelot | Knight of the Round Table | Arthur’s most skilled knight, famed for unmatc… |
| Sir Gawain | Knight of the Round Table | A noble and honorable knight known for his cou… |
| Sir Percival | Knight of the Round Table | A loyal and innocent knight whose bravery and … |
| Mordred | Traitor Knight | Arthur’s treacherous son or nephew who ultimat… |
Delete data
You can delete rows from a LanceDB table using thedelete method with
a filtering expression.
Say we want to throw away Mordred, the traitor knight, from our table.
This will delete the row(s) where the role column matches “Traitor Knight”.
You can verify that the row has been deleted by running a search query again,
and confirming that Mordred no longer appears in the results.
Add column
We can also add new columns to an existing LanceDB table using theadd_columns method.
For this example, let’s add a new float column named power that shows the average
of each character’s strength, courage, magic, and wisdom stats.
The example above shows how a cast expression can be used to ensure the
average total stats is available as a float column, that is then converted to
an Arrow float type under the hood for the Lance table.
We can display the results of this column in descending order of power.
Q4: Who are the most powerful characters?Note that LanceDB’s
where only filters rows, but doesn’t sort them by applying an ORDER BY
clause that you may be used to when working with SQL databases. In TypeScript, you can sort the
returned array in application code.
Python Only
You can also sort the results after converting them to a Polars DataFrame, as shown in the example above.
Python
| name | role | description | power |
|---|---|---|---|
| Merlin | Wizard and Advisor | A powerful wizard and prophet … | 4.0 |
| Sir Galahad | Knight of the Round Table | The purest and most virtuous k… | 4.0 |
| The Lady of the Lake | Mystical Guardian | A mysterious supernatural figu… | 3.75 |
| Sir Lancelot | Knight of the Round Table | Arthur’s most skilled knight, … | 3.5 |
| Sir Gawain | Knight of the Round Table | A noble and honorable knight k… | 3.5 |
Drop column
If you want to remove or delete a column from an existing LanceDB table, you can use thedrop_columns method.
This will remove the power column we added earlier from the table schema.
Drop table
If you want to delete an entire table from the database, you can use thedrop_table method.
This will delete the camelot table from the connected LanceDB database.
See the full code for these examples (including helper functions) in the
basic_usage file for the appropriate client language in the
docs repo.What about vector indexes?
LanceDB supports vector indexes to speed up similarity search on large datasets. For datasets up to a few hundred thousand vectors, LanceDB’s highly efficient kNN (brute-force) retrieval of nearest neighbors is often sufficient. As your dataset grows larger, you can create vector indexes on your vector columns to accelerate search. See the indexing documentation for details on how to create and use vector indexes in LanceDB.What’s next?
Now that you’ve learned the basics of creating tables, adding data, running vector search, and modifying table schemas, you’re ready to explore more advanced features of LanceDB. Below are some suggested next pages.Ingesting data
Learn the different approaches to creating and ingesting data into LanceDB tables from various sources.
Update and modify tables
Learn how to update and modify existing LanceDB tables and their data.
Schema & data evolution
Understand how to evolve your table schemas and data over time with LanceDB.
Versioning and time travel
Explore LanceDB’s built-in table versioning and time travel capabilities.