Get started with LanceDB Cloud

In this tutorial, you’ll ingest a dataset from Huggingface into your LanceDB Cloud table, connect to a remote LanceDB cluster and run some search queries. For interactive code, check out the Python notebook or the TypeScript example

Getting started

Sign up for LanceDB Cloud by clicking here.
Follow this tutorial to create a LanceDB Cloud project.

1. Installation

pip install lancedb datasets

2. Connect to LanceDB

For LanceDB Cloud users, the database URI (which starts with db://) and API key can both be retrieved from the LanceDB Cloud UI.
For LanceDB Enterprise users, please contact us to obtain your database URI, API key, and host_override URL.

import lancedb
import numpy as np
import pyarrow as pa
import os

# Connect to LanceDB Enterprise
uri = "db://your-database-uri"
api_key = "your-api-key"
region = "us-east-1"

# (Optional) For LanceDB Enterprise, set the host override to your enterprise endpoint
host_override = os.environ.get("LANCEDB_HOST_OVERRIDE")

db = lancedb.connect(
    uri=uri,
    api_key=api_key,
    region=region,
    host_override=host_override
)

3. Load Dataset

For large datasets, the operation should be performed in batches to optimize memory usage. Let’s see how it looks when we try to load a larger dataset.

from datasets import load_dataset

# Load a sample dataset from HuggingFace with pre-computed embeddings
sample_dataset = load_dataset("sunhaozhepy/ag_news_sbert_keywords_embeddings", split="test[:1000]")
print(f"Loaded {len(sample_dataset)} samples")
print(f"Sample features: {sample_dataset.features}")
print(f"Column names: {sample_dataset.column_names}")

# Preview the first sample
print(sample_dataset[0])

# Get embedding dimension
vector_dim = len(sample_dataset[0]["keywords_embeddings"])
print(f"Embedding dimension: {vector_dim}")

4. Ingest Data

import pyarrow as pa

# Create a table with the dataset
table_name = "lancedb-cloud-quickstart"
table = db.create_table(table_name, data=sample_dataset, mode="overwrite")

# Convert list to fixedsizelist on the vector column
table.alter_columns(dict(path="keywords_embeddings", data_type=pa.list_(pa.float32(), vector_dim)))
print(f"Table '{table_name}' created successfully")

5. Build an Index

After creating a table with vector data, you’ll want to create an index to enable fast similarity searches. The index creation process optimizes the data structure for efficient vector similarity lookups, significantly improving query performance for large datasets.

Unlike in LanceDB OSS, the create_index/createIndex operation executes asynchronously in LanceDB Enterprise. To ensure the index is fully built, you can use the wait_timeout parameter or call wait_for_index on the table.

from datetime import timedelta

# Create a vector index and wait for it to complete
table.create_index("cosine", vector_column_name="keywords_embeddings", wait_timeout=timedelta(seconds=120))
print(table.index_stats("keywords_embeddings_idx"))

6. Vector Search

Once you have created and indexed your table, you can perform vector similarity searches. LanceDB provides a flexible search API that allows you to find similar vectors, apply filters, and select specific columns to return. The examples below demonstrate basic vector searches as well as filtered searches that combine vector similarity with traditional SQL-style filtering.

query_dataset = load_dataset("sunhaozhepy/ag_news_sbert_keywords_embeddings", split="test[5000:5001]")
print(f"Query keywords: {query_dataset[0]['keywords']}")
query_embed = query_dataset["keywords_embeddings"][0]

# A vector search
result = (
    table.search(query_embed)
    .select(["text", "keywords", "label"])
    .limit(5)
    .to_pandas()
)
print("Search results:")
print(result)

7. Filtered Search

Add filter to your vector search query. Your can use SQL statements, like where for filtering.

filtered_result = (
    table.search(query_embed)
    .where("label > 2")
    .select(["text", "keywords", "label"])
    .limit(5)
    .to_pandas()
)
print("Filtered search results (label > 2):")
print(filtered_result)

What’s Next?

It’s time to use LanceDB Enterprise in your own projects! We’ve prepared more tutorials for you to continue learning. If you have any questions, reach out via Discord.

Get started

Guides

Feature Engineering (Geneva)

Support

Get started with LanceDB Cloud

Getting started

1. Installation

2. Connect to LanceDB

3. Load Dataset

4. Ingest Data

5. Build an Index

6. Vector Search

7. Filtered Search

What’s Next?

Get started

Guides

Feature Engineering (Geneva)

Support

​Getting started

​1. Installation

​2. Connect to LanceDB

​3. Load Dataset

​4. Ingest Data

​5. Build an Index

​6. Vector Search

​7. Filtered Search

​What’s Next?

Getting started

1. Installation

2. Connect to LanceDB

3. Load Dataset

4. Ingest Data

5. Build an Index

6. Vector Search

7. Filtered Search

What’s Next?