The full-text search allows you to incorporate keyword-based search (based on BM25) in your retrieval solutions. LanceDB can deliver 26ms query latency for full-text search. Our benchmark tests have more details.

import lancedb

# connect to LanceDB
db = lancedb.connect(
  uri="db://your-project-slug",
  api_key="your-api-key",
  region="us-east-1"
)

# let's use the table created from quickstart
table_name = "lancedb-cloud-quickstart"
table = db.open_table(table_name)

table.create_fts_index("text")
# Wait for FTS index to be ready
fts_index_name = f"{text_column}_idx"
wait_for_index(table, fts_index_name)

query_text = "football"
fts_results = table.search(query_text, query_type="fts").select(["text", "keywords", "label"]).limit(5).to_pandas()

Newly added or modified records become searchable immediately. The full-text search (FTS) index updates automatically in the background, ensuring continuous search availability without blocking queries.

Advanced Search Features

Fuzzy search allows you to find matches even when the search terms contain typos or slight variations. LanceDB uses the classic Levenshtein distance to find similar terms within a specified edit distance.

ParameterTypeDefaultDescription
fuzzinessint0Maximum edit distance allowed for each term. If not specified, automatically set based on term length: 0 for length ≤ 2, 1 for length ≤ 5, 2 for length > 5
max_expansionsint50Maximum number of terms to consider for fuzzy matching. Higher values may improve recall but increase search time

Let’s create a sample table and build full-text search indices to demonstrate fuzzy search capabilities and relevance boosting features.

import lancedb

# connect to LanceDB
db = lancedb.connect(
  uri="db://your-project-slug",
  api_key="your-api-key",
  region="us-east-1"
)

table_name = "fts-fuzzy-boosting-test"
vectors = [np.random.randn(128) for _ in range(100)]
text_nouns = ("puppy", "car")
text2_nouns = ("rabbit", "girl", "monkey")
verbs = ("runs", "hits", "jumps", "drives", "barfs")
adv = ("crazily.", "dutifully.", "foolishly.", "merrily.", "occasionally.")
adj = ("adorable", "clueless", "dirty", "odd", "stupid")
text = [
    " ".join(
        [
            text_nouns[random.randrange(0, len(text_nouns))],
            verbs[random.randrange(0, 5)],
            adv[random.randrange(0, 5)],
            adj[random.randrange(0, 5)],
        ]
    )
    for _ in range(100)
]
text2 = [
    " ".join(
        [
            text2_nouns[random.randrange(0, len(text2_nouns))],
            verbs[random.randrange(0, 5)],
            adv[random.randrange(0, 5)],
            adj[random.randrange(0, 5)],
        ]
    )
    for _ in range(100)
]
count = [random.randint(1, 10000) for _ in range(100)]

table = db.create_table(
    table_name,
    data=pd.DataFrame(
        {
            "vector": vectors,
            "id": [i % 2 for i in range(100)],
            "text": text,
            "text2": text2,
            "count": count,
        }
    ),
    mode="overwrite"
)

table.create_fts_index("text")
wait_for_index(table, "text_idx")
table.create_fts_index("text2")
wait_for_index(table, "text2_idx")

To demonstrate fuzzy search’s ability to handle typos and misspellings, let’s perform a search with a deliberately misspelled word. The search engine will attempt to match similar terms within the specified edit distance.

from lancedb.query import MatchQuery

print("\n=== Match Query Examples ===")
# Basic match
print("\n1. Basic Match Query for 'crazily':")
basic_match_results = (
    table.search(MatchQuery("crazily", "text"), query_type="fts")
    .select(["id", "text"])
    .limit(100)
    .to_pandas()
)

# Fuzzy match (allows typos)
print("\n2. Fuzzy Match Query for 'crazi~1' (with typo):")
fuzzy_results = (
    table.search(MatchQuery("crazi~1", "text", fuzziness=2), query_type="fts")
    .select(["id", "text"])
    .limit(100)
    .to_pandas()
)

Phrase Match

Phrase matching enables you to search for exact sequences of words. Unlike regular text search which matches individual terms independently, phrase matching requires words to appear in the specified order with no intervening terms. This is particularly useful when:

  • Searching for specific multi-word expressions
  • Matching exact titles or quotes
  • Finding precise word combinations in a specific order
# Exact phrase match
from lancedb.query import PhraseQuery

print("\n1. Exact phrase match for 'puppy runs':")
phrase_results = (
    table.search(PhraseQuery("puppy runs", "text"), query_type="fts")
    .select(["id", "text"])
    .limit(100)
    .to_pandas()
)

Search with Boosting

Boosting allows you to control the relative importance of different search terms or fields in your queries. This feature is particularly useful when you need to:

  • Prioritize matches in certain columns
  • Promote specific terms while demoting others
  • Fine-tune relevance scoring for better search results
ParameterTypeDefaultDescription
positiveQueryrequiredThe primary query terms to match and promote in results
negativeQueryrequiredTerms to demote in the search results
negative_boostfloat0.5Multiplier for negative matches (lower values = stronger demotion)
from lancedb.query import MatchQuery, BoostQuery, MultiMatchQuery

# Boost data with 'runs' in text more than 'puppy' in text
print("\n2. Boosting data with 'runs' in text:")
boosting_results = (
  table.search(
      BoostQuery(
          MatchQuery("runs", "text"),
          MatchQuery("puppy", "text"),
          negative_boost=0.2,
      ),
      query_type="fts",
  )
  .select(["id", "text"])
  .limit(100)
  .to_pandas()
)

"""Test searching across multiple fields."""
print("\n=== Multi Match Query Examples ===")
# Search across both text and text2
print("\n1. Searching 'crazily' in both text and text2:")
multi_match_results = (
    table.search(MultiMatchQuery("crazily", ["text", "text2"]), query_type="fts")
    .select(["id", "text", "text2"])
    .limit(100)
    .to_pandas()
)

# Search with field boosting
print("\n2. Searching with boosted text2 field:")
multi_match_boosting_results = (
    table.search(
        MultiMatchQuery("crazily", ["text", "text2"], boosts=[1.0, 2.0]),
        query_type="fts",
    )
    .select(["id", "text", "text2"])
    .limit(100)
    .to_pandas()
)
  • Use fuzzy search when handling user input that may contain typos or variations
  • Apply field boosting to prioritize matches in more important columns
  • Combine fuzzy search with boosting for robust and precise search results