Basic Usage
Consider that we have a LanceDB table namedmy_table, whose string column text we want to index and query via keyword search, the FTS index must be created before you can search via keywords.
Table Setup
First, open or create the table you want to search:Construct FTS Index
Create a full-text search index on your text column:Full-text Search
Perform full-text search and retrieve results:fts_columns="text"
LanceDB automatically searches on the existing FTS index if the input to the search is of type
str. If you provide a vector as input, LanceDB will search the ANN index instead.Advanced Usage
Tokenize Table Data
By default, the text is tokenized by splitting on punctuation and whitespaces, and would filter out words that are longer than 40 characters. All words are converted to lowercase. Stemming is useful for improving search results by reducing words to their root form, e.g. “running” to “run”. LanceDB supports stemming for multiple languages. You should set thebase_tokenizer parameter rather than tokenizer_name because you cannot customize the tokenizer if tokenizer_name is specified.
For example, to enable stemming for English:
base_tokenizer:"simple"language: Englishwith_position: falsemax_token_length: 40lower_case: truestem: trueremove_stop_words: trueascii_folding: true
ascii_folding to remove accents, e.g. ‘é’ to ‘e’:
Filtering Options
LanceDB full text search supports to filter the search results by a condition, both pre-filtering and post-filtering are supported. This can be invoked via the familiarwhere syntax.
With pre-filtering:
Phrase vs. Terms Queries
For full-text search you can specify either a phrase query like"the old man and the sea",
or a terms search query like old man sea.
To search for a phrase, the index must be created with with_position=True and remove_stop_words=False:
Fuzzy Search
Fuzzy search allows you to find matches even when the search terms contain typos or slight variations. LanceDB uses the classic Levenshtein distance to find similar terms within a specified edit distance.| Parameter | Type | Default | Description |
|---|---|---|---|
| fuzziness | int | 0 | Maximum edit distance allowed for each term. If not specified, automatically set based on term length: 0 for length ≤ 2, 1 for length ≤ 5, 2 for length > 5 |
| max_expansions | int | 50 | Maximum number of terms to consider for fuzzy matching. Higher values may improve recall but increase search time |
Search for Substring
LanceDB supports searching for substrings in the text column, you can set thebase_tokenizer parameter to "ngram" to enable this feature, and use the parameters ngram_min_length and ngram_max_length to control the length of the substrings:
| Parameter | Type | Default | Description |
|---|---|---|---|
| ngram_min_length | int | 3 | Minimum length of the n-grams to search for |
| ngram_max_length | int | 3 | Maximum length of the n-grams to search for |
| prefix_only | bool | false | Whether to only search for prefixes of the n-grams |
Example: Fuzzy Search
Generate Data
First, let’s create a table with sample text data for testing fuzzy search:Create Table
Construct FTS Index
Create a full-text search index on the first text column:Basic and Fuzzy Search
Now we can perform basic, fuzzy, and prefix match searches:Basic Exact Search
Fuzzy Search with Typos
Prefix based Match
Prefix-based match allows you to search for documents containing words that start with a specific prefix.Phrase Match
Phrase matching enables you to search for exact sequences of words. Unlike regular text search which matches individual terms independently, phrase matching requires words to appear in the specified order with no intervening terms. Phrase matching is particularly useful for:- Searching for specific multi-word expressions
- Matching exact titles or quotes
- Finding precise word combinations in a specific order
Flexible Phrase Match
To provide more flexible phrase matching, LanceDB supports theslop parameter. This allows you to match phrases where the terms appear close to each other, even if they are not directly adjacent or in the exact order, as long as they are within the specified slop value.
For example, the phrase query “puppy merrily” would not return any results by default. However, if you set slop=1, it will match phrases like “puppy jumps merrily”, “puppy runs merrily”, and similar variations where one word appears between “puppy” and “merrily”.
Search with Boosting
Boosting allows you to control the relative importance of different search terms or fields in your queries. This feature is particularly useful when you need to:- Prioritize matches in certain columns
- Promote specific terms while demoting others
- Fine-tune relevance scoring for better search results
| Parameter | Type | Default | Description |
|---|---|---|---|
| positive | Query | required | The primary query terms to match and promote in results |
| negative | Query | required | Terms to demote in the search results |
| negative_boost | float | 0.5 | Multiplier for negative matches (lower values = stronger demotion) |
Best practices
- Use fuzzy search when handling user input that may contain typos or variations
- Apply field boosting to prioritize matches in more important columns
- Combine fuzzy search with boosting for robust and precise search results
- Create full-text search indices on text columns that will be frequently searched
- For hybrid search combining text and vectors, see our hybrid search guide
- For performance benchmarks, check our benchmark results
- For complex queries, use SQL to combine FTS with other filter conditions
Boolean Queries
LanceDB supports boolean logic in full-text search, allowing you to combine multiple queries usingand and or operators. This is useful when you want to match documents that satisfy multiple conditions (intersection) or at least one of several conditions (union).
Combining Two Match Queries
In Python, you can combine two MatchQuery objects using either theand function or the & operator (e.g., MatchQuery("puppy", "text") and MatchQuery("merrily", "text")); both methods are supported and yield the same result. Similarly, you can use either the or function or the | operator to perform an or query.
In TypeScript, boolean queries are constructed using the BooleanQuery class with a list of [Occur, subquery] pairs. For example, to perform an AND query:
SQL
Must, Should, or MustNot).
Which queries are allowed?A boolean query must include at least one
SHOULD or MUST clause. Queries that contain only a MUST_NOT clause are not allowed.How to use booleans?
- Use
and/&(Python),Occur.Must(Typescript) for intersection (documents must match all queries). - Use
or/|(Python),Occur.Should(Typescript) for union (documents must match at least one query).