Documentation Index
Fetch the complete documentation index at: https://docs.lancedb.com/llms.txt
Use this file to discover all available pages before exploring further.
Geneva provides three types of user-defined functions for transforming data. Each type has a different input/output cardinality and is suited to different workflows.
Choosing the Right Type
- Adding a column to each row? Use a UDF.
- Splitting each row into multiple rows? Use a Scalar UDTF.
- Computing across rows with a different output shape? Use a Batch UDTF.
At a Glance
| UDF | Scalar UDTF | Batch UDTF |
|---|
| Cardinality | 1:1 | 1:N | N:M |
| Decorator | @udf | @scalar_udtf | @udtf |
| Refresh | Incremental | Incremental | Full |
| Parallelism | Fragment-parallel | Fragment-parallel | Partition-parallel |
| Inherited columns | N/A — adds to existing rows | Automatic from query | Independent output schema |
| Registration | table.add_columns() | db.create_scalar_udtf_view() | db.create_udtf_view() |
UDFs (1:1)
Standard UDFs produce exactly one output value per input row. Use them to add computed columns to existing tables or materialized views.
| id | text | embedding |
|---|
| 1 | ”hello world” | → [0.12, 0.34, …] |
| 2 | ”foo bar” | → [0.56, 0.78, …] |
| 3 | ”baz qux” | → [0.90, 0.11, …] |
Each input row produces exactly one output value. The new column is added to the same table.
Use cases: Embeddings, data enrichment, format conversion, scoring.
See UDFs for the full guide.
Scalar UDTFs (1:N)
Scalar UDTFs expand each source row into multiple output rows. The output is a materialized view that inherits parent columns and supports incremental refresh.
Source: documents
| doc_id | title | text |
|---|
| 1 | ”Intro to AI" | "Machine learning is…“ |
| 2 | ”Data Guide" | "Data pipelines are…” |
Derived: chunks (1:N expansion via @scalar_udtf)
| doc_id | title | chunk_index | chunk_text |
|---|
| 1 | ”Intro to AI” | 0 | ”Machine learning…“ |
| 1 | ”Intro to AI” | 1 | ”Neural networks…“ |
| 1 | ”Intro to AI” | 2 | ”Training data…“ |
| | | |
| 2 | ”Data Guide” | 0 | ”Data pipelines…“ |
| 2 | ”Data Guide” | 1 | ”ETL processes…” |
Each source row produces one or more output rows. Parent columns (doc_id, title) are inherited automatically.
Use cases: Document chunking, video segmentation, image tiling.
See Scalar UDTFs for the full guide.
Batch UDTFs (N:M)
Batch UDTFs read from a source table (or partition) and produce output with an arbitrary schema and row count. They always perform a full refresh.
Source: sales
| product | region | amount |
|---|
| Widget | East | 100 |
| Widget | East | 250 |
| Widget | West | 175 |
| Gadget | East | 300 |
| Gadget | West | 400 |
| Gadget | West | 150 |
Derived: sales_summary (N:M aggregation via @udtf)
| product | total_amount | avg_amount | num_sales |
|---|
| Widget | 525 | 175.0 | 3 |
| Gadget | 850 | 283.3 | 3 |
6 input rows become 2 output rows with a completely different schema. The output shape is determined entirely by the UDTF logic — it could be fewer rows (aggregation), more rows (clustering), or the same count with different columns.
Use cases: Deduplication, clustering, aggregation, cross-row joins.
See Batch UDTFs for the full guide.
API Reference
- UDF —
@udf decorator and UDF class
- UDTF —
@udtf, @scalar_udtf, @batch_udtf decorators and UDTF/ScalarUDTF classes
- Table —
add_columns(), backfill()
- Connection —
create_udtf_view(), create_scalar_udtf_view()