> ## Documentation Index
> Fetch the complete documentation index at: https://docs.lancedb.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Understanding Transforms

> Understand the three types of user-defined functions in Geneva — UDFs, scalar UDTFs, and batch UDTFs — and when to use each.

Geneva provides three types of user-defined functions for transforming data. Each type has a different input/output cardinality and is suited to different workflows.

## Choosing the Right Type

* **Adding a column to each row?** Use a [**UDF**](/geneva/udfs/udfs).
* **Splitting each row into multiple rows?** Use a [**Scalar UDTF**](/geneva/udfs/scalar-udtfs).
* **Computing across rows with a different output shape?** Use a [**Batch UDTF**](/geneva/udfs/batch-udtfs).

## At a Glance

|                       | UDF                                                                                                 | Scalar UDTF                                                                                                                     | Batch UDTF                                                                                                        |
| --------------------- | --------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------- |
| **Cardinality**       | 1:1                                                                                                 | 1:N                                                                                                                             | N:M                                                                                                               |
| **Decorator**         | `@udf`                                                                                              | `@scalar_udtf`                                                                                                                  | `@udtf`                                                                                                           |
| **Refresh**           | Incremental                                                                                         | Incremental                                                                                                                     | Full                                                                                                              |
| **Parallelism**       | Fragment-parallel                                                                                   | Fragment-parallel                                                                                                               | Partition-parallel                                                                                                |
| **Inherited columns** | N/A — adds to existing rows                                                                         | Automatic from query                                                                                                            | Independent output schema                                                                                         |
| **Registration**      | [`table.add_columns()`](https://lancedb.github.io/geneva/api/table/#geneva.table.Table.add_columns) | [`db.create_scalar_udtf_view()`](https://lancedb.github.io/geneva/api/connection/#geneva.db.Connection.create_scalar_udtf_view) | [`db.create_udtf_view()`](https://lancedb.github.io/geneva/api/connection/#geneva.db.Connection.create_udtf_view) |

## UDFs (1:1)

Standard UDFs produce exactly **one output value per input row**. Use them to add computed columns to existing tables or materialized views.

| id | text          | **embedding**            |
| -- | ------------- | ------------------------ |
| 1  | "hello world" | **→ \[0.12, 0.34, ...]** |
| 2  | "foo bar"     | **→ \[0.56, 0.78, ...]** |
| 3  | "baz qux"     | **→ \[0.90, 0.11, ...]** |

Each input row produces exactly one output value. The new column is added to the same table.

**Use cases**: Embeddings, data enrichment, format conversion, scoring.

See [UDFs](/geneva/udfs/udfs) for the full guide.

## Scalar UDTFs (1:N)

Scalar UDTFs **expand each source row into multiple output rows**. The output is a materialized view that inherits parent columns and supports incremental refresh.

**Source: `documents`**

| doc\_id | title         | text                     |
| ------- | ------------- | ------------------------ |
| 1       | "Intro to AI" | "Machine learning is..." |
| 2       | "Data Guide"  | "Data pipelines are..."  |

**Derived: `chunks`** (1:N expansion via `@scalar_udtf`)

| doc\_id | title         | chunk\_index | chunk\_text           |
| ------- | ------------- | ------------ | --------------------- |
| 1       | "Intro to AI" | 0            | "Machine learning..." |
| 1       | "Intro to AI" | 1            | "Neural networks..."  |
| 1       | "Intro to AI" | 2            | "Training data..."    |
|         |               |              |                       |
| 2       | "Data Guide"  | 0            | "Data pipelines..."   |
| 2       | "Data Guide"  | 1            | "ETL processes..."    |

Each source row produces **one or more** output rows. Parent columns (`doc_id`, `title`) are inherited automatically.

**Use cases**: Document chunking, video segmentation, image tiling.

See [Scalar UDTFs](/geneva/udfs/scalar-udtfs) for the full guide.

## Batch UDTFs (N:M)

Batch UDTFs read from a source table (or partition) and **produce output with an arbitrary schema and row count**. They always perform a full refresh.

**Source: `sales`**

| product | region | amount |
| ------- | ------ | ------ |
| Widget  | East   | 100    |
| Widget  | East   | 250    |
| Widget  | West   | 175    |
| Gadget  | East   | 300    |
| Gadget  | West   | 400    |
| Gadget  | West   | 150    |

**Derived: `sales_summary`** (N:M aggregation via `@udtf`)

| product | total\_amount | avg\_amount | num\_sales |
| ------- | ------------- | ----------- | ---------- |
| Widget  | 525           | 175.0       | 3          |
| Gadget  | 850           | 283.3       | 3          |

6 input rows become 2 output rows with a completely different schema. The output shape is determined entirely by the UDTF logic — it could be fewer rows (aggregation), more rows (clustering), or the same count with different columns.

**Use cases**: Deduplication, clustering, aggregation, cross-row joins.

See [Batch UDTFs](/geneva/udfs/batch-udtfs) for the full guide.

## API Reference

* [UDF](https://lancedb.github.io/geneva/api/udf/) — `@udf` decorator and `UDF` class
* [UDTF](https://lancedb.github.io/geneva/api/udtf/) — `@udtf`, `@scalar_udtf`, `@batch_udtf` decorators and `UDTF`/`ScalarUDTF` classes
* [Table](https://lancedb.github.io/geneva/api/table/) — `add_columns()`, `backfill()`
* [Connection](https://lancedb.github.io/geneva/api/connection/) — `create_udtf_view()`, `create_scalar_udtf_view()`
