Understanding Transforms

Geneva provides three types of user-defined functions for transforming data. Each type has a different input/output cardinality and is suited to different workflows.

Choosing the Right Type

Adding a column to each row? Use a UDF.
Splitting each row into multiple rows? Use a Scalar UDTF.
Computing across rows with a different output shape? Use a Batch UDTF.

At a Glance

	UDF	Scalar UDTF	Batch UDTF
Cardinality	1:1	1:N	N:M
Decorator	`@udf`	`@scalar_udtf`	`@udtf`
Refresh	Incremental	Incremental	Full
Parallelism	Fragment-parallel	Fragment-parallel	Partition-parallel
Inherited columns	N/A — adds to existing rows	Automatic from query	Independent output schema
Registration	`table.add_columns()`	`db.create_materialized_view(udtf=)`	`db.create_udtf_view()`

UDFs (1:1)

Standard UDFs produce exactly one output value per input row. Use them to add computed columns to existing tables or materialized views.

id	text	embedding
1	”hello world”	→ [0.12, 0.34, …]
2	”foo bar”	→ [0.56, 0.78, …]
3	”baz qux”	→ [0.90, 0.11, …]

Each input row produces exactly one output value. The new column is added to the same table. Use cases: Embeddings, data enrichment, format conversion, scoring. See UDFs for the full guide.

Scalar UDTFs (1:N)

Scalar UDTFs expand each source row into multiple output rows. The output is a materialized view that inherits parent columns and supports incremental refresh. Source: documents

doc_id	title	text
1	”Intro to AI"	"Machine learning is…“
2	”Data Guide"	"Data pipelines are…”

Derived: chunks (1:N expansion via @scalar_udtf)

doc_id	title	chunk_index	chunk_text
1	”Intro to AI”	0	”Machine learning…“
1	”Intro to AI”	1	”Neural networks…“
1	”Intro to AI”	2	”Training data…“

2	”Data Guide”	0	”Data pipelines…“
2	”Data Guide”	1	”ETL processes…”

Each source row produces one or more output rows. Parent columns (doc_id, title) are inherited automatically. Use cases: Document chunking, video segmentation, image tiling. See Scalar UDTFs for the full guide.

Batch UDTFs (N:M)

Batch UDTFs read from a source table (or partition) and produce output with an arbitrary schema and row count. They always perform a full refresh. Source: sales

product	region	amount
Widget	East	100
Widget	East	250
Widget	West	175
Gadget	East	300
Gadget	West	400
Gadget	West	150

Derived: sales_summary (N:M aggregation via @udtf)

product	total_amount	avg_amount	num_sales
Widget	525	175.0	3
Gadget	850	283.3	3

6 input rows become 2 output rows with a completely different schema. The output shape is determined entirely by the UDTF logic — it could be fewer rows (aggregation), more rows (clustering), or the same count with different columns. Use cases: Deduplication, clustering, aggregation, cross-row joins. See Batch UDTFs for the full guide.

Get started

Guides

Feature Engineering (Geneva)

Support

Understanding Transforms

Choosing the Right Type

At a Glance

UDFs (1:1)

Scalar UDTFs (1:N)

Batch UDTFs (N:M)

Get started

Guides

Feature Engineering (Geneva)

Support

​Choosing the Right Type

​At a Glance

​UDFs (1:1)

​Scalar UDTFs (1:N)

​Batch UDTFs (N:M)

Choosing the Right Type

At a Glance

UDFs (1:1)

Scalar UDTFs (1:N)

Batch UDTFs (N:M)