Skip to main content
Geneva provides three types of user-defined functions for transforming data. Each type has a different input/output cardinality and is suited to different workflows.

Choosing the Right Type

  • Adding a column to each row? Use a UDF.
  • Splitting each row into multiple rows? Use a Scalar UDTF.
  • Computing across rows with a different output shape? Use a Batch UDTF.

At a Glance

UDFScalar UDTFBatch UDTF
Cardinality1:11:NN:M
Decorator@udf@scalar_udtf@udtf
RefreshIncrementalIncrementalFull
ParallelismFragment-parallelFragment-parallelPartition-parallel
Inherited columnsN/A — adds to existing rowsAutomatic from queryIndependent output schema
Registrationtable.add_columns()db.create_materialized_view(udtf=)db.create_udtf_view()

UDFs (1:1)

Standard UDFs produce exactly one output value per input row. Use them to add computed columns to existing tables or materialized views.
idtextembedding
1”hello world”→ [0.12, 0.34, …]
2”foo bar”→ [0.56, 0.78, …]
3”baz qux”→ [0.90, 0.11, …]
Each input row produces exactly one output value. The new column is added to the same table. Use cases: Embeddings, data enrichment, format conversion, scoring. See UDFs for the full guide.

Scalar UDTFs (1:N)

Scalar UDTFs expand each source row into multiple output rows. The output is a materialized view that inherits parent columns and supports incremental refresh. Source: documents
doc_idtitletext
1”Intro to AI""Machine learning is…“
2”Data Guide""Data pipelines are…”
Derived: chunks (1:N expansion via @scalar_udtf)
doc_idtitlechunk_indexchunk_text
1”Intro to AI”0”Machine learning…“
1”Intro to AI”1”Neural networks…“
1”Intro to AI”2”Training data…“
2”Data Guide”0”Data pipelines…“
2”Data Guide”1”ETL processes…”
Each source row produces one or more output rows. Parent columns (doc_id, title) are inherited automatically. Use cases: Document chunking, video segmentation, image tiling. See Scalar UDTFs for the full guide.

Batch UDTFs (N:M)

Batch UDTFs read from a source table (or partition) and produce output with an arbitrary schema and row count. They always perform a full refresh. Source: sales
productregionamount
WidgetEast100
WidgetEast250
WidgetWest175
GadgetEast300
GadgetWest400
GadgetWest150
Derived: sales_summary (N:M aggregation via @udtf)
producttotal_amountavg_amountnum_sales
Widget525175.03
Gadget850283.33
6 input rows become 2 output rows with a completely different schema. The output shape is determined entirely by the UDTF logic — it could be fewer rows (aggregation), more rows (clustering), or the same count with different columns. Use cases: Deduplication, clustering, aggregation, cross-row joins. See Batch UDTFs for the full guide.