Scalar User-Defined Table Functions (UDTFs)

Beta — introduced in Geneva 0.11.0 Standard UDFs produce exactly one output value per input row. Scalar UDTFs enable 1:N row expansion — each source row can produce multiple output rows. The results are stored as a materialized view with MV-style incremental refresh.

Source Table	Derived Table	Expansion
1 video row	→ N clip rows	Video segmentation
1 document row	→ N chunk rows	Text chunking
1 image row	→ N tile rows	Image tiling

Defining a Scalar UDTF

Use the @scalar_udtf decorator on a function that yields output rows. Geneva infers the output schema from the return type annotation. Input parameters are bound to source columns by name — the parameter video_path binds to source column video_path, just like standard UDFs.

A scalar UDTF can yield zero rows for a source row. The source row is still marked as processed and will not be retried on the next refresh.

List return pattern

If you prefer to build the full list in memory rather than yielding, you can return a list instead of an Iterator:

Batched scalar UDTF

For vectorized processing, use batch=True. The function receives Arrow arrays and returns a RecordBatch of expanded rows. Because the return type pa.RecordBatch cannot be inferred, you must supply output_schema explicitly:

Creating a Scalar UDTF View

Scalar UDTFs use the create_scalar_udtf_view API: The query parameter controls which source columns are inherited. Columns listed in .select() are carried into every child row automatically.

Inherited Columns

Child rows automatically include the parent’s columns — no manual join required. The columns available in the child table are determined by the query’s .select():

`videos` table (source)

video_path	duration	metadata
/v/a.mp4	120.0	{fps: 30}
/v/b.mp4	60.0	{fps: 24}

`clips` table (derived, 1:N)

video_path	metadata	clip_start	clip_end	clip_bytes
/v/a.mp4	{fps: 30}	0.0	10.0	b”\x00\x1a…”
/v/a.mp4	{fps: 30}	10.0	20.0	b”\x00\x2b…”
/v/a.mp4	{fps: 30}	20.0	30.0	b”\x00\x3c…”

/v/b.mp4	{fps: 24}	0.0	10.0	b”\x00\x4d…”
/v/b.mp4	{fps: 24}	10.0	20.0	b”\x00\x5e…”

The first three rows come from the /v/a.mp4 source row, the last two from /v/b.mp4. Inherited columns (video_path, metadata) are carried over automatically; clip_start, clip_end, and clip_bytes are generated by the UDTF.

Adding Computed Columns After Creation

Since scalar UDTF views are materialized views, you can add UDF-computed columns to the child table and backfill them: This is a powerful pattern: expand source rows with a scalar UDTF, then enrich the expanded rows with standard UDFs.

Incremental Refresh

Scalar UDTFs support incremental refresh, just like standard materialized views:

New source rows: The UDTF runs on new rows, inserting child rows.
Deleted source rows: Child rows linked to the deleted parent are cascade-deleted.
Updated source rows: Old children are deleted, UDTF re-runs, new children inserted.

Only the new source rows are processed. Existing clips from previous refreshes are untouched.

Chaining UDTF Views

Scalar UDTF views are standard materialized views, so they can serve as the source for further views:

Full Example: Document Chunking

For a comparison of all three function types (UDFs, Scalar UDTFs, Batch UDTFs), see Understanding Transforms. Reference:

​Defining a Scalar UDTF

​List return pattern

​Batched scalar UDTF

​Creating a Scalar UDTF View

​Inherited Columns

​videos table (source)

​clips table (derived, 1:N)

​Adding Computed Columns After Creation

​Incremental Refresh

​Chaining UDTF Views

​Full Example: Document Chunking

Defining a Scalar UDTF

List return pattern

Batched scalar UDTF

Creating a Scalar UDTF View

Inherited Columns

`videos` table (source)

`clips` table (derived, 1:N)

Adding Computed Columns After Creation

Incremental Refresh

Chaining UDTF Views

Full Example: Document Chunking