> ## Documentation Index
> Fetch the complete documentation index at: https://docs.lancedb.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Blob Types in Geneva UDFs

> Learn how to work with Lance Blobs in Geneva UDFs for handling large binary objects efficiently with lazy reading capabilities.

Geneva supports UDFs that take [Lance Blobs](https://lancedb.github.io/lance/guide/blob/) (large binary objects) as input and has the ability to write out columns with binaries encoded as Lance Blobs.  Lance blobs are an optimization intended for large objects (1's MBs -> 100MB's) and provide a file-like object that lazily reads large binary objects.

## Reading Blobs

Defining functions that read blob columns is straight forward.

For scalar UDFs, blob columns are expected to be of type `BlobFile`

<CodeGroup>
  ```python Python icon="python" theme={"theme":{"light":"vitesse-light","dark":"catppuccin-mocha"}}
  from lance.blob import BlobFile

  @udf
  def work_on_udf(blob: BlobFile) -> int:
      assert isinstance(blob, BlobFile)
      data = blob.read()
      # do something intresting.

      return len(data)
  ```
</CodeGroup>

## Writing Blobs

Defining UDFs that write out `Blob`s to a new column is straightforward.   Here we add the standard metadata annotation to the UDF so that Geneva knows to write out Blobs.

For scalar udfs, your udf will return `bytes`, explicitly set the `data_type` to `pa.large_binary()`, and add the `field_metadata` that specifies blob encoding.

<CodeGroup>
  ```python Python icon="python" theme={"theme":{"light":"vitesse-light","dark":"catppuccin-mocha"}}
  @udf(data_type=pa.large_binary(), field_metadata={"lance-encoding:blob": "true"})
  def generate_blob(text: str, multiplier: int) -> bytes:
      """UDF that generates blob data by repeating text."""
      return (text * multiplier).encode("utf-8")
  ```
</CodeGroup>

For `pa.RecordBatch` batched UDFs you the effort is similar:

<CodeGroup>
  ```python Python icon="python" theme={"theme":{"light":"vitesse-light","dark":"catppuccin-mocha"}}
  @udf(data_type=pa.large_binary(), field_metadata={"lance-encoding:blob": "true"})
  def batch_to_blob(batch: pa.RecordBatch) -> pa.Array:
      """UDF that converts RecordBatch rows to blob data."""
      import json

      blobs = []
      for i in range(batch.num_rows):
          # do something that returns bytes
          blob_data = ...
          blobs.append(blob_data)
      return pa.array(blobs, type=pa.large_binary())
  ```
</CodeGroup>

## API Reference

* [UDF](https://lancedb.github.io/geneva/api/udf/) — `@udf` decorator for defining blob-processing functions
* [Table](https://lancedb.github.io/geneva/api/table/) — `add_columns()`, `backfill()`
