| Provider | Embeddings | Generation | Runs locally | Install extra |
|---|---|---|---|---|
| OpenAI | ✓ | ✓ | — | geneva[udf-text-openai] |
| Gemini | ✓ | ✓ | — | geneva[udf-text-gemini] |
| Sentence Transformers | ✓ | — | ✓ | geneva[udf-text-sentence-transformers] |
Comparing models and prompts
Becauseadd_columns accepts a dictionary, you can evaluate multiple models, parameter
settings, or prompts in a single pass over your data. Each entry produces its own column,
so results sit side by side in the same table for easy comparison.
backfill:
What’s included
All built-in UDFs share these capabilities:- API key handling — Keys are captured from your local environment at UDF creation time and serialized with the UDF. No cluster-level environment configuration required.
- Retry with backoff — Transient API errors (rate limits, timeouts, server errors) are automatically retried with exponential backoff.
- Batch processing — Embedding UDFs batch multiple rows per API call for better throughput.
- L2 normalization — Embedding UDFs support optional L2 normalization via the
normalizeparameter (disabled by default since both providers return pre-normalized vectors).
See also
- Working with UDFs — Write custom scalar, batched, and stateful UDFs
- Error handling — Fine-grained retry and skip policies
- Working with blobs — Process binary data (images, audio, video)