Embed text and generate completions using OpenAI models.
See the API reference for all parameters.
pip install 'geneva[udf-text-openai]'
OpenAI UDFs make API calls that incur per-token costs. Each row processed results in one
or more API requests billed to your account. Review
OpenAI pricing before running on large tables.
Set the OPENAI_API_KEY environment variable before calling any factory function below.
The key is read at UDF creation time and serialized with the UDF — no cluster-level
env_vars configuration is needed.
Embeddings
Compare models by adding multiple embedding columns at once:
from geneva.udfs import openai_embedding_udf
table.add_columns({
# Default model — fast, 1536 dimensions
"embedding_small": openai_embedding_udf(
column="body",
model="text-embedding-3-small",
),
# Higher-quality model — 3072 dimensions
"embedding_large": openai_embedding_udf(
column="body",
model="text-embedding-3-large",
),
# Same large model, truncated to 256 dimensions for storage efficiency
"embedding_large_256": openai_embedding_udf(
column="body",
model="text-embedding-3-large",
output_dimensionality=256,
),
})
Generation
Generate text from OpenAI chat completion models. Supports both text and binary (image)
input columns.
See the API reference for all parameters.
Add a summary and an image caption in one call, using different models:
from geneva.udfs import openai_udf
table.add_columns({
# Fast model for bulk text summarization
"summary": openai_udf(
column="body",
prompt="Summarize this document in 3 bullet points",
model="gpt-5-mini",
),
# More capable model for nuanced image captions
"caption": openai_udf(
column="image",
prompt="Provide a 1 sentence description of the scene",
model="gpt-5",
mime_type="image/jpeg",
),
})