Embed text using any HuggingFace Sentence Transformer model locally — no API key needed.
See the API reference for all parameters.
pip install 'geneva[udf-text-sentence-transformers]'
Sentence Transformer models run locally on your workers — there are no API calls and no
per-token costs. This makes them a good fit for large-scale embedding jobs where cost is a
concern.
Embeddings
Compare a lightweight and a high-quality model side by side:
from geneva.udfs import sentence_transformer_udf
table.add_columns({
# Lightweight default model — fast, CPU-friendly
"embedding_mini": sentence_transformer_udf(
column="body",
model="sentence-transformers/all-MiniLM-L6-v2",
),
# Larger model with GPU acceleration
"embedding_bge": sentence_transformer_udf(
column="body",
model="BAAI/bge-large-en-v1.5",
num_gpus=1.0,
),
})
GPU acceleration
Sentence Transformer models can run on CPU or GPU. Smaller models like all-MiniLM-L6-v2
work well on CPU, but larger models like bge-large-en-v1.5 benefit significantly from GPU
acceleration. Use the num_gpus parameter to request GPU resources for a worker:
# CPU-only (default) — suitable for lightweight models
sentence_transformer_udf(column="body", model="sentence-transformers/all-MiniLM-L6-v2")
# GPU-accelerated — recommended for larger models
sentence_transformer_udf(column="body", model="BAAI/bge-large-en-v1.5", num_gpus=1.0)
Setting num_gpus to a fractional value (e.g., 0.5) tells the
Ray scheduler
to co-locate multiple workers on the same physical GPU. For example, two UDFs with
num_gpus=0.5 will be scheduled on a single GPU. Note that Ray does not enforce GPU memory
limits — it is your responsibility to ensure the combined models fit in GPU memory.