If you are using a remote Ray cluster, you will need to have the notebook or script that code is packaged on running the same CPU architecture / OS. By default, Ray clusters are run in Linux. If you host a Jupyter service on a Mac, Geneva will attempt to deploy Mac shared libraries to a Linux cluster and result in Module not found errors. You can instead use a hosted Jupyter notebook, or host your Jupyter or Python environment on a Linux VM or container.
Ray Auto Connect
To execute jobs without an external Ray cluster, you can just trigger theTable.backfill method. This will auto-create a local Ray cluster and is only suitable for prototyping on small datasets.
Existing Ray Cluster
Geneva can execute jobs against an existing Ray cluster. You can define a RayCluster by specifying the address of the cluster and packages needed on your workers. This approach makes it easy to tailor resource requirements to your particular UDFs. You can then wrap your table backfill call with the RayCluster context.Note: If your Ray cluster is managed by KubeRay, you’ll need to setup kubectl port forwarding setup so Geneva can connect.For more interactive usage, you can use this pattern:
Ray on Kubernetes
Geneva uses KubeRay to deploy Ray on Kubernetes. You can define a RayCluster by specifying the pod name, the Kubernetes namespace, credentials to use for deploying Ray, and characteristics of your workers. This approach makes it easy to tailor resource requirements to your particular UDFs. You can then wrap your table backfill call with the RayCluster context.Persistent Contexts
Geneva Execution Contexts can be reused and shared with team members using persistent Clusters and Manifests.Define a Cluster
A Geneva Cluster represents the compute resources used for distributed execution. Callingdefine_cluster() stores the Cluster metadata in persistent storage. The Cluster can then be referenced by name amd provisioned when creating an Execution Context.
Define a Manifest
A Geneva Manifest represents the files and dependencies used in the execution environment. Callingdefine_manifest() packages files in the local environment and stores the Manifest metadata and files in persistent storage.
The Manifest can then be referenced by name when creating an Execution Context. Persistent Manifests allow for deterministic execution environments that can be shared and reused.
Create an Execution Context
An Execution Context represents the concrete execution environment used to execute a distributed Job. Callingcontext will enter a context manager that will provision an execution cluster and execute the Job using the Cluster and Manifest definitions provided. Once completed, the context manager will automatically de-provision the cluster.