Scaling computation resoures
Geneva jobs can split and schedule computational work into smalller batches that are assigned to tasks which are distributed across the cluster. As each task completes, each writes its output into a checkpoint file. If a job is interurupted or run again, Geneva will look to see if a checkpoint for the computation is already present and if not will kick off computations. Usually computation capacity is the bottleneck for job execution. To complete all of a job’s tasks more quickly, you just need to increase the amount of CPU/GPU resources available.GKE node pools
GKE + kuberay can autoscale the amount of VM nodes on demand. Limitations on the amount of resources provisioned is configured via nodepools. Node pools can be managed to scale vertically (type of machine) or horizontally (# of nodes) Properly applying kubernetes labels to the nodepool machines allow you to control resources for different jobs in your cluster.Options on Table.backfill(..)
The Table.backfill(..) method has several optional arguments to tune performance. To saturate the CPUs in the cluster, the main arguments to change are concurrency which controls the number of task processes and intra_applier_concurrency which controls the number of task threads per task process.
commit_granularity controls how frequently fragments are committed so that partical results can be come visible to table readers.
Setting batch_size smaller introduces finer-grained checkpoints and can help provide more frequent proof of life as a job is being executed. This is useful if the computation on your data is expensive.
Reference: