Berkeley Lab's ( ) Information Technology Division ( ) has an opening for a Senior HPC Cluster Systems Administrator... resources, high-performance computing cluster systems, and Kubernetes clusters. This role provides extensive expertise in High...
. As a Senior Cluster Site Reliability Engineer (SRE), you will help scale our research compute cluster to meet our growing needs... of HPC/batch compute frameworks (Slurm, Kueue, AWS/GCP Batch) and/or machine learning training systems (Kubeflow, MLflow...