Strong hands-on experience supporting and tuning job scheduling systems (LSF, Slurm, etc.) in HPC or silicon design environments... reliability engineering practices within HPC scheduling environments Deep knowledge of job scheduling systems (LSF, Slurm...
critical services. Experience supporting large‑scale HPC clusters using Slurm, LSF or Kubernetes clusters, including setup.... We’re looking for a Senior SRE to join our Compute Farm team and help build the next generation of our global services...