, manage, and grow a team of senior and principal engineers responsible for fabric operations, setting clear expectations... scale AI supercomputing environments, ensuring sustained GPU availability, training stability, and SLA compliance Lead...
. As a Principal Supercomputing Operations Engineer, you serve as the technical authority and strategic owner for interconnect fabric... operations across flagship AI supercomputing environments. You treat InfiniBand and GPU interconnect fabrics as a single end...