across hyperscale accelerator clusters. This role sits at the intersection of distributed systems, kernel-level performance engineering... strategy, kernel optimization, and end-to-end system performance. This is a hands-on role with direct impact on training...
intersection of distributed systems, kernel-level performance engineering, and large-scale model training. The ideal candidate can..., distributed training frameworks, or performance-critical kernels. Prior experience collaborating directly with hardware vendors...
-on experience with data and model pipelines (feature stores, registries, distributed training, inference scaling). Knowledge...). Experience optimizing GPU clusters, distributed training, or HPC environments. Familiarity with graph databases (e.g., Neo4j...