, founded by MIT CSAIL researchers. We are searching for a CUDA Kernel Engineer who has hands-on experience developing... and optimizing NVIDIA CUDA kernels from scratch. You will work on the GPU performance layer powering large-scale, high-throughput...
become experiments and products). About the Role As a Training Performance Engineer, you'll drive efficiency improvements... engineering - analyzing GPU kernel performance, collective communication throughput, investigating I/O bottlenecks, and sharding...
validation. Collaborate asynchronously with CUDA or systems engineers who handle low-level kernel optimization. Profile.... Collaborate with CUDA optimization specialists to integrate and validate kernels. Projects may involve primitives used in state...
. You'll work across the stack - from low-level kernel performance to high-level distributed execution - and collaborate... using HIP, CUDA, or Triton, and care deeply about low-level performance. Are familiar with communication libraries...