training latency and maximize GPU utilization. Tune the performance of core operators using HIP/CUDA and low-level... and techniques like kernel overlap. Prior involvement in high-performance ML infrastructure projects, especially in pre-training...
custom requirements for AI SW performance and stability, including from POC requirement to POR release, from GPU kernel..., in both English and mandarin. Excellent in GPU kernel primitive like Attention (FA, PA, MLA, linear Attn etc.), MOE, TOPK design...
Job Category: Pre Sales Job Description: Position Overview As a GPU Specialist Cloud Engineer (CE) within the...-performance computing (HPC) and Artificial Intelligence infrastructure. You are not just a generalist; you are the bridge...
-Performance Kernel Development: Design, implement, and optimize high-performance GPU kernels for AI/ML workloads to maximize... in NVIDIA CUDA or AMD HIP kernel programming. Performance Engineering: Demonstrated ability to debug and profile complex GPU...
-Performance Kernel Development: Design, implement, and optimize high-performance GPU kernels for AI/ML workloads to maximize... in NVIDIA CUDA or AMD HIP kernel programming. Performance Engineering: Demonstrated ability to debug and profile complex GPU...
and maximize GPU utilization. Tune the performance of core operators using HIP/CUDA and low-level profiling tools. Integrate... your career. The TrainingAtScale team at AMD is looking for a Training Optimization Engineer to help build and optimize...
performance. THE PERSON: You are a strong GPU performance engineer with a solid understanding of algorithms, model... development of kernel agents—tools that accelerate kernel iteration and ultimately assist humans in achieving extreme GPU...
Design, develop, and maintain high-performance software in C/C++ and Python, including GPU programming with CUDA, ROCm... toolchains. Profile workloads end-to-end, identify bottlenecks, and implement kernel-level and system-level performance...
). Experience with build and binding ecosystems: CMake, pybind11, and CI/CD for GPU workloads. Performance Engineering: Mastery... of experience in systems programming, HPC, or GPU software development, featuring at least 5 years of hands-on CUDA/C++ kernel...
training latency and maximize GPU utilization. Tune the performance of core operators using HIP/CUDA and low-level profiling... your career. The Role: The TrainingAtScale team at AMD is looking for a Training Optimization Engineer to help build...