Find your dream job NOW!

Click on Location links to filter by Job Title & Location.
Click on Company links to filter by Company & Location.
For exact match, enclose search terms in "double quotes".

Keywords: AI Training Engineer(Framework/Training Reliability/Stability), Location: Beijing

Page: 1

AI Training Engineer(Framework/Training Reliability/Stability)

cluster operations and automated remediation (health checks, drain/replace, topology-aware placement). Training stability... your career. Responsibilities Own reliability governance (standards, runbooks, SLIs/SLOs) and deliver KPI improvements...

Location: Beijing
Posted Date: 01 Feb 2026

AI Training Optimization Engineer

your career. The TrainingAtScale team at AMD is looking for a Training Optimization Engineer to help build and optimize... performance, stability, and scalability of distributed training systems. You will work closely with internal model and platform...

Location: Beijing
Posted Date: 07 Feb 2026

GPU Kernel Performance Engineer

your career. The Role: The TrainingAtScale team at AMD is looking for a Training Optimization Engineer to help build... the performance, stability, and scalability of distributed training systems. You will work closely with internal model...

Location: Beijing
Posted Date: 04 Feb 2026

强化学习训练优化工程师

your career. The Role: The TrainingAtScale team at AMD is looking for a Training Optimization Engineer to help build... the performance, stability, and scalability of distributed training systems. You will work closely with internal model...

Location: Beijing
Posted Date: 06 Mar 2026