(Linux, networking, concurrency). Hands-on large-scale distributed training experience (PyTorch Distributed/torchrun; common... cluster operations and automated remediation (health checks, drain/replace, topology-aware placement). Training stability...
your career. The TrainingAtScale team at AMD is looking for a Training Optimization Engineer to help build and optimize... development and maintenance of AMD’s internal training framework, covering pre-training, post-training, and reinforcement learning...
achieve peak performance. Strengthen AMD’s Training Ecosystem: Fill functional gaps, improve framework integration... your career. THE ROLE: As part of AMD’s Training Optimization Team, you will help customers train AI models seamlessly...
Job Category: Pre Sales Job Description: Position Overview As a GPU Specialist Cloud Engineer (CE) within the.... Optimization: Advise customers on right-sizing GPU shapes based on workload requirements (e.g., training vs. inference, FP8 vs...
We’re working on the next generation of recommendation tools and pushing the boundaries of accelerating model training... and inference on GPU. You’ll join a team of ML, HPC and Software Engineers and Applied Researcher developing a framework designed...
your career. The Role: The TrainingAtScale team at AMD is looking for a Training Optimization Engineer to help build... in the development and maintenance of AMD’s internal training framework, covering pre-training, post-training...
, or Triton. Optimize model inference and training pipelines for speed, throughput, memory efficiency, and cost across GPU... and architecture design. Familiar with inference optimization, experience in developing popular inference framework such as TensorRT...
AI development framework. (Pytorch,Tensorflow,Keras etc.) Good analytical problem-solving skills, relevant engineering experience... of exciting challenges with ample opportunities for development and training in a truly global landscape A culture that pioneers...
on maximizing training and inference speed while enabling effortless scalability. What You’ll Be Doing: Profile, analyze..., and optimize GPU‑accelerated code to improve training and inference performance for large‑scale recommender systems. Design...
, etc. Prior work experience in an agile environment, or introductory training on Lean/Agile. Aware of and able to apply continuous... customer. Learns organization vision statement and decision making framework. Able to understand how team and personal goals...
for training programs to keep your skills and knowledge up to date with the latest industry trends and technologies. By joining..., with 6–8+ years of relevant experience. Proficient in Python and at least one web framework, and capable of independently...
your career. The Role: The TrainingAtScale team at AMD is looking for a Training Optimization Engineer to help build... in the development and maintenance of AMD’s internal training framework, covering pre-training, post-training...