languages. Key job responsibilities - Architect and lead the development of robust inference infrastructure for Amazon... the creation of a sophisticated ML inference service that orchestrates complex interactions between multiple models...
techniques for Physical AI. GPU-based libraries, frameworks, tools, SDKs and infrastructure for model training and inference... platform. This platform consists of three core pillars: systems for massively parallel AI training in the data center...
learning innovation. In this role, you will architect, scale, and optimize high-performance ML infrastructure used... and infrastructure for training and inference on large-scale, distributed GPU clusters. Develop internal tools and automation for ML...
analysis for AI training/inference applications. Large-Scale System Development & Debugging: Experience developing..., and distributed training functionalities. GPU Performance Analysis & Optimization Acuity: The ability to analyze profiling data...