Design and build a unified GPU inference platform for Ads, ensuring scalability, reliability, efficiency. Optimize... model inference via batching, quantization, scheduling, memory management, runtime optimization, kernel-level improvements...
Job Description: WHAT YOU DO AT AMD CHANGES EVERYTHING At AMD, our mission is to build great products that accelerate next-generation computing experiences—from AI and data centers, to PCs, gaming and embedded systems. Grounded in a cul...
Engineer to collaborate with cross-functional teams to design and develop data infrastructure and analytics capabilities... structured data inputs for AI model training and inference (e.g., LLM applications, recommendation systems), optimizing feature...
frameworks for AMD GPUs. Your experience will be critical in enhancing GPU kernels, deep learning models, and training/inference... principles to drive continuous improvement. THE PERSON: Skilled engineer with strong technical and analytical expertise in C...
frameworks for AMD GPUs. Your experience will be critical in enhancing GPU kernels, deep learning models, and training/inference... principles to drive continuous improvement. THE PERSON: Skilled engineer with strong technical and analytical expertise in C...
+ Applied AI Engineer)- Digital Twin & Clinical Ai - Remote (Contractor) Location: Remote - Global - Philippines, Vietnam...%); a few real projects can be sufficient Bonus “brownie points”: Experience deploying AI models/LLMs on the edge (edge inference...
As a Senior Machine Learning Systems Engineer, you'll lead efforts to scale and optimize the training system for our large-scale... do (responsibilities) You'll design, implement, and optimize large-scale machine learning systems for training and inference. You'll...
and inference on GPU. You’ll join a team of ML, HPC and Software Engineers and Applied Researcher developing a framework designed...: In your role as CUDA Engineer Intern you will be profiling and investigating the performance of optimized code together...
for optimizing inference latency and cost Familiarity with GPU/TPU acceleration and distributed inference architectures Experience... Proficiency in deep learning frameworks (TensorFlow, PyTorch) and deployment tools (ONNX, tf-serving, TorchServe, Triton Inference...
Integration: Collaborate with software stack teams to expose optimized kernels within high-level frameworks and inference engines... kernels using OpenAI Triton or other Python-based DSLs for agile kernel development and auto-tuning. Inference Engine...
robust pipelines for LoRA-based model training, post-training quantization, and inference optimisation. Develop... with LoRA training, model post-processing (quantization, pruning), and on-device inference optimisation. Familiarity with image...
, and a deep understanding of prompt engineering techniques. Solid problem-solving skills in LLM inference optimization, token..., but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience. Experience in optimizing LLM inference...
, transformer networks, reinforcement and transfer learning, etc.) Facility with classical methods of statistical inference...
#, and Python. You will design and implement the core inference for our exceptional OCR and document layout analysis engine.... Inference Optimization Strategy: Spearhead efforts to optimize deep learning model inference for maximum speed and throughput...
innovative system optimization solutions for internal LLM workloads. - Optimize LLM inference workloads through innovative kernel..., algorithm, scheduling, and parallelization technologies. - Continuously develop and maintain internal LLM inference...
technical problems, advance state-of-the-art LLM technologies, and translate ideas into production. - Optimize LLM inference... LLM inference infrastructure. - A bachelor's degree or higher in computer science, engineering, or a related field, PhD...
SDK toolchain Implement and optimize inference drivers for large language models (LLM) and large multimodal models (LMM..., transformer and the Hugging Face ecosystem Knowledge of LLM/LMM inference engines, such as llama.cpp or ExecuTorch Experience...
efficient inference on both server-side and embedded targets. Experimentation, Evaluation, and Knowledge Transfer to other team...