Find your dream job NOW!

Click on Location links to filter by Job Title & Location.
Click on Company links to filter by Company & Location.
For exact match, enclose search terms in "double quotes".

Keywords: AI Inference Performance Engineer, Location: Santa Clara, CA

Page: 5

Sr AI Software Development Engineer

of developers pushing the boundaries of efficiency and performance to enable and optimize the software ecosystem for the...: We are looking for a highly motivated and skilled AI Software Engineer to join our team. You will work with a team of Software Engineers to enable...

Posted Date: 10 Jan 2026

Principal Software Engineer - Dynamo

resource management, and intelligent request handling, Dynamo achieves high-performance AI inference for demanding applications..., cache management, or high-performance networking. Understanding of LLM-specific inference challenges, such as context...

Company: Nvidia
Location: Santa Clara, CA
Posted Date: 02 Jan 2026

AI Cluster Test Automation Engineer

performance benchmarks. Experience with running inference workloads in AI clusters with different inference frameworks like vLLM..., SGLang. Running performance benchmarks for inference. Desired Skills: Understanding of High-Performance Computing...

Posted Date: 24 Dec 2025

AI Cluster Validation Engineer

inference workloads in AI clusters with different inference frameworks like vLLM, SGLang. Running performance benchmarks... for inference. Desired Skills: Understanding of High-Performance Computing application, Machine learning and GPU Programming, MPI...

Posted Date: 24 Dec 2025

AI Cluster Validation Engineer

inference workloads in AI clusters with different inference frameworks like vLLM, SGLang. Running performance benchmarks... for inference. Desired Skills: Understanding of High-Performance Computing application, Machine learning and GPU Programming, MPI...

Posted Date: 23 Dec 2025

Principal Software Engineer – Large-Scale LLM Memory and Storage Systems

NVIDIA Dynamo is a high-throughput, low-latency inference framework for serving generative AI and reasoning models... across multi-node distributed environments. Built in Rust for performance and Python for extensibility, Dynamo orchestrates GPU...

Company: Nvidia
Location: Santa Clara, CA
Posted Date: 23 Dec 2025

Distinguished Engineer - Dynamo

initiatives around resiliency, performance and scalability for Dynamo and AI inference. Build and drive Dynamo to continue being...We are currently seeking a senior-level Engineer with distinguished expertise to join the Dynamo engineering team...

Company: Nvidia
Location: Santa Clara, CA
Posted Date: 17 Dec 2025

Senior Software Engineer, L4 - Autonomous Vehicles

exposure to engage with Large-scale model inference architecture Contribute to the integration of innovative research... control, including dependencies, interface management, and performance tuning. What We Need To See: We’re...

Company: Nvidia
Location: Santa Clara, CA
Posted Date: 11 Mar 2026

Principal Engineer, Solutions Architect Lead – Industrial & Embedded IoT, Edge AI On‑Prem Appliance

to remain on premises. It combines the accessibility and performance of a datacenter inference server with the power efficiency... Appliance is a new product line under IE‑IoT BU. This advanced AI solution is designed for generative AI inference and computer...

Company: Qualcomm
Location: Santa Clara, CA
Posted Date: 03 Mar 2026

Principal Engineer, Solutions Architect Lead – Industrial & Embedded IoT, Edge AI On‑Prem Appliance

Management, Solutions, Platform SW, Performance, Security, and Research to leverage existing knowledge and infrastructure, land... that combine application, runtime, and platform considerations (performance, power, memory, cost, security). Deep hands...

Company: Qualcomm
Location: Santa Clara, CA
Posted Date: 02 Mar 2026

Senior Deep Learning Algorithm Engineer, Training Framework

, evaluation, deployment and tooling to optimize performance and user experience. In this critical role, you will expand Megatron..., meticulously analyzing and tuning performance, and expanding our toolkits and libraries to be more comprehensive and coherent...

Company: Nvidia
Location: Santa Clara, CA
Posted Date: 21 Feb 2026

Senior Deep Learning Algorithm Engineer

, evaluation, deployment and tooling to optimize performance and user experience. In this critical role, you will expand Megatron..., meticulously analyzing and tuning performance, and expanding our toolkits and libraries to be more comprehensive and coherent...

Company: Nvidia
Location: Santa Clara, CA
Posted Date: 20 Feb 2026

Senior Software Engineer – TensorRT Edge-LLM

Are you passionate about pushing the limits of real-time large language model inference? Join NVIDIA’s TensorRT Edge...-art inference framework in modern C++ that extends TensorRT with autoregressive model serving capabilities, including...

Company: Nvidia
Location: Santa Clara, CA
Posted Date: 13 Feb 2026

Software Development Engineer (Kubernettes)

performance of key applications and benchmarks. You will be a member of a core team of incredibly talented industry specialists... and scale-out inference. Develop methods and tooling to utilize dynamic resources in service of inference Support...

Posted Date: 04 Feb 2026

Senior Software Engineer, Profiling Services

‑to‑end feature delivery spanning user‑mode components, driver/platform layers, and performance counter/trace providers..., or related degree. 8+ years of system-level C/C++ development, including concurrency, memory management, and performance...

Company: Nvidia
Location: Santa Clara, CA
Posted Date: 31 Jan 2026

Sr Staff ML Engineer (Internet Security))

performance as data and threats evolve. Partner closely with Product Managers and domain experts to translate product..., model adaptation or fine-tuning, evaluation, and cost/performance optimization. Familiarity with AI agent-based approaches...

Location: Santa Clara, CA
Posted Date: 30 Jan 2026

Senior AI Engineer, World Foundation Models

performance and user-visible quality. What you'll be doing: Research, implement, and validate model architecture and algorithm... and long-horizon consistency. Improve training and inference efficiency through architectural and post-training techniques...

Company: Nvidia
Location: Santa Clara, CA
Posted Date: 26 Jan 2026

Senior Deep Learning Algorithm Engineer

performance. Ways to stand out from the crowd: Experience in building large-scale LLM inference systems, especially... outstanding engineers to join our team and help shape the future of LLM inference. Our team is dedicated to pushing the...

Company: Nvidia
Location: Santa Clara, CA
Posted Date: 20 Dec 2025

Senior Software Engineer, Deep Learning - MLIR TRT

to accelerate deep learning inference on NVIDIA hardware platforms for Physical AI. Working across a wide range of abstractions... from model fine-tuning and quantization to low-level kernel development and performance optimization. Develop workflows...

Company: Nvidia
Location: Santa Clara, CA
Posted Date: 17 Dec 2025

Senior Software Engineer, Deep Learning - Torch-TRT

) without forgoing performance Stay up to date with the latest research and innovations in deep learning, implement... with low precision inference, quantization, compression of DNNs Experience optimizing GPU workloads and or developing kernels...

Company: Nvidia
Location: Santa Clara, CA
Posted Date: 14 Dec 2025