Find your dream job NOW!

Click on Location links to filter by Job Title & Location.
Click on Company links to filter by Company & Location.
For exact match, enclose search terms in "double quotes".

Keywords: AI Inference Performance Engineer, Location: Santa Clara, CA

Page: 1

Senior Compiler Engineer, AI Inference Performance

must deliver leading inference performance, fast build time, reduced memory footprints, and ease of use in the forms of both Ahead.... Today, we are increasingly known as “the AI computing company”. We are looking for an AI & Deep Learning Compiler Engineer. NVIDIA is hiring...

Company: Nvidia
Location: Santa Clara, CA
Posted Date: 26 Feb 2026

Senior DL Algorithms Engineer - Inference Performance

inference stack to push the boundaries of inference performance. Benchmark state-of-the-art offerings in various DL models... inference. Experience with performance profiling, analysis and optimization, especially for GPU-based applications...

Company: Nvidia
Location: Santa Clara, CA
Posted Date: 20 Feb 2026

Senior DL Algorithms Engineer - Inference Performance

inference stack to push the boundaries of inference performance. Benchmark state-of-the-art offerings in various DL models... inference. Experience with performance profiling, analysis and optimization, especially for GPU-based applications...

Company: Nvidia
Location: Santa Clara, CA
Posted Date: 08 Jan 2026

AI Inference Performance Engineer

We optimize and benchmark GenAI inference on NVIDIA's latest accelerators, defining the industry’s performance... in deep learning inference or high-performance systems. Deep understanding of LLM/VLM architectures and inference mechanics...

Company: Nvidia
Location: Santa Clara, CA
Posted Date: 10 Mar 2026

AI Inference Performance Engineer - New College Grad 2026

We optimize and benchmark GenAI inference on NVIDIA's latest accelerators, defining the industry’s performance... in deep learning inference or high-performance systems. Deep understanding of LLM/VLM architectures and inference mechanics...

Company: Nvidia
Location: Santa Clara, CA
Posted Date: 07 Mar 2026
Salary: $124000 - 195500 per year

Senior Deep Learning Software Engineer, Inference and Model Optimization

opportunities. Continuously innovate on the inference performance to ensure NVIDIA's inference software solutions (TRT, TRT-LLM... focuses on optimizing generative AI models such as large language models (LLM) and diffusion models for maximal inference...

Company: Nvidia
Location: Santa Clara, CA
Posted Date: 27 Feb 2026

Senior AI Inference Compiler Engineer

. The compiler must deliver leading inference performance, fast build time, reduced memory footprints, and ease of use.... Today, we are increasingly known as “the AI computing company”. We are looking for an AI & Deep Learning Compiler Engineer. NVIDIA is hiring...

Company: Nvidia
Location: Santa Clara, CA
Posted Date: 26 Feb 2026

Principal Software Engineer - AI Inference

NVIDIA is the platform for every new AI-powered application. We seek a Principal Software Engineer - AI Inference... of inference runtime architecture, GPU performance engineering, and distributed systems. You will collaborate closely with internal...

Company: Nvidia
Location: Santa Clara, CA
Posted Date: 25 Feb 2026

Senior Compiler Engineer, AI Inference Platforms

must deliver leading inference performance, fast build time, reduced memory footprints, and ease of use in the forms of both Ahead.... Today, we are increasingly known as “the AI computing company”. We are looking for an AI & Deep Learning Compiler Engineer. NVIDIA is hiring...

Company: Nvidia
Location: Santa Clara, CA
Posted Date: 24 Feb 2026

Senior System Software Engineer - Dynamo-Triton Inference Server

and NVIDIA Dynamo stacks to establish a unified, high-performance inference platform. This platform will ensure feature parity...We are looking for a Senior System Software Engineer to work on . NVIDIA is hiring software engineers for its GPU...

Company: Nvidia
Location: Santa Clara, CA
Posted Date: 20 Feb 2026

Senior Software Development Engineer – SGLang and Inference Stack

training and inference models for optimal performance on AMD hardware. Day-0 supports to many SOTA models, DeepSeek 3.2, Kimi... frameworks for AMD GPUs. Your work will be instrumental in enhancing GPU kernel performance, accelerating deep learning models...

Posted Date: 12 Feb 2026

Software Development Engineer- SGLang and Inference Stack

training and inference models for optimal performance on AMD hardware. Day-0 supports to many SOTA models, DeepSeek 3.2, Kimi... frameworks for AMD GPUs. Your work will be instrumental in enhancing GPU kernel performance, accelerating deep learning models...

Posted Date: 12 Feb 2026

Senior Software Engineer - Inference as a Service

stability, and deliver high-performance, low-latency inference at a massive scale. What you'll be doing: Contribute to the... into our CI/CD pipelines. Build tools and frameworks for real-time observability, performance profiling, and debugging of inference...

Company: Nvidia
Location: Santa Clara, CA
Posted Date: 12 Feb 2026

Principal Software Engineer - Inference as a Service

, ensure service stability, and deliver high-performance, low-latency inference at a massive scale. What you'll be doing... stability and high availability of inference services. Optimize system performance and latency for various model types...

Company: Nvidia
Location: Santa Clara, CA
Posted Date: 11 Feb 2026

Senior Software Engineer, Deep Learning Inference - TensorRT

We are now looking for a Senior Software Engineer for Deep Learning Inference! Would you like to make a big impact...’s SDK for high-performance deep learning inference. Closely follow academic developments in the field of artificial...

Company: Nvidia
Location: Santa Clara, CA
Posted Date: 11 Feb 2026

Senior Deep Learning Software Engineer, Inference and Model Optimization

opportunities. Continuously innovate on the inference performance to ensure NVIDIA's inference software solutions (TRT, TRT-LLM... focuses on optimizing generative AI models such as large language models (LLM) and diffusion models for maximal inference...

Company: Nvidia
Location: Santa Clara, CA
Posted Date: 23 Jan 2026

Senior Software Development Engineer - SGLang and Inference Stack

and tune large-scale training and inference models for optimal performance on AMD hardware. GPU Kernel Development: Design... frameworks for AMD GPUs. Your work will be instrumental in enhancing GPU kernel performance, accelerating deep learning models...

Posted Date: 20 Dec 2025

Senior Software Development Engineer – LLM Inference Framework

. You are comfortable reading and modifying large-scale inference frameworks, debugging performance across GPUs and nodes, and collaborating... your career. THE ROLE: As a senior member of the LLM inference framework team, you will be responsible for building...

Posted Date: 20 Dec 2025

Senior Software Development Engineer - LLM Kernel & Inference Systems

) inference and kernel optimization for AMD GPUs. You will play a critical role in advancing high-performance LLM serving... RESPONSIBILITIES Optimize LLM Inference Frameworks Drive performance improvements in LLM inference frameworks such as vLLM, SGLang...

Posted Date: 20 Dec 2025

Senior AI Performance and Efficiency Engineer

We are seeking a Senior AI/ML Performance and Efficiency Engineer, GPU Clusters at NVIDIA to join our AI Efficiency... of modern ML techniques and tools Experience investigating, and resolving, training & inference performance end to end...

Company: Nvidia
Location: Santa Clara, CA
Posted Date: 22 Feb 2026