Research Scientist on our Reward Models team, you'll lead research efforts to improve how we specify and learn human... frontier of reward modeling for large language models. You'll develop novel architectures and training methodologies for RLHF...
building a future powered by AI that's as magical as it is impactful. As a Senior Research Scientist (Generative Video), you'll... and preference optimization (reward models, RLHF-style tuning, DPO variants, or preference learning). Understanding of inference...