View All Jobs 1579

Distributed Training Engineer, Sora

Optimize training throughput for video models to enhance AI capabilities.
San Francisco Bay Area
Senior
$295,000 - 440,000 USD / year
4 months ago

✨ About The Role

- As a Distributed Systems/ML engineer at OpenAI, you will be improving the training throughput for internal training frameworks. - You will enable researchers to experiment with new ideas by providing them with efficient tools and systems. - The role involves designing, implementing, and optimizing state-of-the-art AI models. - You will be profiling and optimizing the training framework to achieve impressive hardware efficiency. - The job requires collaboration with researchers to develop systems-efficient video models and architectures. - You will apply the latest techniques to the internal training framework to enhance performance. - The position is based in San Francisco, CA, and follows a hybrid work model with 3 days in the office per week. - Relocation assistance is offered to new employees moving to San Francisco for the role. - The role is part of the Sora team at OpenAI, which focuses on making video a key capability of the foundation models while ensuring their reliability and safety.

⚡ Requirements

- You should have experience working with multi-modal machine learning pipelines and a passion for diving deep into systems implementations. - Strong software engineering skills, particularly in Python, are essential for success in this role. - You should be someone who is driven by performance optimization and has a keen eye for maintaining system performance and maintainability. - Understanding and optimizing training kernels should be within your skill set. - You must be passionate about ensuring stable training dynamics in AI systems. - The ideal candidate does not tolerate bugs in their code and strives for writing bug-free machine learning code. - You should have a deep understanding of distributed systems and the performance of supercomputers. - Collaborating with researchers to develop efficient video models and architectures will be a key part of your job. - You should be someone who enjoys working in a hybrid research and product team environment.
+ Show Original Job Post
























Distributed Training Engineer, Sora
San Francisco Bay Area
$295,000 - 440,000 USD / year
Engineering
About OpenAI
Building artificial general intelligence