✨ About The Role
- Focus on improving training throughput for the internal training framework, optimizing performance and hardware efficiency for training runs
- Collaborate with researchers to enable the development of next-generation AI models
- Profile and optimize the training framework, applying the latest techniques to push the field forward
- Work on distributed model execution, interfaces, and implementation for model code, training, and inference
- Prioritize maximizing training and researcher throughput to accelerate progress towards AGI
âš¡ Requirements
- Experienced engineer with a background in distributed systems and machine learning, capable of optimizing performance and minimizing bugs in code
- Strong software engineering skills with proficiency in Python, able to work on large-scale ML experiments and projects
- Enjoys understanding how systems work and continuously coming up with ideas to improve efficiency while reducing complexity
- Thrives in a fast-paced environment, collaborating with researchers to develop cutting-edge AI models
- Comfortable working in a hybrid office model in San Francisco, with relocation assistance available