✨ About The Role
- Lead and manage teams responsible for building powerful APIs orchestrating distributed systems, moving and persisting vast amounts of data to support ML training workloads
- Optimize end-to-end systems, focusing on high performance I/O, local performance maximization, and scalability across compute resources
- Collaborate in a fast-paced environment to respond to evolving needs of training systems architectures, ensuring stability and performance on newest supercomputers
- Drive the development and delivery of technology to millions of users worldwide, prioritizing safety and reliability in all aspects of the work
- Foster a diverse, equitable, and inclusive culture, promoting radical candor and challenging groupthink to accelerate progress towards artificial general intelligence
âš¡ Requirements
- Experienced technical leader with a track record of managing large-scale distributed systems and teams, optimizing end-to-end systems for high performance and scalability
- Skilled in Python and Rust, or eager to learn, with a passion for accelerating progress in AI research towards AGI
- Strong communicator with the ability to lead diverse teams, foster an inclusive culture, and drive collaboration across functions
- Proven problem-solver who takes ownership of challenges, drives innovation, and ensures reliability and scalability in system design and development
- Humble, collaborative team player with a proactive attitude, dedicated to supporting colleagues and achieving team success