✨ About The Role
- The role involves collaborating with researchers, data scientists, and platform developers to specify availability, performance, correctness, and efficiency requirements of the research platform
- Designing and implementing solutions to ensure scalability of infrastructure to meet increasing demands
- Developing and maintaining monitoring systems to proactively identify issues and anomalies in the production environment
- Implementing fault-tolerant and resilient design patterns to minimize service disruptions
- Building and maintaining automation tools to streamline repetitive tasks and improve system reliability
âš¡ Requirements
- Experienced reliability engineer with a track record of accelerating engineering reliability in a fast-paced, rapidly scaling company
- Proficient in cloud infrastructure, specifically Azure, and experienced in collaborating with cross-functional teams to ensure reliability and scalability
- Skilled in utilizing Infrastructure as Code (IaC) principles to automate infrastructure provisioning and configuration management
- Strong problem-solving and troubleshooting skills, with excellent communication and collaboration abilities
- Bachelor's degree in Computer Science, Information Technology, or related field, or equivalent work experience