✨ About The Role
- Design, build, and manage data pipelines and core tables for OpenAI to power analyses, safety systems, and product growth
- Develop canonical datasets to track key product metrics including user growth, engagement, and revenue
- Collaborate with various teams to understand their data needs and provide solutions
- Implement robust and fault-tolerant systems for data ingestion and processing
- Ensure the security, integrity, and compliance of data according to industry and company standards
âš¡ Requirements
- Experienced data engineer with a minimum of 3 years in data engineering and 8 years in software engineering, including data engineering
- Proficient in programming languages commonly used in Data Engineering such as Python, Scala, or Java
- Skilled in distributed processing technologies and frameworks like Hadoop, Flink, and distributed storage systems
- Strong expertise in ETL schedulers like Airflow, Dagster, or Prefect
- Solid understanding of Spark and ability to write, debug, and optimize Spark code