View All Jobs 2357

Llm/ml Engineer (inference)

Architect and implement scalable inference systems for state-of-the-art AI models.
San Francisco, California, United States
Mid-Level
$150,000 - 240,000 USD / year
1 month ago
REDUCTO

REDUCTO

🔮 REDUCTO - UNLOCKING DATA BEHIND COMPLEX DOCUMENTS

✨ About The Role

- The role involves architecting and implementing robust, scalable inference systems for serving advanced AI models. - Optimizing model serving infrastructure for high throughput and low latency at scale is a key responsibility. - The candidate will develop and integrate advanced inference optimization techniques. - Collaboration with the research team to bring cutting-edge capabilities into production is expected. - Building developer tools and infrastructure to support rapid experimentation and deployment is part of the core work.

âš¡ Requirements

- The ideal candidate has a strong background in Python and PyTorch, demonstrating deep expertise in these technologies. - A solid understanding of low-level operating systems concepts, including multi-threading and memory management, is essential for success in this role. - The candidate should be proactive, taking initiative to solve problems rather than just identifying them. - Experience with modern inference systems and the ability to create custom tooling for testing and optimization is highly valued. - A methodical approach to debugging complex systems and rapid prototyping is crucial for this position.
+ Show Original Job Post
























Llm/ml Engineer (inference)
San Francisco, California, United States
$150,000 - 240,000 USD / year
Engineering
About REDUCTO
🔮 REDUCTO - UNLOCKING DATA BEHIND COMPLEX DOCUMENTS