Move Fast Without Breaking Things in ML
Written by Bob Nugman, ML Engineer at Doordash, and Aparna Dhinakaran, CPO of Arize AI. In this piece, Bob and Aparna discuss the importance of reliability engineering for ML initiatives.
Machine learning is quickly becoming a key ingredient in emerging products and technologies. This has caused the field to rapidly mature as it attempts to transform the process of building ML models from an art to an engineering practice. In other words, many companies are learning that bringing a model that works in the research lab into production is much easier said than done.
One particular challenge that ML practitioners face when deploying models into production environments is ensuring a reliable experience for their users. Imagine it’s 5am and you get an urgent message. You hop into a meeting and the technical executive is on the line. Purchases plummeted in a new market, resulting in a material impact on revenue. Customers are complaining. Your team is trying, but it’s unclear where to even start. Did a model drift or fail? One thing is clear: as the industry matures, ML reliability deserves to be taken seriously.