Ray + Arize: Productionize ML for Scale and Usability
If you’ve ever had the opportunity to bring a machine learning project to life from rapid prototyping all the way into production, you know that it is nothing short of yeoman’s work. Some would call it fun, some type-II fun, and others absolute hell – it all depends on your sense of humor and affinity for productionizing things.
But if you are on this hero’s journey, there are two arrows in your ML quiver that can make this process much more enjoyable and likely to hit the mark: Ray and Arize AI. This piece covers why you should consider using Ray’s distributed ML framework and ecosystem and Arize’s ML observability platform and how you can get started.
Imagine you just finished your ML prototype after weeks and weeks of trying to find the necessary data, cleaning/preprocessing all the data, trying to find the right model architecture, training, testing, and iterating over and over until you finally have a working model.
You present your model to the product team and tell an amazing story of how your model will improve the day-to-day operations, increase business KPIs, and a lot more. They love the presentation! They want to move forward and ask you the following: “How do we get this model serving the entire business line? And when the model goes into production, how will you know how it’s performing against our goals?”
What they are REALLY asking (in ML speak) is this: how do we get this off your laptop and take this into a very real production system at scale? And if issues around data quality, data drift or performance degradation happen, how will we catch and fix them quickly so that business outcomes aren’t negatively affected?
This blog is written to help give you a better understanding of how you can answer these questions and start tackling these productionalization tasks.
In this section, we will briefly review both the Ray and Arize technologies and the problems that each solve.
What is Ray?
Ray is an open-source project developed at UC Berkeley’s RISELab. As a general-purpose and universal distributed compute framework, you can flexibly run any compute-intensive Python workload — from distributed training or hyperparameter tuning to deep reinforcement learning and production model serving.
Many times, as ML practitioners, we set out trying to bring value to our business through the ML models we build but oftentimes get sidetracked in learning and managing how to bring our models to a larger scale.
This is where Ray comes in. Ray enables the user to run Python code in a parallel fashion and across multiple machines without confining you to a specific framework – basically imagine Apache Spark but you have the availability of all things Python.
This makes it more of a general-purpose clustering and parallelization framework that can be used to build and run any type of distributed applications. Because of how Ray Core is architected, it is often thought of as a framework for building frameworks.
You can break down Ray into a couple different components. The first is Ray Core, which is a distributed computing framework. The second is the Ray Ecosystem, which broadly speaking is several task-specific libraries that come packaged with Ray.
TL;DR on Ray:
- Very intuitive to scale in a language that you’re comfortable with (going from laptop to distributed workloads in Python)
- Vast ML ecosystem; not constrained by certain technologies or frameworks
- Allows users to focus on building their ML use case, not distributed technologies
Want to go Deeper? Here are some resources:
What is Arize?
For many ML teams, once the model interacts with the real world is where the rubber meets the road. This is where Arize comes in. Arize is an ML observability platform that allows ML practitioners to easily tackle the myriad of issues likely to come across in the real world, such as:
- Model Performance Issues: almost all models will experience some sort of performance degradation
- Model and Data Drift: the real world or model changing; risk to the model
- Data Quality Issues: we all know this one
- Model Explainability: knowing WHY my model is making the predictions it’s making
- Model Fairness: treating groups or protected classes equitably
When it comes to model monitoring, it’s not just that we want to be alerted when there is an issue. Once a monitor fires, we want the ability to know where and why the issues happened, and how we can fix them quickly. Arize makes finding these issues intuitive and automated. Just like in software development, if you don’t know where the bug is or have no visibility into the problem then it can be painstakingly long and arduous to triage the situation.
Arize is built to do three things well. The first is to let you know when something has gone wrong. The second is helping you understand where that issue is, giving you workflows to quickly fix it. Both contribute to the third, which is to continually improve ML models once they’re in production.
As you think about scaling the infrastructure around ML models, you also want to think about scaling team capabilities. If your team is spending copious amounts of time maintaining basic model analytics and systems not purpose-built for ML monitoring and observability, there is less time spent building newer, better models for the business.
TL;DR on Arize:
- Automated monitoring for issues your model will encounter in the wild
- Strong troubleshooting workflows to fix issues quickly
- Built for scale, intuition, and ease of use
Want to go Deeper? Here are some resources:
Let’s See it in Action
Below is a coded example of Ray with Arize.
It’s quite a simple example but shows the scaffolding of both of the technologies working in tandem. Let’s break the notebook up into two major parts.
The first part is likely familiar, as this is one of the advantages when using Ray. Here you are training our model, using that model to predict on the breast cancer dataset, and calculating SHAP values.
A lot of the code should feel familiar, akin to using something like using sklearn fit() and predict(). Here, you are using Ray to distribute the work to two actors.
An actor is essentially a stateful worker (or a service). When a new actor is instantiated, a new worker is created, and methods of the actor are scheduled on that specific worker and can access and mutate the state of that worker. This allows you to distribute the work needed to train, predict, and compute SHAP (or do any other action that is computationally heavy).
In the second part, you are prepping your production data – the data your model predicted on — to be sent to Arize. Here, you are instantiating our Arize client, defining your schema, and logging your predictions to our Arize account.
Whether real time or batch architectures, you can log inference data to Arize to monitor and observe how the model is doing in production. In doing so, you have good visibility into when the model encounters performance degradation, drift, or data quality issues. If you come across these model issues, you will have the ability to quickly find and fix the issue with Arize.
Food For Thought
As you think about your current ML operations, there is one thing you could probably use much more of: time. Ray and Arize can help. Instead of spending a lot of time learning how distributed technologies work or monitoring and troubleshooting models that are in production, it is worth considering offloading these tasks to technology to keep your team focused on what they do best: using deep business domain knowledge to build and deploy high-value ML models.