Arize:Observe 2023

Running AI at Scale In Hybrid Cloud: An Interactive Example with UbiOps

In this hands-on workshop, Anouk Dutree of UbiOps shows how to deploy and run a machine learning model in the cloud. UbiOps is a serverless and cloud agnostic platform for AI & ML models, built to help data science teams run and scale models in production. During the workshop, Dutree discusses best practices when moving your models to production. We will also pay special attention to hybrid cloud set-ups. Python knowledge is all you need for following along!

Anouk Dutree: Well welcome everyone, I'm Anouk, a product owner at UbiOps and I'm going to talk to you a little bit about running AI at scale in a hybrid cloud setting. And I'll do that with, uh, the use of an interactive example that you can either follow along with me or you can just watch me do it and maybe try it out later for yourself.

Um, So, as I mentioned, I'm working for UbiOps, you might know us, you might not. I figured it might be good to just do a little introduction of, um, who we are before we start. So basically, UbiOps is an MLOps platform made for running and, um, scaling AI and ML workloads. Uh, so we're really designed for data scientists to work with us without having like the steep learning curve of a tool like Kubernetes to really make sure that a data scientist can take ownership of their models and also make sure that they can go to production.

So, without further ado, this is the schedule I had in mind for today. We've got 30 minutes, so I'm gonna try to squash everything in there. Uh, introduction you already had. I'm gonna talk a little bit about model serving in production and what does this mean, uh, some common pitfalls that I see.

And afterwards, uh, we're just gonna have a little workshop. I'm gonna deploy a computer vision model with you. And we're gonna do that live. And after we've done that, I will also touch upon a little bit how would this work in a hybrid cloud setup? So what would be different and what would be the same?

And afterwards we can have a little Q&A. If I finish a bit early, I think we can have some in the comment section that I can answer here already. And otherwise, uh, I will be available on the Arize slack to go over all your questions there as well.

So, Let's talk about bringing machine learning to production or as I like to call it, getting machine learning from the lab to the field.

Because when you're a data scientist and you start working on a model, you've got a model task that you need to make, then typically this is on your local laptop, maybe in a notebook environment, uh, or writing Python code directly. But at some point, this of course needs to start generating value for a business.

And there's a path between these two, like something that's locally on your laptop doesn't necessarily provide value straight away. So to do that, um, or actually, yeah, surprisingly, I get the question quite often what the issue is with just keeping code locally. Um, And obviously there are quite a few obstacles with keeping code locally.

Now, some use cases, it might be fine to run something yourself manually every month, but for a lot of use cases you need to really move it to a production environment. An example where you need this is when we look at access to the necessary data. Cuz typically as a data scientist, you only get a subset of the production data to work on for developing your model. But when your model needs to move to production, it needs to have access to the actual production data. And this might have a big impact on your model as well. And does your laptop have access to that? Then there's reliability piece. If it's locally on your laptop, a laptop might crash.

So what if you have a model that needs to run like 24/7, uh, but also resources. What if you don't have a powerful enough GPU for what you need to run? Um, all of those things, uh, come together to create this issue of needing to deploy your machine learning model or your code to a production environment that can handle these obstacles.

Now, typically the plan that I see from companies look something like this. So we just have a data scientist developing code. Uh, we're putting that in some kind of Jupiter environment. Maybe at some point we put it inside a Docker container and then we put it on the cloud and proof we're done. Now, as some of you might.

Recognize, I see that very often the path actually looks something like this. So there are so many different tools in this MLOps landscape that all start coming together that all need to be connected to actually get from something that you wrote on your laptop to something that's actually providing value.

And a lot of these tools of a very steep learning curve are created for software engineers and not necessarily for data scientists. And it can just be quite a tricky road to cross. So what we try to do is to focus really on going from, uh, a code to a live app in a couple of simple steps, and that's also what we're gonna do today.

So maybe to put that in a little bit of context, I wanted to go through an example case of, uh, buyer because this is basically the real life scenario of what we're gonna do in a little bit. Because we're gonna deploy a computer vision app together. And bio crop signs did the exact same thing with us.

They've got an app called Magic Scout, which is an app for farmers worldwide. And this allows farmers to take a picture of their crops, uh, upload it to the app, and the app will tell them if their crops are healthy, if they've got pests or weeds. And it will also give them recommendations on how best to approach these problems that they might be having with their crops.

So this is the setup essentially that they needed. They needed this end user upload. A picture goes to the application. That mobile application sends this picture to the AI model that's hosted somewhere. The AI model will send a prediction back, uh, and the application will show this results in a nice format to their end user and that you've got a happy end user.

And the thing they found out is that data science teams are. Really good at building great AI models, and their model worked really well, but the problem was that they were not necessarily that good at deploying and scaling them. So they've got the first part. But what happens when you've got all these farmers worldwide sending so many images to that model?

Like, can that model then still also handle that? Like maybe one per hour is fine, but what if it becomes hundreds per second? Can you then also scale it up and this becomes an increasing issue? When you're working with computer vision, because very often you need, uh, GPUs to be able to do this in a timely fashion, especially here with a mobile application, you really need, uh, response time of like sub milli or of like millisecond level or at most a couple of seconds, because if you're longer than that, then a user will start to get frustrated because they think that their app crashed and that's of course not what you want.

So, What we, uh, so basically what the issue here was that they needed to solve the challenge of deploying AI and no longer the challenge of making AI models. And this is because all of a sudden you start entering this IT world that you have to deal with. So you have to make API endpoint security and access control monitoring, which is of course something you can very nicely do with Arize, um, uh, resources.

Do you have enough CPUs and GPUs? It's a whole different ball game than making a nice model that works well. So what we did with them is basically they use UB ops to host, uh, and to serve their models. Uh, so the model behind the Magic Scout app is running on an installation of UbiOps that runs both in Google Cloud and AWS for them.

And, uh, by doing this, they make sure that they always have enough resources because essentially when Google Cloud doesn't have enough GPUs anymore for the demand that their app has at that moment, we simply switch over to AWS and source GPUs from there, from them. And this way they can really handle that peak traffic that they have because their data traffic is also very seasonal.

As you can imagine, farmers, um, will be a lot more active in using this application in the summer than in the winter. Uh, so it's quite unpredictable. So, How much traffic they have. And by having this hybrid cloud set up, uh, with them, we manage to make sure that they always have enough resources.

So it's workshop time.

It's your turn. Now if you want, of course, you're also more than welcome to just watch me do it and see how it works. So what are we going to build today? So we're going to build your own live prediction engine that runs in the cloud and can process data through an api. I, so we're gonna take a pre-trained neural network for number detection based on the, which you might recognize.

So I already prepared this for you. We're not gonna train a model today. That's something different. And then afterwards, we're gonna deploy the Python code to run this model on UbiOps. And that will look something like this. So we will have an image of a number. This case it's a five that we can send to this a p i that's hosted on UbiOps.

And this, uh, model will then send back an output of this is a five, I'm 96% sure, or this is a four. I'm so many percent sure. So that's essentially what we're going to build and the steps to do that are this. We're gonna write some code, or actually I wrote some code for you that you can see how it works.

Uh, we're gonna deploy that and then we're gonna just send some data to the API so that you can see it in action. Uh, so let's get started. Here's what you need. Uh, if you want to follow along yourself, you will need UbiOps account, which you can create at It's completely free. Um, and secondly, we need a Google call app notebook for deploying the model.

So this is all the code that I already prepared, and you can find that either by going to this link, uh, or scanning the QR code. So that's how you can find it. Um, Maggie also already put it in the chat, so there you can see it as well. So I'll leave this, uh, or actually the, the, the link is in the chat. So let me just share my screen and switch over to the notebook and then we can do that.

Uh, let me actually first go to the web app so you can see where we go. I'll share my screen.

Okay, perfect. You are seeing my screen now. Let's first go to the web app.

So let me go full screen so you can see it, hopefully a bit better. And I'll zoom in a little bit. Um, Okay, so this is the UBI Ops web app. If you create a new account and a new project, this is roughly what you will see, but probably with less data than mine. Uh, to run the notebook, you will need something that we call an API token because this is used to authenticate so that you can speak to UBI ops through code, uh, using our client library instead of having to do everything through the web app, because we will do it through code.

So if you need an API token, I'm just gonna show you where you can make it because I will also need one can head over to permissions. To API tokens, click at Token. So I'm gonna use this or I'm gonna give it a name, Arize, observe, token and expiration field as well. And then I can go to the next step. I can say what permissions this API token needs.

For this case. We need project editor permissions, so can give that. And then I will copy my token. And then we've got this token, and once you've got the token, you can add to the Google collab. So let me go full screen. Okay. And then we can start. So when you open the Google collab, you should be seeing this.

So the workshop for Arize Observe deploying a computer vision model to the cloud with UbiOps. So here you can see again the setup of what we're actually making. So let's just go through it. First thing's first, we need to install, uh, the UbiOps client library. So I already did this so that this should work, but if you run it, you should see something like this.

So, just collecting IOPS from Pipi, the latest version, it should be, uh, three point 15.1. And then everything should work. And then if we've got UbiOps installed, then we can continue to defining the project info and setting up a connection with UbiOps. So we are gonna use that token that we may just now to basically authenticate ourselves.

So if you run this cell, what you will see is, you'll see there a little prompt where you can enter this token in a safe way. So I'm just gonna paste my token here. And when I click enter, if everything went well, you should be receiving status. Okay? If, uh, you don't receive status, okay, something went wrong and it's probably with your token formatting.

So make sure that your token is off the shape, uh, token space, and then some kind of token code. And then you should have status. Okay, so now we've got this connection set up to UB ops, which means that we can actually start doing something. We can start creating a deployment. So a deployment is basically an object within UbiOps that, uh, serves your code.

So any kind of Python or R code. Um, and this deployment has a defined, uh, input and output set. So in our case, what we're gonna be sending is an input field that's called. Image, and this is of data type file. So we're gonna be sending this image file and as output, we will have a prediction, which is in this case of type string.

And I also added a label, uh, so that you can easily see it back in the, in the web app when you log in. So if we run this cell. And if everything goes right, you should see something like this, which is basically tells you the, um, the, the information of your deployment that was just created. Now what we can also do is go back to the web app, go to deployment, and go full screen again.

And here you should see something like this. So NYS 2023 and today's date. So this is the one that we just created. As you can see, it has the right label, but there's nothing really here yet. That's because we still need to make that. So if we go back to the collab, the full screen again. We can continue, we can now actually start preparing our code because every deployment can have multiple versions, and the versions contain the actual code implementation.

The reason why this is split out is that if you create a new version with like a new implementation, but that performs the same action, so that still takes this input image and returns to prediction, but just with a different code implementation, you can just create a new version and your API endpoint doesn't change, so you don't have to update that.

So we now still need to make our version. So first of all, let's make it a directory called deployment package. I already actually have one, so that's why it's giving this error for me now. And once we've got this directory for a deployment package, we can start putting our code in there that we will deploy to e p ops.

Now I've got this right file statement, which will make sure that all the code in this cell will be written to a deployment lock pie file. So in this cell you can see the actual code that will be run by UBS whenever it receives data. So here we're just importing, um, Caris and Image IO. Nothing too special.

And then you can see that we've got a class called deployment with two functions, an init function and a request function. The init function is run whenever deployment initializes on ubi. So this is basically the place where you would load things into main memory, and this is what we're doing here. Here you can see that I'm loading in the weights of my pre-train model.

So I've got a c n n uh, H5 file, uh, which basically specifies how my neural network looks, and I'm just loading that in here so that that's ready whenever any data gets passed through this deployment instance. And then we've got the request function. The request function is the one that actually receives the input data and can process that to return our output data.

So what you can see here on line 52 is that we read in this image file that we receive. So here we use the image. Fueled we retrieve from the data dictionary that's being passed, the dysfunction, and we're gonna just convert it to the right format so that our model can actually work with it properly.

And then we're gonna call the predict function of our model. Nothing too special there. And we're gonna return the result of this is, uh, what number it is and how much percent sure we are, and that we will return in the return statement. So if you run this cell, um, then you should see something like this, like writing or overwriting deployment, package slash deployment pie.

And this will mean that this file has now been written away to a Python file. And then we still need a requirement because as you can see, I import some packages. So to make sure that I have actually access to these packages on UPS as well, I need to pass a requirement, that specifies all the libraries that need to be installed, uh, for this deployment to function.

So in my case, it's like carers and tensor flow. So when I run that, it will create a requirement. And then the last thing we need for this to work is that waits file because in the internet function, we're loading the C N N of H5 file, but we don't have that yet. So I've prepared that on our Google Cloud storage.

So with this curl command, you can download the C N N of H5 file. Uh, locally and it will put it in the right package so that now we've got a folder that contains three files, a deployment up pie with a code to run a requirements t d with any packages that need to be installed and the weights file that we need.

Okay, so then all that's left to do is actually push this code to UB ops and create a new version. So when you, sorry, I can do that by running this cell. And in this cell, what you can see is here that we call UbiOps deployment version Create. We made a version, this case, uh, it's running Python 3.7. Uh, we specify the memory allocation, how much it can scale.

Um, and then we just pass that to our API to create it. And, uh, we zip this directory that we made with the code, and we also upload that to this version to make a new revision of that version that contains the actual code. So now the only thing that we need to do is wait until the deployment is actually ready for use, because it needs to be built for a little bit first.

So if I run this function, this function will wait for this deployment version to be ready. Mine might be a little bit quicker than yours because, um, Uh, deployment build gets quicker if it uses an environment that you already use somewhere else in the project. So I've already, uh, installed like these packages very often.

So the build time is very quick. If you do it for the first time with, uh, a requirements oft file that you've never used before, uh, or any different kind of, um, packages that you use, then it might take a little bit longer, but it shouldn't take longer than a couple of minutes max for this one. Um, and once that deployment is ready, so you should get that back from the, from the function as well.

You can actually send data to this model and we've prepared a couple of example images that you can use. So if you run this cell, you will, uh, retrieve them from our Google storage and you can use these cause Now here we've got a little bit of like a tiny coat snippet to sh just show you with one of the example images, how it works.

So, This example image to the jpeg if we load it, this is basically what it looks like, so it's a handwritten two. And this we can upload to UbiOps and send to that model that we just deployed. Cuz if we now also go to UbiOps, we should see that that one is ready. So let me quickly reload. Yeah. And here we can see the version that we just made, version v1.

If we go to it, we can see that it's available. The last built status was success, so now it's ready to accept, uh, input data. So let's go back to the collab and let's uploaded. So we can upload the image to UbiOps, which we do with UbiOps tools. Upload file this, uploads it to a specific bucket in your project, and then we can use that, uh, file UI to make a request, uh, to our deployment and making the request.

So actually sending the data is what we do here. So with api, do deployment request create, but just pass the project name, the deployment name, and what data we want to send with it. So in our case, that's, that's this, uh, image that we wanted to process. And as you can see, gave back prediction. This is a two, I am a hundred percent sure of it.

Now, if you want to try this with one of the other ones, you can simply change the file path that's here to one of the other ones. So for instance, we can do a three as well. So this is a three. You can run that. And then we should get back prediction. This is a three. I'm 99.93% sure. And that's basically it.

We now deployed, uh, this computer vision model that we made, um, to uvs and it's accessible now through an api. So the nice thing is that with the use of an API token, this, uh, model can now be exposed to other services. So I now. Wrote this request like through with a client, Python client library here, but I could do the same for my terminal if I wanted to, or from a website.

So if I've got a website, for instance, where I want, um, to host some other kind of, uh, computer vision app where people can upload images and that image gets sent to a model that would be the exact same setup, but then the website would make this request instead of us doing it through code here. So, Let me go back to the stream and share my slides again.

So I mentioned in the beginning that I would also touch upon, uh, how this would work in a hybrid setup because right now we deployed to our UB SaaS, uh, platform, which is hosted on Google Cloud. Uh, but actually we can also just deploy to other clouds if we want to. Because in UBS we've got this, uh, option to be connected to multiple no pools, uh, while still using the same interface.

So this is also what I mentioned, what buyer uses, for instance. So if you have our UBS core services, so just the, uh, what we can do is add additional note pools to your organization. For instance, one in AW S or even an on-premise cluster if you want, and connect them to your organization so that you have access to these other, uh, no pools that you can actually deploy on as well.

And we've also got some that we offer out of the box. If they're necessary. And the nice thing about this is that even though it will be hybrid cloud and you have multiple clouds, you can still have just one single unified interface. And essentially nothing changes in your workflow. The only thing that really changes is what kind of instance type you select.

So when I make the deployment before I selected an instance type with, uh, Uh, 2,402 2048 mb. Um, if we have an instance that's in aws, we would append it with the aws. Um, Suffix for instance. So then instead of saying 2048 MB as instance I, I could say 2048, uh, dash AWS for instance, and deployed there. As long as your, uh, account is access to those, and in this way, um, you basically bring them together.

And the only change is really what you can pick from a dropdown or what you send as a perimeter while still being able to utilize these multiple cloud resources. Um, Okay, and I think I see that I have a little bit of time, so let me just show you that as well. So I'll just share my screen again and show you how this works in the, in the web app, uh, smoother.

I think you should be able to see my tab, so let's just make it full screen as well. So it should be a bit more visible. Um, so I've got here the version that we already made. Now I can also, uh, duplicate that version and I'll make a version that actually runs on AWS instead of Google. So what I can do is I will use the same deployment package, so the exact same code.

I don't want to make changes there, but let's say I want to turn on a GPU this time. And so I'll do five and 3.7 with Cuda in there, and I'll go to the resource settings and. Here you can see the m v test lab t4, and this one's actually hosted on AWS. So now I can pick this one from the dropdown, and when I deploy this, it will be created in AWS once it's ready to be built.

Uh, so now the revision is being made. And there we go. So this one's not actually hosted somewhere else, so it's no longer in Google. And uh, that's really all it, it, it takes. So that's really, uh, the only difference there really is, um, not really any difference in how you will create a deployment or how you work with that.

Of course, there's a difference of how you access the data, but also for that, um, We offer connecting your own storage buckets. So when you connect to your own storage bucket, you can select, connect to existing cloud buckets and select which provider it is. So you can just connect a storage bucket you have in AWS and one in Google Cloud and work through it in the same way, um, as we just did with the follows.

So also there nothing really changes. Okay. And then I'm gonna wrap up. So, uh, I still have some key takeaways in the slides if you can bring them back up, Maggie. Thank you. So my key takeaways for today are really think about how your infrastructure is going to scale in the early stages of development.

Like don't leave it till the last minute, um, because that's something that I really see happening a lot. Uh, and it's just a shame if you work so hard as a data scientist, I'm making this great model and the infrastructure is just not ready to, to actually put it in production. And Oh, I see that there's a question.

Let me check. Um, so I see from Kelvin, how do you decide when a data slash model is ready to be deployed? How do you make sure it won't break or run into issues when it's deployed? So that's a very good question. So, something I really, personally like doing here is actually deploying in shadow mode. Um, so what you can do is, um, basically run two models side by side where one is actually connected to the output.

And the other is not, but you still log how it's performing. Um, in ubs, we can do this with pipelines. So you basically can drag two deployments in there, run them side by side, but only connect one to the pipeline end. And that way you can monitor how this, um, deployment is interacting with production data without actually any end user having any effects of this.

Um, of course you need to be careful here with your resources because it will. Require resources. So maybe it's nice to not do that for everything, but I think it's really helpful to sometimes just deploy something in shadow mode, see how it reacts, see if it performs the way you would expect, and if not, take it back down.

Iterates and deploy it again in shadow mode until you think like, oh, this is actually better than the, the model I already had. Another way you can do this is with, um, or actually the, the formal way of doing this is by champion challenger models. So where you've got your champion that you really run in production, and you've got a challenger model that you run in shadow, moat, maybe even multiple.

And if one of the challengers starts to consistently perform better than the current champion, you can promote them to champion and demote the champion to challenger. That's a setup that I personally really recommend working with because I think it's very hard to get an estimate of how your model performs before you put it in production because production data is inherently pretty different from, um, historic data that you will train the model on.

Um, and then you can also immediately check if you run into issues when it's deployed. So I hope that answers your question. Um, and I see another question from Alexander. Um, how does u bop scale with load? So, Uh, let me see if I understand this correctly. So basically, when you create a deployment, I kind of went over it now quickly because in the, in the, uh, notebook, you will see that there's a parameter called minimum number of instances, which we set to zero and maximum to one.

This means that this deployment that we just made skills from zero to one. So it cannot really scale up. And this is just to make sure that you won't eat through all your resources. Um, but if you. Set that maximum number of instances to a higher number, you can scale up. And we do that automatically. And we do that based on the data traffic that's being sent to your model.

So the more data you're sending to your model, the more instances we will start spinning up to accommodate that increase in data traffic. And when the data traffic slows down, again, we will start, um, scaling them down and turning off these instances so that it also scales back down to the minimum number of instances.

And I see we're already at the time, so I will, uh, answer the rest in the Slack community. Here are my key takeaways still and here shortly my, uh, contact information. Feel free to get in touch with me, either through the Arize Slack community, uh, or email me directly. And yeah, thank you for listening and I hope you have a lovely rest of the event.

Subscribe to our resources and blogs