Rise of the ML Engineer: Flávio Clésio, Artsy

Flávio Clésio is a Berlin-based data and machine learning (ML) engineer. One of the earliest members of a popular community for machine learning operations (MLOps) professionals, Clésio has seen a lot since entering the world of AI nearly a decade ago. Currently, he works at Artsy deploying and maintaining models that – among other things – recommend artwork to users, placing him at the crossroads of art and technology.

This interview was conducted before the current war and humanitarian crisis in Ukraine. Artsy’s content on the topic is worth highlighting, particularly “Ukraine’s Art Community Remains Defiant in the Face of the War” and “Ukraine Pavilion Curators’ Commitment to Exhibiting at the Venice Biennale in the Midst of War.”

Can you walk us through your career background?

I’m from Brazil originally and live in Germany, where I work as an ML engineer for a company called Artsy. My path into ML engineering started around eight years ago, when I made the shift from data science to a role more focused on production systems.

What kind of models is Artsy running in production and why?

We have a plethora of machine learning models. We have scores for artists based on their career stage, whether a work is highly critically acclaimed and so on, and use those to make a number of predictions.

We also have some models related to artwork similarity that we use to make recommendations. This is quite an interesting use case because our models incorporate art-specific inputs – there are some latent features on the artwork not only related with dimensions and colors but also a taxonomy that is unique to the art world, such as the period of time when the artwork was created, the region, the style, the medium and the category. You might not recommend a renaissance artist, for example, with a brand new NFT artist. We also have some artwork classification models, so we know if it’s acrylic on canvas or sculpture for example, with tagging internally to enhance the search and observability of the artwork within the company.

What are frequent pitfalls for recommendation systems based on your experience deploying and maintaining them? 

There are two frequent issues. The first is popularity effects. These plague any recommender system; you need to balance more popular items with relevance to the users across the entire catalog. The second issue that pops up a lot is around how to measure recommendation systems, not only in terms of the academic model metrics – like recall, accuracy – but also how to relate that to actual results like conversation rates or relevance to the user and how each of these translate to revenue at the end of the day.

This is a hard exercise and is not trivial. Sometimes you might have a nice model with high recall that is terrible at driving revenue. Or you might simply lack visibility into why things are going well – maybe users are clicking because of the device or because of the time of the day – and isolating those slices is difficult.

It’s all about how you experiment. You need to have a lot of variance and better isolate your sample and the results. When I was at MyHammer, we would try five or six variants and see the performance before moving forward with a model. It’s all about exploit-and-explore with recommendation systems.

In a recent piece that touches on the necessity of model monitoring, you quote Mike Tyson: “everyone has a plan until they get punched in the face.” Can you tell us more? 

When you’re putting machine learning models in production, we often imagine a happy path where everything works. Unfortunately, it’s not this easy – day one might be easy as you push a newly-trained model into production, but day two onward is hard because you need to operationalize the model and then you need to take care of things like data drift, privacy of the user, regulations, the latency of your model, the latency of your API and a host of other problems that can happen. No matter how good your model is, once the model is in production  reality invades. That’s where the analogy of getting punched in the face comes into play; there are always problems in the real world, most of them unforeseen.

In the piece I mentioned earlier, you also talk about “hyperspecialization, and the lack of cross-functionality” contributing to complacency and team dynamics that might exacerbate disconnects. How can teams overcome that? 

The way that I perceive the MLOps movement as a philosophy or community practice is as a response to what preceded it. Given data science used to predominate – with most people trained in statistics, data modeling or data wrangingling – when it came to deploy models to have automated decisions provided by some kind of platform, many did not fit well into the world of production.

Often, the root cause of this is education. Data scientists do not usually learn any kind of software engineering practices, or the software engineering lifecycle. So if you have advanced analytics and you want to predict whether a customer might churn, the question becomes how to put that inside of the platform in an automated way. The purpose of the MLOps movement is to break those silos down and to help data scientists, data analysts, or analytics engineers be a bit more end-to-end – spanning the full cycle from inception to data modeling to then putting it in production with automated decisions.

You’ve seen the ML infrastructure space really take off in the past few years, with tools available across the ML lifecycle. What’s your take on that?

I think it’s an exciting time in terms of the tooling available. When I started out in the industry, “data science” was sometimes referred to as “data mining” – and most of the time you worked with close source code without any models that you could just easily transpose into production. It was very convoluted.

Now, every day new companies emerge to tackle different problems and while it’s harder in some ways – the ecosystem is quite fragmented – it’s also better. The main challenge now is how to navigate in a huge ecosystem and how to pick the best tools to get the job done.

Any advice to those starting out in the ML career journey? What do you think the skills and requirements are — and is it a good time to enter the industry?

This is an exciting time to become a machine learning engineer because companies are becoming aware that analysis about the past is not enough – you need teams that you can trust to put automated decisions in place.

If I were a data scientist today, I would study a lot of software engineering – clean code, clean architecture, design patterns and so on – because sometimes getting a model into production quickly is more important than perfection of the model itself. Software engineers have this mindset naturally.

You keynoted a talk on security in ML at Pycon Africa last year outlining some of the latent risks in ML development. Can you walk us through a few of those?

That was an interesting one that I noticed in my prior role. By chance, I discovered that inside of scikit-learn some objects – especially models – can be changed by anyone. Once you load your pickle file, you can set some variable inside of the model so that it only gives the same answer over and over. I opened an issue to the scikit-learn project, but I was basically told “it is what it is.”

This bothered me, so I went down the rabbit hole of security to learn more. Security is important not only in terms of the objects that we develop like models, but also with adversarial machine learning where someone can just rebuild your model and poison your data – with the base of this poisoned data leading to the wrong predictions and outputs. Few people are talking about this (mostly academics), but I think the community should have more eyes on the issue.

One reason it’s so troubling to me personally is that a lot of open source projects that we’re using as a data scientist or ML engineer can have dependencies upstream where you end up bringing vulnerabilities home. We can do better, with more awareness and working to eliminate those risks.

You have such an interesting background, starting in the Brazilian Army and then as a professor and data engineer. Any lessons from those roles that inform your work in ML?

There are a lot of lessons, but I think the main one is consistency. You don’t need to operate maxed-out every day, but you need to have this kind of consistency at a high level – consistency brings on reliability and reliability brings trust. I try to translate that for my platforms and development every day.