marquee background
Enhance Recommendation System Model Performance
Recommendation systems

Optimize model performance to maximize recommendation system model outcomes

Recommendation systems are powerful tools used across industries, delivering an immense amount of business value. Implementing ML observability can help increase adoption and conversion, maximize sales and revenue, and positively influence user behavior.

Get Started →

See how Arize can help you optimize your model performance by calculating and monitoring model precision to calibrate user experience, identify anomalous distribution behavior to monitor for model degradation, and dive into low-performing segments to understand where to begin improvements.

Download Notebook →

Proactively detect drift, data quality, and performance issues to ensure confidence with models in production
Visualize model performance at a granular level and automatically surface the first steps to model improvement
Identify the right time to retrain a model based on drifting features, predictions, and actuals
Get ML observability in minutes.

What are Recommendation Systems?

A recommender system, or a recommendation system (sometimes replacing ‘system’ with a synonym such as platform or engine), is a subclass of information filtering system that seeks to predict the “rating” or “preference” a user would give to an item. It can be further defined as a system that produces individualized recommendations as output or has the effect of guiding the user in a personalized way to interesting objects in a larger space of possible options,

While most people associate recommendation systems with media and e-commerce sites, they have increased in popularity to nearly every industry domain. From healthcare to finance, there is a demand for insightful predictions to increase customer, client, and/or user experience. These systems can operate using a single input, like music, or multiple inputs within and across platforms like news, books, and search queries.

How is data for recommendation systems collected?

​Examples of explicit data collection include the following:​

  • Asking a user to rate an item on a sliding scale.
  • Asking a user to search.
  • Asking a user to rank a collection of items from favorite to least favorite.
  • Presenting two items to a user and asking him/her to choose the better one of them.
  • Asking a user to create a list of items that he/she likes (see Rocchio classification or other similar techniques).

Examples of implicit data collection include the following:

  • Observing the items that a user views in an online store.
  • Analyzing item/user viewing times.
  • Keeping a record of the items that a user purchases online.
  • Obtaining a list of items that a user has listened to or watched on his/her computer.
  • Analyzing the user’s social network and discovering similar likes and dislikes.

How are recommendation systems evaluated with data?

There are many ways to evaluate recommendation systems, since they are predictive models with algorithms that, generally, look to minimize the error of a function with traditional data science metrics. Since it is important to measure the prediction error that compares the expected results with the actuals the model produces as an output, it’s also important to keep precision and recall in mind.

  • Precision is concerned about how many recommendations are relevant among the provided recommendations.
  • Recall is concerned about how many recommendations are provided among all the relevant recommendations.

It is important that both metrics are performing well since recall could be 100% but the model’s precision could be suffering.

Challenges with Recommendation Systems

  • Cold start: For a new user or item, there isn’t enough data to make accurate recommendations.
  • Scalability: In many of the environments in which these systems make recommendations, there are millions of users and products. Thus, a large amount of computation power is often necessary to calculate recommendations.
  • Sparsity: The number of items sold on major e-commerce sites is extremely large. The most active users will only have rated a small subset of the overall database. Thus, even the most popular items have very few ratings.
  • Synonyms: Most recommender systems are unable to discover latent associations and thus treat these products differently.
  • Shilling attacks: In a recommendation system where everyone can give the ratings, people may give many positive ratings for their own items and negative ratings for their competitors’. It is often necessary for the collaborative filtering systems to introduce precautions to discourage such manipulations.
  • Diversity: Collaborative filters are expected to increase diversity because they help us discover new products. Some algorithms, however, may unintentionally do the opposite. Because collaborative filters recommend products based on past sales or ratings, they cannot usually recommend products with limited historical data. This bias can then lead to a negative feedback loop.
  • New-item problem: When a new item is introduced the lack of ratings on it makes it difficult to confidently recommend it to a user.
  • Non-normalized ratings: In a recommendation system that takes feedback in the form of a rating, there is nothing to stop a user from giving the same rating for every item. This could either be due to biased ratings or disinterest. For example, a user may rate all movies they mildly to really enjoyed as five stars or may rate everything as 3 stars to skip ahead.
  • User Explainability: It is often difficult, if not impossible, for a user to know what action/actions of theirs have led to a specific recommendation and if there is anything they could do to no longer receive recommendation from with a specific feature.
  • Model Observability: The data science team responsible for creating the recommendation engine may not be able to easy identify the features and events that led to a specific recommendation and whether or not those recommendations are biased. Arize was built to help ML practitioners perform a root cause analysis and explain why a model is behaving a certain way in order to improve it.

Business Impact of Recommendation systems

Companies usually do not publicly share the exact details of the recommendation system or how profitable they are, however blog posts frequently are published with some insight. Netflix, disclosed in a blog post that “75% of what people watch is from some sort of recommendation”, and YouTube reports that 60% of the clicks on the home screen are on the recommendations. In another, later report on the system designed at Netflix, the authors reveal that recommendations led to a measurable increase in user engagement and that the personalization and recommendation service helped to decrease customer churn by several percentage points over the years. As a result, they estimate the business value of recommendation and personalization as more than 1 billion US dollars per year and according to a statement of Amazon’s CEO in 2006, about 35% of their sales originate from cross-sales (i.e., recommendations).

Recommended Resources