Root Mean Square Error (RMSE) In AI: What You Need To Know

Shittu Olumide,  Contributor  | Published August 08, 2023

Root mean square error (RMSE) is the residuals’ standard deviation, or the average difference between the projected and actual values produced by a statistical model. Remainders stand for the separation between the data points and the regression line.

The degree of dispersion of these residuals is measured by RMSE, which demonstrates how well the actual data fit a given AI models’ predictions.

As may be inferred from the illustration above, the root mean square error decreases as the data points go closer to the regression line since the model has less error. Predictions made by a model with a lower error are more accurate.

The RMSE values have the same units as the dependent (outcome) variable and can vary from 0 to positive infinity. You may utilize the root mean square error to gauge the degree of inaccuracy in a regression or other statistical model. A 0 value indicates that the expected and actual values match precisely. Low RMSE values show that the model makes more accurate predictions and fits the data well. Higher levels, on the other hand, imply more significant mistakes and fewer accurate forecasts.

How do you calculate RMSE?

Before you jump into calculating RMSE, you must understand its formula. Given that the RMSE formula is basically the standard deviation formula, it should be recognizable to anyone with training in statistics. That makes sense because the root mean square error equals the residuals’ standard deviation. It calculates the variance between the observed and expected values.

Here is the RMSE formula:

rmse formula

Where:

  • yi is the actual value of the ith observation.
  • ŷi is the predicted value for the ith observation.
  • P is the number of the parameter estimated, including the constant.
  • N is the number of observations.

The root mean square error may be determined by computing the residual for each observation (y – ŷ ) and squaring it. Next, add up each squared residual. To get the average squared residual, or mean squared error (MSE), divide that total by the degrees of freedom (N-P) associated with errors in your model. In order to determine the RMSE, take the square root.

Statistics professionals refer to the RMSE formula’s numerator as the sum of squares.

Why do you need RMSE?

RMSE is helpful in many areas, particularly in regression analysis or when evaluating models that make numerical predictions. Here are some reasons why RMSE is commonly used:

  • Measure of prediction error: The average difference between the values in the dataset that are anticipated and those that are really present is measured by RMSE. It calculates the difference between values that were seen and those that were predicted by a model. The average squared prediction error is given as a single value using the RMSE formula, which computes the square root of the average squared errors.
  • Sensitivity to outliers: RMSE is susceptible to data outliers. Large mistakes have an excessively negative effect on the total value since squaring the errors necessitates it. In order to recognize and comprehend the effects of extreme forecasts or outliers on the model’s performance, this sensitivity might be helpful.
  • Interpretability: RMSE is expressed in the same units as the predicted variable, which makes it easier to interpret and compare across different models or datasets. For example, if you’re predicting house prices in dollars, the RMSE will also be in dollars, allowing you to assess the average prediction error in a meaningful and intuitive way.
  • Optimization criterion: During model training, RMSE is frequently employed as optimization criteria. When fitting the model to the training data, many machine learning methods try to reduce the RMSE. The model is urged to create predictions that are as near to the real values as feasible by reducing the RMSE.
  • Comparison between models: It provides a standardized measure to compare the performance of different models. When evaluating multiple models or algorithms, you can use RMSE to assess which one has lower prediction errors and is more accurate in making predictions on unseen data.

It’s important to note that RMSE is not the only metric used to evaluate models. Depending on the specific problem and context, other metrics such as Mean Absolute Error (MAE), R-squared, or precision and recall may also be relevant.

Where RMSE is useful

Root Mean Square Error (RMSE) is sometimes preferred over Mean Squared Error (MSE) because it provides a measure of error that is in the same unit as the target variable, making it easier to interpret and compare. Additionally, RMSE penalizes larger errors more than MSE, making it more sensitive to outliers.

Because RMSE is a measure for assessing the precision of prediction models, notably in statistics and machine learning, in the following instances of various industries, RMSE can be helpful:

Finance and Economics

In finance, it can be used to measure the accuracy of financial forecasting models. For example, financial institutions may use RMSE to evaluate the accuracy of their stock market prediction models or economic forecasting models. A lower RMSE indicates a more accurate model, crucial for making informed investment decisions.

Energy

In the energy sector, RMSE can be used to assess the accuracy of energy demand or load forecasting models. Power companies rely on accurate load forecasting to optimize power generation, manage resources efficiently, and avoid under or overproduction of electricity. RMSE helps evaluate the accuracy of these forecasting models and provides insights for improving energy management strategies.

Climate Science

Climate scientists often use RMSE to assess the accuracy of climate models. These models predict future climate patterns, such as temperature, precipitation, and sea level rise. RMSE is useful for comparing the model’s predictions against observed data, helping scientists identify areas of improvement in the models and refine their projections.

Transportation and Logistics

In logistics and transportation, RMSE may be used to assess how well demand forecasting models anticipate passenger or freight quantities. This data is necessary for managing inventories, scheduling resources efficiently, and optimizing transportation routes. RMSE is a gauge of accuracy and aids businesses in improving their forecasting models so they can make better choices.

Retail and Sales

Retailers often employ predictive models to forecast product sales or demand. RMSE can be used to assess the accuracy of these models and identify areas where improvements are needed. Accurate sales forecasting helps retailers optimize inventory levels, plan promotions, and manage supply chains more efficiently.

Where other metrics would be better-suited to the task

Despite the fact that RMSE has a broad range of applications and offers useful insights on a model’s prediction accuracy, in some circumstances, depending on the precise needs of the task, other metrics may be more appropriate. Here are a few examples:

  • Mean Absolute Error (MAE): The average absolute difference between the expected and actual values is measured by MAE. MAE is more resistant to outliers since it does not punish huge mistakes as harshly as RMSE. MAE may be preferable if the task requires outlier detection or if you want a measure that weighs all mistakes equally.
  • Mean Absolute Percentage Error (MAPE): The average percentage difference between the anticipated and actual values is calculated using MAPE. It is helpful when you want to evaluate the model’s relative performance across several data scales. In predicting jobs where percentage mistakes are more important than absolute errors, MAPE is frequently utilized.
  • R-squared (R²): R-squared calculates the percentage of the dependent variable’s variation that can be predicted from the independent variables. It may be understood as the proportion of variation explained by the model, and it shows how well the model fits the data. R-squared may enhance RMSE by providing details about the model’s explanatory power and is useful for determining the overall goodness of fit.
  • Precision, Recall, and F1 Score: These metrics are frequently employed in classification tasks, where the main objective is to predict categorical labels rather than continuous values. Metrics like accuracy, recall, and F1-score might be more suited than RMSE if your work includes categorizing data into distinct categories or finding abnormalities.

Hands-on exercise

For the hands-on example, we will work on a retail prediction project. This project contains three datasets, namely stores (anonymized information about the 45 stores, indicating the type and size of the store), features (includes further information on the activity in that region, department, and store for the given dates), and sales (Historical sales data, which covers to 2010-02-05 to 2012-11-01).

Making judgments based on limited historical data is one of the challenges of modeling retail data. Holidays and certain significant events only occur once a year, as does the opportunity to assess the financial effects of strategic choices. At the end of the analysis and prediction, we should get our RMSE, MSE, MAE, R2, and Adjusted R2 metrics.

Note: follow the code from top to bottom to understand the flow.

Importing the libraries

import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

features = pd.read_csv('Features data set.csv')
sales = pd.read_csv('sales data-set.csv')
stores = pd.read_csv('stores data-set.csv')

Let’s have a quick look at one of the datasets.

regression dataset example

Let’s explore the “feature” dataframe and the “sales” dataframe. Features dataframe contains additional store, department, and regional activity data for the given dates. While the “sales” dataframe contains historical sales data covering 2010-02-05 to 2012-11-01.

features['Date'] = pd.to_datetime(features['Date'])
sales['Date'] = pd.to_datetime(sales['Date'])

features head

We have to merge two dataframes together.

df = pd.merge(sales, features, on = ['Store','Date','IsHoliday'])
df = pd.merge(df, stores, on = ['Store'], how = 'left')

sns heatmap

We need to work on our Date columns, for use to be able to use the data we have in the dataset.

df[['year','month','day']] = df.Date.apply(lambda x: pd.Series(x.strftime("%Y,%m,%d").split(",")))

df.drop(['year','day','Date'], axis=1, inplace=True)
df.fillna(0, inplace=True)

df.isnull().sum()

df is null

We will check for holidays in the dataset and replace it with 0 or 1.

df['month']=df['month'].astype(str).astype(int)

df.IsHoliday = df.IsHoliday.replace({False:0, True:1})

Let’s plot the graph to see the Holidays in the dataset

fig = plt.figure(figsize=(16,5))
fig.add_subplot(2,2,1)
sns.countplot(df['IsHoliday'])
fig.add_subplot(2,2,2)
sns.countplot(df['Type'])

subplot type count

df_target = df['Weekly_Sales']
df_final = df.drop(['Weekly_Sales'], axis=1)

df_final = pd.get_dummies(df_final, columns = ['Store', 'Dept', 'Type'], drop_first =True)
df_final.isnull().sum()

df final is null

X = np.array(df_final).astype('float32')
y = np.array(df_target).astype('float32')

We now have the dataset we can work with; let’s split it into train and test sets to be able to train the model on two different samples of the dataset.

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.15)
X_test, X_val, y_test, y_val = train_test_split(X_test, y_test, test_size = 0.5)

We can now have a look at the shape of data we are working with.

print('Shape of X_test = ', X_test.shape,  '\nShape of y_test ='  , y_test.shape)
print('Shape of X_train = ', X_train.shape,  '\nShape of y_train ='  , y_train.shape)
print('Shape of X_val = ', X_val.shape,  '\nShape of y_val ='  , y_val.shape)

example shape of the data

Using XGBoost we can train or fit our data.

import xgboost as xgb
model = xgb.XGBRegressor(objective ='reg:squarederror', learning_rate = 0.2, max_depth = 10, n_estimators = 100)
model.fit(X_train, y_train)

import xgboost

It’s time to make predictions on the test data.

y_predict = model.predict(X_test)

result = model.score(X_test, y_test)
print("Accuracy : {}".format(result))

We were able to achieve a 90% accuracy with our predictions, so now we have to get our RMSE, MSE, MAE, R squared and adjusted R squared metrics from our work done so far.

From sklearn.metrics import r2_score, mean_squared_error, mean_absolute_error
from math import sqrt
k = X_test.shape[1]
n = len(X_test)
RMSE = float(format(np.sqrt(mean_squared_error(y_test, y_predict)),'.3f'))
MSE = mean_squared_error(y_test, y_predict)
MAE = mean_absolute_error(y_test, y_predict)
r2 = r2_score(y_test, y_predict)
adj_r2 = 1-(1-r2)*(n-1)/(n-k-1)

print('RMSE =',RMSE, '\nMSE =',MSE, '\nMAE =',MAE, '\nR2 =', r2, '\nAdjusted R2 =', adj_r2) 

metrics

Setting Up Arize

We will log our model data to Arize so as to monitor and improve the performance of the model. Arize is the machine learning observability platform for machine learning practitioners to monitor, troubleshoot, and explain models.
With Arize, we can:

  • Track real-time model performance with support for delayed ground truth/feedback.
  • Use tracing and explainability to identify root causes of model failures/performance deterioration.
  • Compare the performance of many models.
  • Track surface drift, data quality, model fairness/bias metrics, and lots more.

To use Arize, we will have to sign up for free and login to access the dashboard. This is because we will have to grab the space and API keys.

Install Arize locally on your computer.

pip install arize

After installation, We will initialize the Arize client from arize.pandas.logger to call Client.log().

from arize.pandas.logger import Client, Schema

Once that is done, we will set up the Arize client; we will need the API key and the Space key in the Space setting on the dashboard. We will also give our model an id and a version using the variable model_id & model_version, this will help us identify our deployed model on the Arize dashboard.

API_KEY = ‘***********’
SPACE_KEY = ‘***’  
arize_client = Client(space_key=SPACE_KEY, api_key=API_KEY)

model_id = 'Retail prediction'
model_version = '1.0.0'

if SPACE_KEY == "SPACE_KEY" or API_KEY == "API_KEY":
    raise ValueError("❌ NEED TO CHANGE SPACE AND/OR API_KEY")
else:
    print("Done ✅:  Now we can start using Arize!")

Now that we are connected to Arize, we have to define the model schema. Arize stores model data, and this data is organized via model schema, and this schema consists of model records, with each record containing model inputs, model outputs, timestamp metadata, and lots more.

Let’s define the schema:

schema = Schema(
    prediction_id_column_name="Store", 
    prediction_label_column_name="Weekly_Sales",
    actual_label_column_name="IsHoliday",
    feature_column_names=[
       'Store', 'Dept', 'Weekly_Sales', 'IsHoliday', 'Fuel_Price',
       'Fuel_Price', 'Unemployment', 'Type',
       'Size', 'month'
       ],
)

To learn more about the model schema, check out this documentation.

Finally, we log our model to Arize using the arize_client.log()  function. A link will be displayed once you have successfully deployed your model; you can follow the link to the deployment webpage. Note: It usually takes up to 10 minutes for Arize to populate data throughout the platform.

response = arize_client.log(
    model_id=model_id, 
    model_version=model_version,
    path='inferences.bin',
    batch_id=None,
    
    metrics_validation=[Metrics.CLASSIFICATION],
    environment=Environments.PRODUCTION,
    dataframe=df,
    schema=schema,
    model_type=ModelTypes.BINARY_CLASSIFICATION
)

Visualize the model performance

With the help of Arize’s intuitive dashboard, we can visualize the model performance.

log visualize data arize

We can also set monitors, in the monitor’s tab on the dashboard, and set Drift Monitors and Performance monitors.

monitors setup

Summary

In conclusion, RMSE serves as a valuable metric for measuring the accuracy of a model’s predictions, allowing us to quantify the level of error between predicted and actual values. By minimizing RMSE, we can strive for more precise and reliable predictions.

By incorporating RMSE as an evaluation metric and embracing model observability, we can enhance our predictive models’ transparency, trustworthiness, and overall quality. This empowers us to make more informed decisions, detect and address model shortcomings, and build more accurate, reliable, and explainable models.

In this article, we picked up a real work scenario and we calculated the RMSE by working on a retail prediction project. We were able to create the model, predict values, we got a nice accuracy and most importantly we logged our model to Arize for monitoring and for more observability.