Reviewing and refining models

After the first version of the model training is finished, analyze the resulting model metrics and configure new versions of the experiment until you have achieved the results you need.

When you run the experiment version, you are taken to the Models tab, where you can start analyzing the resulting model metrics. You can access Schema view and Data view by returning to the Data tab. More granular analysis can be performed in the Compare and Analyze tabs.

You will know the first version of the training is finished when all metrics populate in the Model metrics table, and a trophy Trophy icon appears next to the top model.

AutoML is continually improving its model training processes. Therefore, you might notice that the model metrics and other details shown in the images on this page are not identical to yours when you complete these exercises.

Analyzing the models from v1

Switch back to the Models tab. In the Model metrics table, the top model is scored with a trophy Trophy icon. This means that it is the top-performing model based on the F1 score.

Model metrics table showing key model metrics. — Model metrics table showing top-performing v1 model

Switch back to the Models tab. In the Model metrics table, the top model is scored with a trophy Trophy icon. This means that it is the top-performing model based on the F1 score.

Sort the models by performance, from highest to lowest, by clicking the F1 column header. You might choose to exclude low-performing algorithms or focus only on the best one to get faster results in the next iteration of the training. We will address this when configuring v3 in a later section.

Identifying data leakage

Look at the Model insights charts on the right side of the page. These charts give you an indication of the relative importance of each feature, as well as model performance.

From the Permutation importance chart, as well as the Features list in the Experiment configuration pane, notice that this first iteration of the model is relying heavily on the DaysSinceLastService feature, with all other features having almost no significance compared to it.

Permutation importance chart for top-performing v1 model showing feature 'DaysSinceLastService' consuming almost all of the influence for the entire model — Permutation importance chart in Models tab, showing data leakage

This disparity, and the models' extremely high F1 performance scores, should be viewed as a sign that something is wrong. In this case, there was no logic defined during data collection to stop the counting of the number of days since a customer's last service ticket for customers that canceled their subscription. As a result, the model learned to associate a large number of days since last service ticket (present for customers who canceled years ago) with a value of yes in the Churned field.

This is an example of data leakage, because in a real-world scenario, the model would only have access to information up until the prediction is made, and the number of days contained in this field were collected past that point of measurement. This issue is known as target leakage, which is a form of data leakage. For more information about data leakage, see Data leakage.

We need to remove the "leaky" feature DaysSinceLastService from the experiment configuration, since it is skewing the resulting models. Note that in a real-life use case, there needs to be thorough investigation of the data quality and logic, prior to model creation, to ensure that the resulting model is trained properly.

We will address this issue when configuring v2.

Configuring and running version 2

Let's configure a new version to address the data leakage.

Do the following:

Click View configuration to expand the experiment configuration panel.
Click New version.
In the panel, under Features, clear the DaysSinceLastService checkbox.
Click Run v2.

Experiment configuration panel showing configuration of v2. Feature 'DaysSinceLastService' is removed — Removing DaysSinceLastService for v2

Analyzing the models from v2

After the second version of the experiment has finished running, click the checkbox next to the top-performing v2 model in the Model metrics table (marked with a trophy Trophy icon). This refreshes the page with the metrics for that model.

Comparing training and holdout metrics

You can view additional metrics and compare the metrics from the cross-validation training to the holdout metrics.

Do the following:

In the experiment, switch to the Compare tab.

An embedded analysis opens. You can use the interactive interface to dive deeper into your comparative model analysis and uncover new insights.
In the Sheets panel on the right side of the analysis, switch to the Details sheet.
Look at the Model Metrics table. It shows model scoring metrics, such as F1, as well as other information.
Version 1 of training was effected by target leakage, so let's focus only on v2. Use the Version filter pane on the right side of the sheet to select the value 1.
In the Columns to show section, use the filter pane to add and remove columns in the table.
In the drop down listbox, add additional metrics. Training scores for each metric are shown as values ending in Train. Add some training metrics to the table.

You can now see the F1 metrics from the cross-validation training and compare them to the holdout metrics.

Using the 'Compare' tab in the experiment to view training scores alongside holdout scores — Adding and viewing training scores for comparison with the holdout scores

Identifying features with low importance

Next, we should check to see if there are any features with low permutation importance. Features that have little to no influence on the model should be removed for improved prediction accuracy.

Do the following:

In the experiment, switch back to the Models tab.
Look at the Permutation importance chart. The bottom four features—StartMonth, DeviceType, CustomerTenure, and Territory—provide much less influence on our model than the other features. They are of little value for this use case and can be seen as statistical noise.

In v3, we can remove these features to see if it improves the model scores.

Permutation importance chart for selected v2 model showing very low permutation importance for several features — Models tab with top-performing v2 model selected. The Permutation importance chart shows there are features which exert little to no influence on the model.

Identifying low-performing algorithms

We can also look at the Model metrics table to see if we can remove any algorithms from the v3 training. You can remove low-performing algorithms when refining models so that the training runs faster in subsequent iterations.

In the experiment, switch back to the Models tab.
In the Model metrics table, use the Version filter to show only the models from v2.
Look at the F1 scores for each Algorithm. If certain algorithms are creating models that score significantly lower than others, we can remove them from the next version.

Configuring and running version 3

Do the following:

Click View configuration to expand the experiment configuration panel.
Click New version.
In the panel, under Features, clear the checkboxes for StartMonth, DeviceType, CustomerTenure, and Territory.
Optionally, expand Algorithms and clear the checkboxes for Gaussian Naive Bayes and Logistic Regression.
Click Run v3.

Analyzing the models from v3

After v3 has run, you can clear the Version filter from the Model metrics table. Select the top-performing model from v3.

Let's do some quick comparison of the models across all versions.

The first version of the training resulted in the highest scores, but these metrics were highly exaggerated and unrealistic predictors of performance which were caused by the data leakage issue. In v3, the F1 score of the top-performing model increased from that of the top-performing v2 model.

Using the Model metrics table to quickly compare models trained across each of the versions of the experiment — Model metrics table showing sorted F1 scores for models across all three versions. F1 score improved in v3 after removing features with low importance.

As explored earlier, you can switch to the Compare tab for deeper comparison of model scores.

Focusing on a specific model

At any point during model analysis, you can perform granular analysis of an individual model. Explore prediction accuracy, feature importance, and feature distribution with an interactive Qlik Sense experience.

Do the following:

With the top-performing v3 model selected, click the Analyze tab.

An embedded analysis opens.
With the Model Overview sheet, you can analyze the prediction accuracy of the model. Analysis is enhanced by the power of selections. Click a feature or predicted value to make a selection. The data in the embedded analysis adjusts to filter the data. You can drill down in specific feature values and ranges to view how the feature influence and prediction accuracy change.
Switching to the other sheets, you can view visualizations for prediction accuracy, feature distribution, and impact distribution (SHAP). This analytics content can help you to:
- Uncover the key drivers influencing trends in the data.
- Identify how specific features and cohorts are affecting predicted values and prediction accuracy.
- Identify outliers in the data.

Using the 'Analyze' tab to enhance analysis with the power of selections — Analyze tab in an ML experiment

Next steps

In a real-world scenario, it is important to repeat these refining steps as many times as needed before deploying your model, to ensure that you have the best possible model for your particular use case.

In this tutorial, move to the next section about deploying your model.

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – let us know how we can improve!

Leave your feedback here