Reviewing and refining model versions
After the first version of the model training is finished, analyze the resulting model metrics and configure new versions of the experiment until you have achieved the results you need.
When you run the experiment version, you are taken to model view, where you can analyze the resulting model metrics. You can switch to schema or data view at any time. When you need to return to model view, click the model view icon.
You will know the first version of the training is finished when all metrics populate in the Model metrics table, and a trophy icon appears next to the top model.
Analyzing the model
In model view, we can see that the top algorithm is scored with a trophy icon. This means that it is the top-performing model based on the F1 score.
Do the following:
-
In the top right of the table, click the column picker button. Here, you can view all the available metrics for our problem, and add or remove metrics as needed. Select any metrics you want to show in the table, or leave the default metrics.
-
In the Model metrics table, click the Algorithm filter dropdown and select the algorithm corresponding to the top-performing model.
-
Toggle on Show training data metrics.
You can now see the metrics from the cross-validation training and compare them to the holdout metrics. For each holdout metric column, there is a corresponding 'train' column for the equivalent metric from the training data.
-
Click Clear filters and switch the Show training data metrics toggle back to off.
-
Sort the models by performance, from highest to lowest, by clicking the F1 column header. You might choose to exclude low-performing algorithms or focus only on the best one to get faster results in the next iteration of the training. We will address this when configuring v3 in a later section.
-
Scroll down below the metrics table to see visualizations for the selected model.
-
Click or View configuration to expand the Experiment configuration pane.
-
Click New version to create a draft of the next experiment version.
-
From the Permutation importance chart, as well as the Features list in the Experiment configuration pane, notice that this first iteration of the model is relying heavily on the DaysSinceLastService feature, with all other features having almost no significance compared to it.
This disparity, and the models' extremely high performance, should be viewed as a sign that something is wrong. In this case, there was no logic defined during data collection to stop the counting of the number of days since a customer's last service ticket for customers that canceled their subscription. As a result, the model learned to associate a large number of days since last service ticket with a value of yes in the Churned field.
This is an example of data leakage, because in a real-world scenario, the model would only have access to information up until the prediction is made, and the number of days contained in this field were collected past that point of measurement. For more information about data leakage, see Data leakage.
We need to remove the "leaky" feature DaysSinceLastService from the experiment configuration, since it is skewing the resulting models. Note that in a real-life use case, there needs to be thorough investigation of the data quality and logic, prior to model creation, to ensure that the resulting model is trained properly.
We will address this issue in the next section, when configuring v2.
Configuring and running version 2
Since most of the model training will change after this data leakage issue is fixed, let's configure a new version before completing any further refinements.
Do the following:
-
From a previous step, you already have the Experiment configuration pane opened for configuring v2.
-
Under Features in the Experiment configuration pane, clear the DaysSinceLastService checkbox.
-
Click Run v2.
Configuring and running version 3
After the second version of the experiment has finished running, click the checkbox next to the top-performing v2 model in the metrics table (marked with a trophy icon). This refreshes the page with the metrics for that model.
Above the Model metrics table, click the Version filter dropdown and select 2. This allows you to focus only on the v2 model metrics.
You will see that the list of important features has changed substantially since addressing the data leakage. The top-performing model might also use a different algorithm than the top-performing model for v1.
Do the following:
-
Look at the Permutation importance chart. There might be features that provide much less influence on our model than the other features. They are of little value for this use case and can be seen as statistical noise. You can try to remove some of those features to see if it improves the model scores.
-
Click or View configuration to expand the Experiment configuration pane.
-
Click New version to create a draft of the next experiment version.
-
In the Experiment configuration pane, under Features, clear the checkboxes for one or more features that are exerting little to no influence on the model.
-
Look at the Model metrics table. You might choose to exclude some low-performing algorithms or focus only on the best ones to get faster results in the next iteration of the training.
-
In the Experiment configuration pane, under Algorithms, optionally clear the checkboxes for a few of the low-performing algorithms.
-
Click Run v3.
Comparing experiment versions
In the Model metrics table, click Clear filters.
After v3 has run, click the checkbox next to the top-performing v3 model to view its metrics.
Click More model filters, and select the Top performers filter. You can see metrics for the top performers of each iteration of the experiment.
The first version of the training resulted in the highest scores, but these metrics were highly exaggerated and unrealistic predictors of performance which were caused by the data leakage issue. In v3, the F1 score of the top-performing model increased from that of the top-performing v2 model.
In a real-world scenario, it is important to repeat these refining steps as many times as needed before deploying your model, to ensure that you have the best possible model for your particular use case.
In this tutorial, move to the next section about deploying your model.