Creating and configuring the experiment
The first step is to create and configure the experiment. You will use the training dataset you uploaded earlier to train the model until it is ready to be deployed for making predictions.
Creating a new experiment
Do the following:
-
Go to the Create page of the Analytics activity center and select ML experiment.
-
Enter a name for your experiment, for example, Customer churn tutorial.
-
Optionally, add a description and tags.
-
Choose a space for your experiment. It can be your personal space or a shared space.
-
Click Create.
-
Select the training dataset file. This will be either of the following, depending on whether you are working with CSV or QVD:
-
AutoML Tutorial - Churn data - train.csv
-
AutoML Tutorial - Churn data - train.qvd
-
Reviewing the data
Now you are ready to start configuring your experiment, but before you start, let's have a look at the dataset.
We start out in the Data tab. The default view is the Schema view. Here we can see a table where each row represents a column in your dataset. Statistics and insights have been generated in automatic data preparation. You might have to scroll to the right-hand side of the schema to see the Insights.
We can see that AccountID has been excluded due to high cardinality. This means that the column contains too many unique values. The feature Country has been excluded for the opposite reason: the value is the same for all rows. These two features would not provide any value to the machine learning models.
We can also see that the categorical feature Territory has been impact encoded. Hover over the warning and information icons for more information.
Click Data view. In this view, we can see more information about each column, including sample data.
Selecting a target
We want our machine learning model to predict customer churn, so we select Churned, the final column in the dataset, as our target.
Do the following:
-
Switch back to Schema view.
-
Hover over Churned and click the target icon that appears.
In the experiment configuration panel, we can now see that Churned has been selected. We can also see which features are automatically selected and excluded. Since Churned is the target, it will not be used as a feature. We can also see that this experiment will be treated as a binary classification problem.
Selecting features
For this first run of our experiment, we will include all features and algorithms that have been selected by default. However, if you already know that certain features have no influence on the target—based on your business knowledge—you could deselect them at this point to exclude them from the training.
Changing the optimization settings
Intelligent model optimization is turned on by default. With intelligent model optimization, AutoML provides automatic refinement of model training. However, the goal of this tutorial is show you how to manually identify certain issues with your feature data and training results.
For an example of how to train models with intelligent model optimization, see Example – Training models with automated machine learning.
Let's turn intelligent model optimization off to demonstrate manual refinement.
Do the following:
-
In the experiment configuration panel, expand the Model optimization section.
-
Switch from Intelligent to Manual.
Training the experiment
The configuration is done and we are ready to start the training.
Do the following:
-
In the bottom right corner of the experiment window, click Run experiment.
When the experiment has finished running, we can move on to the next step, which is to review the resulting model metrics.