The configuration of experiments consists of selecting the target and the features that the model will use to predict the target. You can also configure a number of optional settings.
To support you in the selection of a target, the historical dataset is analyzed and summary statistics are displayed about each column in the dataset. Several automatic preprocessing steps are applied to the dataset to make sure that only suitable data is included. For more details on the data preprocessing, see Automatic data preparation and transformation.
The default view is the schema view, where each column in your dataset is represented by a row in the schema with information and statistics. To get more information and sample data for each column you also have the data view. Click and to change between the views.
Click to open or close the Experiment configuration side pane. Here you find information about your experiment and the current configuration.
Selecting a target
The target column contains the values that you want the machine learning model to predict. You can change target column until you start the first training, after that it is locked for editing.
Do the following:
Hover over the column and click the icon that appears.
The target column is now indicated by and the other available columns are automatically selected as features.
When the target is selected, you can start running the first version of the experiment. Read more in Training experiments. You can do additional configuration at this point—described below—or adjust the configuration after you have reviewed the training results.
Certain columns in your dataset may not be selectable as a target for your experiment, or may have specific processing applied to them. For explanations of common characteristics detected in training data, see Configuring experiments.
Selecting feature columns
With the target set, you can choose which of the other available columns to include in the training of the model. Exclude any features that you don't want to be part of the model. Note that the column will stay in the dataset but will not be used by the training algorithm.
At the top of the Experiment configuration pane, you can see the number of cells in your dataset. If the number exceeds your dataset limit, you can exclude features to get below the limit.
You can select the feature columns in various ways:
Manually clear the checkboxes for the features you don't want to include.
Click Exclude all features and then select only the ones you want to include.
Make a search and exclude or include all features in your filtered search result.
After you have run the first version of the experiment, you can define the Number of top features to include.
Certain columns in your dataset may not be selectable as features for your experiment, or may have specific processing applied to them. For explanations of common characteristics detected in training data, see Configuring experiments.
All available algorithms are included by default and you can exclude any algorithms that you don't want to use. Normally, you would do this as part of the model refinement when you have seen the first training results. Read more in Refining models.
Changing feature data types
When a dataset is loaded, the columns are treated as categorical or numeric based on the data type. In some cases, you might want to change this setting.
For example, if the days of the week are represented by the numbers 1-7, each number represents a categorical value. By default, it is treated as a continuous ranked numeric value, so you would need to manually change the configuration to treat it as categorical.
Do the following:
In the Feature type column, click .
Select a value in the list.
You can see all columns that have a changed features type on the Experiment configuration pane under Data treatment.
Before you have run your first experiment training, it is possible to change the dataset. After that, you will need to create a new experiment if you want to use a different dataset. Note that you will lose any configuration that you have done when you change dataset.
Do the following:
On the Experiment configuration pane under Training data, click Change dataset.
Select a new dataset.
Configuring hyperparameter optimization
You can optimize the model using hyperparameter optimization. Note that this is an advanced option that could increase the training time significantly. For more information, see Hyperparameter optimization.
Do the following:
On the Experiment configuration pane, expand the Model optimization section.
Select the Hyperparameter optimization checkbox.
Optionally, set a time limit for your optimization. The default time limit is one hour.
Common insights found in training data
Depending on the quality of your dataset, there might be limitations on how you can use specific parts of the data in your experiment configuration. The Insights column in schema view is helpful in identifying particular characteristics of data fields and how they will be processed by machine learning algorithms.
The following table shows possible insights that may be displayed in the schema:
|Insight||Meaning||Impact on configuration|
|Constant||The column has the same value for all rows.||The column can't be used as a target or included feature.|
|One-hot encoded||The feature type is categorical and the column has less than 14 unique values.||No effect on configuration.|
|Impact encoded||The feature type is categorical and the column has 14 or more unique values.||No effect on configuration.|
|High cardinality||The column has too many unique values, and can negatively affect model performance if used as a feature.||The column can't be used as a target.|
|Sparse data||The column has too many null values.||The column can't be used as a target or included feature.|
|Underrepresented class||The column has a class with less than 10 rows.||The column can't be used as a target, but can be included as a feature.|