Configuring experiments

The configuration of experiments consists of selecting the target and the features that the model will use to predict the target. You can also configure a number of optional settings.

To support you in the selection of a target, the historical dataset is analyzed and summary statistics are displayed about each column in the dataset. Several automatic preprocessing steps are applied to the dataset to make sure that only suitable data is included. For more details on the data preprocessing, see Automatic data preparation and transformation.

After running v1, you can create new experiment versions if needed to further refine the model training. For more information, see Refining models.

Requirements and permissions

To learn more about the user requirements for working with ML experiments, see Working with experiments.

The interface

The following sections outline how to navigate the experiment interface to configure your experiment. For more information about the interface, see Navigating the experiment interface.

Tabbed navigation

When you create an experiment, the Data tab opens. This is where you can configure the target and features for the experiment.

After running at least one experiment version, other tabs become available. These other tabs allow you to analyze the models you have just trained in the version. If you need to configure subsequent versions with different feature selections, you can return to the Data tab.

Schema view and Data view

In the Data tab, you can alternate between the following views:

Schema view: The default view. In this view, each column in your dataset is represented by a row in the schema with information and statistics.
Data view: An alternative view you can use to access more information and sample data for each column.

Experiment configuration panel

Click Controls View configuration to open a panel where you can further customize the experiment training. The panel can be opened regardless of which tab you are viewing. This panel provides a number of additional configuration options.

With the experiment configuration panel, you can:

Select a target before training the first version
Add or remove features
Configure a new version of the experiment
Select to change or refresh the training dataset
Add or remove algorithms
Change model optimization settings

Expanded customization panel in an ML experiment — Experiment configuration panel

Selecting a target

The target column contains the values that you want the machine learning model to predict. You can change the target column until you start the first training. After that, it is locked for editing.

Do the following:

In Schema view or Data view, hover over the column.
Click the icon that appears.

The target column is now indicated by and the other available columns are automatically selected as features.

Dataset column with target symbol. — Selecting the target in Schema view

The target can alternatively be selected in the training configuration panel.

When the target is selected, you can start running the first version of the experiment. Read more in Training experiments. You can do additional configuration at this point—described below—or adjust the configuration after you have reviewed the training results.

Explanations of how your data is being interpreted and processed are shown as the experiment training continues. For more information, see Interpreting dataset insights.

Determining the type of model created

The column you select as the target determines the type of model your experiment creates. This, in turn, plays a part in determining which algorithms are used to train the model. Certain columns in your dataset may not be selectable as a target for your experiment, or may have specific processing applied to them.

The model types are:

Binary classification model
Multiclass classification model
Regression model

The table below summarizes the factors in your target that determine the type of model used.

Characteristics of target column that determine model type
Model type	Number of distinct values in column	Feature type required	Additional information
Binary classification	2	Any	-
Multiclass classification	3-10	Any	A column with more than 10 distinct, non-numeric classes is not selectable as the target.
Regression	More than 10	Numeric	-

To know what type of models your experiment trains, click Schema View configuration and expand Algorithms. The model type is visible in the title of the section.

Selecting feature columns

With the target set, you can choose which of the other available columns to include in the training of the model. Exclude any features that you don't want to be part of the model. Note that the column will stay in the dataset but will not be used by the training algorithm.

At the top of the experiment configuration pane, you can see the number of cells in your dataset. If the number exceeds your dataset limit, you can exclude features to get below the limit.

You can select the feature columns in various ways:

In Schema view and Data view

In the main views, you can:

Deselect Include all available features and then select only the ones you want to include.
Manually clear the checkboxes for the features you don't want to include.
Make a search and exclude or include all features in your filtered search result.

In the training customization panel

If you expand the experiment configuration panel, you can:

Manually clear the checkboxes for the features you don't want to include.
After you have run the first version of the experiment, you can define the Number of top features to include.

Features section in the AutoML training customization panel — Features section in the experiment configuration panel

When you select features, they are automatically assigned a feature type. The possible feature types are:

Categorical
Numeric
Date
Free text

The feature type is assigned based on the data contained in the feature column. If a feature meets certain criteria, it might be staged to become the basis for auto-engineered features. If desired, you can change whether the feature is used for automatic feature engineering. For full details about automatic feature engineering, see Automatic feature engineering.

Certain columns in your dataset may not be selectable as features for your experiment, or may have specific processing applied to them. Explanations of how your data is being interpreted and processed are shown as you navigate experiment training. For more information, see Interpreting dataset insights.

Selecting algorithms

All available algorithms are included by default and you can exclude any algorithms that you don't want to use. Normally, you would do this as part of the model refinement when you have seen the first training results. Read more in Refining models.

Algorithms section in the AutoML training customization panel. — Algorithms section in the experiment configuration panel

Changing feature types

When a dataset is loaded, the columns are treated as categorical, numeric, date, or free text based on the data type and other characteristics. In some cases, you might want to change this setting.

For example, if the days of the week are represented by the numbers 1-7, each number represents a categorical value. By default, it is treated as a continuous ranked numeric value, so you would need to manually change the configuration to treat it as categorical.

When a column is identified as containing date and time information, it is used as the basis for new generated auto-engineered features. When this happens, the original column (the parent feature) is treated as having the date feature type.

You can change the parent feature from a date feature to a categorical or numeric feature. For example, this is useful when a feature is identified as a date, but you need it to be treated as a string or number. When you do this, you can no longer use its auto-engineered features in experiment training.

Do the following:

In Schema view, locate the feature.
In the Feature type column for this feature, click .
Select a value in the list.

You can alternatively change feature types from Table Data view. Locate the feature, then click Arrow down next to the current feature type. Select a value in the list.

You can see all columns that have a changed feature type in the experiment configuration panel under Data treatment.

Impact on predictions

When you manually change the feature type of a feature, and then deploy a resulting model, the feature type overrides will be applied to the feature in the apply dataset that is used in predictions made with that model.

Changing dataset

You can change the training dataset before you run the first experiment version, as well as after running any version.

If you change the dataset before running the first version, you will lose any configuration that you have done prior to changing the dataset.

Do the following:

In the experiment configuration panel under Training data, click Change dataset.
Select a new dataset.

For more information about changing and refreshing the dataset during model refinement (after running an experiment version), see Changing and refreshing the dataset.

Configuring model optimization

The following settings can be customized for optimizing your models:

Turning intelligent model optimization on or off
Turning hyperparameter optimization on or off
Turning time-aware training on or off

These options can be turned on or off for each version of the experiment that you run.

Configuring intelligent optimization

By default, the experiment uses intelligent model optimization. With intelligent model optimization, AutoML handles the model refinement process for you by iterating feature selection and applying advanced transformations to your data.

For more information about intelligent optimization, see Intelligent model optimization.

You can turn this setting off to manually refine the models you train. For example, you might want to start your model training with intelligent model optimization, then switch to manual refinement for v2 to further adjust the configuration.

Do the following:

Click View configuration.
If you have already run at least one version of the experiment, click New version.
In the panel, expand Model optimization.
Switch from Intelligent to Manual.
Using the slider, set the maximum run duration for the training.

Activating Intelligent model optimization under Model optimization in the AutoML training customization panel — Configuring model optimization

Configuring hyperparameter optimization

You can optimize the models using hyperparameter optimization. Note that this is an advanced option that could increase the training time significantly. Hyperparameter optimization is available if you turn off intelligent optimization.

For more information, see Hyperparameter optimization.

Do the following:

Click View configuration.
If you have already run at least one version of the experiment, click New version.
In the panel, expand Model optimization.
Switch from Intelligent to Manual.
Select the Hyperparameter optimization checkbox.
Optionally, set a time limit for your optimization. The default time limit is one hour.

AutoML training customization panel with hyperparameter optimization activated — Configuring hyperparameter optimization

Configuring time-aware training

If you want your models trained with consideration to a time series dimension, activate time-aware training for the experiment version. To use this option, you need to have a column in your dataset that contains the relevant time series information.

When time-aware training is turned on, AutoML uses specialized cross-validation and null imputation processes to train the models.

For more information, see Creating time-aware models and Time-based cross-validation.

Do the following:

Click View configuration.
If you have already run at least one version of the experiment, click New version.
In the panel, expand Model optimization.
Under Time-based test-train split, select the Date index to use for sorting the data.

AutoML training customization panel with a date column selected to activate time-aware training — Configure time-aware training by selecting a column in the training data to use as an index

Viewing insights about the training data

In the Data tab of the experiment, you can view insights into the handling of the training data. This information is available in the Insights column in Table rows Schema view. The information shown depends on whether or not you have run a version with the current training data. The changes in the Insights column can help you identify why features might be unavailable for use, or why they have been automatically dropped.

For more information about what each insight means, see Interpreting dataset insights.

Related learning:

Learn more

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – let us know how we can improve!

Leave your feedback here