Viewing insights about your training data

As you add your training data and run versions of the training, you can access insights about how your data is being handled. The Insights provide information about the target and features in your experiment, such as features that have been dropped, are unavailable, or will be encoded with special processing.

The Insights column is found in the Data tab when you are in Schema Schema view. Abbreviated insights are also available in Table Data view. Insights are created individually for each model trained within the experiment.

Insights shown about each feature column in the training dataset — Insights column in Schema view

Insights are generated:

After you have added or changed training data, but have not run any experiment versions yet.
After each experiment version has run. A separate set of insights is created for each model trained.

The insights might be different before and after running a version. This is because as the training begins, AutoML is able to preprocess your data and further diagnose issues with the data. For more information, see Automatic data preparation and transformation.

Viewing insights before training

Before you run a version of the experiment, you can analyze the Insights to see how the current training data is being interpreted. These insights could change after you run the version.

Do the following:

In an experiment, make sure you have added the training data that you want to use for the experiment version.
Open the Data tab.
Make sure you are in Schema view.
Analyze the Insights column. Tooltips provide additional context behind the insights. For further explanations of what each insight means, see Interpreting dataset insights.

Viewing the insights for a model

After the models have finished training for an experiment version, select a model and inspect how the data was handled.

Do the following:

Run an experiment version and then open the Data tab.
Select a model from the drop down list in the toolbar.
Make sure you are in Schema view.
Analyze the Insights column. Tooltips provide additional context behind the insights. For further explanations of what each insight means, see Interpreting dataset insights.

Interpreting dataset insights

The following table provides more detail about the possible insights that may be displayed in the schema.

Dataset insights in schema view
Insight	Meaning	Impact on configuration	When the insight is determined	Additional references
Constant	The column has the same value for all rows.	The column can't be used as a target or included feature.	Before and after running the version	Cardinality
One-hot encoded	The feature type is categorical and the column has less than 14 unique values.	No effect on configuration.	Before and after running the version	Categorical encoding
Impact encoded	The feature type is categorical and the column has 14 or more unique values.	No effect on configuration.	Before and after running the version	Categorical encoding
High cardinality	The column has too many unique values, and can negatively affect model performance if used as a feature.	The column can't be used as a target. It will be excluded automatically as a feature, but can still be included if needed.	Before and after running the version	Cardinality
Sparse data	The column has too many null values.	The column can't be used as a target or included feature.	Before and after running the version	Imputation of nulls
Underrepresented class	The column has a class with less than 10 rows.	The column can't be used as a target, but can be included as a feature.	Before and after running the version	-
<number of> auto-engineered features	The column is the parent feature that can be used to generate auto-engineered features.	If this parent feature is interpreted as a date feature, it is automatically removed from the configuration. It is recommended that you instead use the auto-engineered date features that can be generated from it. It is possible to override this setting and include the feature rather than the auto-engineered features.	Before and after running the version	Automatic feature engineering
Auto-engineered feature	The column is an auto-engineered feature which can, or has been, generated from a parent date feature. It did not appear in the original dataset.	You can remove one or multiple of these auto-engineered features during experiment training. If you switch the feature type of the parent feature to categorical, all auto-engineered features are removed.	Before and after running the version	Automatic feature engineering
Could not process as date	The column possibly includes date and time information, but could not be used to create auto-engineered date features.	The feature is dropped from the configuration. If auto-engineered features were previously generated from this parent feature, they are removed from future experiment versions. You can still use the feature in the experiment, but you must switch its feature type to categorical.	After running the version	Date feature engineering
Possible free text	The column could possibly be available for use as a free text feature.	The free text feature type is assigned to the column. You must run an experiment version to confirm whether the feature can be processed as free text.	Before running the version	Handling of free text data
Free text	The column has been confirmed as containing free text. It can be processed as free text.	No additional configurations are required for the feature.	After running the version	Handling of free text data
Could not process as free text	Upon further analysis, the column cannot be processed as free text.	You need to deselect the feature from the configuration for the next experiment version. If the feature does not have high cardinality, you can alternatively change the feature type to categorical.	After running the version	Handling of free text data
Target leakage	The feature is suspected of being affected by target leakage. If so, it includes information about the target column that you are trying to predict. Features with target leakage can give you a false sense of assurance about model performance. In real-world predictions, they cause the model to perform very poorly.	The feature has not been used to train the model.	After running the version	Data leakage
Low permutation importance	The feature does not have much, if any, influence on the model predictions. Removing these features improves model performance by reducing statistical noise.	The feature has not been used to train the model.	After running the version	Understanding permutation importance
Highly correlated	The feature is highly correlated with one or more other features in the experiment. Having features that are highly correlated with one another decreases model performance.	The feature has not been used to train the model. The feature with which it is highly correlated has not been dropped due to high correlation, but could have been dropped for another reason, such as low permutation importance.	After running the version	Correlation

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – let us know how we can improve!

Leave your feedback here