Skip to main content Skip to complementary content

Holdout data and cross-validation

One of the biggest challenges in predictive analytics is to know how a trained model will perform on data that it has never seen before. Put in another way, how well the model has learned true patterns versus having simply memorized the training data. Holdout data and cross-validation are effective techniques to make sure that your model isn’t just memorizing but is actually learning generalized patterns.

Testing models for memorization versus generalization

Asking how well a model will perform in the real world is equivalent to asking whether the model memorizes or generalizes. Memorization is the ability to remember perfectly what happened in the past. While a model that memorizes might have high scores when initially trained, the predictive accuracy will drop significantly when applied to new data. Instead, we want a model that generalizes. Generalization is the ability to learn and apply general patterns. By learning the true broader patterns from training data, a generalized model will be able to make the same quality predictions on new data that it has not seen before.

Automatic holdout data

A holdout is randomly selected data that is "hidden" from the model while it is training and then used to score the model. The holdout simulates how the model will perform on future predictions by generating accuracy metrics on data that was not used in training. It is as though we built a model, deployed it, and are monitoring its predictions relative to what actually happened—without having to wait to observe those predictions.

The dataset is split into training data and holdout data

Ratio of training data and holdout data.

Cross-validation

The practice of cross-validation is to take a dataset and randomly split it into a number even segments, called folds. The machine learning algorithm is trained on all but one fold. Cross-validation then tests each fold against a model trained on all of the other folds. This means that each trained model is tested on a segment of the data that it has never seen before. The process is repeated with a different fold being hidden during training and then tested until all folds have been used exactly once as a test and been trained on during every other iteration.

The training data is split into five folds. During each iteration, a different fold is set aside to be used as test data.

Training data split into five folds and iterated five times.

The outcome of cross-validation is a set of test metrics that give a reasonable forecast of how accurately the trained model will be able to predict on data that it has never seen before.

How does automatic holdout and cross-validation work

AutoML uses five-fold cross-validation during the model training to simulate model performance. The model is then tested against a separate holdout of the training data. This generates scoring metrics to let you evaluate and compare how well different algorithms perform.

  1. Before the training of your experiment starts, all data in your dataset that has a non-null target is randomly shuffled. 20 percent of your dataset is extracted as holdout data. The remaining 80 percent of the dataset is used to train the model with cross-validation.

  2. To prepare for cross-validation the dataset is split into five pieces—folds—at random. The model is then trained five times, each time "hiding" a different fifth of the data to test how the model performs. Training metrics are generated during the cross-validation and is the average of the computed values.

  3. After the training, the model is applied to the holdout data. Because the holdout data has not been seen by the model during training—unlike cross-validation data—it's ideal for validating the model training performance. Holdout metrics are generated during this final model evaluation.

For more information about metrics used to analyze model performance, see Reviewing models.

The training data is used during the five-fold cross-validation to generate a model. After the training, the model is evaluated using the holdout data.

Training data is used for cross validation and holdout data for final model evaluation.

Learn more

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – let us know how we can improve!