Scoring multiclass classification models
Multiclass classification is when you are trying to predict a single discrete outcome as in binary classification, but with more than two classes. Multiclass classification models are scored by different averages of F1.
Macro F1
Macro F1 is the averaged F1 value for each class without weighting, that is, all classes are treated equally.
Micro F1
Micro F1 is the F1 value calculated across the entire confusion matrix. The total true positives, false negatives, and false positives are counted. Calculating the Micro F1 score is equivalent to calculating the global precision or the global recall.
Weighted F1
Weighted F1 corresponds to the binary classification F1. It is calculated for each class and then combined as a weighted average taking into account the number of records for each class.
Accuracy
Accuracy measures how often the model made a correct prediction on average. It is calculated as the number of exactly matching predictions divided by the number of samples.