Factors Influencing the Resubstitution Accuracy in Multivariate Classification Analysis:  Implications for Study Design in Ergonomics

Edward A. Clancy

The use of multivariate classification analysis (e.g. discriminant analysis, linear regression, logistic regression) is becoming widespread in ergonomics, as well as numerous other disciplines.  Classification analysis is frequently used to determine what combination of features (independent variables), and in what mathematical relations and proportions, defines an acceptable versus an unacceptable risk.  Accurate predictive classification models can be useful in suggesting interventions which can minimize illness and injury.  Frequently, classification studies in the ergonomics literature report the resubstitution accuracy—the accuracy that is realized when the classifier is evaluated on the same sample that was used to generate the classification coefficients.  However, it is well established that the resubstitution accuracy is optimistically biased.  The extent, or magnitude, of this bias is not well understood.  Thus, a Monte Carlo simulation study was conducted to investigate this bias.  Random data containing no true classification power (denoted the 'Nil Model') were generated, then analyzed using discriminant analysis.  For the case of two outcome groups, the true accuracy of the Nil Model is 50% (i.e. no better than flipping a fair coin).  For conditions similar to those in the literature, the random data 'reported' highly accurate classification performance—results as high as 100%.  These 'reports' represent the bias artefact of resubstitution accuracy.  Factors influencing the extent of the bias were studied.  It was found that the resubstitution bias is reduced if;  sample size is increased, the number of candidate features is decreased, the number of selected features is decreased, and the proportion of samples from each outcome group is equalized.  Feature correlation did not influence resubstitution accuracy.  These simulation studies indicate that reporting of the resubstitution accuracy alone can be problematic.  It is suggested that research reports which incorporate classification analysis either (1) train the classification function on one data set, but report as the performance metric the classification accuracy achieved on an independent, adequately-sized test data set, or (2) demonstrate that the magnitude of the resubstitution bias is minimal.

Ergonomics, Vol. 40, No. 4, pp. 417-427, 1997