Overfitting is a problem in machine learning that introduces errors based on noise and meaningless data into prediction or classification. Overfitting tends to happen in cases where training data sets are either of insufficient size or training data sets include parameters and/or unrelated features correlated with a feature of interest non-randomly. For example, an algorithm trained to read chest x-rays may correlate the use of a side marker or absence of a side marker with the absence/non-absence of pathology 1.
Strictly speaking, overfitting applies to fitting a polynomial curve to data points where the polynomial suggests a more complex model than the accurate one. In terms of neural networks, classification results which are misclassified by irrelevant parameters are referred to as examples of overfitting.
There are many techniques to correct for overfitting including regularization.
- 1. John R. Zech, Marcus A. Badgeley, Manway Liu, Anthony B. Costa, Joseph J. Titano, Eric Karl Oermann. Variable generalization performance of a deep learning model to detect pneumonia in chest radiographs: A cross-sectional study. (2018) PLOS Medicine. 15 (11): e1002683. doi:10.1371/journal.pmed.1002683 - Pubmed
Related Radiopaedia articles
- artificial intelligence (AI)
- imaging data sets
- computer-aided diagnosis (CAD)
- natural language processing
machine learning (overview)
- machine learning processes
- machine learning models
- visualizing and understanding neural networks
- common data preparation/preprocessing steps
- DICOM to bitmap conversion
- dimensionality reduction
- principal component analysis
- training, testing and validation datasets
- loss function
- optimization algorithms
- linear and quadratic
- batch normalization
- rule-based expert systems