Data augmentation

Changed by Dimitrios Toumpanakis, 15 Apr 2021

Back to revision history

Updates to Article Attributes

Title was changed:

~~Augmentation~~Data augmentation

Body was changed:

Show markup changes

~~Augmentation~~Data augmentation is a ~~process of artificial data generation, which produces a greater volume~~technique that increases the amount of data by adding slightly modified copies of already existing data. This increases the diversity of the training set,which helps to reduce overfitting when training a machine learning model and ~~thus increasing~~can have a positive effect on the ~~likelihood of obtaining higher~~model's predictive ~~accuracy of a predictive model~~performance.

~~Usually,~~Training with a higher volume of data ~~is likely to yield~~usually yields better predictive and more accurate models ~~from training~~, as the ~~algorithm~~model is able to see a greater variety of examples and generalize more effectively. ~~However, it is not always possible to collect a large amount of data, hence~~Data augmentation is ~~required~~used to ~~generate sufficient~~counter the problem of data ~~to train an accurate predictive model. This~~scarcity that is ~~particularly relevant for datasets with~~often encountered in machine learning.

Most applications of machine learning in radiology involve images as data. ~~There are~~ Regarding images, data augmentation can be achieved in many ~~methods of generating new training examples with images~~ways. ~~These include~~For example:

~~mirroring~~flipping/mirroring the image
rotating the image
adding noise to the image ("noise injection")
~~distorting the image~~color modification
random erasing

~~Augmentation creates~~ ~~augmented data. Augmented~~Caution is advised when using data ~~is based~~augmentation on radiological images since some transformations can result in non-realistic images (e.g. a horizontal flip on chest X rays introducing a systematic ~~modification~~error of ~~existing~~dextrocardia).

In the case of significant data ~~(with images often through~~scarcity, the above simple ~~linear algebra operations on~~techniques may be only of limited help. If a dataset is too small, then a transformed image set via rotation and mirroring etc. may still be too small for a given problem. In that case, a complimentary solution can be the ~~whole image) as opposed to~~sourcing of entirely new and synthetic ~~data~~images through various techniques, commonly the use of generative adversarial networks.

-Augmentation is a process of artificial data generation, which produces a greater volume of data, and thus increasing the likelihood of obtaining higher predictive accuracy of a predictive model.Usually, a higher volume of data is likely to yield better predictive and more accurate models from training as the algorithm is able to see a greater variety of examples. However, it is not always possible to collect a large amount of data, hence augmentation is required to generate sufficient data to train an accurate predictive model. This is particularly relevant for datasets with images. There are many methods of generating new training examples with images. These include:<ul>
~~-<li>mirroring the image</li>~~
~~-<li>adding noise to the image</li>~~
~~-<li>distorting the image</li>~~
-</ul>Augmentation creates <a title="Synthetic and augmented data" href="/articles/synthetic-and-augmented-data">augmented data</a>. Augmented data is based on systematic modification of existing data (with images often through simple linear algebra operations on the whole image) as opposed to <a title="Synthetic and augmented data" href="/articles/synthetic-and-augmented-data">synthetic data</a>.
+Data augmentation is a technique that increases the amount of data by adding slightly modified copies of already existing data. This increases the diversity of the <a title="Training, testing and validation datasets" href="/articles/training-testing-and-validation-datasets">training set, </a>which helps to reduce <a title="Overfitting" href="/articles/overfitting">overfitting</a> when training a machine learning model and can have a positive effect on the model's predictive performance.Training with a higher volume of data usually yields better predictive and more accurate models, as the model is able to see a greater variety of examples and generalize more effectively. Data augmentation is used to counter the problem of data scarcity that is often encountered in <a title="Machine learning" href="/articles/machine-learning-1">machine learning</a>.Most applications of machine learning in radiology involve images as data. Regarding images, data augmentation can be achieved in many ways. For example:<ul>
+<li>flipping/mirroring the image</li>
+<li>rotating the image</li>
+<li>adding noise to the image ("noise injection")</li>
+<li>color modification</li>
+<li>random erasing</li>
+</ul>Caution is advised when using data augmentation on radiological images since some transformations can result in non-realistic images (e.g. a horizontal flip on chest X rays introducing a systematic error of dextrocardia). In the case of significant data scarcity, the above simple techniques may be only of limited help. If a dataset is too small, then a transformed image set via rotation and mirroring etc. may still be too small for a given problem. In that case, a complimentary solution can be the sourcing of entirely new and <a title="Synthetic and augmented data" href="/articles/synthetic-and-augmented-data">synthetic images</a> through various techniques, commonly the use of <a title="Generative adversarial networks (GANs)" href="/articles/generative-adversarial-network-1">generative adversarial networks</a>.