Data augmentation

Changed by Dimitrios Toumpanakis, 15 Apr 2021

Updates to Article Attributes

Title was changed:
AugmentationData augmentation
Body was changed:

AugmentationData augmentation is a process of artificial data generation, which produces a greater volumetechnique that increases the amount of data by adding slightly modified copies of already existing data. This increases the diversity of the training set,which helps to reduce overfitting when training a machine learning model and thus increasingcan have a positive effect on the likelihood of obtaining highermodel's predictive accuracy of a predictive modelperformance.

Usually,Training with a higher volume of data is likely to yieldusually yields better predictive and more accurate models from training, as the algorithmmodel is able to see a greater variety of examples and generalize more effectively. However, it is not always possible to collect a large amount of data, henceData augmentation is requiredused to generate sufficientcounter the problem of data to train an accurate predictive model. Thisscarcity that is particularly relevant for datasets withoften encountered in machine learning.

Most applications of machine learning in radiology involve images as data. There are Regarding images, data augmentation can be achieved in many methods of generating new training examples with imagesways. These includeFor example:

  • mirroringflipping/mirroring the image
  • rotating the image
  • adding noise to the image ("noise injection")
  • distorting the imagecolor modification
  • random erasing

Augmentation creates augmented data. AugmentedCaution is advised when using data is basedaugmentation on radiological images since some transformations can result in non-realistic images (e.g. a horizontal flip on chest X rays introducing a systematic modificationerror of existingdextrocardia). 

In the case of significant data (with images often throughscarcity, the above simple linear algebra operations ontechniques may be only of limited help. If a dataset is too small, then a transformed image set via rotation and mirroring etc. may still be too small for a given problem. In that case, a complimentary solution can be the whole image) as opposed tosourcing of entirely new and synthetic dataimages through various techniques, commonly the use of generative adversarial networks.

  • -<p><strong>Augmentation </strong>is a process of artificial data generation, which produces a greater volume of data, and thus increasing the likelihood of obtaining higher predictive accuracy of a predictive model.</p><p>Usually, a higher volume of data is likely to yield better predictive and more accurate models from training as the algorithm is able to see a greater variety of examples. However, it is not always possible to collect a large amount of data, hence augmentation is required to generate sufficient data to train an accurate predictive model. This is particularly relevant for datasets with images. There are many methods of generating new training examples with images. These include:</p><ul>
  • -<li>mirroring the image</li>
  • -<li>adding noise to the image</li>
  • -<li>distorting the image</li>
  • -</ul><p>Augmentation creates <a title="Synthetic and augmented data" href="/articles/synthetic-and-augmented-data">augmented data</a>. Augmented data is based on systematic modification of existing data (with images often through simple linear algebra operations on the whole image) as opposed to <a title="Synthetic and augmented data" href="/articles/synthetic-and-augmented-data">synthetic data</a>.</p>
  • +<p><strong>Data augmentation </strong>is a technique that increases the amount of data by adding slightly modified copies of already existing data. This increases the diversity of the <a title="Training, testing and validation datasets" href="/articles/training-testing-and-validation-datasets">training set, </a>which helps to reduce <a title="Overfitting" href="/articles/overfitting">overfitting</a> when training a machine learning model and can have a positive effect on the model's predictive performance.</p><p>Training with a higher volume of data usually yields better predictive and more accurate models, as the model is able to see a greater variety of examples and generalize more effectively. Data augmentation is used to counter the problem of data scarcity that is often encountered in <a title="Machine learning" href="/articles/machine-learning-1">machine learning</a>.</p><p>Most applications of machine learning in radiology involve images as data. Regarding images, data augmentation can be achieved in many ways. For example:</p><ul>
  • +<li>flipping/mirroring the image</li>
  • +<li>rotating the image</li>
  • +<li>adding noise to the image ("noise injection")</li>
  • +<li>color modification</li>
  • +<li>random erasing</li>
  • +</ul><p>Caution is advised when using data augmentation on radiological images since some transformations can result in non-realistic images (e.g. a horizontal flip on chest X rays introducing a systematic error of dextrocardia). </p><p>In the case of significant data scarcity, the above simple techniques may be only of limited help. If a dataset is too small, then a transformed image set via rotation and mirroring etc. may still be too small for a given problem. In that case, a complimentary solution can be the sourcing of entirely new and <a title="Synthetic and augmented data" href="/articles/synthetic-and-augmented-data">synthetic images</a> through various techniques, commonly the use of <a title="Generative adversarial networks (GANs)" href="/articles/generative-adversarial-network-1">generative adversarial networks</a>.</p>

Updates to Synonym Attributes

Updates to Link Attributes

Title was removed:
Augmentation
Type was removed.
Visible was set to .

Updates to Primarylink Attributes

ADVERTISEMENT: Supporters see fewer/no ads

Updating… Please wait.

 Unable to process the form. Check for errors and try again.

 Thank you for updating your details.