Existing data augmentation methods can be roughly divided into three categories: spatial transformation,
color distortion, and information dropping.
Spatial transformation involves a set of basic data augmentation solutions, such as random scale, crop, flip and random rotation, which are widely used in model training.
Color distortion, which contains changing brightness, hue,
etc. is also used in several models. These two methods aim at transforming the training data to better simulate
real-world data, through changing some channels of information.
Information deletion is widely employed recently for its
effectiveness and/or efficiency. It includes random erasing, cutout, and hide-and-seek (HaS). It is common knowledge that by deleting a level of information in the
image, CNNs can learn originally less sensitive or important
information and increase the perception field, resulting in a
notable increase of robustness of the model.
1. Random Erasing Data Augmentation
In training, Random Erasing randomly selects a rectangle region in an image and erases its pixels with random values. In this process, training images with various levels of occlusion are generated, which reduces the risk of over-fitting and makes the model robust to occlusion. Random Erasing is parameter learning free, easy to implement, and can be integrated with most of the CNN-based recognition models. Random Erasing is complementary to commonly used data augmentation techniques such as random cropping and flipping, and yields consistent improvement over strong baselines in image classification, object detection and person re-identification.
Random Erasing + Random Cropping:
Random cropping is an effective data augmentation approach, it reduces the contribution of the background in the
CNN decision, and can base learning models on the presence of parts of the object instead of focusing on the whole
object. In comparison to random cropping, Random Erasing retains the overall structure of the object, only occluding
some parts of object. In addition, the pixels of erased region
are re-assigned with random values, which can be viewed
as adding noise to the image. In our experiment, we show that these two methods are complementary to each other for data augmentation.
2. Improved Regularization of Convolutional Neural Networks with Cutout
Due to the CNN model
capacity required to capture such representations, they are
often susceptible to overfitting and therefore require proper
regularization in order to generalize well. This paper shows that the simple regularization
technique of randomly masking out square regions of input during training, which we call cutout, can be used to
improve the robustness and overall performance of convolutional neural networks. Not only is this method extremely easy to implement, but we also demonstrate that
it can be used in conjunction with existing forms of data
augmentation and other regularizers to further improve
model performance.
3. Hide-and-Seek: Forcing a Network to be Meticulous for
Weakly-supervised Object and Action Localization
'Hide-and-Seek', a weakly-supervised
framework that aims to improve object localization in images and action localization in videos. Most existing
weakly-supervised methods localize only the most discriminative parts of an object rather than all relevant parts,
which leads to suboptimal performance. Our key idea is
to hide patches in a training image randomly, forcing the
network to seek other relevant parts when the most discriminative part is hidden. Our approach only needs to
modify the input image and can work with any network designed for object localization. During testing, we do not
need to hide any patches. Our Hide-and-Seek approach obtains superior performance compared to previous methods
for weakly-supervised object localization on the ILSVRC
dataset. We also demonstrate that our framework can be
easily extended to weakly-supervised action localization. The RGB value v of a
hidden pixel to be equal to the mean RGB vector of the
images over the entire dataset
4. GridMask Data Augmentation
We found intriguingly a successful information dropping method should achieve reasonable
balance between deletion and reserving of regional information on the images. The reason is twofold intuitively.
Existing information dropping algorithms have different
chances of achieving a reasonable balance between deletion
and reservation of continuous regions. Both cutout and random erasing delete only one continuous region of
the image. The resulting imbalance of these two conditions
is obvious because the deleted region is one area. It has a
good chance to cover the whole object or none of it depending on size and location. The approach of HaS is to
divide the picture evenly into small squares and delete them
randomly. It is more effective and still stands a considerable chance for continuously deleting or reserving regions.
Some unsuccessful examples of existing methods are shown.
We surprisingly observe
the very easy strategy that can balance these two conditions
statistically better is by using structured dropping regions,
such as deleting uniformly distributed square regions. Our
proposed information removal method, named GridMask,
5. MixUp: BEYOND EMPIRICAL RISK MINIMIZATION
Large deep neural networks are powerful, but exhibit undesirable behaviors such
as memorization and sensitivity to adversarial examples. In this work, we propose
mixup, a simple learning principle to alleviate these issues. In essence, mixup trains
a neural network on convex combinations of pairs of examples and their labels.
By doing so, mixup regularizes the neural network to favor simple linear behavior
in-between training examples. We also find that
mixup reduces the memorization of corrupt labels, increases the robustness to
adversarial examples, and stabilizes the training of generative adversarial networks.
6. CutMix: Regularization Strategy to Train Strong Classifiers
with Localizable Features
Current
methods for regional dropout remove informative pixels on
training images by overlaying a patch of either black pixels or random noise. Such removal is not desirable because it leads to information loss and inefficiency during training. We therefore propose the CutMix augmentation strategy: patches are cut and pasted among training images where the ground truth labels are also mixed
proportionally to the area of the patches. By making efficient use of training pixels and retaining the regularization effect of regional dropout. CutMix improves the model robustness against
input corruptions and its out-of-distribution detection performances.
7. Mosaic data augmentation
Mosaic data augmentation combines 4 training images into one in certain ratios (instead of only two in CutMix). Mosaic is the first new data augmentation technique introduced in YOLOv4. This allows for the model to learn how to identify objects at a smaller scale than normal. It also is useful in training to significantly reduce the need for a large mini-batch size.
8. Class label smoothing
Generally, the correct classification for a bounding box is represented as a one hot vector of classes [0,0,0,1,0,0, ...] and the loss function is calculated based on this representation. However, when a model becomes overly sure with a prediction close to 1.0, it is often wrong, overfit, and over looking the complexities of other predictions in some way. Following this intuition, it is more reasonable to encode the class label representation to value that uncertainty to some degree. Naturally, the authors choose 0.9, so [0,0,0,0.9, 0....] to represent the correct class.
0 Comments