new data augmentation methods

Existing data augmentation methods can be roughly divided into three categories: spatial transformation, color distortion, and information dropping. Spatial transformation involves a set of basic data augmentation solutions, such as random scale, crop, flip and random rotation, which are widely used in model training. Color distortion, which contains changing brightness, hue, etc. is also used in several models. These two methods aim at transforming the training data to better simulate real-world data, through changing some channels of information.

Information deletion is widely employed recently for its effectiveness and/or efficiency. It includes random erasing, cutout, and hide-and-seek (HaS). It is common knowledge that by deleting a level of information in the image, CNNs can learn originally less sensitive or important information and increase the perception field, resulting in a notable increase of robustness of the model.

1. Random Erasing Data Augmentation

In training, Random Erasing randomly selects a rectangle region in an image and erases its pixels with random values. In this process, training images with various levels of occlusion are generated, which reduces the risk of over-fitting and makes the model robust to occlusion. Random Erasing is parameter learning free, easy to implement, and can be integrated with most of the CNN-based recognition models. Random Erasing is complementary to commonly used data augmentation techniques such as random cropping and flipping, and yields consistent improvement over strong baselines in image classification, object detection and person re-identification.

Random Erasing + Random Cropping:

Random cropping is an effective data augmentation approach, it reduces the contribution of the background in the CNN decision, and can base learning models on the presence of parts of the object instead of focusing on the whole object. In comparison to random cropping, Random Erasing retains the overall structure of the object, only occluding some parts of object. In addition, the pixels of erased region are re-assigned with random values, which can be viewed as adding noise to the image. In our experiment, we show that these two methods are complementary to each other for data augmentation.

Code: https://github.com/zhunzhong07/Random-Erasing

2. Improved Regularization of Convolutional Neural Networks with Cutout

Due to the CNN model capacity required to capture such representations, they are often susceptible to overfitting and therefore require proper regularization in order to generalize well. This paper shows that the simple regularization technique of randomly masking out square regions of input during training, which we call cutout, can be used to improve the robustness and overall performance of convolutional neural networks. Not only is this method extremely easy to implement, but we also demonstrate that it can be used in conjunction with existing forms of data augmentation and other regularizers to further improve model performance.

Code: https://github.com/uoguelph-mlrg/Cutout

3. Hide-and-Seek: Forcing a Network to be Meticulous for Weakly-supervised Object and Action Localization

'Hide-and-Seek', a weakly-supervised framework that aims to improve object localization in images and action localization in videos. Most existing weakly-supervised methods localize only the most discriminative parts of an object rather than all relevant parts, which leads to suboptimal performance. Our key idea is to hide patches in a training image randomly, forcing the network to seek other relevant parts when the most discriminative part is hidden. Our approach only needs to modify the input image and can work with any network designed for object localization. During testing, we do not need to hide any patches. Our Hide-and-Seek approach obtains superior performance compared to previous methods for weakly-supervised object localization on the ILSVRC dataset. We also demonstrate that our framework can be easily extended to weakly-supervised action localization. The RGB value v of a hidden pixel to be equal to the mean RGB vector of the images over the entire dataset

Code: https://github.com/seokjunS/hide-and-seek-tensorflow

4. GridMask Data Augmentation

We found intriguingly a successful information dropping method should achieve reasonable balance between deletion and reserving of regional information on the images. The reason is twofold intuitively.

Existing information dropping algorithms have different chances of achieving a reasonable balance between deletion and reservation of continuous regions. Both cutout and random erasing delete only one continuous region of the image. The resulting imbalance of these two conditions is obvious because the deleted region is one area. It has a good chance to cover the whole object or none of it depending on size and location. The approach of HaS is to divide the picture evenly into small squares and delete them randomly. It is more effective and still stands a considerable chance for continuously deleting or reserving regions. Some unsuccessful examples of existing methods are shown.

We surprisingly observe the very easy strategy that can balance these two conditions statistically better is by using structured dropping regions, such as deleting uniformly distributed square regions. Our proposed information removal method, named GridMask,

Code: https://github.com/akuxcw/GridMask

5. MixUp: BEYOND EMPIRICAL RISK MINIMIZATION

Large deep neural networks are powerful, but exhibit undesirable behaviors such as memorization and sensitivity to adversarial examples. In this work, we propose mixup, a simple learning principle to alleviate these issues. In essence, mixup trains a neural network on convex combinations of pairs of examples and their labels. By doing so, mixup regularizes the neural network to favor simple linear behavior in-between training examples. We also find that mixup reduces the memorization of corrupt labels, increases the robustness to adversarial examples, and stabilizes the training of generative adversarial networks.

Code: https://github.com/facebookresearch/mixup-cifar10

6. CutMix: Regularization Strategy to Train Strong Classifiers with Localizable Features

Current methods for regional dropout remove informative pixels on training images by overlaying a patch of either black pixels or random noise. Such removal is not desirable because it leads to information loss and inefficiency during training. We therefore propose the CutMix augmentation strategy: patches are cut and pasted among training images where the ground truth labels are also mixed proportionally to the area of the patches. By making efficient use of training pixels and retaining the regularization effect of regional dropout. CutMix improves the model robustness against input corruptions and its out-of-distribution detection performances.

Code: https://github.com/clovaai/CutMix-PyTorch

7. Mosaic data augmentation

Mosaic data augmentation combines 4 training images into one in certain ratios (instead of only two in CutMix). Mosaic is the first new data augmentation technique introduced in YOLOv4. This allows for the model to learn how to identify objects at a smaller scale than normal. It also is useful in training to significantly reduce the need for a large mini-batch size.

8. Class label smoothing

Generally, the correct classification for a bounding box is represented as a one hot vector of classes [0,0,0,1,0,0, ...] and the loss function is calculated based on this representation. However, when a model becomes overly sure with a prediction close to 1.0, it is often wrong, overfit, and over looking the complexities of other predictions in some way. Following this intuition, it is more reasonable to encode the class label representation to value that uncertainty to some degree. Naturally, the authors choose 0.9, so [0,0,0,0.9, 0....] to represent the correct class.

https://blog.roboflow.ai/yolov4-data-augmentation/