How to train model on large batches when GPU can’t hold more than a few samples

How can you train model on large batches when GPU can’t hold more than a few samples?
There are some solutions:
  - Gradient Accumulation
  - Gradient Checkpointing
  - Distributed training: training on several machines

Post a Comment

0 Comments