How can you train model on large batches when GPU can’t hold more than a few samples?
There are some solutions:
- Gradient Accumulation
- Gradient Checkpointing
- Distributed training: training on several machines
There are some solutions:
- Gradient Accumulation
- Gradient Checkpointing
- Distributed training: training on several machines
0 Comments