1) you didn't try to overfit a single batch first. (i.e. if you can't overfit a small amount of data you've got a simple bug somewhere)
2) you forgot to toggle train/eval mode for the net.
3) you forgot to .zero_grad() (in pytorch) before .backward().
4) you passed softmaxed outputs to a loss that expects raw logits. ; others? :)
5) you didn't use bias=False for your Linear/Conv2d layer when using BatchNorm, or conversely forget to include it for the output layer .This one won't make you silently fail, but they are spurious parameters.
6) thinking view() and permute() are the same thing (& incorrectly using view)
7) I like to start with the simplest possible sanity checks - e.g. also training on all zero data first to see what loss I get with the base output distribution, then gradually include more inputs and scale up the net, making sure I beat the previous thing each time. (starting with small model + small amount of data & growing both together; I always find it really insightful)
(I turn my data back on and get the same loss :) also if doing this produces a nice/decaying loss curve, this usually indicates not very clever initialization. I sometimes like to tweak the final layer biases to be close to base distribution)
8) Choose number of filters
0 Comments