GANs on MNIST Part 2.5

At the end of the last post, I mentioned that the DCGAN I wrote was very, very slow (around 30 min per epoch). I decided to dig a little further to find out why and fix it.

A Surprise

After profiling a few functions, I found out that my DCGAN's training loop took about 0.75 seconds to load a minibatch (of 128 MNIST images). My original GAN only took about 0.0001 seconds to do the same thing, so this is a major slowdown.

This was also a surprise, since the code for loading minibatches is exactly the same across the two modules.

Another Hint

Interestingly, I also noticed that the very first minibatch load was fast; it took about 0.0001 seconds for both the GAN and the DCGAN. The DCGAN's second minibatch load was much slower, though: 0.75 seconds. Every subsequent load took about the same amount of time.

GPU Memory

This led me to wonder whether I was running out of GPU memory. Using this script, I checked GPU usage across training epochs for both the original GAN and the DCGAN. The GAN plateaus at about 500 MB. The DGGAN... gets to 7.3 GB at the second epoch (when the slowdown starts). My GPU only has 8 GB of memory, so it looked like the DCGAN was indeed saturating the GPU.

The Solution

In retrospect, this isn't surprising; the generator in my DCGAN goes from 100 to 1024 channels in its very first transpose convolution block, introducing something like $$4 \times 4 \times 100 \times 1024 \approx 1.6M$$ weights by itself.

So what if we reduce the size of the network? I divided each layer's out-channel by 8 (so the first layer of the generator, e.g., now outputs 128 channels instead of 1024). This brought my GPU memory usage down to 1.2GB, and made the network much faster to train (epochs now take about 45 seconds instead of 30 minutes). Qualitatively, the results look similar:

Interestingly, in comparison with the original GAN, the DCGAN's generator isn't as successful in lowering its BCELoss across the training period:

However, since the discriminator is getting better at the same time, this still may represent imrovement. Again, qualitatively, this seems to be the case.