Skip to Content

GANs on MNIST Part 2

Continuing Part 1, where we used a simple architecture to forge MNIST digits, in this post we'll use a slightly more sophisticated architecture. Instead of fully connected linear layers, we'll use a convolution/deconvolution setup. In theory, this should work 'better', since convolution/deconvolution layers are built for image applications. Here's what our generated output looks like as it improves over 100 epochs:

Generator

Our generator looks like this:

GEN_INPUT_CHANNELS = 100

class Generator(nn.Module):
    def __init__(self):
        super().__init__()
        self.model = nn.Sequential(
            nn.ConvTranspose2d(GEN_INPUT_CHANNELS, 1024, 4, 1, 0),
            nn.BatchNorm2d(1024),
            nn.ReLU(inplace=True),

            nn.ConvTranspose2d(1024, 512, 4, 2, 1),
            nn.BatchNorm2d(512),
            nn.ReLU(inplace=True),

            nn.ConvTranspose2d(512, 256, 4, 2, 1),
            nn.BatchNorm2d(256),
            nn.ReLU(inplace=True),

            nn.ConvTranspose2d(256, 1, 4, 2, 3),
            nn.Tanh()
        )

    def forward(self, x):
        return self.model(x)

Discriminator

And our discriminator looks like this:

class Discriminator(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv_layers = nn.Sequential(
            nn.Conv2d(1, 256, 5),
            nn.LeakyReLU(0.2, inplace=True),

            nn.Conv2d(256, 512, 5),
            nn.BatchNorm2d(512),
            nn.LeakyReLU(0.2, inplace=True),

            nn.Conv2d(512, 1024, 5),
            nn.BatchNorm2d(1024),
            nn.LeakyReLU(0.2, inplace=True),

            nn.Conv2d(1024, 1, 5)
        )

        self.linear_layer = nn.Sequential(
            nn.Linear(144, 1),
            nn.Sigmoid()
        )
        
    def forward(self, x):
        temp = self.conv_layers(x)
        return self.linear_layer(temp.view(x.size(0), 144))

Training

Luckily, we can reuse the TrainingManager class from the last post with minimal changes (specifically, some dimension conversions from the generator output layer to the discriminator input layer).

Commentary

As mentioned, this network should in theory work better than the one from the last post. And, measured by epoch, that's true. For example, here's what our vanilla GAN looked like after 10 epochs:

And here's what our DCGAN looks like after 10 epochs:

Obviously, the DCGAN result looks a lot better. However this network takes a lot longer to train (at least, it does in the way that I've written it). While we could train the vanilla GAN through 100 epochs in a few minutes, the DCGAN takes 20 to 30 minutes per epoch.

Credits

My network architectures are modified versions of those from Github user znxlwm. You can see my code here.