Fast.ai has a two-part Deep Learning Course, the first being Practical Deep Learning for Coders, and the second being Deep Learning from the Foundations, both having different approaches and intended for different audiences. In the 7th lecture of Part 1, Jeremy Howard taught a lot about modern architectures such as Residual Network (ResNet) , U-Net, and Generative Adversarial Network (GAN).
Generative Adversarial Networks
GANs were first invented by Ian Goodfellow, one of the modern figures in the Deep Learning world. GANs could be used for various tasks such as Style Transfer, Pix2Pix, create CycleGAN, etc. Today what I’ll be experimenting with is Image Restoration.
There are different elements of an image which one can attempt to restore, and the example shown by Jeremy was restoring low resolution images into higher resolution images, which produces something like the following
Jeremy also mentioned that GANs would also be capable of not only restoring an image’s resolution, but other elements such as clearing JPEG-like artifacts, different kinds of noise, or even restoring colors. And with that, I immediately hooked to finish the lecture and try out what I’ve learned, and thus came this project.
Instead of turning low resolution images to high resolution images, I instead wanted to build a network which will be able to recolor black and white images. The approach is to do so is still similar in terms of how a GAN works, except with a few tweaks which we’ll discuss further down.
Since it is the first time I’ve worked with generative networks like GANs, I decided to base my code heavily on a fast.ai notebook, lesson7-superres-gan.ipynb.
The code provided below isn’t complete and only the important blocks of code were taken.
The GAN Approach
A GAN is sort of like a game between two entities, one being the artist (formally generator) and the other being the critic (formally discriminator). Both of them have their own respective roles: the artist has to produce an image, while the critic has to decide whether the image produced by the artist is a real image or a fake/generated image.
The two of them have to get better at what they do, the critic has to get better at differentiating real from fake images, while the artist has to improve the image produced to fool the critic. The implementation of this concept to a task like image restoration is pretty much like the aforementioned. That is, the artist has to produce a higher resolution image from the low resolution image, while the critic also learns to distinguish between the two possibilities.
Now, to apply that to color restoration, instead of differentiating low resolution from high resolution images, the critic has to classify artist-generated images from colored images, and while doing so the artist has to learn how to better recolor the images it produces to outsmart the critic.
In order to build a network that is able to both learn to recolor images and to classify real from fake images, we need to provide it two sets of data, namely a colored image and its corresponding black-and-white image. To do so, we used the Pets dataset from Oxford IIT which are colored, and created a function to grayscale the images. Jeremy called the function to do such task as a crappifier, which in our case only grayscales the images. Once we have our colored and grayscaled images, we can use it later to train the network.
from PIL import Image, ImageDraw, ImageFont class crappifier(object): def __init__(self, path_lr, path_hr): self.path_lr = path_lr self.path_hr = path_hr def __call__(self, fn, i): dest = self.path_lr/fn.relative_to(self.path_hr) dest.parent.mkdir(parents=True, exist_ok=True) img = PIL.Image.open(fn) img = img.convert('L') img.save(dest, quality=100)
Now, we will begin to train our generator first before using it in a GAN. The architecture we’ll use is a U-Net, with ResNet34 as its base model and all it’s trained to do is to recolor the images so it looks more like its colored-counterpart. Notice also that we’re using Mean Squared Error or
MSELossFlat as our loss function.
arch = models.resnet34 loss_gen = MSELossFlat() learn_gen = unet_learner(data_gen, arch, wd=wd, blur=True, norm_type=NormType.Weight, self_attention=True, y_range=y_range, loss_func=loss_gen)
Once we have the generative model, we can train the model head for a few epochs, unfreeze, and train for several more epochs.
The resulting generated images after a total of 5 epochs looks like the following
As you can see, the generator did poorly on some areas of the image, while it did great in others. Regardless, we’ll save those generated images to be used as the fake images dataset for the critic to learn from.
After generating two sets of images, we’ll feed the data to a critic and let it learn to distinguish between real images from the artist-generated images. Below is a sample batch of data, where the real images are labelled simply as
images and the generated ones as
To create the critic, we’ll be using fast.ai’s built-in
gan_critic, which is just a simple Convolutional Neural Network with residual blocks. Unlike the generator, the loss function we’ll use is Binary Cross Entropy, since we only have two possible predictions, and also wrap it with
loss_critic = AdaptiveLoss(nn.BCEWithLogitsLoss()) learn_critic = Learner(data_crit, gan_critic(), metrics=accuracy_thresh_expand, loss_func=loss_critic, wd=wd)
Once the Learner has been created, we can proceed with training the critic for several epochs.
With both of the generator and the critic pretrained, we can finally use both of them together and commence the game of outsmarting each other found in GANs. We will be utilizing
AdaptiveGANSwitcher, which basically goes switches between generator to critic or vice versa when the loss goes below a certain threshold.
switcher = partial(AdaptiveGANSwitcher, critic_thresh=0.65)
Wrapping both the generator and the critic inside a GAN learner:
learn = GANLearner.from_learners(learn_gen, learn_crit, weights_gen=(1.,50.), show_img=False, switcher=switcher, opt_func=partial(optim.Adam, betas=(0.,0.99)), wd=wd)
A particular callback we’ll use is called
GANDiscriminativeLR, which handles multiplying the learning rate for the critic.
Finally, we can train the GAN for 40 rounds before we use a larger image size to train for another 10 rounds.
lr = 1e-4 learn.fit(40, lr)
learn.data = get_data(16, 192) learn.fit(10, lr/2)
The resulting training images looks like the following
And as you can see, our model was able to recolor the images to a certain extent of accuracy. This is not bad, but GANs do have their weaknesses which we’ll discuss in the last section. Before we wrap up the GAN section, let’s try to feed the model external images, that is images that it hasn’t seen before.
Recoloring External Images
The following pet images were taken randomly from the internet. I’ve manually grayscaled the images and before letting the model predict its output.
The colors produced, especially the animal’s fur is less saturated than it’s original image. However the natural background like grass and the sky is still acceptable, although different from the original.
Lastly, I tried to feed an image which is not a cat nor a dog. I tried to feed it images of actual people. The top row is a black-and-white picture which is already grayscaled when I received it. Whereas the bottom row’s image went through the same process as the images right above.
Few things to notice here for the first prediction, the model is biased towards green and yellow colors, hence the floor color of the first output. Secondly, aside from coloring the person in front, the model also colored the person on the phone’s screen.
On the other hand, the second prediction was great at coloring the backdrop of mountains and the sky, but is bad at coloring the supposedly bright-red car as well as coloring the person as it remained mostly grey.
The most likely reason behind the poor recoloring of a person is because of the dataset being used to train the GAN on, which are Pets in this case.
Weaknesses of GANs
GANs are well known for being troublesome to be handled, especially during training, hence the fancy configuration and knobs which we have to have in order for it to behave well. Moreover, they take quite long hours to train in comparison to other architectures.
Possible Replacement of GANs
Just like shown in the remaining of Lecture 7, there are other architectures which are as good or even better than GANs, one of which is to use Feature Loss coupled with U-Nets, with shorter training hours and better results in several cases. I have tried doing that approach, but will not be discussing that here.
GANs are great, the tasks they can do vary from one architecture to another, and is one of the methods to let a model “dream” and have their own forms of creativity. However, they have certain weaknesses which includes long training time and careful tweaking requirements. They are definitely modern, and doing reasearch in the domain is still very much open and fun to do if you’re into this particular field.
That’s it! Thanks for your time and I hope you’ve learned something!