You don’t always have time to tailor the perfect dress, but you can always take your sister’s black one and apply some changes, if needed, so it fits perfectly. Well, this is more like transfer mmm…. wearing? 😕
Okay, that’s weird but anyways!
The question here is the following: Can I make use of this trained CNN for a different task? Of course, you can! This is called Transfer Learning 😄!
You mean somehow take the model and perform predictions on new images directly?
Exactly! You can take the model as it is or even part of it depending on the task at hand.
I assume you’re familiar with Computer Vision(CV) and CNNs but let’s have a quick refresh. Computer Vision is simply a field of artificial intelligence where the goal of a model is to understand the actual content of an image and therefore perform a specific task such as image classification, object detection, semantic segmentation etc…
To be fair, CV started as a field on its own!
Having the input data as images, means we need to extract features before feeding it to the model. The best way to do so is through the convolution operation of a neural network. That’s why it’s called convolutional neural network.
In traditional machine learning, you train your model on a dataset to solve a well-defined problem. Having a new task, means a new dataset, a new model, and the entire training process all over again.
In deep learning, training a model on benchmark datasets such as ImageNet can take days, but you end up with a great model that can be reused! This is Transfer Learning, where learning new tasks relies on the previously learned tasks. It has a lot of benefits, such as reducing training time, requiring less data for training, and more accurate results.
This can be done in several ways. The pre-trained model (Model A in the above image) can be used as a:
Going back to the ImageNet dataset released in 2009, the related ILSVRC annual competition has produced the best models for different CV tasks such as AlexNet, VGG19, GoogLeNet, and much more.
Let’s explore some of these advanced architectures in details.
1. AlexNet: It solves the problem of image classification where the input is an image of one of 1000 different classes and the output is a vector of 1000 numbers. The model consists of 8 layers: 5 convolution layers and 3 fully connected layers.
Wait for it… What makes AlexNet special is not its architecture, but the following features:
AlexNet’s developers also applied some image augmentation and the drop-out technique to improve the model’s performance and avoid overfitting. The final model won the challenge by reducing the top-5 error from 26% to 15.3%.
The Top-5 error rate is the percentage of times the classifier failed to include the proper class among its top five guesses.
Now let’s jump into VGG-16/19.
2. VGG-16/19: In simple words, it’s a CNN used for image classification. As shown in the below image, with each convolution, the depth of the image is increasing, so we end up with a one-dimensional vector of size 4096.
The developers suggested two architectures, one of 16 layers and the other of 19 layers. They found that using small convolution filter sizes and increasing the depth of the image improved the model’s performance and achieved a test accuracy of around 74% and a top-5 error equal to 7.3%.
3. GoogLeNet or InceptionV1
As you can tell from the name, it was developed by Google and consisted of 22 layers being the deepest model at that time.
Well, at least I understand why they won the competition in 2014.
Most of the developed models back then focused on varying the kernel size for feature extraction. However, the Inception architecture focuses on parallel processing and the extraction of various feature maps concurrently. Having an architecture based on the inception module was the main feature that differentiates this model from others.
Take a breath, it’s getting complicated 😄
In simple terms, an inception module allows the use of multiple types of filter sizes, in a single image block, which we then concatenate and pass to the next layer.
Unlike the traditional sequential architecture, in this inception module, convolution is performed simultaneously with different filter sizes. The outputs are then concatenated and thus the feature is extracted. The same architecture was then improved after the introduction of 1×1 convolutions for dimensions reduction and thus the model reached a top-5 error of 6.67% which was very close to the human-level performance. In addition, the model reduced the number of parameters from 60 million (AlexNet) to 4 million.
Let’s put it all together!
In this blog, you got introduced to the concept of transfer learning which is taking relevant parts of a pre-trained model and applying it to new but similar tasks. Many advanced CNN architectures, such as AlexNet, VGG16/19, InceptionV1… can be easily implemented using Keras library.
Maybe few years later, we could upload our neural network and maximize its use! Who knows? After all, Lucy and Transcendence got us familiar with the concept 🙄!