Let’s switch brains!😄

ZAKA AIMay 14, 2022

Let’s switch brains!😄

You don’t always have time to tailor the perfect dress, but you can always take your sister’s black one and apply some changes, if needed, so it fits perfectly. Well, this is more like transfer mmm…. wearing? 😕

Okay, that’s weird but anyways!

When it comes to deep learning, we all know the more quality data we have, the better the performance of the model is. Having a large dataset such as ImageNet, can give you a great Convolutional Neural Network (CNN), ignoring the training time.

The question here is the following: Can I make use of this trained CNN for a different task? Of course, you can! This is called Transfer Learning 😄!

You mean somehow take the model and perform predictions on new images directly?

Exactly! You can take the model as it is or even part of it depending on the task at hand.

The power of Transfer Learning

Let’s take a look at the questions we’ll answer in this blog:

What is Transfer Learning?
Why do we actually need it?
What are the different methods of using pre-trained CNNs?
What are the most famous advanced CNN architectures?

Take a step back

I assume you’re familiar with Computer Vision(CV) and CNNs but let’s have a quick refresh. Computer Vision is simply a field of artificial intelligence where the goal of a model is to understand the actual content of an image and therefore perform a specific task such as image classification, object detection, semantic segmentation etc…

To be fair, CV started as a field on its own!

Having the input data as images, means we need to extract features before feeding it to the model. The best way to do so is through the convolution operation of a neural network. That’s why it’s called convolutional neural network.

Can I borrow your model?

In traditional machine learning, you train your model on a dataset to solve a well-defined problem. Having a new task, means a new dataset, a new model, and the entire training process all over again.

In deep learning, training a model on benchmark datasets such as ImageNet can take days, but you end up with a great model that can be reused! This is Transfer Learning, where learning new tasks relies on the previously learned tasks. It has a lot of benefits, such as reducing training time, requiring less data for training, and more accurate results.

Transfer of knowledge

This can be done in several ways. The pre-trained model (Model A in the above image) can be used as a:

Pre-trained classifier: where the model is directly used to classify new images as the tasks are related.
Feature extractor: Here, the base model forms the first part of the target model or can be a standalone extractor. In both cases, the output is a vector of features representing the input image. This can be used in tasks where the features are not actually specific to a dataset/task, but rather general and applicable to many datasets/tasks such as edges, corners….
Weights initialization technique: In this case, the weights of the base model are the starting point for the training process of the new model, instead of random values, and will adapt in response to the new problem.

Feature extraction vs. weight initialization

Advanced CNN architectures

Going back to the ImageNet dataset released in 2009, the related ILSVRC annual competition has produced the best models for different CV tasks such as AlexNet, VGG19, GoogLeNet, and much more.

You can check them here.

1. AlexNet: It solves the problem of image classification where the input is an image of one of 1000 different classes and the output is a vector of 1000 numbers. The model consists of 8 layers: 5 convolution layers and 3 fully connected layers.

AlexNet Architecture

Is it that simple?! 😏

Wait for it… What makes AlexNet special is not its architecture, but the following features:

The use of ReLu as the activation function — which has an advantage in the training time
Multiple GPUs: Through his paper, Alex Krizhevsky explained clearly a new way to parallelize the training of CNNs across multiple GPUs. The model’s neurons were equally split between 2 GPUs.
Overlapping pooling: This is actually the same pooling operation that we know with only one difference, having adjacent windows over which the max is computed overlap each other. Overlapping pooling reduced the error by 0.5% and made the model robust to overfitting.

Non-overlapping vs. Overlapping pooling

AlexNet’s developers also applied some image augmentation and the drop-out technique to improve the model’s performance and avoid overfitting. The final model won the challenge by reducing the top-5 error from 26% to 15.3%.

The Top-5 error rate is the percentage of times the classifier failed to include the proper class among its top five guesses.

Now let’s jump into VGG-16/19.

2. VGG-16/19: In simple words, it’s a CNN used for image classification. As shown in the below image, with each convolution, the depth of the image is increasing, so we end up with a one-dimensional vector of size 4096.

The developers suggested two architectures, one of 16 layers and the other of 19 layers. They found that using small convolution filter sizes and increasing the depth of the image improved the model’s performance and achieved a test accuracy of around 74% and a top-5 error equal to 7.3%.

3. GoogLeNet or InceptionV1

As you can tell from the name, it was developed by Google and consisted of 22 layers being the deepest model at that time.

Well, at least I understand why they won the competition in 2014.

Most of the developed models back then focused on varying the kernel size for feature extraction. However, the Inception architecture focuses on parallel processing and the extraction of various feature maps concurrently. Having an architecture based on the inception module was the main feature that differentiates this model from others.

Take a breath, it’s getting complicated 😄

In simple terms, an inception module allows the use of multiple types of filter sizes, in a single image block, which we then concatenate and pass to the next layer.

Unlike the traditional sequential architecture, in this inception module, convolution is performed simultaneously with different filter sizes. The outputs are then concatenated and thus the feature is extracted. The same architecture was then improved after the introduction of 1×1 convolutions for dimensions reduction and thus the model reached a top-5 error of 6.67% which was very close to the human-level performance. In addition, the model reduced the number of parameters from 60 million (AlexNet) to 4 million.

Conclusion

Let’s put it all together!

In this blog, you got introduced to the concept of transfer learning which is taking relevant parts of a pre-trained model and applying it to new but similar tasks. Many advanced CNN architectures, such as AlexNet, VGG16/19, InceptionV1… can be easily implemented using Keras library.

Maybe few years later, we could upload our neural network and maximize its use! Who knows? After all, Lucy and Transcendence got us familiar with the concept 🙄!