Image Prediction Using a Pre-trained Model

This article was published as a part of the Data Science Blogathon.

Introduction

Researchers from all over the world compete to create the most precise and effective picture recognition systems. Therefore, it often makes seving them bend their own neural network designs from begin. Even better, researchers shared the trained neural network versions of these network architectures after training them on sizable data sets. Therefore, we can use such already-trained neural networks either directly or as a jumping point for our training.

Image Prediction
Source: Photo by Cristian Castillo on Unsplash

Datasets

The Wordhierarchy (now only the nouns) is used to organize the images in ImageNet, where thousands of photographs represent each node of the hierarchy. The project has significantly advanced the fields of deep learning and computer vision. Researchers can access the data for free and for academic purposes.

Image Prediction
Source: ImageNet

The ImageNet Large Scale Visual Recognition Challenge, or ILSVRC, is an annual image recognition competition held by ImageNet. International teams from colleges and businesses compete to create the most precise image recognition models. The pre-trained models included with Keras were developed using the smaller data set utilized for this competition. Images of 1,000 various types of objects, including food and animal breeds, are included in the data set. For instance, a Granny Smith apple is one of the object types in the data set. More than 1200 images of just this type of apple are included in the data collection.

Pre-Trained Models

Some of the pre-trained models for image classifications:

VGG

A deep neural network called the VGG has 16 or 19 layers. In 2014, it represented the cutting edge. Its convolutional neural network design is quite conventional. Because it is simple to use and comprehend, it is still frequently used as the foundation for other models. But more recent designs typically have better efficiency.

ResNet-50

ResNet-50, a 50-layer neural network representing the state of the art from 2015, can be more accurate using less memory than the VGG architecture. ResNet employs a more intricate design where upper levels of the neural network are connected to numerous layers below them in addition to the layer directly beneath them.

Inception v3

Another excellent-performing design from 2015 is Inception v3. It has an even more intricate layout built around layers that split off into numerous distinct paths before coming back together. These networks demonstrate the growing complexity and size of neural networks in research in 2014 and 2015 to improve accuracy. Modern neural network architectures are frequently more specialized.

What and for what purpose is the pre-trained model?

  1. These are models which are complex networks with a lot of variables.
  2. Training such a network typically takes a lot of time and resources.
  3. Even if it differs somewhat, we may take off the top layer and only train the weight of that layer (Transfer Learning)

Solution

Now, let’s implement a pre-trained model to recognize objects and images. I’ll be using the Keras library as all of the pre-trained models are included with Keras. I’ll only be covering the VGG model.

Now without any further ado, Let’s start.

Firstly, let’s import all the required packaG16 from the Keras application.

https://gist.github.com/callmemaze/3baef5f752bc78c0a83139888b3ae1e5

Next, let’s load the image file to process. I have used a dog picture but feel free to use any picture you like. It is too big for us to use a neural network to process the image we are loading. The image size must correspond to the number of input nodes in the neural network when you feed it photos. Images we put into the network for VGG must be 224 by 224 pixels in size. We, therefore, set the target size parameter to that value. In addition, we’ll use the image.img to array method to transform the picture data into an array of numbers that we can then feed into the neural network.

https://gist.github.com/callmemaze/005a07ead44112ac6f76962ed0ba6dbe

Now, we give the image the fourth dimension. This is so that Keras can receive an array of many photos at once. As a result, our single image becomes an array of several images, each with a single element. Images must always be normalized before being fed into the neural network such that each pixel’s value is between zero and one. Pre-process input, and a built-in normalizing function in the VGG model will accomplish that. All we have to do is phone it and send our information. In order to pass in our data, which is x, we will say vgg16.preprocessinput. Now let’s also visualize the image using matplotlib library.

https://gist.github.com/callmemaze/891cfeb4faea3fca98555c4cf3d1f811

Now, we’ll create a new instance of the model by creating a new vgg16 object.

https://gist.github.com/callmemaze/88684dfbe6351dbb8de576be4423bdb7

We are now prepared to make an image prediction using the normalized data and neural network. By using model.predict and supply our data, we can accomplish this. The forecasts we receive will be a floating point number array with 1,000 elements. The array’s elements will each indicate the likelihood that each of the 1,000 things the model was trained to recognize is present in our image. The names of the most likely matches are provided by the decode image prediction function of the VGG model, which makes things simpler. Here, vgg16.decode forecasts can be called. After that, we simply send in the image prediction object we already built. It will automatically provide us with the top five most likely matches.

https://gist.github.com/callmemaze/884dd9dc0e7d1fc7a8c8226b4e18da6f

The image prediction fits our image pretty well, in my opinion. Standard poodle, kuvasz, and Labrador retriever are a few other matches. All of those are plausible hypotheses for the image. Please feel free to try again with your own photographs. It’s entertaining to observe the forecasts it will make and the kinds of images that will perplex it.

Conclusion

We have successfully implemented VGG pre-trained models and predicted the image using it. I’ve merely given a general overview of the VGG pre-trained image categorization algorithms and how to use them. But because this is a constantly expanding field, there is always a fresh model to anticipate and new frontiers to explore. I implore you to test the models as mentioned above on various datasets with various parameter settings and report your findings in the comments below!

Endnotes: 

  • The pre-trained models are like magic; we may use them right away without any training or data by simply downloading them.
  • If the source job and the target task differ but the domains are somewhat similar, we may need to train a few layers, but it won’t be as time-consuming as starting from scratch and will require considerably fewer data.
  • A good use case for importing an existing model and running image prediction right away is in the early stages of prototyping or just trying a model out. However, fine-tuning a network is still the preferred course of action.
  • The neural network doesn’t typically need to be trained from scratch. Instead, we can utilize an existing neural network and modify it to solve a new problem via transfer learning.

Where are the codes?

The complete code can be found on Github here. I have used Google Colaboratory but feel free to use which you are comfortable with. While you are there, a star would be very helpful.

Reach out to me

Please don’t be shy. Let’s communicate

Thanks for reading, and happy learning and happy coding! See you next time.

The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.

Leave a Reply

Your email address will not be published.