Convolutional Neural Networks are a type of neural networks that are majorly used for image recognition and classification. While simple neural networks can classify an image, if the image has large pixels it will increase the number of parameters for the neural network which will indirectly increase the processing time and slow down the network. This is exactly where Convolutional Neural Networks can come to the rescue. CNNs are a form of specialised neural networks that help in processing data that has an input shape, like a 2D matrix or images. In this article, we’ll discuss what graph convolutional networks really are and how they work.

What are Convolutional Neural Networks?

Convolutional Neural Networks (also known as CNN or ConvNets) are deep learning algorithms that take images as input, assign importance to the various features or aspects of the image, and differentiates one image from the other. While initially, filters have to be manually set by programmers, with adequate training, CNNs can learn the filters and unique image characteristics on their own. Convolutional Neural Networks are majorly used for image classification, image recognition, object detection, edge detection, semantic segmentation, object tracking, and video classification.

Convolutional Neural Networks
Source: Medium

How do Convolutional Neural Networks Work?

Just like other deep learning algorithms, the architecture of CNN is also based on the human brain, visual cortex to be specific. Every time we see something through our eyes, numerous layers of neurons get activated. Every layer is responsible for detecting a certain set of features like edges and lines. The combined output of these layers is what helps us recognise what we see. CNN works pretty similarly with different layers identifying different aspects of the input image.

Convolutional Neural Network Layers
Source: Medium

The architecture of the convolutional neural networks includes the following layers:

1. Convolution Layer

The main goal of the convolution layer is to extract features like colours, corners, and edges from the input. As we go deeper, the algorithm will be able to identify complex features like digits and shapes. It creates a dot product between two matrices – one matrix includes the restricted portion of the image and the other matrix includes a kernel or filter (A matrix with a set of learnable parameters).

After the input has passed through the entire convolution layer, we will have a featured matrix with much lesser dimensions or parameters than the input image. It will also have more clearer features. Furthermore, after every convolution operation, an additional non-linear operation called ReLU (Rectified Linear Unit) is used in order to introduce non-linearity. The reason behind using ReLU is that in the real-world data we want CNN to learn. Since CNN is also non-linear but the convolution layer performs the linear operation by default.

2. Pooling Layer

The pooling layer is responsible for reducing the number of parameters when the image is still too large. It decreases the overall dimensions of the featured matrix while still extracting dominant features. This process of reducing dimensionality and retaining important information is called spatial pooling. The reason behind the reduction is to decrease the overall computational power and time taken to process each image.

3. Fully Connected Layer

In this step, we flatten the matrix received from the pooling layer and feed it into a fully connected layer, just like any other neural network. We combine the different features of the image to create a model and classify the image into different categories.

Why CNN Over Feed-Forward Neural Networks?

The most common question when it comes to CNNs is — Why are we even using them over normal feed-forward neural networks? Why can’t we just flatten an image and feed it directly to the network? That’s because simple neural networks might show some precision when classifying basic binary images, but they will have little to no accuracy when handling complex images with pixel dependencies. Moreover, they don’t have the computational power to handle images with large pixels.

With a CNN, it becomes possible to successfully identify temporal and spatial dependencies in the images by applying important filters. The reduction in the total number of parameters and the reusability factor of weights also helps create a more fitting image dataset. In simpler words, CNN can be trained easily to understand even the most complex of images.

Now that you know what convolutional neural networks are and what goes inside CNNs, you’ll know that CNNs are working in the background when you see an image recognising and detecting algorithm. To get a better understanding of CNNs and neural networks in general, it’s always best to start from the basics — machine learning. Springboard offers a comprehensive machine learning course online which offers 1:1 mentorship, career coaching along with a job guarantee. Take a look at the machine learning program details and apply today!