Convolution Neural Networks

In general, Neural networks are good in identifying the patterns in the given data but if the patterns situated in different places in the given data then the neural network model which trained initially to recognize the pattern in the data may not able to predict the pattern correctly though there is a patter in the data due shift variant. So, we need a system which can predict the pattern in the data irrespective of position of the pattern.

In 1980 Hubel & Wiseley done an experiment on cat showing different angle of images and recorded the brain stimulus signals.

Later Kunihiko Fukushima who was Computer scientist from Japan developed a NeoCognitron Model which is similar to stimulus responses of mechanism in the brain

In this model he said that in our brain mainly there are two important kind of cells.

Us (Simple) cells
Uc (Complex) cells

The Us cells acts like capturing the information and Uc cells acts like filtering the information. In layman terms we are using similar mechanism in Neural networks. In Convolution layer we use Us cells [Kernel size] which block of input and Filters acts as Uc which apply filter on the information to maintain required information to pass to next layers.

From the above figure it is clearly stating that the input image of size 28*28 with channel_size=1[No. of .Dimensions] (for grey image we have only one dimension and for RGB we have 3 dimensions) when we applied convolution i.e. to this image when 'N' No. of filters applied on the image we get that 'N' No. of Feature maps(We called the output of convolution layer (which is nothing but the filter applied on input ) called as Feature map). If you observe after the convolution we have size of 24*24*n1 (Here n1 represents No. of filters applied in convolution process suppose if n1=10 that means 10 filters are applied on the input) this 24 size will be get from the formula:

Output_size=((Input_size+(2*padding)-Size of filter) / stride ) +1

Here Input_size=28, padding=0, size of filter =5*5 ,stride=1.

Once refer to below images

for clarification on Filter how we apply filter on the given image

for clarification on stride

Here in the below image they took stride =2 so the red arrow size is 2 for each operation.

if stride=1 then that will looks like

here we got result as from applying filter [[1,0,-1],[1,0,-1],[1,0,-1]] on the one block in input [[3,1,1],[1,0,7],[2,3,5]] by applying element wise product and summation we get -7. similarly we apply filter for each block of image and we get our complete values for output

In the next we use down sampling mechanism such as Max pooling and Average pooling to down sample the feature map. We do this to make the reduction of size of feature map so that at end we will get 1*1 input size which we pass to MLP(Multi layer perceptron) to get out target class

For clarification on down sampling mechanisms such as Max pooling and Average pooling refer below images

When we apply a filter of size 2 and stride =2 then we get first block as [[1,1],[5,6]] out of these the max value is 6 and we took that maximum value in that particular block. Since we are considering Max of values we call this as Max pooling. Similarly we apply this for all the other tree blocks of elements.