1
Introduction to Deep Learning
The deep learning history is often tracked back to 1943 when Walter Pitts and Warren McCulloch computed a model (computer model) which support the human brain (the neural networks). To mimic the thought process of human, they used a collection of algorithms and arithmetic concepts which is called as threshold logic. Since 1943, Deep learning [1] is evolving without break. As it utilizes multiple algorithm in multiple layers to mimic the thought process by processing the data, understanding the human speech and visually recognizing objects. Here the information is processed by passing it into multiple layers, where each layer's output act as an input for the next layer. The first layer in a network is called the input layer, while the last is called an output layer. All the layers between the two are referred to as hidden layers. Each layer is typically a simple, uniform algorithm containing one kind of activation function. Deep learning's other aspect is feature extraction. It uses an algorithm to automatically construct meaningful "features" from the data which will be used for training, learning, and understanding. Mostly, a data scientist, or a programmer, is responsible for this process.
1.1 History of Deep Learning
Right now, the world is seeing a global Artificial Intelligent revolution across all industry with the driving factor as deep learning. Google and Facebook using deep learning now, and it has not appeared overnight, rather it evolved slowly and steadily over several decades. There are so many machine learning researchers worked with great determination behind this evaluation. All might be surprised to know the key discoveries of deep learning made by our researchers from 1940s has been illustrated in Table 1.1.
For the evolution of deep learning, many researcher's contributions directly or indirectly, would have influence in the growth. Here, to present the history of deep learning with the help of some key moments have been attempted. Work has been made to present the chronological events of deep learning history as accurately as possible. For more information on the above chronological events, refer to the links given in the references. All these inventions incorporate mathematics that quantify uncertainty by probability. Introducing probability concept to deep learning helps deep learning-based system to act like a human with common sense. Which mean when dealing with real world, these systems make decisions with incomplete information. Using probability in deep learning helps to model components of uncertainty.
Table 1.1 Key discoveries of deep learning.
Year Inventor name Invention/technique 1943 Warren Mcculloch and Walter Pitts Computer Model based on human brain
[2]-Threshold Logic (combination of algorithms and mathematics) 1957 Frank Rosenblatt Perceptron (Binary Classifier)
[3]-has true learning ability. 1960 Henry J. Kelley Neural Network
[4]-Continuous Back Propagation Model. Used to recognize the kinds of patterns. 1962 Stuart Dreyfus Updated the Neural Network back propagation model with chain rule
[5]. 1965 Alexey Grigoryevich Ivakhnenko and Valentine Grigor'evich Lapa Multilayer Neural Network with activation function and GMDH (Group Method of data Handling) 1969 Marvin Minsky and Seymour Papert Frank Rosenblatt perceptrons are proposed with multiple hidden layers. 1970 Seppo Linnainmaa Generated and implemented the automatic differnceable back propagation method in computer code. 1971 Alexey Grigoryevich Ivakhnenko 8 Layer Deep Neural Network with GMDH (Group Method of data Handling) 1980 Kunihiko Fukushima Convolutional Neural Network with Neocognitron. 1982 John Hopfield Recurrent Neural Network (Hopfield Network)-it acts as a content addressable memory system. 1982 Paul Werbos Proposed the steps to use Back propagation 1985 David H. Ackley, Geoffrey Hinton and Terrence Sejnowski Stochastic Recurrent Neural Network (Boltzmann Machine). 1986 Terry Sejnowski NeTalk-Talking Neural Network. 1986 Geoffrey Hinton, Rumelhart and Williams Implemented back propagation in neural network
[6]. 1986 Paul Smolensky Updated the Boltzmann Machine to Restricted Boltzmann Machine with connection between input and hidden layer. 1989 Yann LeCum Implemented back propagation in Convolutional Neural Network. 1989 George Cybenko Implemented Feed Forward Neural Network
[7] to approximate continuous function. 1991 Sepp Hochreiter Identified the vanishing gradient problem in deep neural network. 1997 Sepp Hochreiter and Jurgen Schmidhuber LSTM
[8]-Long short-term memory 2006 Geoffrey Hinton, Ruslan Salakhutdinov, Osindero and Teh Deep Belief Network
[9] 2008 Andrew NG's Implemented GPU with Deep Neural Network 2009 Fei-Fei Li Launches ImageNet dataset with deep learning. 2011 Yoshua Bengio, Antoine Bordes, Xavier Glorot Implemented ReLU activation function in Neural Network as Rectified Neural Network to solve vanishing gradient problem. 2012 Alex Krizhevsky AlexNet-Implemented GPU with Convolutional Neural Network (CNN) for image classification. 2014 Ian Goodfellow Created Generative Adversarial Network (GAN) 2016 Deep Mind Technology AlphaGo-Deep reinforcement Learning Model 2019 Yoshua Bengio, Geoffrey Hinton and Yann LeCun Turing Award 2018
1.2 A Probabilistic Theory of Deep Learning
Using probability in science quantify uncertainty. When lots of data are utilized by machine learning and deep learning for training and testing a model to find patterns, only data is utilized by the system instead of logic. At that time uncertainty increases with relevant probability. In deep learning, most of the models like Baysian model, Hidden Markov model, probability graphical model completely depend on the concepts of probability. Since these system using real world data they have to handle chaoticness with the help of tools. The simplified versions [10] of probability and statistics are presented here in terms of deep learning. In Figure 1.1, the MNIST digit recognition dataset have been shown, which is a hello world program of deep learning. This explains the foundation of probability concept in terms of deep learning. This data set is used to classify the handwritten digits and label them.
Figure 1.1 MNIST data set.
To do this classification task, the system which planned to create using machine learning is not going to be an accurate one. To do this task, the below neural network in Figure 1.2 has been used to process the input images of 28*28 pixel.
The input image of size 28*28 is feed into input neural network layer. In this layer input image is computed by multiplying with weights (w) and bias (b). Each neural network layer has ten neurons in which each digit processed to proceed further with activation function. At the end, probability of each digit of length 10 along with the vector value is obtained as output. Here in the output vector, the highest values index with its probability have been obtained using argmax. The detailed explanation of neural networks and its layers will be discussed in detail in chapter 4. The purpose of using neural network example in this chapter is to understand how some basic probability concepts used in deep learning. Consider a given vector below in Eq. 1.1.
(1.1) Figure 1.2 Structure of neural network.
The hidden probability and distribution concepts, such as sample space, random variable, probability distribution, discrete distribution, conditional probability, normalization, joint probability, marginal probability, continuous distribution, binomial distribution, uniform distribution, normal distribution, and softmax distribution along with how these concepts correlate with input data used by neural network have been explained below.
Sample Space-All the set of possible data used in a procedure to test a hypothesis.
In terms of MNIST data set, the input used to compute sample space is...