Deep Learning Models

There are different types of deep learning models including Single Layer Perceptron Model (SLP); Multi-layer Perceptron Model (MLP); Convolutional Neural Networks (CNNs); Recurrent Neural Networks (RNNs); Restricted Boltzmann Machines (RBMs); and Deep Belief Networks (DBNs). This chapter takes a look at these models in greater details.

Single Layer Perceptron Model (SLP)

An SLP is made up of at least one artificial neurons that are parallel. These neurons could be of different types or be the same as is the case with Artificial Neuron Applet. Each of the neuron’s layers has a single output for the network and has connections to the external inputs.

In Deep Learning, the single layer perceptron model provides a way in which human categorization behavior is modeled. As such, the SLP is a central concept in any given categorization model. It is needed when modeling retrieval stage (mapping the features of stimuli to class probabilities).

The SLP can be structured in different ways, the below schematic figure showing one of such structures.

Single Layer Perceptron Model 1

The top small circles, which are represented by F, make up the input layer which also takes in bias. The middle section which comprises large circles is the output layer. In this layer, the mathematical notations indicate summation and sigmoid. These expressions do not need further explanation.

As already seen, the SLP model does not have any hidden layer. There are only the input layer and the output layers. The input nodes are responsible for the stimuli features. In the figure shown above, these are represented by the circles at the top row. The features go through the input nodes without experiencing any changes. For each stimuli feature, there is a specifically assigned input node. The figure above has 4 input nodes, with the node labeled ‘1’ taken as the bias node. As opposed to the working of other nodes which transfer the value of a feature, the bias node has a fixed output value of 1.

There is a direct connection between the output nodes and the input nodes, the bias node included. For each of these connections, there is an associated weight. As the feature value goes through the connection, it gets multiplied by the respective weight of the given connection before it reaches the output node.

Multi-layer Perceptron Model (MLP)

This is a feed-forward ANN which receives a set of inputs out of which it yields a set of outputs. The multi-layer perceptron model is made of numerous layers of input nodes which have connections to the output nodes as a directed graph. The network is trained via a technique referred to as back-propagation.

An MLP is a Deep Learning method that is used in dealing with problems that need supervised learning. Parallel distributed processing and computational neuroscience research also make use of the model. Having started off in the 1980s, MLPs have developed rather fast, getting applications in several areas including image recognition, speech recognition, and machine translation (DeepLearning, 2016).

Continued studies on multi-layer perceptron have proven that they can approximate XOR operator in addition to a host of other non-linear functions. Perceptrons, as building blocks, tend to be of significant use when taken as larger functions as is the case with multi-layer Perceptrons.

During your initial interactions with Deep Learning, you are better of starting at the MLP level. It could be considered as the ‘hello world’ of Deep Learning. That is because even though they are complex to some greater extent, they are simpler to understand and get some use out of them.

During the application of the MLP model to supervised learning problem, input–output pairs are trained whereby they get to model dependencies between the inputs and outputs. The training happens in such a manner that parameters, weights and biases are adjusted with the aim of minimizing error. Relativity is maintained between the weight and bias, and the error with the aid of back propagation technique. Root mean squared error (RMSE) is the most preferred method for measuring the error.

Cybenko’s theorem considers the multi-layer perceptrons as some kind of universal function approximates.  As such, they could be relied on, in combination with regression analysis, to tackle mathematical models. This is because they are effective classifier algorithms (SkyMind, 2014).

As already seen, the MLP is a feed-forward network. Their operation resembles a ping pong or tennis. Two types of motions are mainly depicted, which happen with some consistency. There is regular back and forth, similar to ping pong of guesses and answers whereby the guess attempts to test whatever is known, while the answer is a feedback indicating where we might have gone wrong.

Convolutional Neural Networks

The development of convolutional neural networks (CNNs) is inspired from the manner in which the brain is wired and works. Our brains help us to constantly analyze the world around us, making predictions of whatever we see without conscious effort. When we encounter something, we remember what we had seen in the past and label it accordingly. For instance, when you observe a baby shedding tears, you quickly come to the conclusion that this kid is crying because you have seen in the past that most children shed tears while crying. This kind of labeling has been possible due to complex inter-connections made between the brain and the eyes.

Just the same way as the brain works in collaboration with the eyes, computers are able to ‘see’ the world, although from a different perspective. Even though their perception of an image is different from ours, it is possible to train them such that they can recognize patterns just like us. The CNN artificial neural network is used to empower algorithms to identify objects and images. A significant part of their operation is in the convolution part of their name.

Fully inspired by the brain, the CNNs have an architecture that is different from that of other neural networks. When it comes to the regular neural networks, an input goes through a series of hidden layers for it to be transformed. Each layer has a set of neurons which are connected to neurons before and after the current layer, and all the way to the output layer where predictions are made.

That is not the case when it comes to CNNs. For starters, the organization of the layers is done in terms of 3 dimensions, that is, width, height, and depth. There are connections between the neurons of a given layer and those of other layers, but only to a lesser extent. The output on the other hand is minimized to one probability scores.

The Convolutional Neural Networks are made up of two components: the classification part and the feature extraction part. In the feature extraction part, the network does several pooling and convolutions operations which lead to the detection of the features. For example, if the picture of a cow is fed into the system, features like two ears, horns, and four legs would be recognized.

The classification part serves as the classifier, assigning probability to the detected image which the algorithm predicts.

Recurrent Neural Networks (RNNs)

Our thinking does not always begin from scratch. For instance, when you read a book, you understand different words due to the fact that you still have an understating of past words. There is persistence in your thoughts in that you do not discard everything away and begin thinking afresh.

One major problem that we had with traditional neural networks is that they were unable to do so. For instance, if you wanted to classify the different scenes of a film, it was not clear how the traditional neural network would base on to determine later features of the movie. This issue is addressed by recurrent neural networks (RNNs). Loops are introduced into the networks in order to ensure that information persists.

The actual definition of a recurrent neural network may vary but the facts are that it belongs to artificial neural networks and that connections are existent between units, forming a directed graph sequence. In so doing, the network is able to express a dynamic temporal behavior for a given time sequence. In contradiction to the feed-forward neural networks, the recurrent neural networks are able to process inputs to the sequences based on their internal state, which some experts prefer calling memory. This feature has made them be used for a number of tasks like next word prediction, speech recognition, image captioning, music composition, handwriting recognition, detection of time series anomaly, and stock market prediction (Britz, 2015).

RNNs are considered in situations where your model requires context so that they can yield an output from the input.

To better understand this, consider people who like watching movie series. The fact that you have watched up to episode 3 of a given series means you have the context up to that point of time and can relate everything to it (Banerjee, 2018).

This type of neural network is able to remember everything it has encountered. Whereas the other neural networks have independent inputs, the inputs in RNNs are dependent on each other. By remembering these relations, the RNN becomes much improved at predicting the output. As it goes on with the self-training, it retains memory of all relations, a feature that is achieved via loops.

The one common thing that remains between RNNs and traditional neural networks is the manner in which they are trained. Back propagation technique is used. However, a few twists are done, coming up with a term referred to as Back Propagation through Time (BPTT)(Banerjee, 2018).

The reputation of RNNs has increased today as researchers look forward to solving real-life problems faced by industries.

Restricted Boltzmann Machines (RBMs)

Invented in 1986 by Paul Smolensky as Harmonium, restricted Boltzmann machines (RBMs) has undergone numerous refinement, including fast-learning algorithms. This has given them wide range applications in classification, topic modeling, feature learning, collaborative filtering, and dimensional reduction. Both supervised and unsupervised techniques can be used to train them as determined by the task to be accomplished (Skymind, n.d.).

Just as indicated by their name, RBMs are developed from Boltzmann machines and include the restriction that a bipartite graph ought to be formed by their neurons. This means that both hidden and visible units are used in the neurons and that these may or may not have symmetric connections. There is also absence of connections in the nodes in a group. These are contradicted by ‘unrestricted’ Boltzmann machines in which the hidden units can have connects. The restriction is important because it makes it possible to achieve effective and efficient training algorithms. When RBMs are stacked up, deep belief networks are formed.

The Restricted Boltzmann Machines are a part of a group of models referred to as energy-based models (EBMs). The EBMs work by associating various scalar energy to the relevant variables. Learning happens through modification of the energy function, forming a shape that has desirable features (Oppermann, 2018).

The structure of the RBMs is simple. They have two layers, the visible layer and the hidden layer. The visible layer is also referred to as the input layer. It has nodes in which calculations occur. There is an inter-connection of the nodes across the layers. Two nodes within the same layer cannot be connected. In other words, there is no communication within the layer, hence the ‘restriction’ part of the name. The visible node receives a feature so that learning can happen (Oppermann, 2018).

When it comes to the hidden node, an input is received which is then multiplied by the associated weight. These are the weights that are assigned to the connection between two layers. After the multiplication, the result goes via the activation algorithm to yield a single output for each hidden node.

Restricted Boltzmann Machines undergo training to enhance the accuracy of the probabilities. The contrastive divergence (CD) algorithm is often the most preferred algorithm when it comes to RBMs training.

Deep Belief Networks (DBNs)

Just as the name suggests, a Deep Belief Network is a subclass of Deep Neural Network. It is made up of several layers of hidden units (latent variables), with the layers inter-connected. However, no inter-connections happen within individual layers (Wikipedia, n.d.).

The Deep Belief Networks are formed as a result of stacking and training of RBMs with the perspective of making them greedy. These DBNs have the ability to learn from the training data on what they should do to extract deep hierarchical representation.

DBN has a unique ability of learning how to reconstruct inputs in a probabilistic manner, unsupervised. Features are detected by the layers. Once the learning has been finalized, the Deep Belief Networks could undergo extra supervised training to make them efficient at classification.

When researchers realized that it is possible to train DBNs in a smooth manner, layer after layer, they were able to come up with some of the most-advanced Deep Learning algorithms. With such progress, it led to the emergence of a host of applications for Deep Belief Networks including in drug discovery and electroencephalography (DeepLearning, n.d.).

Geoffrey Hinton proposed a training method for RDBMs referred to as contrastive divergence (CD). This is one of the most effective training methods when it comes to learning the weights. The researcher proposed an equation which would be applied throughout the training process. This equation is as shown below (Wikipedia, n.d.):

Geoffrey Hinton proposed training method for RDBM

After training of the RBM, a second RBM could load up on it, with this new RBM receiving from the initial one following a successful training. The new RBM undergoes the same training that the first one went through and the process is done repeatedly until desirable results are obtained.

The CD training process has proven to be a viable option owing to the positive results which researchers have been able to get so far.

Leave a Reply

Your email address will not be published. Required fields are marked *