# Artificial neural networks in a nutshell

An artificial neural network (often shortened as ANN) is, an attempt at artificially representing the network of neurons in the brain and their functioning with software. It consists of neural units (

*artificial neurons*), each capable of receiving input and producing output based on rules, and which are connected together.The fun thing about ANNs, and the whole point of them, is the fact that they are meant to mimic "learning". Inputs to the network are weighted and the learning mechanism consists in an iterative self-adjustment of the weights in such a way to achieve optimal correspondence to the desired result on training data. This way, the network emulates the synapses of the brain in their capability to carry information from one neuron to another.

The biological metaphor of these algorithms to actual neural networks in the brain is more of an inspiration than a grounded reality. ANNs were conceived with the idea to mimic how the human brain works but in reality they are a far shout from actually doing this comprehensively, and also, we don't know the human brain well enough yet anyway - it is a very complex system which is hard to accurately represent in a simplified model. In fact, it is actually confusing to say that neural networks "mimick the brain", as the brain doesn't really work as ANNs do. On this, the discussion in the opening chapter of Chollet's book is a very good one.

- real neurons are slower than artificial ones, but there's really plenty of them (a human brain contains an order of magnitude of 100 billions neurons) and the way they communicate is non-trivial
- real networks use energy very efficiently
- real networks can do several highly complex operations at one time

This here in this figure is the generic and schematic model of an artificial neuron. Several input data

$(x_1, \ldots, x_n)$

are streamed into the neuron and a *transfer function*(which we indicate with$f$

, note that it can also be called *activation function*) combines them with weights$(w_1, w_2, \ldots, w_n)$

(usually in a linear combination) to determine what the neuron computes. Then an *output function*spits an output of the neuron based on a threshold$t$

the neuron is equipped with.See the page on the sigmoid neuron for a more precise definition of the typical activations and transfers, with sigmoid neurons.

A (feedforward, see below) neural network. Image from Nielsen's book.

To build a network of neurons, what you have to do is put several of them together in a way that they can communicate. Neurons are grouped into

*layers*, groups that are at the same level, so that communication is passed from one layer of neurons to the other. In a network, there is- an
*input*layer: the one at the start of the communication process - an
*output*layer: the one that spits the final result at the end of the process - one or more
*hidden*layers: the layers in between that constitute the intermediate steps

Each layer can be composed of however many neurons you wish. This means that if there are

$n$

neurons at a given stage, each neuron in the following stage will receive$n$

inputs.In much the same way as the transfer function uses a combination of weighted inputs into a neuron, the input to any neuron in a certain layer is a weighted sum of all outputs of the neurons of the previous layer. The way learning is evaluted is through a

*loss function*of the network, which is a way of telling whether its output matches the ground truth.This here is no more than a super-quick and very high-level introduction to several types of neural networks, the details of which are explored elsewhere in this chapter. You can find a more comprehensive outline of the different types of networks in the Neural Network zoo, with great and coloured illustrations by F Van Veen. The article also reports some important papers about the mentioned networks.

Note that the categories of networks listed here are not necessarily mutually exclusive, because they may describe different properties of the network. For example, feedforward networks can be deep or not deep.

In a feedforward network, communication flows in a horizontal way: the output of neurons in a certain layer is passed to neurons in the next layer horizontally, there is no going backwards.

Feedforward networks of artificial neurons were conceived straight with the birth of the perceptron (see page), so in the 1950s.

Recurrent networks have loops, so the output of a neuron can be fed back to the neuron itself, allowing for the dynamism which is missing in the feed-forward model. These types of networks are implemented in such a way that there is the time factor embedded in, meaning neurons fire only within a specific window of time, allowing for feedback communication to not be propagated instantaneously (which would be difficult to control). These types of networks have a concept of

*memory*and there's several types of them.Recurrent networks were born in the 1980s. They are particularly suited for problems which involve the temporal component, like those dealing with natural language.

Deep neural networks are those beasts performing

*deep learning*, this (relatively) new trend in machine learning/artificial intelligence which is starting to tackle very complicated problems with impressive results. Being deep for a network means nothing more than having several hidden layers, allowing for enormous complexity. Networks that are not deep are called*shallow*.Deep Learning as a thing (a field) is not a new concept, it dates its birth back from the 1980s, but their big resurgence has been in the year 2006 when they have been finally shown to be capable to learn in an efficient way. Before then research on deep architectures hadn't reached the point where these tools could be put to use for any practical reason, due to time complexity and overall lack of efficiency.

Convolutional networks are deep and feedforward. In the convolutional layers of these networks not every neuron is connected to every other neuron and the output is obtained via a convolution operation on the input data. Convolutional networks are well suited for tasks related to vision, that is, where the input data consists of images: for these sorts of tasks, in most typical case, a "normal" feedforward networks would have to perform too many operations and be too large to be of any practical use, while the use of convolutions saves complexity.

The inspiration for these categories of networks came from the vision systems of the biological world, and this is why they have been designed specifically for machine vision tasks. An image gets passed to the network in batches of input data: at the very start, the first batch of

$n$

pixels gets in, then a counter is shifted by one pixel and the second batch of$n$

pixels goes in. This mechanism is loosely borrowed from what the neurons in the visual cortex do. They only deal with a certain part of the visual field at once, that is, with a pixel and its neighbours.The first convolutional networks date from the 1990s (even though the concepts are decades older) but they became ubiquitous in the 2010s with the many visual applications they serve nowadays. In fact, they are particularly suited for image tasks as exhibit a natural ability to capture spatial structures.

The training process of a network is not different from the general training process of supervised algorithms: you want to minimise a cost function that gives a comprehensive measurement of the mistakes (differences between predicted points and real points). You do this via gradient descent (see page), which minimises the error function, that is to say the difference between the expected value of the output and the value the network outputs.

Neural network are equipped with a mechanism called backpropagation (see page), which acts during gradient descent and allows the weights to be adjusted continuously in order to iteratively improve the accuracy of the results (minimising the loss function). This is the what the

*optimiser*of the network is tasked with. What backpropagation does in practice is computing the derivatives of the cost function with respect to the weights and propagating them from the last layer back to the first one. Each weight gets modified iteratively by an amount which is proportional to the derivative of the cost function with respect to it, which is what gradient descend uses.Weights, at the very starting stage of training, get initialised at random.

It can be shown (well, it's been mathematically proven, see Cybenko's paper) that neural networks can be "taught" to approximate any continuous function: the more neurons, the better the approximation achieved.

- 1.
- 2.
- 3.
- 4.Comparison of artificial neural networks and human brains on solving number series, from a lecture at the University of Bamberg
- 5.G Cybenko,
**Approximation by superposition of sigmoidal function**,*Math Control Signal System*, 2, 1989 - 6.F Chollet,
**Deep Learning with Python**,*Manning*, 2017