This short article gives you a high-level overview of the AI technique known as **artificial neural networks** (ANN). The objective is to convey intuition rather than rigour, sufficient for example to understand this Python code. After reading this, you might like to follow up with the Further Reading list below. I have made heavy use of Wikipedia (and the listed resources) but any errors are likely my own. There is a companion article over on Python3.codes: A Neural Network in Python, Part 1: sigmoid function, gradient descent & backpropagation

**Artificial Neural Networks** were first conceived of in the 1940s, but they were of only theoretical interest until some 3 decades later, when the invention of the backpropagation algorithm began to open up the possibilities for practical implementation in software. Along with tremendous advances in computer processing power, and powerful numerical software libraries, its now possible to write useful programs to solve a wide variety of tasks, like computer vision and speech recognition.

A **biological neuron** is an electrically excitable cell that processes and transmits information through electrical and chemical signals. These signals between neurons occur via synapses, specialized connections with other cells. Neurons can connect to each other to form neural networks. Dendrites are hair-like extensions of the soma which act like *input channels*. These input channels receive their input through the synapses of other neurons. The soma processes these incoming signals and then sends that processed value into an output which is sent out to other neurons through the axon and the synapses.

**artificial neural network**in AI is a program, or programming paradigm, inspired by the way the brain appears to work, at a low level.

Each **neuron** receives a number of inputs. Each input’s value is multiplied by a *weighting factor* for the input channel, then all of the inputs * weights are summed. The initial weights are chosen at random.

This sum passes into an **activation function** which determines the ‘firing value’ of the neuron. A typical activation function is the sigmoid function. A **sigmoid function** is a mathematical function having an “S” shaped curve (**sigmoid curve**). Often, *sigmoid function* refers to the special case of the logistic function defined by the formula S(x) = 1/(1 + e^{x}). There may be a threshold value which the sum must exceed for the neuron to fire.

Neurons are grouped into **layers**, so that there is an *input layer*, an *output layer*, and in between are 1 or more *‘hidden’ layers*. The input layer is fully connected (each neuron’s output goes to all the neurons of the next layer) to the first hidden layer, which is then fully connected to the next layer, and so on.

At the final layer (the output layer) these ‘guesses’ are compared to the expected results for each given input. Of course, these initial guesses are likely to be terribly wrong! So we compute the sizes of the errors, and the directions that the weights need to be adjusted and **backpropagate** this information to the previous layers so that they can tweak the weights. This process is called gradient descent. Then we run the forward phase again. This is repeated several times, maybe even thousands or millions of times, and the guesses improve.

Gradient descent utilises a loss function (sometimes referred to as the **cost function** or **error function**), which maps values of one or more variables onto a real number intuitively representing some “cost” associated with the weight vector. It calculates the difference between the input training example and its expected output, after the example has been propagated through the network.

The output of a neuron depends on the weighted sum of all its inputs:, where and are the weights on the connection from the input units to the output unit (in the case where the neuron has 2 inputs). Therefore, the error also depends on the incoming weights to the neuron, which is ultimately what needs to be changed in the network to enable learning.

Consider an error (or*loss*or

*cost*) function measuring the difference between two outputs. The standard choice is , the square of the Euclidean distance between the vectors and . The error plots therefore take a parabolic shape. The factor of conveniently cancels the exponent when the error function is subsequently differentiated. The partial derivative with respect to the outputs is:

If each weight is plotted on a separate horizontal axis and the error on the vertical axis, the result is a parabolic bowl. For a neuron with weights, the same plot would require an elliptic paraboloid of dimensions. The backpropagation algorithm aims to find the set of weights that minimizes the error.

In pseudocode:

initialize network weights (often small random values)doforEachtraining example named ex prediction =neural-net-output(network, ex) actual =teacher-output(ex) compute error (prediction - actual) at the output compute for all weights from hidden to output compute for all weights from input to hidden update network weightsuntilstopping criterion satisfiedreturnthe network

This structure can be shown to be capable of **learning**, such as performing fundamental logical operations, performing simulations of mathematical functions, playing games, recognising handwriting, speech, items in pictures, and so on. The field has advanced incredibly in recent years, such that these programs can match or even surpass human experts. This is in contrast to ‘algorithmic’ programming, which depends on explicitly specifying the steps required to solve a problem. This can easily become too complex and unwieldy for practical purposes, but an ANN operates more like humans by learning from examples – lots of them.

## Further Reading

- Neural Networks and Deep Learning
- Artificial neural networks are changing the world. What are they?
- The Nature of Code: Chapter 10. Neural Networks
- Neural Networks Demystified
- A Neural Network in Python, Part 1: sigmoid function, gradient descent & backpropagation

**Artificial neural networks** (**ANNs**) or **connectionist systems** are a computational model used in computer science and other research disciplines, which is based on a large collection of simple neural units (artificial neurons), loosely analogous to the observed behavior of a biological brain's axons. Each neural unit is connected with many others, and links can enhance or inhibit the activation state of adjoining neural units. Each individual neural unit computes using summation function. There may be a threshold function or limiting function on each connection and on the unit itself, such that the signal must surpass the limit before propagating to other neurons. These systems are self-learning and trained, rather than explicitly programmed, and excel in areas where the solution or feature detection is difficult to express in a traditional computer program.

Neural networks typically consist of multiple layers or a cube design, and the signal path traverses from the first (input), to the last (output) layer of neural units. Back propagation is the use of forward stimulation to reset weights on the "front" neural units and this is sometimes done in combination with training where the correct result is known. More modern networks are a bit more free flowing in terms of stimulation and inhibition with connections interacting in a much more chaotic and complex fashion. Dynamic neural networks are the most advanced, in that they dynamically can, based on rules, form new connections and even new neural units while disabling others.

The goal of the neural network is to solve problems in the same way that the human brain would, although several neural networks are more abstract. Modern neural network projects typically work with a few thousand to a few million neural units and millions of connections, which is still several orders of magnitude less complex than the human brain and closer to the computing power of a worm.

New brain research often stimulates new patterns in neural networks. One new approach is using connections which span much further and link processing layers rather than always being localized to adjacent neurons. Other research being explored with the different types of signal over time that axons propagate, such as Deep Learning, interpolates greater complexity than a set of boolean variables being simply on or off.

Neural networks are based on real numbers, with the value of the core and of the axon typically being a representation between 0.0 and 1.

An interesting facet of these systems is that they are unpredictable in their success with self-learning. After training, some become great problem solvers and others don't perform as well. In order to train them, several thousand cycles of interaction typically occur.

Like other machine learning methods – systems that learn from data – neural networks have been used to solve a wide variety of tasks, like computer vision and speech recognition, that are hard to solve using ordinary rule-based programming.

Historically, the use of neural network models marked a directional shift in the late eighties from high-level (symbolic) artificial intelligence, characterized by expert systems with knowledge embodied in *if-then* rules, to low-level (sub-symbolic) machine learning, characterized by knowledge embodied in the parameters of a cognitive model with some dynamical system.