Artificial Neural Network (ANN)

An Artificial Neural Network (ANN) is a computational model inspired by the structure and functioning of the human brain. It consists of layers of interconnected nodes (called neurons) that process information in a way similar to biological neurons. These networks are capable of learning complex patterns from data by adjusting the connections (called weights) between neurons through training.

ANNs are widely used in tasks such as image recognition, natural language processing, speech recognition, and robotics, where traditional algorithms struggle. By mimicking the brain’s ability to learn from experience, ANNs form the foundation of deep learning and have become a key tool in modern artificial intelligence applications.

From Biological to Artificial Neurons

Surprisingly, ANNs have been around for quite a while: they were first introduced back in 1943 by the neurophysiologist Warren McCulloch and the mathematician Walter Pitts. In their landmark paper “A Logical Calculus of Ideas Immanent in Nervous Activity,” McCulloch and Pitts presented a simplified computational model of how biological neurons might work together in animal brains to perform complex computations using propositional logic. This was the first artificial neural network architecture. Since then many other architectures have been invented, as we will see.

🧠 Landmarks in the History of Artificial Neural Networks

1943 – McCulloch & Pitts Model
- Warren McCulloch and Walter Pitts proposed the first mathematical model of a neuron, laying the foundation for neural networks.
1958 – Perceptron by Frank Rosenblatt
- Introduced the Perceptron algorithm, the first neural network model capable of learning.
1969 – Perceptron Criticism by Minsky & Papert
- Published a book showing the limitations of single-layer perceptrons (e.g., inability to solve XOR), which caused interest in ANNs to decline.
1986 – Backpropagation Algorithm (Rumelhart, Hinton, and Williams)
- Reignited interest by enabling multi-layer neural networks (MLPs) to learn through error backpropagation.
1998 – LeNet-5 by Yann LeCun
- A successful Convolutional Neural Network (CNN) for digit recognition (used in reading ZIP codes and checks).
2006 – Deep Learning Breakthrough (Hinton et al.)
- Geoffrey Hinton introduced Deep Belief Networks, proving that deeper networks could be trained effectively using pretraining.
2012 – AlexNet Wins ImageNet Competition
- A deep CNN (AlexNet) dramatically outperformed other methods in image classification, marking the deep learning revolution.
2014–Present – Rapid Growth and Applications
- Recurrent Neural Networks (RNNs), LSTMs, GANs, Transformers (e.g., BERT, GPT) became widely used in language, vision, and AI applications.

🚀 Reasons for the Advancement of Artificial Neural Networks

Availability of Large Datasets (Big Data)
- Modern ANN models require massive amounts of data to learn effectively, and the internet, sensors, and user data provide this in abundance.
Improved Computational Power (GPUs/TPUs)
- High-performance computing devices like Graphics Processing Units (GPUs) and Tensor Processing Units (TPUs) accelerate training of deep networks.
Advanced Algorithms and Techniques
- Innovations such as backpropagation, dropout, ReLU activation, batch normalization, and optimizers (e.g., Adam) made training deep networks feasible.
Better Software Frameworks
- Open-source libraries like TensorFlow, PyTorch, and Keras have made building and experimenting with neural networks easier than ever.
Cloud Computing and Distributed Training
- Neural networks can now be trained on powerful cloud infrastructures, allowing researchers and industries to scale models quickly.
Breakthrough Research in Deep Learning
- Research into CNNs, RNNs, LSTMs, GANs, and Transformers has opened the door to solving complex tasks like vision, speech, and language understanding.
Strong Industry and Academic Collaboration
- Tech giants (Google, Microsoft, OpenAI, Meta) and top universities have pushed the field forward through joint innovation.
Successful Real-World Applications

Impressive results in image recognition, machine translation, autonomous driving, and healthcare have demonstrated the practical value of ANNs.

Biological Neurons

Before we discuss artificial neurons, let’s take a quick look at a biological neuron (represented in Figure 10-1). It is an unusual-looking cell mostly found in animal brains. It’s composed of a cell body containing the nucleus and most of the cell’s complex components, many branching extensions called dendrites, plus one very long extension called the axon. The axon’s length may be just a few times longer than the cell body, or up to tens of thousands of times longer. Near its extremity the axon splits off into many branches called telodendria, and at the tip of these branches are minuscule structures called synaptic terminals (or simply synapses), which are connected to the dendrites or cell bodies of other neurons.

Biological neurons produce short electrical impulses called action potentials (APs, or just signals) which travel along the axons and make the synapses release chemical signals called neurotransmitters. When a neuron receives a sufficient amount of these neurotransmitters within a few milliseconds, it fires its own electrical impulses (actually, it depends on the neurotransmitters, as some of them inhibit the neuron from firing).

Thus, individual biological neurons seem to behave in a rather simple way, but they are organized in a vast network of billions, with each neuron typically connected to thousands of other neurons. Highly complex computations can be performed by a network of fairly simple neurons, much like a complex anthill can emerge from the combined efforts of simple ants. The architecture of biological neural networks (BNNs)5 is still the subject of active research, but some parts of the brain have been mapped, and it seems that neurons are often organized in consecutive layers, especially in the cerebral cortex (i.e., the outer layer of your brain), as shown in Figure 10-2.

Logical Computations with Neurons

McCulloch and Pitts proposed a very simple model of the biological neuron, which later became known as an artificial neuron: it has one or more binary (on/off) inputs and one binary output. The artificial neuron activates its output when more than a certain number of its inputs are active. In their paper, they showed that even with such a simplified model it is possible to build a network of artificial neurons that

computes any logical proposition you want. To see how such a network works, let’s build a few ANNs that perform various logical computations (see Figure 10-3), assuming that a neuron is activated when at least two of its inputs are active.

Let’s see what these networks do:

• The first network on the left is the identity function: if neuron A is activated, then neuron C gets activated as well (since it receives two input signals from neuron A); but if neuron A is off, then neuron C is off as well.

• The second network performs a logical AND: neuron C is activated only when both neurons A and B are activated (a single input signal is not enough to activate neuron C).

• The third network performs a logical OR: neuron C gets activated if either neuron A or neuron B is activated (or both).

• Finally, if we suppose that an input connection can inhibit the neuron’s activity (which is the case with biological neurons), then the fourth network computes a slightly more complex logical proposition: neuron C is activated only if neuron A is active and neuron B is off. If neuron A is active all the time, then you get a logical NOT: neuron C is active when neuron B is off, and vice versa. You can imagine how these networks can be combined to compute complex logical expressions

The Perceptron

The Perceptron is one of the simplest ANN architectures, invented in 1957 by Frank Rosenblatt. It is based on a slightly different artificial neuron (see Figure 10-4) called a threshold logic unit (TLU), or sometimes a linear threshold unit (LTU). The inputs and output are numbers (instead of binary on/off values), and each input connection is associated with a weight. The TLU computes a weighted sum of its inputs $(z = w_1 x_1+ w_2 x_2 +\cdots + w_n x_n = x^T w)$, then applies a step function to that sum and outputs the result: $h_w(x) = step(z)$, where $z = x^T w.$

The most common step function used in Perceptrons is the Heaviside step function (see Equation 10-1). Sometimes the sign function is used instead.

A single TLU can be used for simple linear binary classification. It computes a linear combination of the inputs, and if the result exceeds a threshold, it outputs the positive class. Otherwise it outputs the negative class (just like a Logistic Regression or linear SVM classifier).Training a TLU in this case means finding the right values for $w_0, w_1$, and $w_2$

A Perceptron is simply composed of a single layer of TLUs,7 with each TLU connected to all the inputs. When all the neurons in a layer are connected to every neuron in the previous layer (i.e., its input neurons), the layer is called a fully connected layer, or a dense layer. The inputs of the Perceptron are fed to special passthrough neurons called input neurons: they output whatever input they are fed. All the input neurons form the input layer. Moreover, an extra bias feature is generally added ($x_0 = 1$): it is

typically represented using a special type of neuron called a bias neuron, which outputs 1 all the time. A Perceptron with two inputs and three outputs is represented in Figure 10-5. This Perceptron can classify instances simultaneously into three different binary classes, which makes it a multioutput classifier.

Thanks to the magic of linear algebra, Equation 10-2 makes it possible to efficiently compute the outputs of a layer of artificial neurons for several instances at once.

In this equation:

• As always, $X$ represents the matrix of input features. It has one row per instance and one column per feature.

• The weight matrix $W$ contains all the connection weights except for the ones from the bias neuron. It has one row per input neuron and one column per artificial neuron in the layer.

• The bias vector $b$ contains all the connection weights between the bias neuron and the artificial neurons. It has one bias term per artificial neuron.

• The function $\phi$ is called the activation function: when the artificial neurons are TLUs, it is a step function (but we will discuss other activation functions shortly).

So, how is a Perceptron trained? The Perceptron training algorithm proposed by Rosenblatt was largely inspired by Hebb’s rule. When a biological neuron triggers another neuron often, the connection between these two neurons grows stronger, “Cells that fire together, wire together”; that is, the connection weight between two neurons tends to increase when they fire simultaneously. This rule later became known as Hebb’s rule (or Hebbian learning). Perceptrons are trained using a variant of this rule that takes into account the error made by the network when it makes a prediction; the Perceptron learning rule reinforces connections that help reduce the error. More specifically, the Perceptron is fed one training instance at a time, and for each instance it makes its predictions. For every output neuron that produced a wrong prediction, it reinforces the connection weights from the inputs that would have contributed to the correct prediction. The rule is shown in Equation 10-3.

The decision boundary of each output neuron is linear, so Perceptrons are incapable of learning complex patterns (just like Logistic Regression classifiers). However, if the training instances are linearly separable, Rosenblatt demonstrated that this algorithm would converge to a solution.This is called the Perceptron convergence theorem.

In their 1969 monograph Perceptrons, Marvin Minsky and Seymour Papert highlighted a number of serious weaknesses of Perceptrons—in particular, the fact that they are incapable of solving some trivial problems (e.g., the Exclusive OR (XOR) classificationproblem; see the left side of Figure 10-6). This is true of any other linear classification model (such as Logistic Regression classifiers), but researchers had expected much more from Perceptrons, and some were so disappointed that they dropped neural networks altogether in favor of higher-level problems such as logic, problem solving, and search.

It turns out that some of the limitations of Perceptrons can be eliminated by stacking multiple Perceptrons. The resulting ANN is called a Multilayer Perceptron (MLP). An MLP can solve the XOR problem, as you can verify by computing the output of the MLP represented on the right side of Figure 10-6: with inputs (0, 0) or (1, 1), the network outputs 0, and with inputs (0, 1) or (1, 0) it outputs 1. All connections have a weight equal to 1, except the four connections where the weight is shown. Try verifying that this network indeed solves the XOR problem!

Search This Blog

GNEST305 Introduction to Artificial Intelligence and Data Science KTU BTech S3 2024 Scheme