I am by no means an expert in machine learning or artificial intelligence, but just an avid learner and curious individual. Please use the information provided here as a supplement to further your knowledge and research.
I am currently building an artificial neural network from scratch (more specifically a multi-layer perceptron neural network. To help clear my head and make sure I understand everything correctly (or still remember from my machine learning subjects at university) I will be writing several blog posts about machine learning and artificial neural networks. To start the list of blog posts off I am going to write about one of the fundamental building blocks of the neural network, the neuron.
I will be skipping the background about the neuron and how it relates to the biological neuron. Instead I will cover how it is constructed and used in an artificial neural network.
The neuron will process inputs from the previous artificial neural network layer. The previous layer could be:
- The input layer of the artificial neural network. In this case then the values are taken directly from the training data, test data, or user input when using the trained artificial neural network.
- A hidden layer of the artificial neural network. In this case the values would be outputs from other neurons.
Note: There is a special input which is called the “bias” which always has a value of +1. The role and purpose of the bias is so that there is a trainable constant and allows the activation function (will talk about that a little later) to give us the best possible chance of an optimal model. You can read some more about the bias purpose and use here and here.
For every input to the neuron there is an associated weight. Initially the weights are generally given a random value between the range 0.1 to 0.9. As the artificial neural network is trained the weights are adjusted as to better match the desired output.
The first operation to obtain the neuron output is to perform a summation of the multiplication of the associated weight and input. Mathematically this is represented as follows:
- y is the output.
- n is the number of inputs to the neuron.
- i is the index of the input to the neuron.
- w is a weight.
- x is an input.
You may be asking, Chris why do we have the zero indexed weight and input outside the actual summation when you just add it anyway? Good question reader. Generally the zero indexed input is the bias which then makes the associate weight the zero indexed weight.
After the summation operation is performed there needs to be an activation function used. The use of the activation function allows for the enhancement or simplification of the neural network. Generally the activation function is non-linear and there are a variety of functions that can be used. Two of the most common activation functions are either sigmoid or rectifier functions. More information about the sigmoid function can be found here. More information about the rectifier function can be found here. The final value of the activation is the output of the neuron.
The next blog post would most likely be about the various layers of the neural network and how they are all interconnected. So stay tuned.