A computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance T, as measured by P, improves with Experience E.
The best example would be our Gmail account.
Google uses Machine Learning to automatically filter all those spam from the regular emails.
In this case, the Task would be to classify email.
The experience E is the training data and the performance measure P needs to be defined.

Motivation For Neural Nets
Use Biology as an inspiration for the mathematic model.
Get signals from previous neurons.
Generate signals according to inputs.
Pass signals on to next neuron
By layering, many neurons can create a complex model.

The neuron outputs the transformed data.
Every input carries some weight with it.
But what exactly is the weight?
Weight determines the strength of the neuron.
It is used to connect neurons in one layer with the neuron of another layer.
Now suppose the weight is zero that means the influence of that input is zero.
Let’s take for example
But do data come with weight or we have to define it on our own?
You want to create a model to predict the weight of the coin.
There will be three inputs to our machine model.

Model to predict the weight according to the input.

What if I happen to train a weight model which is dependent on two input or more such inputs?
your model might just as well as be:

Suppose the weight model is something like this

the output is some function f(z) which we call activation function and where z=x1w1+x2w2+x3w3+b.

Z is the net input.
b is the bias term.
a=f(z) which is the output to the next layer

There are various activation function.
The main purpose of the activation is to introduce non-linearity into the output of a neuron
Some of them are Linear, Logistic(Sigmoid), Hyperbolic.
We will encounter a case where one activation function perform better than the other.
Example- Suppose to classify cats and dogs you select a particular activation function.
With that, you achieve 52% percent of accuracy.
But using a different activation function, the accuracy of 53% of accuracy. Then definitely we will go for the second activation function.

How is the activation function decided and whats the maths behind it?
An activation function decides whether a given neuron should be activated or not.
In my previous, I have discussed three layers of Neural Network namely.
Input Layer which accepts input features. It is at this place information from the user or dataset is provided to the Neural Network.

Hidden Layer- As the name suggests, this visible layer is not visible and it is the place where computation is performed and the result is transferred to the next layer.
Each neuron in the hidden layer learns a different set of weights to represent different function over the input data.

Output Layer- As the name suggests. We receive output at this layer.

How to determine number of input and hidden layer in a Neural Network?

Input layer- This parameter is uniquely determined and once you know the shape of our training data.
To be more specific, the number of neurons comprising this layer is equal to the number of features (Columns) in our data.
Multiple Inputs represent a feature of the input dataset.

Some Neural Network additionally adds one for a bias term.
We will discuss the term bias later.

Output Layer- Determining its size (number of neurons) is simple. It is completely determined by the chose model configuration.

Now suppose initially we don’t use any activation function.
We would receive f(z).
The range of f(z) can be anywhere between -infinity to +infinity.
Hence we cannot decide whether the neuron should fire or not.
Upon the addition of the activation function, the range of f(z) can be confined to a much smaller range.
We can further decide whether the neuron should be fired or not.

How neuron gets activated?
If the value of f(z) is above the threshold value Output is 1. Then neurons will get activated, otherwise, the output is 0.

Why do we need multiple neurons?
If we only use a single neuron and no hidden layer, the network will always be bound to learn linear decision boundaries.
To learn non-linear decision boundaries when classifying the output, multiple neurons are required.
If every neuron is provided with the same weight, it would be not useful to us.

Hidden layers are able to reduce the dimensionality of the data by learning from different weight

We need multiple neurons to classify a given dataset.

Suppose we are working on a binary classifier.
Something which classifies yes for 1 and no for 0.
Now the activation function upon receiving the z will activate neurons.
What if more than 1 neuron is “activated”.
To make things much more complicated we chose multiple neurons in our input layer and