Python #Deeplearning

This is the beginning of my neural networks learning. I have read the books written by Michael Nielsen for a long time and I think now it’s time to complete the learning examples in first chapter of his book.

Introduction of some functions

I first want to show the use of some functions in his program example. These uses really surprise me a lot.

numpy.random

The first function is the random function from the package numpy. In initialize the matrix, the random function is used.

random.randn

Return a sample (or samples) from the “standard normal” distribution.
If positive, int_like or int-convertible arguments are provided, randn generates an array of shape (d0, d1, …, dn), filled with random floats sampled from a univariate “normal” (Gaussian) distribution of mean 0 and variance 1 (if any of the d_i are floats, they are first converted to integers by truncation). A single float randomly sampled from the distribution is returned if no argument is provided

1
2
import numpy as np
print(np.random.randn(1))
[0.34894831]
1
print(np.random.randn(3))
[0.34444932 0.12172097 1.14900238]
1
print(np.random.randn(2,3))
[[ 0.49635216  0.22762119 -0.68270641]
 [-2.13526944 -0.82040908 -0.79356388]]

Random.shuffle

Modify a sequence in-place by shuffling its contents.

This function only shuffles the array along the first axis of a multi-dimensional array. The order of sub-arrays is changed but their contents remains the same.

1
A=np.random.randn(4,4)
1
print(A)
[[ 0.89532715 -2.34406351 -0.47233016 -0.1943856 ]
 [ 0.57509425 -0.84810353  1.11576561  1.33146725]
 [ 0.81883264  2.25208295 -1.52527099 -1.30444846]
 [ 1.94464225  0.29825984 -0.16625868 -0.35876162]]
1
2
np.random.shuffle(A)
print(A)
[[ 1.94464225  0.29825984 -0.16625868 -0.35876162]
 [ 0.57509425 -0.84810353  1.11576561  1.33146725]
 [ 0.81883264  2.25208295 -1.52527099 -1.30444846]
 [ 0.89532715 -2.34406351 -0.47233016 -0.1943856 ]]

zip()

Python’s zip() function creates an iterator that will aggregate elements from two or more iterables. You can use the resulting iterator to quickly and consistently solve common programming problems, like creating dictionaries. In this tutorial, you’ll discover the logic behind the Python zip() function and how you can use it to solve real-world problems.

1
2
3
4
5
A=['1','2','3']
B=['A','B','C']
C=[1,2,3]
ABC=zip(A,B,C)
print(ABC)
<zip object at 0x000001D6AB168A08>
1
type(ABC)
zip
1
list(ABC)
[('1', 'A', 1), ('2', 'B', 2), ('3', 'C', 3)]

Matrix use

1
2
3
A=np.array([1,2,3,4,5,6,7])
for l in A[1:]:
print(l)
2
3
4
5
6
7
1
2
for l in A[:-1]:
print(l)
1
2
3
4
5
6
1
2
sizes=[2,3,4]
W=[np.random.randn(y, x) for x, y in zip(sizes[:-1], sizes[1:])]
1
print(W)
[array([[-0.53071848, -0.26905161],
       [-0.75696575, -0.57292324],
       [-1.47093334,  0.060232  ]]), array([[ 1.03193319,  0.58177683,  0.78046451],
       [ 0.14132843, -0.90416154, -0.12645047],
       [ 1.90204955, -0.55866015,  0.39481778],
       [-0.11897701,  1.1277029 ,  0.7584795 ]])]

My understanding on the neural networks

How we map the input to the output

Previously, I have learned the use of some special functions, now it’s time to give a summary of my understanding. I will not list everything since Michael Nieslen gives wonderful descriptions.

In my view, our problem is that we have a input which is usually a one dimensional array, the output is still an array. What we need do is to map the input to the output correctly.

In real life we will can describe and measure the world in different way. The color, the sound, the taste etc. However,

Anything is a number.

The properties in real world can all be mapped into a number space. And what happened in real world can be described by the numbers and number operations. For example, we use coordinate (x,y,z) to describe the position of the some object.

We create the neural networks and it has many layers. From the mathematical view, the input data will be processed through different layers with a matrix and Sigmoid transformation.

Where is the value in layer and is the weight matrix between layer i and layer j. is a vector.

So the value vectors in different layers will be linked with the the transformation. So this is how we obtain the output from the input.

In summary, we have a neural networks means we have a series of weight matrix and bias vectors. Different weights, bias, number of layers will give different neural networks. Our neural networks are actually a series of matrix, vectors.

How to measure the quality of the mapping?

To measure the quality of the mapping, we should compare the output data from our neural networks and the actual data. To measure the quality we can for example define the following cost function

If the difference between the output and the actual value is smaller, it means our neural networks works better.

How to train our neural networks?

A very important step of deep learning is to train our neural networks. Training means we change our weights and bias vector to let our output closer to the actual result.

To realize this, we need give some modifications after each learning. In machine learning, the down hill method is used to optimize our parameters. What we will do is just the same. The difference is that the effect of weights and bias on the output is more complex. We need choose a road in an abstract space to let the cost function decrease just like we go down in an abstract space.

So the partial derivative will be calculated. Some tricks will be used to let the change of the weight function and bias will always let the cost function decrease.

But what’s really exciting about the equation is that it lets us see how to choose so as to make negative. In particular, suppose we choose

where is a small, positive parameter (known as the learning rate). So

and the vector should be updated like this

However, for a neural networks, further derivation must be done to calculate the derivative of the weight and bias in each layers. Nielsen gives detailed explanation and proof.Back Propagation Method

Here is a simple summary,

The backpropagation equations provide us with a way of computing the gradient of the cost function. Let’s explicitly write this out in the form of an algorithm:

  • Input x: Set the corresponding activation a1 for the input layer.
  • Feedforward: For each l=2,3,…,L compute and .
  • Output error : Compute the vector .
  • Backpropagate the error: For each l=L−1,L−2,…,2 compute .
  • Output: The gradient of the cost function is given by $\frac{\partial C}{\partial b{j}^{l}}=\delta{j}^{l}\frac{\partial C}{\partial w{jk}^{l}}=a{k}^{l-1}\delta_{j}^{l}$.

Explanation of the program

Now I will focus on the program and give my own understanding of the function. I will follow the list of the program. The first is import necessary packages.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
"""
network.py
~~~~~~~~~~

A module to implement the stochastic gradient descent learning
algorithm for a feedforward neural network. Gradients are calculated
using backpropagation. Note that I have focused on making the code
simple, easily readable, and easily modifiable. It is not optimized,
and omits many desirable features.
"""

#### Libraries
# Standard library
import random

# Third-party libraries
import numpy as np

Then the Sigmoid function and derivation of Sigmoid function will be defined.

1
2
3
4
5
6
7
8
#### Miscellaneous functions
def sigmoid(z):
"""The sigmoid function."""
return 1.0/(1.0+np.exp(-z))

def sigmoid_prime(z):
"""Derivative of the sigmoid function."""
return sigmoid(z)*(1-sigmoid(z))

Then a class named network will be defined. And in this class,the _init_ function is as follows

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
def __init__(self, sizes):
"""The list ``sizes`` contains the number of neurons in the
respective layers of the network. For example, if the list
was [2, 3, 1] then it would be a three-layer network, with the
first layer containing 2 neurons, the second layer 3 neurons,
and the third layer 1 neuron. The biases and weights for the
network are initialized randomly, using a Gaussian
distribution with mean 0, and variance 1. Note that the first
layer is assumed to be an input layer, and by convention we
won't set any biases for those neurons, since biases are only
ever used in computing the outputs from later layers."""
self.num_layers = len(sizes)
self.sizes = sizes
self.biases = [np.random.randn(y, 1) for y in sizes[1:]]
self.weights = [np.random.randn(y, x)
for x, y in zip(sizes[:-1], sizes[1:])]

Be careful the use of size[:-1],size[1:] which means the matrix removed the last element and last element respectively. The use of zip is also new to me and self.weights is constructed by a series of matrix with different dimensions.

The feedforward function that which update from layer to layer

1
2
3
4
5
def feedforward(self, a):
"""Return the output of the network if ``a`` is input."""
for b, w in zip(self.biases, self.weights):
a = sigmoid(np.dot(w, a)+b)
return a

Here is the SGD main function

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
def SGD(self, training_data, epochs, mini_batch_size, eta,
test_data=None):
"""Train the neural network using mini-batch stochastic
gradient descent. The ``training_data`` is a list of tuples
``(x, y)`` representing the training inputs and the desired
outputs. The other non-optional parameters are
self-explanatory. If ``test_data`` is provided then the
network will be evaluated against the test data after each
epoch, and partial progress printed out. This is useful for
tracking progress, but slows things down substantially."""
if test_data: n_test = len(test_data)
n = len(training_data)
for j in xrange(epochs):
random.shuffle(training_data)
mini_batches = [
training_data[k:k+mini_batch_size]
for k in xrange(0, n, mini_batch_size)]
for mini_batch in mini_batches:
self.update_mini_batch(mini_batch, eta)
if test_data:
print "Epoch {0}: {1} / {2}".format(
j, self.evaluate(test_data), n_test)
else:
print "Epoch {0} complete".format(j)

The main training function. A test data will be used if needed. We give the training data which will be loaded use well defined function

1
2
3
>>> import mnist_loader
>>> training_data, validation_data, test_data = \
... mnist_loader.load_data_wrapper()

This function will divide the step of training and show the progress and quality of the neural networks if we give the test data. The function update_mini_batch will update the weights and bias for a given training data.

update_mini_batch function is defined as follows

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
def update_mini_batch(self, mini_batch, eta):
"""Update the network's weights and biases by applying
gradient descent using backpropagation to a single mini batch.
The ``mini_batch`` is a list of tuples ``(x, y)``, and ``eta``
is the learning rate."""
nabla_b = [np.zeros(b.shape) for b in self.biases]
nabla_w = [np.zeros(w.shape) for w in self.weights]
for x, y in mini_batch:
delta_nabla_b, delta_nabla_w = self.backprop(x, y)
nabla_b = [nb+dnb for nb, dnb in zip(nabla_b, delta_nabla_b)]
nabla_w = [nw+dnw for nw, dnw in zip(nabla_w, delta_nabla_w)]
self.weights = [w-(eta/len(mini_batch))*nw
for w, nw in zip(self.weights, nabla_w)]
self.biases = [b-(eta/len(mini_batch))*nb
for b, nb in zip(self.biases, nabla_b)]

The partial derivative for weights and biases will be defined first. Then the partial derivative will be calculated use the function backprop. And the new weights will bias will be updated. The most important part is the function backprop

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
def backprop(self, x, y):
"""Return a tuple ``(nabla_b, nabla_w)`` representing the
gradient for the cost function C_x. ``nabla_b`` and
``nabla_w`` are layer-by-layer lists of numpy arrays, similar
to ``self.biases`` and ``self.weights``."""
nabla_b = [np.zeros(b.shape) for b in self.biases]
nabla_w = [np.zeros(w.shape) for w in self.weights]
# feedforward
activation = x
activations = [x] # list to store all the activations, layer by layer
zs = [] # list to store all the z vectors, layer by layer
for b, w in zip(self.biases, self.weights):
z = np.dot(w, activation)+b
zs.append(z)
activation = sigmoid(z)
activations.append(activation)
# backward pass
delta = self.cost_derivative(activations[-1], y) * \
sigmoid_prime(zs[-1])
nabla_b[-1] = delta
nabla_w[-1] = np.dot(delta, activations[-2].transpose())
# Note that the variable l in the loop below is used a little
# differently to the notation in Chapter 2 of the book. Here,
# l = 1 means the last layer of neurons, l = 2 is the
# second-last layer, and so on. It's a renumbering of the
# scheme in the book, used here to take advantage of the fact
# that Python can use negative indices in lists.
for l in xrange(2, self.num_layers):
z = zs[-l]
sp = sigmoid_prime(z)
delta = np.dot(self.weights[-l+1].transpose(), delta) * sp
nabla_b[-l] = delta
nabla_w[-l] = np.dot(delta, activations[-l-1].transpose())
return (nabla_b, nabla_w)

This function is a realization of previous explanation of the back propagation methods. And finally, the two other functions

1
2
3
4
5
6
7
8
9
10
11
12
13
def evaluate(self, test_data):
"""Return the number of test inputs for which the neural
network outputs the correct result. Note that the neural
network's output is assumed to be the index of whichever
neuron in the final layer has the highest activation."""
test_results = [(np.argmax(self.feedforward(x)), y)
for (x, y) in test_data]
return sum(int(x == y) for (x, y) in test_results)

def cost_derivative(self, output_activations, y):
"""Return the vector of partial derivatives \partial C_x /
\partial a for the output activations."""
return (output_activations-y)

which is very easy to understand.

How to use?

This is really a good example of neural networks deep learning. To use this, you can direct download Michael Nielsen’s example. However, he writes this use python2, to use python3, you can use the another example by MichalDanielDobrzanski

after downloading the repository, the file network.py is just like we shown above. The following is the use of the program

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
C:\Users\xiail\Documents\Dropbox\Code\Python\Study\Neural-Networks\Study-1\DeepL
on35 (master -> origin)
λ python
Python 3.7.2 (tags/v3.7.2:9a3ffc0492, Dec 23 2018, 23:09:28) [MSC v.1916 64 bit
win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import mnist_loader
>>> training_data, validation_data, test_data = mnist_loader.load_data_wrapper()
>>> import network
>>> net = network.Network([784, 30, 10])
>>> net.SGD(training_data, 30, 10, 3.0, test_data=test_data)
Epoch 0 : 8254 / 10000
Epoch 1 : 8367 / 10000
Epoch 2 : 8449 / 10000
Epoch 3 : 8483 / 10000
Epoch 4 : 8517 / 10000
Epoch 5 : 8533 / 10000
Epoch 6 : 8538 / 10000
Epoch 7 : 8541 / 10000
Epoch 8 : 9448 / 10000
Epoch 9 : 9450 / 10000
Epoch 10 : 9446 / 10000
Epoch 11 : 9475 / 10000
Epoch 12 : 9456 / 10000
Epoch 13 : 9473 / 10000
Epoch 14 : 9447 / 10000
Epoch 15 : 9483 / 10000
Epoch 16 : 9501 / 10000
Epoch 17 : 9501 / 10000
Epoch 18 : 9502 / 10000
Epoch 19 : 9501 / 10000
Epoch 20 : 9485 / 10000
Epoch 21 : 9491 / 10000
Epoch 22 : 9519 / 10000
Epoch 23 : 9499 / 10000
Epoch 24 : 9530 / 10000
Epoch 25 : 9504 / 10000
Epoch 26 : 9502 / 10000
Epoch 27 : 9521 / 10000
Epoch 28 : 9506 / 10000
Epoch 29 : 9498 / 10000
>>>