Catalog
  1. 1. Introduction of some functions
    1. 1.1. numpy.random
      1. 1.1.1. random.randn
      2. 1.1.2. Random.shuffle
    2. 1.2. zip()
    3. 1.3. Matrix use
  2. 2. My understanding on the neural networks
    1. 2.1. How we map the input to the output
    2. 2.2. How to measure the quality of the mapping?
    3. 2.3. How to train our neural networks?
  3. 3. Explanation of the program
    1. 3.1. How to use?
Neural Networks Learning I -Recognize Handwritten Digits

This is the beginning of my neural networks learning. I have read the books written by Michael Nielsen for a long time and I think now it’s time to complete the learning examples in first chapter of his book.

Introduction of some functions

I first want to show the use of some functions in his program example. These uses really surprise me a lot.

numpy.random

The first function is the random function from the package numpy. In initialize the matrix, the random function is used.

random.randn

Return a sample (or samples) from the “standard normal” distribution.
If positive, int_like or int-convertible arguments are provided, randn generates an array of shape (d0, d1, …, dn), filled with random floats sampled from a univariate “normal” (Gaussian) distribution of mean 0 and variance 1 (if any of the d_i are floats, they are first converted to integers by truncation). A single float randomly sampled from the distribution is returned if no argument is provided

1
import numpy as np
2
print(np.random.randn(1))
[0.34894831]
1
print(np.random.randn(3))
[0.34444932 0.12172097 1.14900238]
1
print(np.random.randn(2,3))
[[ 0.49635216  0.22762119 -0.68270641]
 [-2.13526944 -0.82040908 -0.79356388]]

Random.shuffle

Modify a sequence in-place by shuffling its contents.

This function only shuffles the array along the first axis of a multi-dimensional array. The order of sub-arrays is changed but their contents remains the same.

1
A=np.random.randn(4,4)
1
print(A)
[[ 0.89532715 -2.34406351 -0.47233016 -0.1943856 ]
 [ 0.57509425 -0.84810353  1.11576561  1.33146725]
 [ 0.81883264  2.25208295 -1.52527099 -1.30444846]
 [ 1.94464225  0.29825984 -0.16625868 -0.35876162]]
1
np.random.shuffle(A)
2
print(A)
[[ 1.94464225  0.29825984 -0.16625868 -0.35876162]
 [ 0.57509425 -0.84810353  1.11576561  1.33146725]
 [ 0.81883264  2.25208295 -1.52527099 -1.30444846]
 [ 0.89532715 -2.34406351 -0.47233016 -0.1943856 ]]

zip()

Python’s zip() function creates an iterator that will aggregate elements from two or more iterables. You can use the resulting iterator to quickly and consistently solve common programming problems, like creating dictionaries. In this tutorial, you’ll discover the logic behind the Python zip() function and how you can use it to solve real-world problems.

1
A=['1','2','3']
2
B=['A','B','C']
3
C=[1,2,3]
4
ABC=zip(A,B,C)
5
print(ABC)
<zip object at 0x000001D6AB168A08>
1
type(ABC)
zip
1
list(ABC)
[('1', 'A', 1), ('2', 'B', 2), ('3', 'C', 3)]

Matrix use

1
A=np.array([1,2,3,4,5,6,7])
2
for l in A[1:]:
3
    print(l)
2
3
4
5
6
7
1
for l in A[:-1]:
2
    print(l)
1
2
3
4
5
6
1
sizes=[2,3,4]
2
W=[np.random.randn(y, x) for x, y in zip(sizes[:-1], sizes[1:])]
1
print(W)
[array([[-0.53071848, -0.26905161],
       [-0.75696575, -0.57292324],
       [-1.47093334,  0.060232  ]]), array([[ 1.03193319,  0.58177683,  0.78046451],
       [ 0.14132843, -0.90416154, -0.12645047],
       [ 1.90204955, -0.55866015,  0.39481778],
       [-0.11897701,  1.1277029 ,  0.7584795 ]])]

My understanding on the neural networks

How we map the input to the output

Previously, I have learned the use of some special functions, now it’s time to give a summary of my understanding. I will not list everything since Michael Nieslen gives wonderful descriptions.

In my view, our problem is that we have a input which is usually a one dimensional array, the output is still an array. What we need do is to map the input to the output correctly.

In real life we will can describe and measure the world in different way. The color, the sound, the taste etc. However,

Anything is a number.

The properties in real world can all be mapped into a number space. And what happened in real world can be described by the numbers and number operations. For example, we use coordinate (x,y,z) to describe the position of the some object.

We create the neural networks and it has many layers. From the mathematical view, the input data $V_{in}$ will be processed through different layers with a matrix and Sigmoid transformation.

Where $V^{i+1}$ is the value in layer $i+1$ and $\boldsymbol{W}^{i}$ is the weight matrix between layer i and layer j. $\boldsymbol{b}$ is a vector.

So the value vectors in different layers will be linked with the the transformation. So this is how we obtain the output from the input.

In summary, we have a neural networks means we have a series of weight matrix and bias vectors. Different weights, bias, number of layers will give different neural networks. Our neural networks are actually a series of matrix, vectors.

How to measure the quality of the mapping?

To measure the quality of the mapping, we should compare the output data from our neural networks and the actual data. To measure the quality we can for example define the following cost function

If the difference between the output and the actual value is smaller, it means our neural networks works better.

How to train our neural networks?

A very important step of deep learning is to train our neural networks. Training means we change our weights and bias vector to let our output closer to the actual result.

To realize this, we need give some modifications after each learning. In machine learning, the down hill method is used to optimize our parameters. What we will do is just the same. The difference is that the effect of weights and bias on the output is more complex. We need choose a road in an abstract space to let the cost function decrease just like we go down in an abstract space.

So the partial derivative will be calculated. Some tricks will be used to let the change of the weight function and bias will always let the cost function decrease.

But what’s really exciting about the equation is that it lets us see how to choose $\Delta \nu$ so as to make $\Delta C$ negative. In particular, suppose we choose

where $\eta$ is a small, positive parameter (known as the learning rate). So $\Delta C\approx -\eta |\nabla C|^2<0$

and the vector should be updated like this

However, for a neural networks, further derivation must be done to calculate the derivative of the weight and bias in each layers. Nielsen gives detailed explanation and proof.Back Propagation Method

Here is a simple summary,

The backpropagation equations provide us with a way of computing the gradient of the cost function. Let’s explicitly write this out in the form of an algorithm:

  • Input x: Set the corresponding activation a1 for the input layer.
  • Feedforward: For each l=2,3,…,L compute $z^{l}=w^{l}a^{l-1}+b^{l}$ and $a^{l}=\sigma_{z^{l}}$.
  • Output error $\delta^{L}$: Compute the vector $\delta^{L}=\nabla_{a}C\cdot \sigma^{\prime}(z^{L})$.
  • Backpropagate the error: For each l=L−1,L−2,…,2 compute $\delta^{l}=((w^{l+1})^{T}\delta^{l+1}))\cdot \sigma^{\prime}(z^{l})$.
  • Output: The gradient of the cost function is given by $\frac{\partial C}{\partial b_{j}^{l}}=\delta_{j}^{l}$ and $\frac{\partial C}{\partial w_{jk}^{l}}=a_{k}^{l-1}\delta_{j}^{l}$.

Explanation of the program

Now I will focus on the program and give my own understanding of the function. I will follow the list of the program. The first is import necessary packages.

1
"""
2
network.py
3
~~~~~~~~~~
4
5
A module to implement the stochastic gradient descent learning
6
algorithm for a feedforward neural network.  Gradients are calculated
7
using backpropagation.  Note that I have focused on making the code
8
simple, easily readable, and easily modifiable.  It is not optimized,
9
and omits many desirable features.
10
"""
11
12
#### Libraries
13
# Standard library
14
import random
15
16
# Third-party libraries
17
import numpy as np

Then the Sigmoid function and derivation of Sigmoid function will be defined.

1
#### Miscellaneous functions
2
def sigmoid(z):
3
    """The sigmoid function."""
4
    return 1.0/(1.0+np.exp(-z))
5
6
def sigmoid_prime(z):
7
    """Derivative of the sigmoid function."""
8
    return sigmoid(z)*(1-sigmoid(z))

Then a class named network will be defined. And in this class,the _init_ function is as follows

1
def __init__(self, sizes):
2
        """The list ``sizes`` contains the number of neurons in the
3
        respective layers of the network.  For example, if the list
4
        was [2, 3, 1] then it would be a three-layer network, with the
5
        first layer containing 2 neurons, the second layer 3 neurons,
6
        and the third layer 1 neuron.  The biases and weights for the
7
        network are initialized randomly, using a Gaussian
8
        distribution with mean 0, and variance 1.  Note that the first
9
        layer is assumed to be an input layer, and by convention we
10
        won't set any biases for those neurons, since biases are only
11
        ever used in computing the outputs from later layers."""
12
        self.num_layers = len(sizes)
13
        self.sizes = sizes
14
        self.biases = [np.random.randn(y, 1) for y in sizes[1:]]
15
        self.weights = [np.random.randn(y, x)
16
                        for x, y in zip(sizes[:-1], sizes[1:])]

Be careful the use of size[:-1],size[1:] which means the matrix removed the last element and last element respectively. The use of zip is also new to me and self.weights is constructed by a series of matrix with different dimensions.

The feedforward function that which update $a$ from $l$ layer to $l+1$ layer

1
def feedforward(self, a):
2
    """Return the output of the network if ``a`` is input."""
3
    for b, w in zip(self.biases, self.weights):
4
        a = sigmoid(np.dot(w, a)+b)
5
    return a

Here is the SGD main function

1
def SGD(self, training_data, epochs, mini_batch_size, eta,
2
            test_data=None):
3
        """Train the neural network using mini-batch stochastic
4
        gradient descent.  The ``training_data`` is a list of tuples
5
        ``(x, y)`` representing the training inputs and the desired
6
        outputs.  The other non-optional parameters are
7
        self-explanatory.  If ``test_data`` is provided then the
8
        network will be evaluated against the test data after each
9
        epoch, and partial progress printed out.  This is useful for
10
        tracking progress, but slows things down substantially."""
11
        if test_data: n_test = len(test_data)
12
        n = len(training_data)
13
        for j in xrange(epochs):
14
            random.shuffle(training_data)
15
            mini_batches = [
16
                training_data[k:k+mini_batch_size]
17
                for k in xrange(0, n, mini_batch_size)]
18
            for mini_batch in mini_batches:
19
                self.update_mini_batch(mini_batch, eta)
20
            if test_data:
21
                print "Epoch {0}: {1} / {2}".format(
22
                    j, self.evaluate(test_data), n_test)
23
            else:
24
                print "Epoch {0} complete".format(j)

The main training function. A test data will be used if needed. We give the training data which will be loaded use well defined function

1
>>> import mnist_loader
2
>>> training_data, validation_data, test_data = \
3
... mnist_loader.load_data_wrapper()

This function will divide the step of training and show the progress and quality of the neural networks if we give the test data. The function update_mini_batch will update the weights and bias for a given training data.

update_mini_batch function is defined as follows

1
def update_mini_batch(self, mini_batch, eta):
2
        """Update the network's weights and biases by applying
3
        gradient descent using backpropagation to a single mini batch.
4
        The ``mini_batch`` is a list of tuples ``(x, y)``, and ``eta``
5
        is the learning rate."""
6
        nabla_b = [np.zeros(b.shape) for b in self.biases]
7
        nabla_w = [np.zeros(w.shape) for w in self.weights]
8
        for x, y in mini_batch:
9
            delta_nabla_b, delta_nabla_w = self.backprop(x, y)
10
            nabla_b = [nb+dnb for nb, dnb in zip(nabla_b, delta_nabla_b)]
11
            nabla_w = [nw+dnw for nw, dnw in zip(nabla_w, delta_nabla_w)]
12
        self.weights = [w-(eta/len(mini_batch))*nw
13
                        for w, nw in zip(self.weights, nabla_w)]
14
        self.biases = [b-(eta/len(mini_batch))*nb
15
                       for b, nb in zip(self.biases, nabla_b)]

The partial derivative for weights and biases will be defined first. Then the partial derivative will be calculated use the function backprop. And the new weights will bias will be updated. The most important part is the function backprop

1
def backprop(self, x, y):
2
        """Return a tuple ``(nabla_b, nabla_w)`` representing the
3
        gradient for the cost function C_x.  ``nabla_b`` and
4
        ``nabla_w`` are layer-by-layer lists of numpy arrays, similar
5
        to ``self.biases`` and ``self.weights``."""
6
        nabla_b = [np.zeros(b.shape) for b in self.biases]
7
        nabla_w = [np.zeros(w.shape) for w in self.weights]
8
        # feedforward
9
        activation = x
10
        activations = [x] # list to store all the activations, layer by layer
11
        zs = [] # list to store all the z vectors, layer by layer
12
        for b, w in zip(self.biases, self.weights):
13
            z = np.dot(w, activation)+b
14
            zs.append(z)
15
            activation = sigmoid(z)
16
            activations.append(activation)
17
        # backward pass
18
        delta = self.cost_derivative(activations[-1], y) * \
19
            sigmoid_prime(zs[-1])
20
        nabla_b[-1] = delta
21
        nabla_w[-1] = np.dot(delta, activations[-2].transpose())
22
        # Note that the variable l in the loop below is used a little
23
        # differently to the notation in Chapter 2 of the book.  Here,
24
        # l = 1 means the last layer of neurons, l = 2 is the
25
        # second-last layer, and so on.  It's a renumbering of the
26
        # scheme in the book, used here to take advantage of the fact
27
        # that Python can use negative indices in lists.
28
        for l in xrange(2, self.num_layers):
29
            z = zs[-l]
30
            sp = sigmoid_prime(z)
31
            delta = np.dot(self.weights[-l+1].transpose(), delta) * sp
32
            nabla_b[-l] = delta
33
            nabla_w[-l] = np.dot(delta, activations[-l-1].transpose())
34
        return (nabla_b, nabla_w)

This function is a realization of previous explanation of the back propagation methods. And finally, the two other functions

1
def evaluate(self, test_data):
2
    """Return the number of test inputs for which the neural
3
    network outputs the correct result. Note that the neural
4
    network's output is assumed to be the index of whichever
5
    neuron in the final layer has the highest activation."""
6
    test_results = [(np.argmax(self.feedforward(x)), y)
7
                    for (x, y) in test_data]
8
    return sum(int(x == y) for (x, y) in test_results)
9
10
def cost_derivative(self, output_activations, y):
11
    """Return the vector of partial derivatives \partial C_x /
12
    \partial a for the output activations."""
13
    return (output_activations-y)

which is very easy to understand.

How to use?

This is really a good example of neural networks deep learning. To use this, you can direct download Michael Nielsen’s example. However, he writes this use python2, to use python3, you can use the another example by MichalDanielDobrzanski

after downloading the repository, the file network.py is just like we shown above. The following is the use of the program

1
C:\Users\xiail\Documents\Dropbox\Code\Python\Study\Neural-Networks\Study-1\DeepL
2
on35 (master -> origin)                                                         
3
λ python                                                                        
4
Python 3.7.2 (tags/v3.7.2:9a3ffc0492, Dec 23 2018, 23:09:28) [MSC v.1916 64 bit 
5
 win32                                                                          
6
Type "help", "copyright", "credits" or "license" for more information.          
7
>>> import mnist_loader                                                         
8
>>> training_data, validation_data, test_data = mnist_loader.load_data_wrapper()
9
>>> import network                                                              
10
>>> net = network.Network([784, 30, 10])                                        
11
>>> net.SGD(training_data, 30, 10, 3.0, test_data=test_data)                    
12
Epoch 0 : 8254 / 10000                                                          
13
Epoch 1 : 8367 / 10000                                                          
14
Epoch 2 : 8449 / 10000                                                          
15
Epoch 3 : 8483 / 10000                                                          
16
Epoch 4 : 8517 / 10000                                                          
17
Epoch 5 : 8533 / 10000                                                          
18
Epoch 6 : 8538 / 10000                                                          
19
Epoch 7 : 8541 / 10000                                                          
20
Epoch 8 : 9448 / 10000                                                          
21
Epoch 9 : 9450 / 10000                                                          
22
Epoch 10 : 9446 / 10000                                                         
23
Epoch 11 : 9475 / 10000                                                         
24
Epoch 12 : 9456 / 10000                                                         
25
Epoch 13 : 9473 / 10000                                                         
26
Epoch 14 : 9447 / 10000                                                         
27
Epoch 15 : 9483 / 10000                                                         
28
Epoch 16 : 9501 / 10000                                                         
29
Epoch 17 : 9501 / 10000                                                         
30
Epoch 18 : 9502 / 10000                                                         
31
Epoch 19 : 9501 / 10000                                                         
32
Epoch 20 : 9485 / 10000                                                         
33
Epoch 21 : 9491 / 10000                                                         
34
Epoch 22 : 9519 / 10000                                                         
35
Epoch 23 : 9499 / 10000                                                         
36
Epoch 24 : 9530 / 10000                                                         
37
Epoch 25 : 9504 / 10000                                                         
38
Epoch 26 : 9502 / 10000                                                         
39
Epoch 27 : 9521 / 10000                                                         
40
Epoch 28 : 9506 / 10000                                                         
41
Epoch 29 : 9498 / 10000                                                         
42
>>>
Author: Knifelee
Link: https://knifelees3.github.io/2020/03/11/A_En_Python-Deep-Learinging-I-Recognize-Handwritten-Digits/
Copyright Notice: All articles in this blog are licensed under CC BY-NC-SA 4.0 unless stating additionally.
Donate
  • 微信
  • 支付寶

Comment