3
\$\begingroup\$

I am learning about ANN and tried it for the MINST data sets. Now I am supposted to create a neural network (ANN) with backpropagation.

The structure for the neural network I have is this the input layer consists of 784 neurons followed by one hidden layer with 10 neurons and the output layer is 10.

Data file

Code:

import numpy as np 
import pandas as pd 
#read the data from file
data = pd.read_csv('data_set_file') #use the provided file or use cloud based ones 

data = np.array(data)
m, n = data.shape
np.random.shuffle(data) 

data_dev = data[0:1000].T
Y_dev = data_dev[0]
X_dev = data_dev[1:n]
X_dev = X_dev / 255.

data_train = data[1000:m].T
Y_train = data_train[0]
X_train = data_train[1:n]
X_train = X_train / 255.

def init_parameters():
    W1 = np.random.normal(size=(10, 784)) * np.sqrt(1./(784))
    b1 = np.random.normal(size=(10, 1)) * np.sqrt(1./10)
    W2 = np.random.normal(size=(10, 10)) * np.sqrt(1./20)
    b2 = np.random.normal(size=(10, 1)) * np.sqrt(1./(784))
    return W1, b1, W2, b2

def ReLU(Z):
    return np.maximum(Z, 0)

def softmax(Z):
    exp = np.exp(Z - np.max(Z)) 
    return exp / exp.sum(axis=0)
    
def forward_prop(W1, b1, W2, b2, X):
    Z1 = W1.dot(X) + b1
    A1 = ReLU(Z1)
    Z2 = W2.dot(A1) + b2
    A2 = softmax(Z2)
    return Z1, A1, Z2, A2

def ReLU_deriv(Z):
    return Z > 0

def one_hot(Y):
    one_hot_Y = np.zeros((Y.size, Y.max() + 1))
    one_hot_Y[np.arange(Y.size), Y] = 1
    one_hot_Y = one_hot_Y.T
    return one_hot_Y

def backward_prop(Z1, A1, Z2, A2, W1, W2, X, Y):
    one_hot_Y = one_hot(Y)
    dZ2 = A2 - one_hot_Y
    dW2 = 1 / m * dZ2.dot(A1.T)
    db2 = 1 / m * np.sum(dZ2, axis=1).reshape(-1, 1) 
    dZ1 = W2.T.dot(dZ2) * ReLU_deriv(Z1)
    dW1 = 1 / m * dZ1.dot(X.T)
    db1 = 1 / m * np.sum(dZ1, axis=1).reshape(-1, 1) 
    return dW1, db1, dW2, db2

def update_parameters(W1, b1, W2, b2, dW1, db1, dW2, db2, a):
    W1 = W1 - a * dW1 
    b1 = b1 - a * db1    
    W2 = W2 - a * dW2  
    b2 = b2 - a * db2    
    return W1, b1, W2, b2

def get_predictions(A2):
    return np.argmax(A2, 0)

def get_accuracy(predictions, Y):
    print(predictions, Y)
    return np.sum(predictions == Y) / Y.size

def gradient_descent(X, Y, a, epochs):
    W1, b1, W2, b2 = init_parameters()
    for i in range(epochs):
        Z1, A1, Z2, A2 = forward_prop(W1, b1, W2, b2, X)
        dW1, db1, dW2, db2 = backward_prop(Z1, A1, Z2, A2, W1, W2, X, Y)
        W1, b1, W2, b2 = update_parameters(W1, b1, W2, b2, dW1, db1, dW2, db2, a)
        if i % 10 == 0:
            print("Iteration: ", i)
            predictions = get_predictions(A2)
            print(get_accuracy(predictions, Y))
    return W1, b1, W2, b2

W1, b1, W2, b2 = gradient_descent(X_train, Y_train, 0.10, 120)

def make_predictions(X, W1, b1, W2, b2):
    _, _, _, A2 = forward_prop(W1, b1, W2, b2, X)
    predictions = get_predictions(A2)
    return predictions

dev_predictions = make_predictions(X_dev, W1, b1, W2, b2)
get_accuracy(dev_predictions, Y_dev)

Results:

Training_data accuracy

Training_data_result

Dev_data

enter image description here


If there are improvements to the code feel free to share or point it out.

Things that I am confused about:

I am supposed to split the data into 70% training data, 10% dev data and 20% test data.

(While the data is not currently split in the right percentage and the test data is missing because I am not sure get of how I should use it)

From what I have gathered the training data is used to train the network. What am I supposed to do with the other 2 I used the dev data to see how the model generalized from the training data. Should I be doing something else with the test and dev data?

I am asking this because I have this question I am trying to answer:

percentage of correctness of the total test data set (20% of all cases)

percentage of correctness of each of the classes in the test data set.

Edit:

This is how I would split the data:

#validation set (10%) 
data_dev = data[0:4200].T
Y_dev = data_dev[0]
X_dev = data_dev[1:n]
X_dev = X_dev / 255.

#yesting set(20%).
data_test = data[4200:12600].T
Y_test = data_test[0]
X_test = data_test[1:n]
X_test = X_test / 255.

#training set (70%),
data_train = data[12600:m].T
Y_train = data_train[0]
X_train = data_train[1:n]
X_train = X_train / 255.
\$\endgroup\$
1
  • \$\begingroup\$ I have looked and done some more work on this and I have learned that testing data shouldn't be labeled does that mean the edit I have provided for how I would split the data is wrong ? \$\endgroup\$ Commented May 22, 2023 at 22:44

1 Answer 1

2
+50
\$\begingroup\$

Your code has several major flaws, but first just what the heck is the dataset? What is the code all about?

I downloaded the text file, and opened it, and what the heck is it?

enter image description here

Okay, so the first column is named label, then the rest of the columns are named pixel-something. The rows seem to be all integers, and after some researching I determined the numbers to be all between 0 and 255.

The numbers are one byte, and the columns are named pixels? Oh, I got it, each row must represent a flattened grayscale image. But what exactly is the image?

The last column is pixel783, and the pixel column names start at pixel0, so there are 784 pixels per image. 784 pixels, the prime factorization of 784 is [2, 2, 2, 2, 7, 7], and, oh, the numbers can be rearranged to [28, 28], so each row starts with the label and then the bytes of a flattened 28x28 grayscale image!

But what are the images exactly?

I wrote the following code to find out:

import cv2
import numpy as np
from pathlib import Path

def to_image(data):
    return np.array(data).reshape((28, -1))

data = Path("C:/Users/Xeni/Desktop/data.csv").read_text().splitlines()[1:]
for i, row in enumerate(data):
    label, *row = list(map(int, row.split(',')))
    cv2.imwrite(f'D:/dataset_images/{label}-{i}.png', to_image(row))

enter image description here

These are 28x28 grayscale images of handwritten decimal digits that are already classified. There are 42K of them in total. So your neural network is used to classify handwritten digits! But you didn't mention the purpose of your code or the nature of the dataset anywhere in your question, you owe to clarify it.


Now onto the actual reviewing

Code execution at the top of script

data = pd.read_csv('data_set_file')

Don't do that, if you put code execution directly at the top of module, it can and will cause trouble when you try to import the module.

Instead use a main guard, so that the code will only be executed when the module is run directly, allowing the module to be imported.

Like this:

if __name__ == '__main__":
    data = pd.read_csv('data_set_file')

Naming of globals

You have a lot of global variables in lowercase, you need to convert them to UPPERCASE to denote that they are global variables.

Floating code repetition and unnecessary variable assignment

data_dev = data[0:1000].T
Y_dev = data_dev[0]
X_dev = data_dev[1:n]
X_dev = X_dev / 255.

data_train = data[1000:m].T
Y_train = data_train[0]
X_train = data_train[1:n]
X_train = X_train / 255.

Did you notice that in the above two blocks of code, the only actual difference is the index slicing? Since you already used functions, you need to put repeated code into functions, and this should be put into functions.

You can use destructuring assignment to assign a sequence to two variables. If you do a, b = (0, 1), then a == 0 and b == 1. But this doesn't work with sequences with more than two elements.

So you need to do this: a, *b = (0, 1, 2, 3, 4), then a == 0 and b == (1, 2, 3, 4). The * operator is unpacking the rest of the list, the above line first assigns the first element to a and then assigns whatever remains to b.

X_dev = data_dev[1:n]
X_dev = X_dev / 255.

You have a lot of intermediate variables like the example above. there is no need to do two assignments and do the second assignment immediately after the first assignment without ever using the first variable. And absolutely no need to assign a name and the immediately return the result.

Also adding a trailing . to ints to make the result of division floats is completely unnecessary, / is float division operator and the result is automatically float. To get int division result you have to use the floor division operator //.

Just do this: X_dev = data_dev[1:n] / 255 and this return do_something().

M, N = DATA.shape

def get_xy(start, end):
    y, *x = DATA[start:end].T[:N]
    return y, np.array(x) / 255

Y_DEV, X_DEV = get_xy(0, 4200)
Y_TEST, X_TEST = get_xy(4200, 12600)
Y_TRAIN, X_TRAIN = get_xy(12600, M)

Repetition and way too many arguments

def init_parameters():
    W1 = np.random.normal(size=(10, 784)) * np.sqrt(1./(784))
    b1 = np.random.normal(size=(10, 1)) * np.sqrt(1./10)
    W2 = np.random.normal(size=(10, 10)) * np.sqrt(1./20)
    b2 = np.random.normal(size=(10, 1)) * np.sqrt(1./(784))
    return W1, b1, W2, b2

In the above function you made several mistakes, first there is absolutely no need to use numpy.sqrt to get the square root of some number. You can use the built-in ** exponentiation operator for that. Just do n ** 0.5. Second you used (10, 1) twice and np.sqrt(1./(784)) twice, you need to create variables for them.

Third 784 is a magic number, it is the number of pixels in each image, which is equivalent to the number of columns minus one. What if the input was different? What if the input has different number of columns? Then your code stops working outright. You need to calculate that number based on the number of columns.

Fourth you used the same line multiple times, with exactly the same number of variables and positions of variables. This is a prime candidate for a for loop, you just need to extract the variables into a collection and loop through that collection and operate on them in the loop body.

And finally you are passing way too many variables around, with different groups of four variables that are always passed in the same order. You need to refactor your code into a class so that class knows the variables without passing the arguments. And you need to put them into a stack and get them from the stack to make things easier.

def sqrt_inv(n):
    return (1 / n) ** 0.5

L = N - 1
TEN_ONE, SQRT_INV_L = (10, 1), sqrt_inv(L)
PARAMETERS = (
    ((10, L), SQRT_INV_L),
    (TEN_ONE, sqrt_inv(10)),
    ((10, 10), sqrt_inv(20)),
    (TEN_ONE, SQRT_INV_L),
)

def init_parameters(self):
    self.weights = [
        np.random.normal(size=size) * scale for size, scale in PARAMETERS
    ]

I will explain the second function later.

Matrix multiplication

Don't use np.dot to get the dot product of two matrices. .dot() is six characters. You should use the matrix multiplication operator @ to do that. It is designed for exactly this purpose. And much more concise (saves five characters).

Repetition and passing way too many arguments that go around

def forward_prop(W1, b1, W2, b2, X):
    Z1 = W1.dot(X) + b1
    A1 = ReLU(Z1)
    Z2 = W2.dot(A1) + b2
    A2 = softmax(Z2)
    return Z1, A1, Z2, A2

def backward_prop(Z1, A1, Z2, A2, W1, W2, X, Y):
    one_hot_Y = one_hot(Y)
    dZ2 = A2 - one_hot_Y
    dW2 = 1 / m * dZ2.dot(A1.T)
    db2 = 1 / m * np.sum(dZ2, axis=1).reshape(-1, 1) 
    dZ1 = W2.T.dot(dZ2) * ReLU_deriv(Z1)
    dW1 = 1 / m * dZ1.dot(X.T)
    db1 = 1 / m * np.sum(dZ1, axis=1).reshape(-1, 1) 
    return dW1, db1, dW2, db2

def update_parameters(W1, b1, W2, b2, dW1, db1, dW2, db2, a):
    W1 = W1 - a * dW1 
    b1 = b1 - a * db1    
    W2 = W2 - a * dW2  
    b2 = b2 - a * db2    
    return W1, b1, W2, b2

def gradient_descent(X, Y, a, epochs):
    W1, b1, W2, b2 = init_parameters()
    for i in range(epochs):
        Z1, A1, Z2, A2 = forward_prop(W1, b1, W2, b2, X)
        dW1, db1, dW2, db2 = backward_prop(Z1, A1, Z2, A2, W1, W2, X, Y)
        W1, b1, W2, b2 = update_parameters(W1, b1, W2, b2, dW1, db1, dW2, db2, a)
        if i % 10 == 0:
            print("Iteration: ", i)
            predictions = get_predictions(A2)
            print(get_accuracy(predictions, Y))
    return W1, b1, W2, b2

In the above mess you are doing way too many mistakes. You are reusing the same lines of code that have to be put into functions without putting them into functions. Passing way too many variables around, some of them are even not changed. And using two-stage intermediate assignments again. And not assigning 1/m as a global variable. And using i % 10 == 0 instead of not i % 10...

To solve the many problems, first consider update_parameters function. You are taking two groups of numbers, and assign the value of the element from the first group minus some number times the respective element from the second group with the same index, back to the element from the first group.

This is a perfect candidate for for loops, more specifically for a, b in zip(A, B):. The second group needs to be multiplied before-hand.

I have refactored the function to this:

def update_parameters(self):
    self.weights = [a - b for a, b in zip(self.weights, self.deltas)]

If you put all those functions in the same class you don't have to pass all those arguments around at all.

You have three groups of four variables that are related, and that are always passed around together. If you put each four of the same group inside a list, you can make things much easier.

But then how do you access them? Use named attributes to mask list indexing.

Like this:

class Foo:
    r = range(10)
    
    @property
    def last(self):
        return self.r[-1]

Foo().last returns 9.

But here there are 12 properties, how do we add them while avoiding repetition?

Use setattr with functions already passed to property.

groups = (
    ("weights", ("W1", "b1", "W2", "b2")),
    ("corrections", ("Z1", "A1", "Z2", "A2")),
    ("deltas", ("dW1", "db1", "dW2", "db2")),
)


def add_property(group, name, i):
    setattr(Neural_Network, name, property(lambda self: getattr(self, group)[i]))


for group, names in groups:
    for i, name in enumerate(names):
        add_property(group, name, i)

I have refactored your code completely, and it is much better now.

(By the way the naming of your variables is really bad, it is even more confusing than my variable names. But I can't fix the variable names because I know how the code works but I have no idea WHY any of your neural network code works)


Refactored code:

import numpy as np
import pandas as pd


class Neural_Network:
    @staticmethod
    def ReLU(Z):
        return np.maximum(Z, 0)

    @staticmethod
    def softmax(Z):
        exp = np.exp(Z - np.max(Z))
        return exp / exp.sum(axis=0)

    @staticmethod
    def ReLU_deriv(Z):
        return Z > 0

    @staticmethod
    def one_hot(Y):
        one_hot_Y = np.zeros((Y.size, Y.max() + 1))
        one_hot_Y[np.arange(Y.size), Y] = 1
        return one_hot_Y.T

    @staticmethod
    def calc_diff(dz, t):
        return MINV * dz @ t, MINV * np.sum(dz, axis=1).reshape(-1, 1)

    def __init__(self, X, Y, a):
        self.X = X
        self.Y = Y
        self.a = a
        self.init_parameters()

    def init_parameters(self):
        self.weights = [
            np.random.normal(size=size) * scale for size, scale in PARAMETERS
        ]

    def forward_prop(self):
        Z1 = self.W1 @ self.X + self.b1
        A1 = Neural_Network.ReLU(Z1)
        Z2 = self.W2 @ A1 + self.b2
        A2 = Neural_Network.softmax(Z2)
        self.corrections = [Z1, A1, Z2, A2]

    def backward_prop(self):
        dZ2 = self.A2 - Neural_Network.one_hot(self.Y)
        dW2, db2 = Neural_Network.calc_diff(dZ2, self.A1.T)
        dZ1 = self.W2.T @ dZ2 * Neural_Network.ReLU_deriv(self.Z1)
        dW1, db1 = Neural_Network.calc_diff(dZ1, self.X.T)
        self.deltas = [self.a * i for i in (dW1, db1, dW2, db2)]

    def update_parameters(self):
        self.weights = [a - b for a, b in zip(self.weights, self.deltas)]

    def get_predictions(self):
        self.predictions = np.argmax(self.A2, 0)

    def get_accuracy(self, Y):
        self.accuracy = np.sum(self.predictions == Y) / Y.size

    def gradient_descent(self, epochs):
        for i in range(epochs):
            self.forward_prop()
            self.backward_prop()
            self.update_parameters()
            if not i % 10:
                self.get_predictions()
                self.get_accuracy(self.Y)
                self.print_status(i)
        return self.weights

    def print_status(self, i):
        print(
            f"Iteration: {i}, Prediction: {self.predictions}, Data: {self.Y}, Accuracy: {self.accuracy}"
        )

    def make_predictions(self, X):
        self.X = X
        self.forward_prop()
        self.get_predictions()


groups = (
    ("weights", ("W1", "b1", "W2", "b2")),
    ("corrections", ("Z1", "A1", "Z2", "A2")),
    ("deltas", ("dW1", "db1", "dW2", "db2")),
)


def add_property(group, name, i):
    setattr(Neural_Network, name, property(lambda self: getattr(self, group)[i]))


for group, names in groups:
    for i, name in enumerate(names):
        add_property(group, name, i)

if __name__ == "__main__":
    DATA = pd.read_csv("C:/Users/Xeni/Desktop/data.csv")
    DATA = np.array(DATA)
    M, N = DATA.shape
    np.random.shuffle(DATA)
    MINV = 1 / M

    def get_xy(start, end):
        y, *x = DATA[start:end].T[:N]
        return y, np.array(x) / 255

    Y_DEV, X_DEV = get_xy(0, 4200)
    Y_TEST, X_TEST = get_xy(4200, 12600)
    Y_TRAIN, X_TRAIN = get_xy(12600, M)

    def sqrt_inv(n):
        return (1 / n) ** 0.5

    L = N - 1
    TEN_ONE, SQRT_INV_L = (10, 1), sqrt_inv(L)
    PARAMETERS = (
        ((10, L), SQRT_INV_L),
        (TEN_ONE, sqrt_inv(10)),
        ((10, 10), sqrt_inv(20)),
        (TEN_ONE, SQRT_INV_L),
    )
    nn = Neural_Network(X_TRAIN, Y_TRAIN, 0.10)
    nn.gradient_descent(120)
    nn.make_predictions(X_DEV)
    nn.get_accuracy(Y_DEV)
    print(nn.accuracy)
\$\endgroup\$

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.