Your code has several major flaws, but first just what the heck is the dataset? What is the code all about?
I downloaded the text file, and opened it, and what the heck is it?

Okay, so the first column is named label, then the rest of the columns are named pixel-something. The rows seem to be all integers, and after some researching I determined the numbers to be all between 0 and 255.
The numbers are one byte, and the columns are named pixels? Oh, I got it, each row must represent a flattened grayscale image. But what exactly is the image?
The last column is pixel783, and the pixel column names start at pixel0, so there are 784 pixels per image. 784 pixels, the prime factorization of 784 is [2, 2, 2, 2, 7, 7], and, oh, the numbers can be rearranged to [28, 28], so each row starts with the label and then the bytes of a flattened 28x28 grayscale image!
But what are the images exactly?
I wrote the following code to find out:
import cv2
import numpy as np
from pathlib import Path
def to_image(data):
    return np.array(data).reshape((28, -1))
data = Path("C:/Users/Xeni/Desktop/data.csv").read_text().splitlines()[1:]
for i, row in enumerate(data):
    label, *row = list(map(int, row.split(',')))
    cv2.imwrite(f'D:/dataset_images/{label}-{i}.png', to_image(row))

These are 28x28 grayscale images of handwritten decimal digits that are already classified. There are 42K of them in total. So your neural network is used to classify handwritten digits! But you didn't mention the purpose of your code or the nature of the dataset anywhere in your question, you owe to clarify it.
Now onto the actual reviewing
Code execution at the top of script
data = pd.read_csv('data_set_file')
Don't do that, if you put code execution directly at the top of module, it can and will cause trouble when you try to import the module.
Instead use a main guard, so that the code will only be executed when the module is run directly, allowing the module to be imported.
Like this:
if __name__ == '__main__":
    data = pd.read_csv('data_set_file')
Naming of globals
You have a lot of global variables in lowercase, you need to convert them to UPPERCASE to denote that they are global variables.
Floating code repetition and unnecessary variable assignment
data_dev = data[0:1000].T
Y_dev = data_dev[0]
X_dev = data_dev[1:n]
X_dev = X_dev / 255.
data_train = data[1000:m].T
Y_train = data_train[0]
X_train = data_train[1:n]
X_train = X_train / 255.
Did you notice that in the above two blocks of code, the only actual difference is the index slicing? Since you already used functions, you need to put repeated code into functions, and this should be put into functions.
You can use destructuring assignment to assign a sequence to two variables.
If you do a, b = (0, 1), then a == 0 and b == 1. But this doesn't work with sequences with more than two elements.
So you need to do this: a, *b = (0, 1, 2, 3, 4), then a == 0 and b == (1, 2, 3, 4). The * operator is unpacking the rest of the list, the above line first assigns the first element to a and then assigns whatever remains to b.
X_dev = data_dev[1:n]
X_dev = X_dev / 255.
You have a lot of intermediate variables like the example above. there is no need to do two assignments and do the second assignment immediately after the first assignment without ever using the first variable. And absolutely no need to assign a name and the immediately return the result.
Also adding a trailing . to ints to make the result of division floats is completely unnecessary, / is float division operator and the result is automatically float. To get int division result you have to use the floor division operator //.
Just do this: X_dev = data_dev[1:n] / 255 and this return do_something().
M, N = DATA.shape
def get_xy(start, end):
    y, *x = DATA[start:end].T[:N]
    return y, np.array(x) / 255
Y_DEV, X_DEV = get_xy(0, 4200)
Y_TEST, X_TEST = get_xy(4200, 12600)
Y_TRAIN, X_TRAIN = get_xy(12600, M)
Repetition and way too many arguments
def init_parameters():
    W1 = np.random.normal(size=(10, 784)) * np.sqrt(1./(784))
    b1 = np.random.normal(size=(10, 1)) * np.sqrt(1./10)
    W2 = np.random.normal(size=(10, 10)) * np.sqrt(1./20)
    b2 = np.random.normal(size=(10, 1)) * np.sqrt(1./(784))
    return W1, b1, W2, b2
In the above function you made several mistakes, first there is absolutely no need to use numpy.sqrt to get the square root of some number. You can use the built-in ** exponentiation operator for that. Just do n ** 0.5.
Second you used (10, 1) twice and np.sqrt(1./(784)) twice, you need to create variables for them.
Third 784 is a magic number, it is the number of pixels in each image, which is equivalent to the number of columns minus one. What if the input was different? What if the input has different number of columns? Then your code stops working outright. You need to calculate that number based on the number of columns.
Fourth you used the same line multiple times, with exactly the same number of variables and positions of variables. This is a prime candidate for a for loop, you just need to extract the variables into a collection and loop through that collection and operate on them in the loop body.
And finally you are passing way too many variables around, with different groups of four variables that are always passed in the same order. You need to refactor your code into a class so that class knows the variables without passing the arguments. And you need to put them into a stack and get them from the stack to make things easier.
def sqrt_inv(n):
    return (1 / n) ** 0.5
L = N - 1
TEN_ONE, SQRT_INV_L = (10, 1), sqrt_inv(L)
PARAMETERS = (
    ((10, L), SQRT_INV_L),
    (TEN_ONE, sqrt_inv(10)),
    ((10, 10), sqrt_inv(20)),
    (TEN_ONE, SQRT_INV_L),
)
def init_parameters(self):
    self.weights = [
        np.random.normal(size=size) * scale for size, scale in PARAMETERS
    ]
I will explain the second function later.
Matrix multiplication
Don't use np.dot to get the dot product of two matrices. .dot() is six characters. You should use the matrix multiplication operator @ to do that. It is designed for exactly this purpose. And much more concise (saves five characters).
Repetition and passing way too many arguments that go around
def forward_prop(W1, b1, W2, b2, X):
    Z1 = W1.dot(X) + b1
    A1 = ReLU(Z1)
    Z2 = W2.dot(A1) + b2
    A2 = softmax(Z2)
    return Z1, A1, Z2, A2
def backward_prop(Z1, A1, Z2, A2, W1, W2, X, Y):
    one_hot_Y = one_hot(Y)
    dZ2 = A2 - one_hot_Y
    dW2 = 1 / m * dZ2.dot(A1.T)
    db2 = 1 / m * np.sum(dZ2, axis=1).reshape(-1, 1) 
    dZ1 = W2.T.dot(dZ2) * ReLU_deriv(Z1)
    dW1 = 1 / m * dZ1.dot(X.T)
    db1 = 1 / m * np.sum(dZ1, axis=1).reshape(-1, 1) 
    return dW1, db1, dW2, db2
def update_parameters(W1, b1, W2, b2, dW1, db1, dW2, db2, a):
    W1 = W1 - a * dW1 
    b1 = b1 - a * db1    
    W2 = W2 - a * dW2  
    b2 = b2 - a * db2    
    return W1, b1, W2, b2
def gradient_descent(X, Y, a, epochs):
    W1, b1, W2, b2 = init_parameters()
    for i in range(epochs):
        Z1, A1, Z2, A2 = forward_prop(W1, b1, W2, b2, X)
        dW1, db1, dW2, db2 = backward_prop(Z1, A1, Z2, A2, W1, W2, X, Y)
        W1, b1, W2, b2 = update_parameters(W1, b1, W2, b2, dW1, db1, dW2, db2, a)
        if i % 10 == 0:
            print("Iteration: ", i)
            predictions = get_predictions(A2)
            print(get_accuracy(predictions, Y))
    return W1, b1, W2, b2
In the above mess you are doing way too many mistakes. You are reusing the same lines of code that have to be put into functions without putting them into functions. Passing way too many variables around, some of them are even not changed. And using two-stage intermediate assignments again. And not assigning 1/m as a global variable. And using i % 10 == 0 instead of not i % 10...
To solve the many problems, first consider update_parameters function. You are taking two groups of numbers, and assign the value of the element from the first group minus some number times the respective element from the second group with the same index, back to the element from the first group.
This is a perfect candidate for for loops, more specifically for a, b in zip(A, B):. The second group needs to be multiplied before-hand.
I have refactored the function to this:
def update_parameters(self):
    self.weights = [a - b for a, b in zip(self.weights, self.deltas)]
If you put all those functions in the same class you don't have to pass all those arguments around at all.
You have three groups of four variables that are related, and that are always passed around together. If you put each four of the same group inside a list, you can make things much easier.
But then how do you access them? Use named attributes to mask list indexing.
Like this:
class Foo:
    r = range(10)
    
    @property
    def last(self):
        return self.r[-1]
Foo().last returns 9.
But here there are 12 properties, how do we add them while avoiding repetition?
Use setattr with functions already passed to property.
groups = (
    ("weights", ("W1", "b1", "W2", "b2")),
    ("corrections", ("Z1", "A1", "Z2", "A2")),
    ("deltas", ("dW1", "db1", "dW2", "db2")),
)
def add_property(group, name, i):
    setattr(Neural_Network, name, property(lambda self: getattr(self, group)[i]))
for group, names in groups:
    for i, name in enumerate(names):
        add_property(group, name, i)
I have refactored your code completely, and it is much better now.
(By the way the naming of your variables is really bad, it is even more confusing than my variable names. But I can't fix the variable names because I know how the code works but I have no idea WHY any of your neural network code works)
Refactored code:
import numpy as np
import pandas as pd
class Neural_Network:
    @staticmethod
    def ReLU(Z):
        return np.maximum(Z, 0)
    @staticmethod
    def softmax(Z):
        exp = np.exp(Z - np.max(Z))
        return exp / exp.sum(axis=0)
    @staticmethod
    def ReLU_deriv(Z):
        return Z > 0
    @staticmethod
    def one_hot(Y):
        one_hot_Y = np.zeros((Y.size, Y.max() + 1))
        one_hot_Y[np.arange(Y.size), Y] = 1
        return one_hot_Y.T
    @staticmethod
    def calc_diff(dz, t):
        return MINV * dz @ t, MINV * np.sum(dz, axis=1).reshape(-1, 1)
    def __init__(self, X, Y, a):
        self.X = X
        self.Y = Y
        self.a = a
        self.init_parameters()
    def init_parameters(self):
        self.weights = [
            np.random.normal(size=size) * scale for size, scale in PARAMETERS
        ]
    def forward_prop(self):
        Z1 = self.W1 @ self.X + self.b1
        A1 = Neural_Network.ReLU(Z1)
        Z2 = self.W2 @ A1 + self.b2
        A2 = Neural_Network.softmax(Z2)
        self.corrections = [Z1, A1, Z2, A2]
    def backward_prop(self):
        dZ2 = self.A2 - Neural_Network.one_hot(self.Y)
        dW2, db2 = Neural_Network.calc_diff(dZ2, self.A1.T)
        dZ1 = self.W2.T @ dZ2 * Neural_Network.ReLU_deriv(self.Z1)
        dW1, db1 = Neural_Network.calc_diff(dZ1, self.X.T)
        self.deltas = [self.a * i for i in (dW1, db1, dW2, db2)]
    def update_parameters(self):
        self.weights = [a - b for a, b in zip(self.weights, self.deltas)]
    def get_predictions(self):
        self.predictions = np.argmax(self.A2, 0)
    def get_accuracy(self, Y):
        self.accuracy = np.sum(self.predictions == Y) / Y.size
    def gradient_descent(self, epochs):
        for i in range(epochs):
            self.forward_prop()
            self.backward_prop()
            self.update_parameters()
            if not i % 10:
                self.get_predictions()
                self.get_accuracy(self.Y)
                self.print_status(i)
        return self.weights
    def print_status(self, i):
        print(
            f"Iteration: {i}, Prediction: {self.predictions}, Data: {self.Y}, Accuracy: {self.accuracy}"
        )
    def make_predictions(self, X):
        self.X = X
        self.forward_prop()
        self.get_predictions()
groups = (
    ("weights", ("W1", "b1", "W2", "b2")),
    ("corrections", ("Z1", "A1", "Z2", "A2")),
    ("deltas", ("dW1", "db1", "dW2", "db2")),
)
def add_property(group, name, i):
    setattr(Neural_Network, name, property(lambda self: getattr(self, group)[i]))
for group, names in groups:
    for i, name in enumerate(names):
        add_property(group, name, i)
if __name__ == "__main__":
    DATA = pd.read_csv("C:/Users/Xeni/Desktop/data.csv")
    DATA = np.array(DATA)
    M, N = DATA.shape
    np.random.shuffle(DATA)
    MINV = 1 / M
    def get_xy(start, end):
        y, *x = DATA[start:end].T[:N]
        return y, np.array(x) / 255
    Y_DEV, X_DEV = get_xy(0, 4200)
    Y_TEST, X_TEST = get_xy(4200, 12600)
    Y_TRAIN, X_TRAIN = get_xy(12600, M)
    def sqrt_inv(n):
        return (1 / n) ** 0.5
    L = N - 1
    TEN_ONE, SQRT_INV_L = (10, 1), sqrt_inv(L)
    PARAMETERS = (
        ((10, L), SQRT_INV_L),
        (TEN_ONE, sqrt_inv(10)),
        ((10, 10), sqrt_inv(20)),
        (TEN_ONE, SQRT_INV_L),
    )
    nn = Neural_Network(X_TRAIN, Y_TRAIN, 0.10)
    nn.gradient_descent(120)
    nn.make_predictions(X_DEV)
    nn.get_accuracy(Y_DEV)
    print(nn.accuracy)