Things to keep in mind while building ML models from scratch

Published in

Analytics Vidhya

5 min readJan 24, 2021

Many beginners of Machine Learning prefer starting with the scikit-learn library, no exception in my case either. But they tend to forget the importance of building models from scratch. Starting from scratch gives you the real answers. How to multiply the weights and features? What about the bias? In fact, what is bias? How to perform the sigmoid function? How to calculate the Loss? How to find the optimized values of weights? If you’ve had these questions then you’re on the right track, bud! And it’s time to start from scratch!

So, one of the most prominent tools you’ll have with you while doing this is the NumPy library. It is an extremely powerful library to calculate computationally heavy calculations in no time. Think of it as the ‘math’ library but for huge matrices, that is much more powerful and vast.

Here, I won’t be discussing the basics of NumPy, but problem-solving techniques that proved to be useful for me while building models from scratch using NumPy.

The Rank-1 array

You’ll often come across something that looks like ‘ (5,) ’ while determining the shape of an array. This is called ‘Rank- 1 array’. This neither behaves like a row vector nor a column vector. So, it’s always a good habit to use the assert function. This function is used to make sure if the condition that is stated inside is true. If it’s not, it will raise an AssertionError, that we don’t want. Consider the following example to understand better:

import numpy as np
a = np.array([12,3,4,55,6])

At first glance, the shape of vector a looks like (5,1) or maybe (1,5) ( God I hope this doesn’t turn out to be another Yammy/Laurel dilemma, or worse, the blue/black dress! ).

’cause what’s ambiguity in programming? right?

So, we use the assert function to make sure that it is what it looks like:

assert(a.shape==(5,1))

The above code throws an AssertionError. This is because the shape of vector a is not (5,1).

print(a.shape)(5,)

Well, the solution to this is:

a = a.reshape((5,1))
print(a.shape)
assert(a.shape==(5,1))(5, 1)

The output is displayed without any errors being raised. Always feel free to use the reshape method, even if it might seem redundant, as this can do wonders!

Also, the vector a ends up looking like this if you are curious.

print(a)[[12]
 [ 3]
 [ 4]
 [55]
 [ 6]]

So, I guess it’s ‘Laurel’. Well, don’t look at it too much, it might become ‘Yammy’!

Broadcasting

There exists a very simple concept called broadcasting. In simple words, it tells how arrays of different shapes would be handled by NumPy. So, if you’ve faced an error that says, ‘ operands could not be broadcast together with shapes (4,2) (2,2) ’, guess you’re lucky to be here. This error occurs when I try to do the following:

mat1 = np.array([[1,2],
        [9,8],
        [3,4],
        [5,6]])
mat2 = np.array([[3,5],
                [10,2]])
ans = mat1 + mat2

Had mat1 and mat2 not been NumPy arrays and been just nested lists, mat2 would have been concatenated to mat1 and that would have resulted in something like this:

[[1, 2], [9, 8], [3, 4], [5, 6], [3, 5], [10, 2]]

But, since it’s NumPy, the focus is on calculation. Hence, two matrices of different orders, mat1 of order (4,2) and mat2 of order (2,2) cannot be added. Except, when I do this:

mat1 = np.array([[1,2],
        [9,8],
        [3,4],
        [5,6]])
mat2 = np.array([[3,5]])
ans = mat1 + mat2
print(ans)[[ 4  7]
 [12 13]
 [ 6  9]
 [ 8 11]]

Notice the order of mat2 now, it’s (1,2). We get the result in ans, without any error because of broadcasting. In simple terms, mat2 repeated itself until its order became the same as mat1 and added itself to mat1, giving the above result. So, the condition for broadcasting in matrices is that one of the terms must be a vector that is a 1D array (No, not the band One Direction) or an integer. Following is the operation with integer:

mat1 = np.array([[1,2],
        [9,8],
        [3,4],
        [5,6]])
b = 12
ans = mat1 + b
print(ans)[[13 14]
 [21 20]
 [15 16]
 [17 18]]

This concept would prove to be instrumental while adding bias to a matrix that’s a dot product of a matrix of features and matrix of weights.

Dot Product

If you’re from a non-mathematics background, dot product might confuse you a bit. To keep it simple, this article will only cover its coding part. The dot product is the multiplication of matrices that can happen only when the number of columns of the first matrix is equal to the number of rows of the second matrix. This is necessary as the multiplication occurs between the rows of the first matrix and the corresponding columns of the second matrix. Thus, the number of rows in the result matrix is the same as that of the first matrix, and the number of columns of the result matrix is equal to that of the second matrix. For example:

first = np.array([[2,2],[5,6],[4,3],[3,1]])
second = np.array([[3,7,1],[5,5,2]])
print(first.shape, second.shape)(4, 2) (2, 3)
    ------

Notice, what’s common between them.

Guess the order of the result matrix. Correct! It’s (4,3).

import numpy
result = np.dot(first,second)
print(result)[[16 24  6]
 [45 65 17]
 [27 43 10]
 [14 26  5]]

We’ve used np.dot() as it is matrix multiplication. Using ‘ * ’ to multiply would result in broadcast error that we discussed earlier.

While solving a problem, you might be given a matrix of input features X of order, say, (5,3), and a matrix of weights w of order (1,3). At once, it might seem like their dot product isn’t possible but what if we use the transpose of w. That would change the shape of w to (3,1).

Let me state an example:

X = np.array([[1,2,3], 
              [6,5,4], 
              [7,8,9],
              [12,11,10],
              [13,14,15]])w = np.array([[2,6,4]])
w_tp = w.T
output = np.dot(X,w_tp)
print(output)[[ 26]
 [ 58]
 [ 98]
 [130]
 [170]]

w_tp stores the transpose of w, that is:

print(w_tp)[[2]
 [6]
 [4]]

You can also use the np.transpose(w) to find the transpose of w.

Last but definitely not least, making things from scratch will take time. But, it’s all worth it after you see your model giving 60% of accuracy ( *secretly cries in pain* ).

Thank you for reading! I hope you found it useful :)

Things to keep in mind while building ML models from scratch

The Rank-1 array

Broadcasting

Dot Product

Written by Namrata Tanwani