Introduction to Linear Algerbra for Machine Learning – Part 1

Its been a while I posted, a lot has changed and a lot has happened but its time to get back on tracks. I’ll be posting more, working more and updating more. Bad habits die hard and good habits take time to cultivate, for me its time to put in the effort and cultivate the good ones.

Anyways moving along lets discuss the most important thing in the world of Machine Learning – Mathematics! Particularly, we will be discussing Linear Algebra in this and next few posts!

LINEAR ALGEBRA

As continuous form of mathematics, linear algebra, is used in many different applications throughout engineering and other scientific operations. But this blog isn’t about everything related to engineering and science and we will thus be focusing only on Linear Algebra essential for Machine Learning and Deep Learning.

Everything starts from a small unit and so does linear algebra. Let’s discuss the simplest and units of linear algebra with some example uses.

SCALARS, VECTORS, MATRICES & TENSORS

Let’s go over then one by one!

  • Scalars
    • A scalar is just a number and is usually denoted by lowercase italic alphabets.
    • Example: x pounds of wheat costs $y. Here x ∈ ℝ+∪{0} and y ∈ ℝ+∪{0}.
    • This is because weight >= 0 and Dollars >= 0 and ℝ+ represents set of all positive real numbers.
  • Vectors
    • An array of numbers stored in some order, where each number can be accessed using its index in the vector. Usually in books you will find lowercase italic alphabets with bold faces representing a vector.
    • If a vector x contains age of n students – x1, x2, …, xn. Since age belongs to set ℕ, then we can say that x ∈ ℕn.
vector_x
  • Matrices
    • A 2 dimensional array of numbers arranged in a certain order is called a matrix.
    • Each element can be accessed using two indices, first representing row number and second representing column number.
    • If all elements of a matrix A with n rows and m columns, xi,j ∈ ℝ, then we say that A ∈ ℝm x n.
  • Tensors
    • An array of numbers arranged on a grid, with grid having different number of axes.
    • Denoted by bold uppercase alphabets
    • A perfect example of it will be an RGB image. Each channel R, G and B having the same structure – a matrix, combined together to form a 3D grid.

In Python

import numpy as np

# scalar
x = 10
print('Scalar', x)
# vector
x = np.random.randn(10, 1)
print('Vector\n', x)
# matrix
x = np.random.randn(5, 3)
print('Matrix\n', x)
# tensor
x = np.random.randn(3, 3, 2)
print('Tensor\n', x)

Output:

Scalar 10
Vector
 [[ 0.26404859]
 [ 0.27203541]
 [ 0.88623008]
 [-0.23407678]
 [ 0.24343701]
 [ 0.23422709]
 [-1.62698432]
 [ 0.66561386]
 [-0.92319638]
 [-1.00850356]]
Matrix
 [[ 0.83486184  0.30539286 -0.57430288]
 [ 0.39422336  1.51274928 -0.37897842]
 [ 1.56143987 -0.10864624  0.03555972]
 [-1.31741402 -1.314135    0.03261628]
 [ 0.52382211  0.04659095 -0.05408585]]
Tensor
 [[[ 0.05698093 -0.29766196]
  [ 0.02711075  0.19166628]
  [-0.9793209  -1.18313195]]

 [[-0.83811632 -0.42394832]
  [ 2.21761874  0.77650627]
  [-0.64105576  0.32665985]]

 [[-0.19459993 -0.56587196]
  [-0.87172322 -0.06711544]
  [-0.16313432 -0.51133303]]]

SPECIAL OPERATIONS IN LINEAR ALGEBRA

There are many useful operations in algebra, which will come very handy in the field of Machine Learning.

TRANSPOSE

Transpose of a matrix is a mirror image of the matrix across its main diagonal. Main diagonal is the line running down from top left to the right of the matrix.

transpose_of_matrix
Src: Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning (p. 31). MIT Press.

Transpose operation is denoted by symbol T in the superscript of the matrix it is being applied on.

(AT)i,j  = Aj,i.

Vector is a matrix with just 1 column. So its transpose will be a row vector:

transpose_of_vector

A scalar’s transpose is that scalar itself: aT = a

In Python

v = np.random.randn(3, 1)
print('Vector:\n', v)
print('Vector Transpose:\n', v.T)

mat = np.random.randn(3, 2)
print('\nMatrix:\n', mat)
print('Matrix Transpose:\n', mat.T)

tensor = np.random.randn(3, 2, 1)
print('\nTensor:\n', tensor)
print('Transposing 1st and 2nd axes of tensor:\n', np.transpose(tensor, (0, 2, 1)))
Vector:
 [[-0.4321856 ]
 [ 0.17133487]
 [-0.44079784]]
Vector Transpose:
 [[-0.4321856   0.17133487 -0.44079784]]

Matrix:
 [[ 0.77862041 -0.3330565 ]
 [-0.78407302 -0.91280909]
 [ 0.31670827  0.37695897]]
Matrix Transpose:
 [[ 0.77862041 -0.78407302  0.31670827]
 [-0.3330565  -0.91280909  0.37695897]]

Tensor:
 [[[ 0.04130934]
  [-0.91920283]]

 [[ 2.03426545]
  [-1.15492907]]

 [[-0.37229797]
  [-0.7149692 ]]]
Transposing 1st and 2nd axes of tensor:
 [[[ 0.04130934 -0.91920283]]

 [[ 2.03426545 -1.15492907]]

 [[-0.37229797 -0.7149692 ]]]

ARITHMETIC OPERATIONS

Two vectors, matrices and tensors can be added and subtracted if they have equal axes and equal number of elements along respective axes.

So, it doesn’t make sense if we try to add:

or if we try to add:

but it makes sense if we try to add:

matrix_addition

Although conventional linear algebra doesn’t allow it, but we can also add or subtract a vector to a matrix or a matrix to a tensor by using something known as broadcasting.

So, if we add:

broadcasting_1

we are basically trying to add vector to both columns of the first matrix, i.e.,

broadcasting_2

We are broadcasting the vector to match the dimensions of the matrix. This could also have worked if the vector was a row vector:

broadcasting_3

What linear algebra allows is the addition, subtraction, multiplication and division of a scalar to every element of a vector, matrix or tensor:

In Python

mat1 = np.random.randn(3, 3)
mat2 = np.ones((3, 3))
vec1 = np.random.randn(3, 1)
scalar = 3

print('\nMatrix 1:\n', mat1)
print('\nMatrix 2:\n', mat2)
print('\nVector:\n', vec1)
print('\nScalar:\n', scalar)

print('\nMat1 + Mat2 =\n', mat1+mat2)
print('\nMat2 + vec1 =\n', mat2+vec1)
print('\nMat1 / scalar =\n', mat1/scalar)

Output

Matrix 1:
 [[-0.12962187  0.59230972 -0.24106256]
 [ 0.17427254 -0.21010996  0.55757414]
 [-0.4393248  -0.99196874 -0.21815031]]

Matrix 2:
 [[1. 1. 1.]
 [1. 1. 1.]
 [1. 1. 1.]]

Vector:
 [[ 2.17789155]
 [ 2.61166418]
 [-0.82754604]]

Scalar:
 3

Mat1 + Mat2 =
 [[0.87037813 1.59230972 0.75893744]
 [1.17427254 0.78989004 1.55757414]
 [0.5606752  0.00803126 0.78184969]]

Mat2 + vec1 =
 [[3.17789155 3.17789155 3.17789155]
 [3.61166418 3.61166418 3.61166418]
 [0.17245396 0.17245396 0.17245396]]

Mat1 / scalar =
 [[-0.04320729  0.19743657 -0.08035419]
 [ 0.05809085 -0.07003665  0.18585805]
 [-0.1464416  -0.33065625 -0.07271677]]

THE SPECIAL CASE OF MATRIX MULTIPLICATION

Matrix multiplication can be performed in two ways, each having different meanings to it:

  • Element-wise matrix multiplication: Similar to matrix addition/subtraction, we multiply corresponding elements of the matrix. The operation can be scaled to vectors or tensors.

THE SPECIAL CASE OF MATRIX MULTIPLICATION

Matrix multiplication can be performed in two ways, each having different meanings to it:

  • Matrix Product for Linear Transformation: This is a special operation between two matrices such that to find ci,j element of the resultant matrix C of matrix product of two matrices A and B we take dot product of ith row of A and jth column of B. Then to understand matrix product, we need to understand dot product of two vectors.

Dot Product of two vectors: This operation takes two vectors and returns a single scalar. Algebraically it is sum of product of respective elements of two equally sized vectors. Example:

vectors_dot_product

Going back to matrix product! Now we know two things that:

  • Dot product takes two equal sized vectors, and
  • We take ith row of first matrix and jth column of second matrix to get ci,j of the resultant matrix.

We can conclude that the length of the rows of the first matrix, i.e., number of columns in the first matrix = the length of the columns of the second matrix, i.e., number of rows in the second matrix.

Let’s have a look at an example:

multiply_matrices
Src: Scroggs, M. mscroggs.co.uk Blog: Matrix multiplication. Retrieved from https://www.mscroggs.co.uk/blog/73

We can consider a vector as a matrix with either 1 column or 1 row and thus can use matrix multiplication to multiply a vector to a matrix.

In Python

A = np.random.randn(3, 5)
B = np.random.randn(5, 2)

C = np.dot(A, B)

print('A:\n', A)
print('B:\n', B)
print('C:\n', C)

Output

A:
 [[-0.1388191   0.14745987 -0.84502098  0.46513235 -0.21720005]
 [ 0.62561826 -0.64941331  0.33531271  1.21443163 -0.91440507]
 [ 0.49542941 -1.35285908 -0.39693741 -1.38204644  0.43740322]]
B:
 [[ 0.66353444 -0.47480196]
 [-0.91456639 -1.96659148]
 [-1.21411341 -0.74356759]
 [ 1.23905566  0.83039239]
 [-1.3658308   0.53584564]]
C:
 [[ 1.6719616   0.67410514]
 [ 3.3556142   1.24923307]
 [-0.26191031  1.80717984]]

Most of the time when we speak of matrix multiplication, we will be referring to the latter one, unless specified. Now let’s have a look at some of the properties of matrix multiplication.

Properties of Matrix Multiplication
  • Matrix multiplication is distributive:

A(B+C) = AB + AC

  • Matrix multiplication is associative:

A(BC) = (AB)C

  • It is NOT commutative!

ABBA

  • Transpose of a matrix multiplication:

(AB)T = ATBT

Matrix multiplication without NumPy:

def matmul(A, B):
    assert len(A[0]) == len(B), 'Num of columns in A should be equal to number of rows in B.'
    rows_resultant = len(A)
    cols_resultant = len(B[0])
    eq_lens = len(A)
    C = []
    for i in range(rows_resultant):
        C.append([])
        for j in range(cols_resultant):
            C[i].append(0)
            for k in range(eq_lens):
                C[i][j] += A[i][k] * B[k][j]
    return C

A = np.random.randn(3, 5).tolist()
B = np.random.randn(5, 3).tolist()
C = matmul(A, B)

print('A:\n', np.asarray(A))
print('B:\n', np.asarray(B))
print('C:\n', np.asarray(C))

Output:

A:
 [[ 0.04419893  0.18795041 -1.24424637  0.04571569  0.10870657]
 [ 1.34369374 -0.74967094  1.86216682  0.64519078  0.59999603]
 [-0.3734074   0.16378569 -2.02438453  1.43358335 -1.50436267]]
B:
 [[ 0.54236258  0.30535342 -0.01404141]
 [-0.96720427 -0.17220239  0.22115328]
 [-0.60043883  1.14008026 -0.54035821]
 [-0.16141727 -1.13582692 -0.6621159 ]
 [-0.92204875  0.77273183 -0.6106762 ]]
C:
 [[ 0.58927924 -1.43740995  0.71328398]
 [ 0.33573688  2.66241625 -1.19089667]
 [ 0.85458265 -2.45018636  1.13535771]]

SYSTEM OF EQUATIONS AS MATRIX MULTIPLICATION

Say we have equations such as:

Where,

Ai,j = coefficient of xj in equation i.

bi = Result required from the weighted sum in equation i.

xj = unknowns to be found using the system.

In terms of Machine Learning, Ai,j represents jth feature value of ith example, bi represents output label of the example and xj represent weight of the jth feature in the weighted sum.

Representing this in the matrix form we get:

Ax = b

Where,

system_of_eq_matrix_prod_form
system_of_eq_as_matrices

Thus, we can use matrix-vector multiplication as short hand notation to represent system of equations.

Since there are ways to represent system of equations as matrix multiplication, there might be ways to solve the system using matrices and vectors. We will be looking into it in the next part of this blog! Till then try implementing simple NumPy like class for arrays and their methods in Python from scratch.

Leave a Reply

Close Menu

SUBSCRIBE FOR WEEKLY POST UPDATES <3