Elements of linear algebra#

ECON2125/6012 Lecture 5
Fedor Iskhakov

Announcements & Reminders

  • Tests due last night: feedback soon

  • This lecture is recorded

  • Next week we return to the class

Plan for this lecture

Review of some concepts in linear algebra

  1. Vector spaces

  • Linear combinations, span, linear independence

  • Linear subspaces, bases and dimension

  • Linear maps and independence

  1. Linear equations

  • Basic operations with matrices

  • Square and invertible matrices

  • Determinant

  1. Quadratic forms

  • Eigenvalues and diagonalization

  • Symmetric matrices

  • Positive and negative definiteness

Supplementary reading:

Intro to Linear Algebra#

Linear algebra is used to study linear models

Foundational for many disciplines related to economics

  • Economic theory

  • Econometrics and statistics

  • Finance

  • Operations research

Example of linear model

Equilibrium in a single market with price \(p\)

\[\begin{split} q_d = a + b p \\ q_s = c + d p \\ q_s = q_d \end{split}\]

What price \(p\) clears the market, and at what quantity \(q = q_s = q_d\)?

Here \(a, b, c, d\) are the model parameters or coefficients

Treated as fixed for a single computation but might vary between computations to better fit the data

Example

Determination of income

\[\begin{split} C = a + b(Y - T) \\ E = C + I \\ G = T \\ Y = E % \end{split}\]

Question: discuss what the variables are

Solve for \(Y\) as a function of \(I\) and \(G\)

Bigger, more complex systems found in problems related to

  • Regression and forecasting

  • Portfolio analysis

  • Ranking systems

  • Etc., etc. — any number of applications

Definition

A general system of equations can be written as

\[\begin{split} \begin{array}{c} a_{11} x_1 + a_{12} x_2 + \cdots + a_{1K} x_K = b_1 \\ a_{21} x_1 + a_{22} x_2 + \cdots + a_{2K} x_K = b_2 \\ \vdots \\ a_{N1} x_1 + a_{N2} x_2 + \cdots + a_{NK} x_K = b_N \end{array} \end{split}\]

Typically

  • the \(a_{ij}\) and \(b_i\) are exogenous / given / parameters

  • the values \(x_j\) are endogenous

Key question: What values of \(x_1, \ldots, x_K\) solve this system?

More often we write the system in matrix form

\[\begin{split} \left( \begin{array}{cccc} a_{11} & a_{12} & \cdots & a_{1K} \\ a_{21} & a_{22} & \cdots & a_{2K} \\ \vdots & \vdots & & \vdots \\ a_{N1} & a_{N2} & \cdots & a_{NK} \end{array} \right) \left( \begin{array}{c} x_1 \\ x_2 \\ \vdots \\ x_K \end{array} \right) = \left( \begin{array}{c} b_1 \\ b_2 \\ \vdots \\ b_K \end{array} \right) % \end{split}\]

or

\[ % {\bf A} {\bf x} = {\bf b} % \]

Notice bold font to indicate that the variables are vectors.

Let’s solve this system on a computer:

import numpy as np
from scipy.linalg import solve
A = [[0, 2, 4],
     [1, 4, 8],
     [0, 3, 7]]
b = (1, 2, 0)
A, b = np.asarray(A), np.asarray(b)
x=solve(A, b)
print(f'Solutions is x={x}')
Solutions is x=[-1.77635684e-15  3.50000000e+00 -1.50000000e+00]

This tells us that the solution is \(x = (0. , 3.5, -1.5)\)

That is,

\[\begin{split} % {\bf x} = \left( \begin{array}{c} x_1 \\ x_2 \\ x_3 \end{array} \right) = \left( \begin{array}{c} 0 \\ 3.5 \\ -1.5 \end{array} \right) % \end{split}\]

The usual question: if computers can do it so easily — what do we need to study for?

But now let’s try this similar looking problem

Question: what changed?

import numpy as np
from scipy.linalg import solve
A = [[0, 2, 4],
     [1, 4, 8],
     [0, 3, 6]]
b = (1, 2, 0)
A, b = np.asarray(A), np.asarray(b)
x=solve(A, b)
print(f'Solutions is x={x}')
Solutions is x=[ 0.00000000e+00 -9.00719925e+15  4.50359963e+15]
/var/folders/5d/cjhpvdlx7lg8zt1ypw5fcyy40000gn/T/ipykernel_77087/3978225594.py:8: LinAlgWarning: Ill-conditioned matrix (rcond=4.11194e-18): result may not be accurate.
  x=solve(A, b)

This is the output that we get: LinAlgWarning: Ill-conditioned matrix

  • What does this mean?

  • How can we fix it?

We still need to understand the concepts after all!

Vector Space#

Recall that \(\mathbb{R}^N := \) set of all \(N\)-vectors

Definition

An \(N\)-vector \({\bf x}\) is a tuple of \(N\) real numbers:

\[ {\bf x} = (x_1, \ldots, x_N) \in \mathbb{R}^N \quad \text{ where } \quad x_n \in \mathbb{R} \text{ for each } n \]

We typically think of vectors as columns, and can write \({\bf x}\) vertically, like so:

\[\begin{split} % {\bf x} = \left( \begin{array}{c} x_1 \\ x_2 \\ \vdots \\ x_N \end{array} \right) % \end{split}\]
_images/vec.png

Fig. 42 Visualization of vector \({\bf x}\) in \(\mathbb{R}^2\)#

_images/vecs.png

Fig. 43 Three vectors in \(\mathbb{R}^2\)#

The vector of ones will be denoted \({\bf 1}\)

\[\begin{split} % {\bf 1} := \left( \begin{array}{c} 1 \\ \vdots \\ 1 \end{array} \right) % \end{split}\]

Vector of zeros will be denoted \({\bf 0}\)

\[\begin{split} % {\bf 0} := \left( \begin{array}{c} 0 \\ \vdots \\ 0 \end{array} \right) % \end{split}\]

Linear Operations#

Two fundamental algebraic operations:

  1. Vector addition

  2. Scalar multiplication

Definition

Sum of \({\bf x} \in \mathbb{R}^N\) and \({\bf y} \in \mathbb{R}^N\) defined by

\[\begin{split} % {\bf x} + {\bf y} :=: \left( \begin{array}{c} x_1 \\ x_2 \\ \vdots \\ x_N \end{array} \right) + \left( \begin{array}{c} y_1 \\ y_2 \\ \vdots \\ y_N \end{array} \right) := \left( \begin{array}{c} x_1 + y_1 \\ x_2 + y_2 \\ \vdots \\ x_N + y_N \end{array} \right) % \end{split}\]

Example

\[\begin{split} % \left( \begin{array}{c} 1 \\ 2 \\ 3 \\ 4 \end{array} \right) + \left( \begin{array}{c} 2 \\ 4 \\ 6 \\ 8 \end{array} \right) := \left( \begin{array}{c} 3 \\ 6 \\ 9 \\ 12 \end{array} \right) % \end{split}\]

Example

\[\begin{split} % \left( \begin{array}{c} 1 \\ 2 \\ 3 \\ 4 \end{array} \right) + \left( \begin{array}{c} 1 \\ 1 \\ 1 \\ 1 \end{array} \right) := \left( \begin{array}{c} 2 \\ 3 \\ 4 \\ 5 \end{array} \right) % \end{split}\]
_images/vec_add.png

Fig. 44 Vector addition#

Definition

Scalar product of \(\alpha \in \mathbb{R}\) and \({\bf x} \in \mathbb{R}^N\) defined by

\[\begin{split} % \alpha {\bf x} = \alpha \left( \begin{array}{c} x_1 \\ x_2 \\ \vdots \\ x_N \end{array} \right) := \left( \begin{array}{c} \alpha x_1 \\ \alpha x_2 \\ \vdots \\ \alpha x_N \end{array} \right) % \end{split}\]

Example

\[\begin{split} % 0.5 \left( \begin{array}{c} 1 \\ 2 \\ 3 \\ 4 \end{array} \right) := \left( \begin{array}{c} 0.5 \\ 1.0 \\ 1.5 \\ 2.0 \end{array} \right) % \end{split}\]

Example

\[\begin{split} % -1 \left( \begin{array}{c} 1 \\ 2 \\ 3 \\ 4 \end{array} \right) := \left( \begin{array}{c} -1 \\ -2 \\ -3 \\ -4 \end{array} \right) % \end{split}\]
_images/vec_scalar.png

Fig. 45 Scalar multiplication#

Subtraction performed element by element, analogous to addition

\[\begin{split} % {\bf x} - {\bf y} := \left( \begin{array}{c} x_1 - y_1 \\ x_2 - y_2 \\ \vdots \\ x_N - y_N \end{array} \right) % \end{split}\]

Def can be given in terms of addition and scalar multiplication:

\[ % {\bf x} - {\bf y} := {\bf x} + (-1) {\bf y} % \]
_images/vec_minus.png

Fig. 46 Difference between vectors#

Incidentally, most high level numerical libraries treat vector addition and scalar multiplication in the same way — element-wise

import numpy as np
x = np.array((2, 4, 6))
y = np.array((10, 10, 10))
print(x + y) # Vector addition

x = np.array([12, 14, 16])
print(2 * x) # Scalar multiplication
[12 14 16]
[24 28 32]

Definition

A linear combination of vectors \({\bf x}_1,\ldots, {\bf x}_K\) in \(\mathbb{R}^N\) is a vector

\[ % {\bf y} = \sum_{k=1}^K \alpha_k {\bf x}_k = \alpha_1 {\bf x}_1 + \cdots + \alpha_K {\bf x}_K % \]

where \(\alpha_1,\ldots, \alpha_K\) are scalars

Example

\[\begin{split} % 0.5 \left( \begin{array}{c} 6.0 \\ 2.0 \\ 8.0 \end{array} \right) + 3.0 \left( \begin{array}{c} 0 \\ 1.0 \\ -1.0 \end{array} \right) = \left( \begin{array}{c} 3.0 \\ 4.0 \\ 1.0 \end{array} \right) % \end{split}\]

Inner Product#

Definition

The inner product of two vectors \({\bf x}\) and \({\bf y}\) in \(\mathbb{R}^N\) is

\[ % {\bf x}' {\bf y} := \sum_{n=1}^N x_n y_n % \]

The notation \(\bullet '\) flips the vector from column-vector to row-vector, this will make sense when we talk later about matrices for which vectors are special case.

Example

\({\bf x} = (2, 3)\) and \({\bf y} = (-1, 1)\) implies that

\[ % {\bf x}' {\bf y} = 2 \times (-1) + 3 \times 1 = 1 % \]

Example

\({\bf x} = \tfrac{1}{N} {\bf 1}\) and \({\bf y} = (y_1, \ldots, y_N)\) implies

\[ % {\bf x}' {\bf y} = \frac{1}{N} \sum_{n=1}^N y_n % \]
import numpy as np
x = np.array((1, 2, 3, 4))
y = np.array((2, 4, 6, 8))
print('Product =',x * y)
print('Inner product =',np.sum(x * y))
print('Inner product =',x @ y)
Product = [ 2  8 18 32]
Inner product = 60
Inner product = 60

Fact

For any \(\alpha, \beta \in \mathbb{R}\) and any \({\bf x}, {\bf y} \in \mathbb{R}^N\), the following statements are true:

  1. \({\bf x}' {\bf y} = {\bf y}' {\bf x}\)

  2. \((\alpha {\bf x})' (\beta {\bf y}) = \alpha \beta ({\bf x}' {\bf y})\)

  3. \({\bf x}' ({\bf y} + {\bf z}) = {\bf x}' {\bf y} + {\bf x}' {\bf z}\)

For example, item 2 is true because

\[ % (\alpha {\bf x})' (\beta {\bf y}) = \sum_{n=1}^N \alpha x_n \beta y_n = \alpha \beta \sum_{n=1}^N x_n y_n = \alpha \beta ({\bf x}' {\bf y}) % \]

Exercise: Use above rules to show that \((\alpha {\bf y} + \beta {\bf z})'{\bf x} = \alpha {\bf x}' {\bf y} + \beta {\bf x}' {\bf z}\)

The next result is a generalization

Fact

Inner products of linear combinations satisfy

\[ % \left( \sum_{k=1}^K \alpha_k {\bf x}_k \right)' \left( \sum_{j=1}^J \beta_j {\bf y}_j \right) = \sum_{k=1}^K \sum_{j=1}^J \alpha_k \beta_j {\bf x}_k' {\bf y}_j % \]

(Reminder about) Norms and Distance#

The (Euclidean) norm of \({\bf x} \in \mathbb{R}^N\) is defined as

\[ % \| {\bf x} \| := \sqrt{{\bf x}' {\bf x} } = \left( \sum_{n=1}^N x_n^2 \right)^{1/2} % \]

Interpretation:

  • \(\| {\bf x} \|\) represents the ``length’’ of \({\bf x}\)

  • \(\| {\bf x} - {\bf y} \|\) represents distance between \({\bf x}\) and \({\bf y}\)

_images/vec.png

Fig. 47 Length of red line \(= \sqrt{x_1^2 + x_2^2} =: \|{\bf x}\|\)#

\(\| {\bf x} - {\bf y} \|\) represents distance between \({\bf x}\) and \({\bf y}\)

_images/vec_minus.png

Fact

For any \(\alpha \in \mathbb{R}\) and any \({\bf x}, {\bf y} \in \mathbb{R}^N\), the following statements are true:

  1. \(\| {\bf x} \| \geq 0\) and \(\| {\bf x} \| = 0\) if and only if \({\bf x} = {\bf 0}\)

  • \(\| \alpha {\bf x} \| = |\alpha| \| {\bf x} \|\)

  • \(\| {\bf x} + {\bf y} \| \leq \| {\bf x} \| + \| {\bf y} \|\) (triangle inequality)

  • \(| {\bf x}' {\bf y} | \leq \| {\bf x} \| \| {\bf y} \|\) (Cauchy-Schwarz inequality)

Fact

If \({\bf x} \in \mathbb{R}^N\) is nonzero, then the solution to the optimization problem

\[ % \max_{{\bf y}} {\bf x}'{\bf y} \quad \text{ subject to } \quad {\bf y} \in \mathbb{R}^N \text{ and } \| {\bf y} \| = 1 % \]

is \({\hat{\bf x}} := {\bf x} / \| {\bf x} \|\)

_images/max_inner_prod.png

Span#

Let \(X \subset \mathbb{R}^N\) be any nonempty set (of points in \(\mathbb{R}^N\), i.e. vectors)

Definition

Set of all possible linear combinations of elements of \(X\) is called the span of \(X\), denoted by \(\mathrm{span}(X)\)

For finite \(X := \{{\bf x}_1,\ldots, {\bf x}_K\}\) the span can be expressed as

\[ \mathrm{span}(X):= \left\{ \text{ all } \sum_{k=1}^K \alpha_k {\bf x}_k \text{ such that } (\alpha_1,\ldots, \alpha_K) \in \mathbb{R}^K \right\} \]

We are mainly interested in the span of finite sets…

Example

Let’s start with the span of a singleton

Let \(X = \{ {\bf 1} \} \subset \mathbb{R}^2\), where \({\bf 1} := (1,1)\)

The span of \(X\) is all vectors of the form

\[\begin{split} % \alpha {\bf 1} = \left( \begin{array}{c} \alpha \\ \alpha \end{array} \right) \quad \text{ with } \quad \alpha \in \mathbb{R} % \end{split}\]

Constitutes a line in the plane that passes through

  • the vector \({\bf 1}\) (set \(\alpha = 1\))

  • the origin \({\bf 0}\) (set \(\alpha = 0\))

_images/span_of_one_vec.png

Fig. 48 The span of \({\bf 1} := (1,1)\) in \(\mathbb{R}^2\)#

Example

Let \({\bf x}_1 = (3, 4, 2)\) and let \({\bf x}_2 = (3, -4, 0.4)\)

By definition, the span is all vectors of the form

\[\begin{split} % {\bf y} = \alpha \left( \begin{array}{c} 3 \\ 4 \\ 2 \end{array} \right) + \beta \left( \begin{array}{c} 3 \\ -4 \\ 0.4 \end{array} \right) \quad \text{where} \quad \alpha, \beta \in \mathbb{R} % \end{split}\]

It turns out to be a plane that passes through

  • the vector \({\bf x}_1\)

  • the vector \({\bf x}_2\)

  • the origin \({\bf 0}\)

_images/span_plane.png

Fig. 49 Span of \({\bf x}_1, {\bf x}_2\)#

Fact

If \(X \subset Y\), then \(\mathrm{span}(X) \subset \mathrm{span}(Y)\)

Let \(Y\) be any subset of \(\mathbb{R}^N\), and let \(X:= \{{\bf x}_1,\ldots, {\bf x}_K\}\)

If \(Y \subset \mathrm{span}(X)\), we say that the vectors in \(X\) span the set \(Y\)

Alternatively, we say that \(X\) is a spanning set for \(Y\)

A nice situation: \(Y\) is large but \(X\) is small

\(\implies\) large set \(Y\) ``described’’ by the small number of vectors in \(X\)

Example

Consider the vectors \(\{{\bf e}_1, \ldots, {\bf e}_N\} \subset \mathbb{R}^N\), where

\[\begin{split} % {\bf e}_1 := \left( \begin{array}{c} 1 \\ 0 \\ \vdots \\ 0 \end{array} \right), \quad {\bf e}_2 := \left( \begin{array}{c} 0 \\ 1 \\ \vdots \\ 0 \end{array} \right), \; \cdots, \; {\bf e}_N := \left( \begin{array}{c} 0 \\ 0 \\ \vdots \\ 1 \end{array} \right) % \end{split}\]

That is, \({\bf e}_n\) has all zeros except for a \(1\) as the \(n\)-th element

Vectors \({\bf e}_1, \ldots, {\bf e}_N\) are called the canonical basis vectors of \(\mathbb{R}^N\)

_images/vec_canon.png

Fig. 50 Canonical basis vectors in \(\mathbb{R}^2\)#

Fact

The span of \(\{{\bf e}_1, \ldots, {\bf e}_N\}\) is equal to all of \(\mathbb{R}^N\)

_images/vec_canon_x.png

Fig. 51 Canonical basis vectors in \(\mathbb{R}^2\)#

Fact

Consider the set

\[ % P := \{ (x_1, x_2, 0) \in \mathbb{R}^3 \colon x_1, x_2 \in \mathbb{R \}} % \]

Let \({\bf e}_1\) and \({\bf e}_2\) be the canonical basis vectors in \(\mathbb{R}^3\)

Then \(\mathrm{span}\{{\bf e}_1, {\bf e}_2\} = P\)

Graphically, \(P =\) flat plane in \(\mathbb{R}^3\), where height coordinate \(x_3=0\)

_images/flat_plane_no_vecs.png
_images/flat_plane_e_vecs.png

Fig. 52 \(\mathrm{span}\{{\bf e}_1, {\bf e}_2\} = P\)#

Linear Subspaces#

Definition

A nonempty \(S \subset \mathbb{R}^N\) called a linear subspace of \(\mathbb{R}^N\) if

\[ % {\bf x}, {\bf y} \in S \; \text{ and } \;\alpha, \beta \in \mathbb{R} \quad \implies \quad \alpha {\bf x} + \beta {\bf y} \in S % \]

In other words, \(S \subset \mathbb{R}^N\) is “closed” under vector addition and scalar multiplication

Note: Sometimes we just say subspace and drop “linear”

Example

\(\mathbb{R}^N\) itself is a linear subspace of \(\mathbb{R}^N\)

Fact

Fix \({\bf a} \in \mathbb{R}^N\) and let \(A := \{ {\bf x} \in \mathbb{R}^N \colon {\bf a }'{\bf x} = 0 \}\)

The set \(A\) is a linear subspace of \(\mathbb{R}^N\)

Fact

If \(Z\) is a nonempty subset of \(\mathbb{R}^N\), then \(\mathrm{span}(Z)\) is a linear subspace

Fact

If \(S\) and \(S'\) are two linear subspaces of \(\mathbb{R}^N\), then \(S \cap S'\) is also a linear subspace of \(\mathbb{R}^N\).

Other examples of linear subspaces

  • The singleton \(\{{\bf 0}\}\) in \(\mathbb{R}^N\)

  • Lines through the origin in \(\mathbb{R}^2\) and \(\mathbb{R}^3\)

  • Planes through the origin in \(\mathbb{R}^3\)

Exercise: Let \(S\) be a linear subspace of \(\mathbb{R}^N\). Show that

  1. \({\bf 0} \in S\)

  2. If \(X \subset S\), then \(\mathrm{span}(X) \subset S\)

  3. \(\mathrm{span}(S) = S\)

Linear Independence#

Important applied questions

  • When does a set of linear equations have a solution?

  • When is that solution unique?

  • When do regression arguments suffer from collinearity?

  • How can we approximate complex functions parsimoniously?

  • What is the rank of a matrix?

  • When is a matrix invertible?

All of these questions closely related to linear independence

Definition

A nonempty collection of vectors \(X := \{{\bf x}_1,\ldots, {\bf x}_K\} \subset \mathbb{R}^N\) is called linearly independent if

\[ % \sum_{k=1}^K \alpha_k {\bf x}_k = {\bf 0} \; \implies \; \alpha_1 = \cdots = \alpha_K = 0 % \]

As we’ll see, linear independence of a set of vectors determines how large a space they span

Loosely speaking, linearly independent sets span large spaces

Example

Let \({\bf x} := (1, 2)\) and \({\bf y} := (-5, 3)\)

The set \(X = \{{\bf x}, {\bf y}\}\) is linearly independent in \(\mathbb{R}^2\)

Indeed, suppose \(\alpha_1\) and \(\alpha_2\) are scalars with

\[\begin{split} % \alpha_1 \left( \begin{array}{c} 1 \\ 2 \end{array} \right) + \alpha_2 \left( \begin{array}{c} -5 \\ 3 \end{array} \right) = {\bf 0} % \end{split}\]

Equivalently,

\[\begin{split} % \alpha_1 = 5 \alpha_2 \\ 2 \alpha_1 = -3 \alpha_2 % \end{split}\]

Then \(2(5\alpha_2) = 10 \alpha_2 = -3 \alpha_2\), implying \(\alpha_2 = 0\) and hence \(\alpha_1 = 0\)

Example

The set of canonical basis vectors \(\{{\bf e}_1, \ldots, {\bf e}_N\}\) is linearly independent in \(\mathbb{R}^N\)

As a first step to better understanding linear independence let’s look at some equivalences

Fact

Take \(X := \{{\bf x}_1,\ldots, {\bf x}_K\} \subset \mathbb{R}^N\). For \(K > 1\) all of following statements are equivalent

  1. \(X\) is linearly independent

  2. No \({\bf x}_i \in X\) can be written as linear combination of the others

  3. \(X_0 \subsetneq X \implies \mathrm{span}(X_0) \subsetneq \mathrm{span}(X)\)

  • Here \(X_0 \subsetneq X\) means \(X_0 \subset X\) and \(X_0 \ne X\)

  • We say that \(X_0\) is a proper subset of \(X\)

As an exercise, let’s show that the first two statements are equivalent, first

(1)#\[ \sum_{k=1}^K \alpha_k {\bf x}_k = {\bf 0} \; \implies \; \alpha_1 = \cdots = \alpha_K = 0 \]

and the second

(2)#\[ \text{no ${\bf x}_i \in X$ can be written as linear combination of others} \]

We now show that

To show that (1) \(\implies\) (2) let’s suppose to the contrary that

  1. \(\sum_{k=1}^K \alpha_k {\bf x}_k = {\bf 0} \implies \alpha_1 = \cdots = \alpha_K = 0\)

  2. and yet some \({\bf x}_i\) can be written as a linear combination of the other elements of \(X\)

In particular, suppose that

\[ % {\bf x}_i = \sum_{k \ne i} \alpha_k {\bf x}_k % \]

Then, rearranging,

\[ % \alpha_1 {\bf x}_1 + \cdots + (-1) {\bf x}_i + \cdots + \alpha_K {\bf x}_K = {\bf 0} % \]

This contradicts 1., and hence (2) holds

To show that (2) \(\implies\) (1) let’s suppose to the contrary that

  1. no \({\bf x}_i\) can be written as a linear combination of others

  2. and yet \(\exists\) \(\alpha_1, \ldots, \alpha_K\) not all zero with \(\alpha_1 {\bf x}_1 + \cdots + \alpha_K {\bf x}_K = {\bf 0}\)

Suppose without loss of generality that \(\alpha_1 \ne 0\) (similar argument works for any \(\alpha_j\))

Then

\[ % {\bf x}_1 = \frac{\alpha_2}{-\alpha_1} {\bf x}_2 + \cdots + \frac{\alpha_K}{-\alpha_1} {\bf x}_K % \]

This contradicts 1., and hence (1) holds.

Let’s show one more part of the proof as an exercise:

\[ % X \text{ linearly independent } \implies \text{ proper subsets of $X$ have smaller span} % \]

Example

Dropping any of the canonical basis vectors reduces span

Consider the \(N=2\) case

We know that \(\mathrm{span} \{{\bf e}_1, {\bf e}_2\} =\) all of \(\mathbb{R}^2\)

Removing either element of \(\mathrm{span} \{{\bf e}_1, {\bf e}_2\}\) reduces the span to a line

_images/vec_h_axis.png

Fig. 53 The span of \(\{{\bf e}_1\}\) alone is the horizonal axis#

Example

As another visual example of linear independence, consider the pair

\[\begin{split} % {\bf x}_1 = \begin{pmatrix} 3 \\ 4 \\ 2 \end{pmatrix} \quad \text{and} \quad {\bf x}_2 = \begin{pmatrix} 3 \\ -4 \\ 1 \end{pmatrix} % \end{split}\]

The span of this pair is a plane in \(\mathbb{R}^3\)

But if we drop either one the span reduces to a line

_images/nonredundant1.png

Fig. 54 The span of \(\{{\bf x}_1, {\bf x}_2\}\) is a plane#

_images/nonredundant2.png

Fig. 55 The span of \(\{{\bf x}_1\}\) alone is a line#

_images/nonredundant3.png

Fig. 56 The span of \(\{{\bf x}_2\}\) alone is a line#

Linear Dependence#

Definition

If \(X\) is not linearly independent then it is called linearly dependent

We saw above that: linear independence \(\iff\) dropping any elements reduces span.

Hence \(X\) is linearly dependent when some elements can be removed without changing \(\mathrm{span}(X)\)

That is, \(\exists \, X_0 \subsetneq X \; \text{ such that } \; \mathrm{span}(X_0) = \mathrm{span}(X)\)

Example

As an example with redundacy, consider \(\{{\bf x}_1, {\bf x}_2\} \subset \mathbb{R}^2\) where

  • \({\bf x}_1 = {\bf e}_1 := (1, 0)\)

  • \({\bf x}_2 = (-2, 0)\)

We claim that \(\mathrm{span} \{{\bf x}_1, {\bf x}_2\} = \mathrm{span}\{{\bf x}_1\}\)

_images/vec_noncanon.png

Fig. 57 The vectors \({\bf x}_1\) and \({\bf x}_2\)#

Implications of Independence#

Let \(X := \{{\bf x}_1,\ldots, {\bf x}_K\} \subset \mathbb{R}^N\)

Fact

If \(X\) is linearly independent, then \(X\) does not contain \({\bf 0}\)

Exercise: Prove it

Fact

If \(X\) is linearly independent, then every subset of \(X\) is linearly independent

Fact

If \(X:= \{{\bf x}_1,\ldots, {\bf x}_K\} \subset \mathbb{R}^N\) is linearly independent and \({\bf z}\) is an \(N\)-vector not in \(\mathrm{span}(X)\), then \(X \cup \{ {\bf z} \}\) is linearly independent

Unique Representations#

Let

  • \(X := \{{\bf x}_1,\ldots,{\bf x}_K\} \subset \mathbb{R}^N\)

  • \({\bf y} \in \mathbb{R}^N\)

We know that if \({\bf y} \in \mathrm{span}(X)\), then exists representation

\[ % {\bf y} = \sum_{k=1}^K \alpha_k {\bf x}_k % \]

But when is this representation unique?

Answer: When \(X\) is linearly independent

Fact

If \(X = \{{\bf x}_1,\ldots, {\bf x}_K\} \subset \mathbb{R}^N\) is linearly independent and \({\bf y} \in \mathbb{R}^N\), then there is at most one set of scalars \(\alpha_1,\ldots,\alpha_K\) such that \({\bf y} = \sum_{k=1}^K \alpha_k {\bf x}_k\)

Span and Independence#

Here’s one of the most fundamental results in linear algebra

Exchange Lemma

Let \(S\) be a linear subspace of \(\mathbb{R}^N\), and be spanned by \(K\) vectors.
Then any linearly independent subset of \(S\) has at most \(K\) vectors

Proof: Omitted

Example

If \(X := \{{\bf x}_1, {\bf x}_2, {\bf x}_3\} \subset \mathbb{R}^2\), then \(X\) is linearly dependent

  • because \(\mathbb{R}^2\) is spanned by the two vectors \({\bf e}_1, {\bf e}_2\)

_images/vecs.png

Fig. 58 Must be linearly dependent#

Example

Recall the plane (flat plane in \(\mathbb{R}^3\) where height coordinate \(x_3\) is zero)

\[ P := \{ (x_1, x_2, 0) \in \mathbb{R}^3 \colon x_1, x_2 \in \mathbb{R \}} \]

We showed before that \(\mathrm{span}\{{\bf e}_1, {\bf e}_2\} = P\) for

\[\begin{split} % {\bf e}_1 := \left( \begin{array}{c} 1 \\ 0 \\ 0 \end{array} \right), \quad {\bf e}_2 := \left( \begin{array}{c} 0 \\ 1 \\ 0 \end{array} \right) % \end{split}\]

Therefore any three vectors lying in \(P\) are linearly dependent

_images/flat_plane.png

Fig. 59 Any three vectors in \(P\) are linearly dependent#

When Do \(N\) Vectors Span \(\mathbb{R}^N\)?#

In general, linearly independent vectors have a relatively “large” span

  • No vector is redundant, so each contributes to the span

This helps explain the following fact:

Fact

Let \(X := \{ {\bf x}_1, \ldots, {\bf x}_N \}\) be any \(N\) vectors in \(\mathbb{R}^N\)

\(\mathrm{span}(X) = \mathbb{R}^N\) if and only if \(X\) is linearly independent

Example

The vectors \({\bf x} = (1, 2)\) and \({\bf y} = (-5, 3)\) span \(\mathbb{R}^2\)

Bases#

Definition

Let \(S\) be a linear subspace of \(\mathbb{R}^N\)

A set of vectors \(B := \{{\bf b}_1, \ldots, {\bf b}_K\} \subset S\) is called a basis of \(S\) if

  1. \(B\) is linearly independent

  2. \(\mathrm{span}(B) = S\)

Example

Canonical basis vectors form a basis of \(\mathbb{R}^N\)

Indeed, if \(E := \{{\bf e}_1, \ldots, {\bf e}_N\} \subset \mathbb{R}^N\), then

  • \(E\) is linearly independent – we showed this earlier

  • \(\mathrm{span}(E) = \mathbb{R}^N\) – we showed this earlier

Example

Recall the plane

\[ % P := \{ (x_1, x_2, 0) \in \mathbb{R}^3 \colon x_1, x_2 \in \mathbb{R \}} % \]

We showed before that \(\mathrm{span}\{{\bf e}_1, {\bf e}_2\} = P\) for

\[\begin{split} % {\bf e}_1 := \left( \begin{array}{c} 1 \\ 0 \\ 0 \end{array} \right), \quad {\bf e}_2 := \left( \begin{array}{c} 0 \\ 1 \\ 0 \end{array} \right) % \end{split}\]

Moreover, \(\{{\bf e}_1, {\bf e}_2\}\) is linearly independent (why?)

Hence \(\{{\bf e}_1, {\bf e}_2\}\) is a basis for \(P\)

_images/flat_plane_e_vecs.png

Fig. 60 The pair \(\{{\bf e}_1, {\bf e}_2\}\) form a basis for \(P\)#

What are the implications of \(B\) being a basis of \(S\)?

In short, every element of \(S\) can be represented uniquely from the smaller set \(B\)

In more detail:

  • \(B\) spans \(S\) and, by linear independence, every element is needed to span \(S\) — a “minimal” spanning set

  • Since \(B\) spans \(S\), every \({\bf y}\) in \(S\) can be represented as a linear combination of the basis vectors

  • By independence, this representation is unique

It’s obvious given the definition that

Fact

If \(B \subset \mathbb{R}^N\) is linearly independent, then \(B\) is a basis of \(\mathrm{span}(B)\)

Example

Let \(B := \{{\bf x}_1, {\bf x}_2\}\) where

\[\begin{split} % {\bf x}_1 = \begin{pmatrix} 3 \\ 4 \\ 2 \end{pmatrix} \quad \text{and} \quad {\bf x}_2 = \begin{pmatrix} 3 \\ -4 \\ 1 \end{pmatrix} % \end{split}\]

We saw earlier that

  • \(S := \mathrm{span}(B)\) is the plane in \(\mathbb{R}^3\) passing through \({\bf x}_1\), \({\bf x}_2\) and \({\bf 0}\)

  • \(B\) is linearly independent in \(\mathbb{R}^3\) (dropping either reduces span)

Hence \(B\) is a basis for the plane \(S\)

_images/nonredundant1.png

Fig. 61 The pair \(\{{\bf x}_1, {\bf x}_2\}\) is a basis of its span#

Fundamental Properties of Bases#

Fact

If \(S\) is a linear subspace of \(\mathbb{R}^N\) distinct from \(\{{\bf 0}\}\), then

  1. \(S\) has at least one basis, and

  2. every basis of \(S\) has the same number of elements

Dimension#

Definition

Let \(S\) be a linear subspace of \(\mathbb{R}^N\)

We now know that every basis of \(S\) has the same number of elements

This common number is called the dimension of \(S\)

Example

\(\mathbb{R}^N\) is \(N\) dimensional because the \(N\) canonical basis vectors form a basis

Example

\(P := \{ (x_1, x_2, 0) \in \mathbb{R}^3 \colon x_1, x_2 \in \mathbb{R \}}\) is two dimensional because the first two canonical basis vectors of \(\mathbb{R}^3\) form a basis

Example

In \(\mathbb{R}^3\), a line through the origin is one-dimensional, while a plane through the origin is two-dimensional

Dimension of Spans#

Fact

Let \(X := \{{\bf x}_1,\ldots,{\bf x}_K\} \subset \mathbb{R}^N\)

The following statements are true:

  1. \(\mathrm{dim}(\mathrm{span}(X)) \leq K\)

  2. \(\mathrm{dim}(\mathrm{span}(X)) = K\) \(\;\iff\;\) \(X\) is linearly independent

Fact

If \(S\) a linear subspace of \(\mathbb{R}^N\), then

\[ % \dim(S) = N \; \iff \; S = \mathbb{R}^N % \]

Useful implications

  • The only \(N\)-dimensional subspace of \(\mathbb{R}^N\) is \(\mathbb{R}^N\)

  • To show \(S = \mathbb{R}^N\) just need to show that \(\dim(S) = N\)

Proof: omitted

Linear Maps#

In this section we investigate one of the most important classes of functions: so-called linear functions

  • Linear functions play a fundamental role in all fields of science

  • Linear functions are in one-to-one correspondence with matrices

  • Even nonlinear functions can often be rewritten as partially linear

  • The properties of linear functions are closely tied to notions such as

    • linear combinations, span

    • linear independence, bases, etc.

Definition

A function \(T \colon \mathbb{R}^K \to \mathbb{R}^N\) is called linear if

\[ % T(\alpha {\bf x} + \beta {\bf y}) = \alpha T{\bf x} + \beta T{\bf y} \qquad \forall \, {\bf x}, {\bf y} \in \mathbb{R}^K, \; \forall \, \alpha, \beta \in \mathbb{R} % \]

Notation:

  • Linear functions often written with upper case letters

  • Typically omit parenthesis around arguments when convenient

Example

\(T \colon \mathbb{R} \to \mathbb{R}\) defined by \(Tx = 2x\) is linear

Example

The function \(f \colon \mathbb{R} \to \mathbb{R}\) defined by \(f(x) = x^2\) is nonlinear

Example

Given constants \(c_1\) and \(c_2\), the function \(T \colon \mathbb{R}^2 \to \mathbb{R}\) defined by

\[ % T {\bf x} = T (x_1, x_2) = c_1 x_1 + c_2 x_2 % \]

is linear

_images/linfunc.png

Fig. 62 The graph of \(T {\bf x} = c_1 x_1 + c_2 x_2\) is a plane through the origin#

Remark: Thinking of linear functions as those whose graph is a straight line is not correct

Example

Function \(f \colon \mathbb{R} \to \mathbb{R}\) defined by \(f(x) = 1 + 2x\) is nonlinear

This kind of function is called an affine function

Let \({\bf a}_1, \ldots, {\bf a}_K\) be vectors in \(\mathbb{R}^N\)

Let \(T \colon \mathbb{R}^K \to \mathbb{R}^N\) be defined by

\[\begin{split} % T{\bf x} = T \begin{pmatrix} x_1 \\ \vdots \\ x_K \end{pmatrix} = x_1 {\bf a}_1 + \ldots + x_K {\bf a}_K % \end{split}\]

Exercise: Show that this function is linear

Remarks

  • This is a generalization of the previous linear examples

  • In a sense it is the most general representation of a linear map from \(\mathbb{R}^K\) to \(\mathbb{R}^N\)

  • It is also “the same” as the \(N \times K\) matrix with columns \({\bf a}_1, \ldots, {\bf a}_K\) — more on this later

Implications of Linearity#

Fact

If \(T \colon \mathbb{R}^K \to \mathbb{R}^N\) is a linear map and \({\bf x}_1,\ldots, {\bf x}_J\) are vectors in \(\mathbb{R}^K\), then for any linear combination we have

\[ T \left[ \alpha_1 {\bf x}_1 + \cdots + \alpha_J {\bf x}_J \right] = \alpha_1 T {\bf x}_1 + \cdots + \alpha_J T {\bf x}_J \]

Exercise: Show that if \(T\) is any linear function then \(T{\bf 0} = {\bf 0}\)

Fact

If \(T \colon \mathbb{R}^K \to \mathbb{R}^N\) is a linear map, then

\[ % \mathrm{rng}(T) = \mathrm{span}(V) \quad \text{where} \quad V := \{T{\bf e}_1, \ldots, T{\bf e}_K\} % \]

Here \({\bf e}_k\) is the \(k\)-th canonical basis vector in \(\mathbb{R}^K\)

Example

Let \(T \colon \mathbb{R}^2 \to \mathbb{R}^2\) be defined by

\[\begin{split} % T{\bf x} = T(x_1, x_2) = x_1 \begin{pmatrix} 1 \\ 2 \end{pmatrix} + x_2 \begin{pmatrix} 0 \\ -2 \end{pmatrix} % \end{split}\]

Then

\[\begin{split} % T{\bf e}_1 = \begin{pmatrix} 1 \\ 2 \end{pmatrix} \quad \text{and} \quad T{\bf e}_2 = \begin{pmatrix} 0 \\ -2 \end{pmatrix} % \end{split}\]

Exercise: Show that \(V := \{T{\bf e}_1, T{\bf e}_2\}\) is linearly independent

We conclude that the range of \(T\) is all of \(\mathbb{R}^2\) (why?)

Definition

The null space or kernel of linear map \(T \colon \mathbb{R}^K \to \mathbb{R}^N\) is

\[ % \mathrm{kernel}(T) := \{ {\bf x} \in \mathbb{R}^K \colon T{\bf x} = {\bf 0}\} % \]

Exercise: Show that \(\mathrm{kernel}(T)\) is a linear subspace of \(\mathbb{R}^K\)

Fact

\(\mathrm{kernel}(T) = \{{\bf 0}\}\) if and only if \(T\) is one-to-one

Linearity and Bijections#

Recall that an arbitrary function can be

  • one-to-one (injections)

  • onto (surjections)

  • both (bijections)

  • neither

For linear functions from \(\mathbb{R}^N\) to \(\mathbb{R}^N\), the first three are all equivalent!

In particular,

\[ % \text{onto } \iff \text{ one-to-one } \iff \text{ bijection} % \]

Fact

If \(T\) is a linear function from \(\mathbb{R}^N\) to \(\mathbb{R}^N\) then all of the following are equivalent:

  1. \(T\) is a bijection

  2. \(T\) is onto/surjective

  3. \(T\) is one-to-one/injective

  4. \(\mathrm{kernel}(T) = \{ {\bf 0} \}\)

  5. The set of vectors \(V := \{T{\bf e}_1, \ldots, T{\bf e}_N\}\) is linearly independent

Definition

If any one of the above equivalent conditions is true, then \(T\) is called nonsingular

  • Don’t forget: We are talking about \(\mathbb{R}^N\) to \(\mathbb{R}^N\) here

_images/linbijec.png

Fig. 63 The case of \(N=1\), nonsingular and singular#

Proof that \(T\) onto \(\iff\) \(V := \{T{\bf e}_1, \ldots, T{\bf e}_N\}\) is linearly independent

Recall that for any linear map \(T\) we have \(\mathrm{rng}(T) = \mathrm{span}(V)\)

Using this fact and the definitions,

\[\begin{split} % T \text{ is onto/surjective } \iff \mathrm{rng}(T) = \mathbb{R}^N \\ \iff \mathrm{span}(V) = \mathbb{R}^N \\ \iff V \text{ is linearly indepenent} % \end{split}\]

(We saw that \(N\) vectors span \(\mathbb{R}^N\) iff linearly indepenent)

Exercise: rest of proof

Fact

If \(T \colon \mathbb{R}^N \to \mathbb{R}^N\) is nonsingular then so is \(T^{-1}\).

What is the implication here?

If \(T\) is a bijection then so is \(T^{-1}\)

Hence the only real claim is that \(T^{-1}\) is also linear

Exercise: prove this statement

Maps Across Different Dimensions#

Remember that the above results apply to maps from \(\mathbb{R}^N\) to \(\mathbb{R}^N\)

Things change when we look at linear maps across dimensions

The general rules for linear maps are

  • Maps from lower to higher dimensions cannot be onto

  • Maps from higher to lower dimensions cannot be one-to-one

  • In either case they cannot be bijections

Fact

For a linear map \(T\) from \(\mathbb{R}^K \to \mathbb{R}^N\), the following statements are true:

  1. If \(K < N\) then \(T\) is not onto

  2. If \(K > N\) then \(T\) is not one-to-one

Example

Cost function \(c(k, \ell) = rk + w\ell\) cannot be one-to-one

_images/cost_min_2.png

Matrices and Linear Equations#

As we’ll see, there’s an isomorphic relationship between

  1. matrices

  2. linear maps

Often properties of matrices are best understood via those of linear maps


Typical \(N \times K\) matrix:

\[\begin{split} % {\bf A} = \left( \begin{array}{cccc} a_{11} & a_{12} & \cdots & a_{1K} \\ a_{21} & a_{22} & \cdots & a_{2K} \\ \vdots & \vdots & & \vdots \\ a_{N1} & a_{N2} & \cdots & a_{NK} \end{array} \right) % \end{split}\]

Symbol \(a_{ij}\) stands for element in the

  • \(n\)-th row

  • \(j\)-th column

Matrices correspond to coefficients of a linear equation

\[\begin{split} % \begin{array}{c} a_{11} x_1 + a_{12} x_2 + \cdots + a_{1K} x_K = b_1 \\ a_{21} x_1 + a_{22} x_2 + \cdots + a_{2K} x_K = b_2 \\ \vdots \\ a_{N1} x_1 + a_{N2} x_2 + \cdots + a_{NK} x_K = b_N \end{array} % \end{split}\]
  • Given the \(a_{ij}\) and \(b_i\), what values of \(x_1, \ldots, x_K\) solve this system?

We now investigate this and other related questions

But first some background on matrices…

An \(N \times K\) matrix also called a

  • row vector if \(N = 1\)

  • column vector if \(K = 1\)

Example

\[\begin{split} % {\bf b} = \begin{pmatrix} b_1 \\ \vdots \\ b_N \end{pmatrix} \text{ is }\; N \times 1, \qquad {\bf c} = \begin{pmatrix} c_1 \cdots c_K \end{pmatrix} \text{ is }\; 1 \times K % \end{split}\]

Definition

If \(N = K\), then \({\bf A}\) is called square matrix

We use

  • \(\mathrm{col}_k({\bf A})\) to denote the \(k\)-th column of \({\bf A}\)

  • \(\mathrm{row}_n({\bf A})\) to denote the \(n\)-th row of \({\bf A}\)

Example

\[\begin{split} % \mathrm{col}_1({\bf A}) = \mathrm{col}_1 \left( \begin{array}{ccc} a_{11} & \cdots & a_{1K} \\ a_{21} & \cdots & a_{2K} \\ \vdots & \vdots & \vdots \\ a_{N1} & \cdots & a_{NK} \end{array} \right) = \left( \begin{array}{c} a_{11} \\ a_{21} \\ \vdots \\ a_{N1} \end{array} \right) % \end{split}\]

The zero matrix is

\[\begin{split} % {\bf 0} := \left( \begin{array}{cccc} 0 & 0 & \cdots & 0 \\ 0 & 0 & \cdots & 0 \\ \vdots & \vdots & & \vdots \\ 0 & 0 & \cdots & 0 \\ \end{array} \right) % \end{split}\]

The identity matrix is

\[\begin{split} % {\bf I} := \left( \begin{array}{cccc} 1 & 0 & \cdots & 0 \\ 0 & 1 & \cdots & 0 \\ \vdots & \vdots & & \vdots \\ 0 & 0 & \cdots & 1 \\ \end{array} \right) % \end{split}\]

Algebraic Operations for Matrices#

Addition and scalar multiplication are also defined for matrices

Both are element by element, as in the vector case

Scalar multiplication:

\[\begin{split} % \gamma \left( \begin{array}{cccc} a_{11} & a_{12} & \cdots & a_{1K} \\ a_{21} & a_{22} & \cdots & a_{2K} \\ \vdots & \vdots & & \vdots \\ a_{N1} & a_{N2} & \cdots & a_{NK} \\ \end{array} \right) := \left( \begin{array}{cccc} \gamma a_{11} & \gamma a_{12} & \cdots & \gamma a_{1K} \\ \gamma a_{21} & \gamma a_{22} & \cdots & \gamma a_{2K} \\ \vdots & \vdots & & \vdots \\ \gamma a_{N1} & \gamma a_{N2} & \cdots & \gamma a_{NK} \\ \end{array} \right) % \end{split}\]

Addition:

\[\begin{split} % \left( \begin{array}{ccc} a_{11} & \cdots & a_{1K} \\ a_{21} & \cdots & a_{2K} \\ \vdots & \vdots & \vdots \\ a_{N1} & \cdots & a_{NK} \\ \end{array} \right) + \left( \begin{array}{ccc} b_{11} & \cdots & b_{1K} \\ b_{21} & \cdots & b_{2K} \\ \vdots & \vdots & \vdots \\ b_{N1} & \cdots & b_{NK} \\ \end{array} \right) \\ := \left( \begin{array}{ccc} a_{11} + b_{11} & \cdots & a_{1K} + b_{1K} \\ a_{21} + b_{21} & \cdots & a_{2K} + b_{2K} \\ \vdots & \vdots & \vdots \\ a_{N1} + b_{N1} & \cdots & a_{NK} + b_{NK} \\ \end{array} \right) % \end{split}\]

Note: that matrices must be same dimension

Multiplication of matrices

Product \({\bf A} {\bf B}\): \(i,j\)-th element is inner product of \(i\)-th row of \({\bf A}\) and \(j\)-th column of \({\bf B}\)

\[\begin{split} % \left( \begin{array}{ccc} a_{11} & \cdots & a_{1K} \\ a_{21} & \cdots & a_{2K} \\ \vdots & \vdots & \vdots \\ a_{N1} & \cdots & a_{NK} \\ \end{array} \right) \left( \begin{array}{ccc} b_{11} & \cdots & b_{1J} \\ b_{21} & \cdots & b_{2J} \\ \vdots & \vdots & \vdots \\ b_{K1} & \cdots & b_{KJ} \\ \end{array} \right) = \left( \begin{array}{ccc} c_{11} & \cdots & c_{1J} \\ c_{21} & \cdots & c_{2J} \\ \vdots & \vdots & \vdots \\ c_{N1} & \cdots & c_{NJ} \\ \end{array} \right) % \end{split}\]

In this display,

\[ % c_{11} = \sum_{k=1}^K a_{1k} b_{k1} % \]

Suppose \({\bf A}\) is \(N \times K\) and \({\bf B}\) is \(J \times M\)

  • \({\bf A} {\bf B}\) defined only if \(K = J\)

  • Resulting matrix \({\bf A} {\bf B}\) is \(N \times M\)

The rule to remember

\[ % \text{product of } N \times K \text{ and } K \times M \text{ is } N \times M % \]

Fact

Important: Multiplication is not commutative: it is not in general true that \({\bf A} {\bf B} = {\bf B} {\bf A}\)

  • In fact \({\bf B} {\bf A}\) is not well-defined unless \(N = M\) also holds

Multiplication of matrix and a vector#

\[\begin{split} % {\bf A} {\bf x} = \left( \begin{array}{cccc} a_{11} & a_{12} & \cdots & a_{1K} \\ a_{21} & a_{22} & \cdots & a_{2K} \\ \vdots & \vdots & & \vdots \\ a_{N1} & a_{N2} & \cdots & a_{NK} \end{array} \right) \left( \begin{array}{c} x_1 \\ x_2 \\ \vdots \\ x_K \end{array} \right) \\ = x_1 \left( \begin{array}{c} a_{11} \\ a_{21} \\ \vdots \\ a_{N1} \end{array} \right) + x_2 \left( \begin{array}{c} a_{12} \\ a_{22} \\ \vdots \\ a_{N2} \end{array} \right) + \cdots + x_K \left( \begin{array}{c} a_{1K} \\ a_{2K} \\ \vdots \\ a_{NK} \end{array} \right) \\ = \sum_{k=1}^K x_k \mathrm{col}_k({\bf A}) % \end{split}\]

Rules of matrix multiplication#

Fact

Given scalar \(\alpha\) and conformable \({\bf A}\), \({\bf B}\) and \({\bf C}\), we have

  1. \({\bf A} ({\bf B} {\bf C}) = ({\bf A} {\bf B}) {\bf C}\)

  2. \({\bf A} ({\bf B} + {\bf C}) = {\bf A} {\bf B} + {\bf A} {\bf C}\)

  3. \(({\bf A} + {\bf B}) {\bf C} = {\bf A} {\bf C} + {\bf B} {\bf C}\)

  4. \({\bf A} \alpha {\bf B} = \alpha {\bf A} {\bf B}\)

Here ``conformable’’ means operation makes sense

Definition

The \(k\)-th power of a square matrix \({\bf A}\) is

\[ {\bf A}^k := \underbrace{{\bf A} \cdots {\bf A}}_{k \text{ terms}} \]

Definition

If it exists, the square root of \({\bf A}\) is written \({\bf A}^{1/2}\) defined as the matrix \({\bf B}\) such that \({\bf B}^2\) is \({\bf A}\)

In matrix multiplication, \({\bf I}\) is the multiplicative unit

That is, assuming conformability, we always have

\[ % {\bf A} {\bf I} = {\bf I} {\bf A} = {\bf A} % \]

Exercise: Check it using the definition of matrix multiplication

If \({\bf I}\) is \(K \times K\), then

\[ % \mathrm{col}_k({\bf I}) = {\bf e}_k = \text{ $k$-th canonical basis vector in } \mathbb{R}^K % \]
import numpy as np
A = [[2, 4],
     [4, 2]]
A = np.array(A) # Convert A to array
B = np.identity(2)
print(A,B)
print('Sum:',A+B)
print('Product:',np.dot(A,B))
[[2 4]
 [4 2]] [[1. 0.]
 [0. 1.]]
Sum: [[3. 4.]
 [4. 3.]]
Product: [[2. 4.]
 [4. 2.]]

Matrices as Maps#

Any \(N \times K\) matrix \({\bf A}\) can be thought of as a function \({\bf x} \mapsto {\bf A} {\bf x}\)

  • In \({\bf A} {\bf x}\) the \({\bf x}\) is understood to be a column vector

It turns out that every such map is linear!

To see this fix \(N \times K\) matrix \({\bf A}\) and let \(T\) be defined by

\[ % T \colon \mathbb{R}^K \to \mathbb{R}^N, \qquad T{\bf x} = {\bf A} {\bf x} % \]

Pick any \({\bf x}\), \({\bf y}\) in \(\mathbb{R}^K\), and any scalars \(\alpha\) and \(\beta\)

The rules of matrix arithmetic tell us that

\[ % T(\alpha {\bf x} + \beta {\bf y}) := {\bf A} (\alpha {\bf x} + \beta {\bf y}) = \alpha {\bf A} {\bf x} + \beta {\bf A} {\bf y} =: \alpha T{\bf x} + \beta T{\bf y} % \]

So matrices make linear functions

How about examples of linear functions that don’t involve matrices?

There are none!

Fact

If \(T \colon \mathbb{R}^K \to \mathbb{R}^N\) then

\[ T \text{ is linear } \; \iff \; \exists \; N \times K \text{ matrix } {\bf A} \text{ such that } T{\bf x} = {\bf A} {\bf x}, \;\forall \, {\bf x} \in \mathbb{R}^K \]

Matrix Product as Composition#

Fact

Let

  • \({\bf A}\) be \(N \times K\) and \({\bf B}\) be \(K \times M\)

  • \(T \colon \mathbb{R}^K \to \mathbb{R}^N\) be the linear map \(T{\bf x} = {\bf A}{\bf x}\)

  • \(U \colon \mathbb{R}^M \to \mathbb{R}^K\) be the linear map \(U{\bf x} = {\bf B}{\bf x}\)

The matrix product \({\bf A} {\bf B}\) corresponds exactly to the composition of \(T\) and \(U\)

This helps us understand a few things

For example, let

  • \({\bf A}\) be \(N \times K\) and \({\bf B}\) be \(J \times M\)

  • \(T \colon \mathbb{R}^K \to \mathbb{R}^N\) be the linear map \(T{\bf x} = {\bf A}{\bf x}\)

  • \(U \colon \mathbb{R}^M \to \mathbb{R}^J\) be the linear map \(U{\bf x} = {\bf B}{\bf x}\)

Then \({\bf A} {\bf B}\) is only defined when \(K = J\)

This is because \({\bf A} {\bf B}\) corresponds to \(T \circ U\)

But for \(T \circ U\) to be well defined we need \(K = J\)

Then \(U\) maps \(\mathbb{R}^M\) to \(\mathbb{R}^K\) and \(T\) maps \(\mathbb{R}^K\) to \(\mathbb{R}^N\)

Column Space#

Definition

Let \({\bf A}\) be an \(N \times K\) matrix.
The column space of \({\bf A}\) is defined as the span of its columns

\[\begin{split} % \mathrm{span}({\bf A}) = \mathrm{span} \{ \mathrm{col}_1 ({\bf A}), \ldots, \mathrm{col}_K({\bf A}) \} \\ = \text{all vectors of the form } \sum_{k=1}^K x_k \mathrm{col}_k({\bf A}) % \end{split}\]

Equivalently,

\[ % \mathrm{span}({\bf A}) := \{ {\bf A}{\bf x } \colon {\bf x} \in \mathbb{R}^K \} % \]

This is exactly the range of the associated linear map

\(T \colon \mathbb{R}^K \to \mathbb{R}^N\) defined by \(T {\bf x} = {\bf A} {\bf x}\)

Example

If

\[\begin{split} % {\bf A} = \begin{pmatrix} 1 & -5 \\ 2 & 3 \end{pmatrix} % \end{split}\]

then the span is all linear combinations

\[\begin{split} % x_1 \left( \begin{array}{c} 1 \\ 2 \end{array} \right) + x_2 \left( \begin{array}{c} -5 \\ 3 \end{array} \right) \quad \text{where} \quad (x_1, x_2) \in \mathbb{R}^2 % \end{split}\]

These columns are linearly independent (shown earlier)

Hence the column space is all of \(\mathbb{R}^2\) (why?)

Exercise: Show that the column space of any \(N \times K\) matrix is a linear subspace of \(\mathbb{R}^N\)

Rank#

Equivalent questions

  • How large is the range of the linear map \(T {\bf x} = {\bf A} {\bf x}\)?

  • How large is the column space of \({\bf A}\)?

The obvious measure of size for a linear subspace is its dimension

Definition

The dimension of \(\mathrm{span}({\bf A})\) is known as the rank of \({\bf A}\)

\[ % \mathrm{rank}({\bf A}) := \mathrm{dim}(\mathrm{span}({\bf A})) % \]

Because \(\mathrm{span}({\bf A})\) is the span of \(K\) vectors, we have

\[ % \mathrm{rank}({\bf A}) = \mathrm{dim}(\mathrm{span}({\bf A})) \leq K % \]

Definition

\({\bf A}\) is said to have full column rank if

\[ % \mathrm{rank}({\bf A}) = \text{ number of columns of } {\bf A} % \]

Fact

For any matrix \({\bf A}\), the following statements are equivalent:

  1. \({\bf A}\) is of full column rank

  2. The columns of \({\bf A}\) are linearly independent

  3. If \({\bf A} {\bf x} = {\bf 0}\), then \({\bf x} = {\bf 0}\)

Exercise: Check this, recalling that

\[ % \dim(\mathrm{span}\{{\bf a}_1, \ldots, {\bf a}_K\}) = K \, \iff \, \{{\bf a}_1, \ldots, {\bf a}_K\} \text{ linearly indepenent} % \]
import numpy as np
A = [[2.0, 1.0],
     [6.5, 3.0]]
print(f'Rank = {np.linalg.matrix_rank(A)}')
A = [[2.0, 1.0],
     [6.0, 3.0]]
print(f'Rank = {np.linalg.matrix_rank(A)}')
Rank = 2
Rank = 1

\(N \times N\) Linear Equations#

Let’s look at solving linear equations such as \({\bf A} {\bf x} = {\bf b}\)

We start with the “best” case: number of equations \(=\) number of unknowns

Thus,

  • Take \(N \times N\) matrix \({\bf A}\) and \(N \times 1\) vector \({\bf b}\) as given

  • Search for an \(N \times 1\) solution \({\bf x}\)

But does such a solution exist? If so is it unique?

The best way to think about this is to consider the corresponding linear map

\[ % T \colon \mathbb{R}^N \to \mathbb{R}^N, \qquad T{\bf x} = {\bf A} {\bf x} % \]
_images/linbijec.png

Equivalent:

  1. \({\bf A} {\bf x} = {\bf b}\) has a unique solution \({\bf x}\) for any given \({\bf b}\)

  2. \(T {\bf x} = {\bf b}\) has a unique solution \({\bf x}\) for any given \({\bf b}\)

  3. \(T\) is a bijection

We already have conditions for linear maps to be bijections

Just need to translate these into the matrix setting

Recall that \(T\) called nonsingular if \(T\) is a linear bijection

Definition

We say that \({\bf A}\) is nonsingular if \(T: {\bf x} \to {\bf A}{\bf x}\) is nonsingular

Equivalent:

  • \({\bf x} \mapsto {\bf A} {\bf x}\) is a bijection from \(\mathbb{R}^N\) to \(\mathbb{R}^N\)

We now list equivalent conditions for nonsingularity

Fact

Let \({\bf A}\) be an \(N \times N\) matrix
All of the following conditions are equivalent

  1. \({\bf A}\) is nonsingular

  2. The columns of \({\bf A}\) are linearly independent

  3. \(\mathrm{rank}({\bf A}) = N\)

  4. \(\mathrm{span}({\bf A}) = \mathbb{R}^N\)

  5. If \({\bf A} {\bf x} = {\bf A} {\bf y}\), then \({\bf x} = {\bf y}\)

  6. If \({\bf A} {\bf x} = {\bf 0}\), then \({\bf x} = {\bf 0}\)

  7. For each \({\bf b} \in \mathbb{R}^N\), the equation \({\bf A} {\bf x} = {\bf b}\) has a solution

  8. For each \({\bf b} \in \mathbb{R}^N\), the equation \({\bf A} {\bf x} = {\bf b}\) has a unique solution

All equivalent ways of saying that \(T{\bf x} = {\bf A} {\bf x}\) is a bijection!

Example

For condition 5 the equivalence is:
\({\bf A} {\bf x} = {\bf A} {\bf y}\), then \({\bf x} = {\bf y}\) \(\iff\) \(T {\bf x} = T {\bf y}\), then \({\bf x} = {\bf y}\) \(\iff\) \(T\) is one-to-one \(\iff\) Since \(T\) is a linear map from \(\mathbb{R}^N\) to \(\mathbb{R}^N\), \(T\) is a bijection

Example

For condition 6 the equivalence is:
if \({\bf A} {\bf x} = {\bf 0}\), then \({\bf x} = {\bf 0}\) \(\iff\) \(\{{\bf x}: {\bf A}{\bf x} = {\bf 0}\} = \{{\bf 0}\}\) \(\iff\) \(\{{\bf x}: T{\bf x} = {\bf 0}\} = \{{\bf 0}\}\) \(\iff\) \(\ker{T}=\{{\bf 0}\}\) \(\iff\) Since \(T\) is a linear map from \(\mathbb{R}^N\) to \(\mathbb{R}^N\), \(T\) is a bijection

Example

For condition 7 the equivalence is:
for each \({\bf b}\in\mathbb{R}^N\), the equation \({\bf A}{\bf x} = {\bf b}\) has a solution \(\iff\) every \({\bf b}\in\mathbb{R}^N\) has an \({\bf x}\) such that \({\bf A}{\bf x} = {\bf b}\) \(\iff\) every \({\bf b}\in\mathbb{R}^N\) has an \({\bf x}\) such that \(T{\bf x} = {\bf b}\) \(\iff\) \(T is onto/surjection\) \(\iff\) Since \(T\) is a linear map from \(\mathbb{R}^N\) to \(\mathbb{R}^N\), \(T\) is a bijection

Example

Now consider condition 2: \

The columns of \({\bf A}\) are linearly independent.

Let \({\bf e}_j\) be the \(j\)-th canonical basis vector in \(\mathbb{R}^N\).

Observe that \({\bf A}{\bf e_j} = \mathrm{col}_j({\bf A})\) \(\implies\) \(T{\bf e_j} = \mathrm{col}_j({\bf A})\) \(\implies\) \(V := \{T {\bf e}_1, \ldots, T{\bf e}_N\} =\) columns of \({\bf A}\), and \(V\) is linearly independent if and only if \(T\) is a bijection

Example

Consider a one good linear market system

\[\begin{split} % q = a - b p \qquad (\text{demand}) \\ q = c + d p \qquad (\text{supply}) % \end{split}\]

Treating \(q\) and \(p\) as the unknowns, let’s write in matrix form as

\[\begin{split} % \begin{pmatrix} 1 & b \\ 1 & -d \end{pmatrix} \begin{pmatrix} q\\ p \end{pmatrix} = \begin{pmatrix} a \\ c \end{pmatrix} % \end{split}\]

A unique solution exists whenever the columns are linearly independent

  • so \((b, -d)\) is not a scalar multiple of \({\bf 1}\) \(\iff\) \(b \ne -d\)

_images/not_multiple_of_one.png

Fig. 64 \((b, -d)\) is not a scalar multiple of \({\bf 1}\)#

Example

Recall in the introduction we try to solve the system \({\bf A} {\bf x} = {\bf b}\) of this form

\[\begin{split} A = \left[\begin{array}{cccc} 1 & 2 & 4 \\ 1 & 4 & 8 \\ 0 & 3 & 6 \\ \end{array} \right] \end{split}\]

The problem is that \({\bf A}\) is singular (not nonsingular)

  • In particular, \(\mathrm{col}_3({\bf A}) = 2 \mathrm{col}_2({\bf A})\)

Inverse Matrices#

Definition

Given square matrix \({\bf A}\), suppose \(\exists\) square matrix \({\bf B}\) such that

\[{\bf A} {\bf B} = {\bf B} {\bf A} = {\bf I}\]

Then

  • \({\bf B}\) is called the inverse of \({\bf A}\), and written \({\bf A}^{-1}\)

  • \({\bf A}\) is called invertible

Fact

A square matrix \({\bf A}\) is nonsingular if and only if it is invertible

Remark: \({\bf A}^{-1}\) is just the matrix corresponding to the linear map \(T^{-1}\)

Fact

Given nonsingular \(N \times N\) matrix \({\bf A}\) and \({\bf b} \in \mathbb{R}^N\), the unique solution to \({\bf A} {\bf x} = {\bf b}\) is given by

\[ {\bf x}_b := {\bf A}^{-1} {\bf b} \]

Example

Recall the one good linear market system

\[\begin{split} % \begin{array}{c} q = a - b p \\ q = c + d p \end{array} \quad \iff \quad \begin{pmatrix} 1 & b \\ 1 & -d \end{pmatrix} \begin{pmatrix} q\\ p \end{pmatrix} = \begin{pmatrix} a \\ c \end{pmatrix} % \end{split}\]

Suppose that \(a=5\), \(b=2\), \(c=1\), \(d=1.5\)

The matrix system is \({\bf A} {\bf x} = {\bf b}\) where

\[\begin{split} % {\bf A} := \begin{pmatrix} 1 & 2 \\ 1 & -1.5 \end{pmatrix}, \; {\bf x} := \begin{pmatrix} q\\ p \end{pmatrix}, \; {\bf b} := \begin{pmatrix} 5 \\ 1 \end{pmatrix} % \end{split}\]

Since \(b \ne -d\) we can solve for the unique solution

Easy by hand but let’s try on the computer

import numpy as np
from scipy.linalg import inv
A = [[1, 2],
     [1, -1.5]]
b = [5, 1]
q, p = np.dot(inv(A), b) # A^{-1} b
print(f'q={q:.4f} p={p:.4f}')
q=2.7143 p=1.1429
_images/simple_mkt.png

Fig. 65 Equilibrium \((p^*, q^*)\) in the one good case#

Fact

In the \(2 \times 2\) case, the inverse has the form

\[\begin{split} % \left( \begin{array}{cc} a & b \\ c & d \\ \end{array} \right)^{-1} = \frac{1}{ad - bc} \left( \begin{array}{cc} d & -b \\ -c & a \\ \end{array} \right) % \end{split}\]

Example

\[\begin{split} % {\bf A} = \left( \begin{array}{cc} 1 & 2 \\ 1 & -1.5 \\ \end{array} \right) \quad \implies \quad {\bf A}^{-1} = \frac{1}{-3.5} \left( \begin{array}{cc} -1.5 & -2 \\ -1 & 1 \\ \end{array} \right) % \end{split}\]

Example

Consider the \(N\) good linear demand system

(4)#\[ % q_n = \sum_{k=1}^N a_{nk} p_k + b_n, \quad n = 1, \ldots N % \]

Task: take quantities \(q_1, \ldots, q_N\) as given and find corresponding prices \(p_1, \ldots, p_N\) — the “inverse demand curves”

We can write (4) as

\[ % {\bf q} = {\bf A} {\bf p} + {\bf b} % \]

where vectors are \(N\)-vectors and \({\bf A}\) is \(N \times N\)

If the columns of \({\bf A}\) are linearly independent then a unique solution exists for each fixed \({\bf q}\) and \({\bf b}\), and is given by

\[ % {\bf p} = {\bf A}^{-1} ({\bf q} - {\bf b}) % \]

Left and Right Inverses#

Definition

Given square matrix \({\bf A}\), a matrix \({\bf B}\) is called

  • a left inverse of \({\bf A}\) if \({\bf B} {\bf A} = {\bf I}\)

  • a right inverse of \({\bf A}\) if \({\bf A} {\bf B} = {\bf I}\)

  • an inverse of \({\bf A}\) if \({\bf A} {\bf B} = {\bf B} {\bf A} = {\bf I}\)

By definition, a matrix that is both an left inverse and a right inverse is an inverse

Fact

If square matrix \({\bf B}\) is either a left or right inverse for \({\bf A}\), then \({\bf A}\) is nonsingular and \({\bf A}^{-1} = {\bf B}\)

In other words, for square matrices, left inverse \(\iff\) right inverse \(\iff\) inverse

Rules for Inverses#

Fact

If \({\bf A}\) is nonsingular and \(\alpha \ne 0\), then

  1. \({\bf A}^{-1}\) is nonsingular and \(({\bf A}^{-1})^{-1} = {\bf A}\)

  2. \(\alpha {\bf A}\) is nonsingular and \((\alpha {\bf A})^{-1} = \alpha^{-1} {\bf A}^{-1}\)

Fact

If \({\bf A}\) and \({\bf B}\) are \(N \times N\) and nonsingular then

  1. \({\bf A} {\bf B}\) is also nonsingular

  2. \(({\bf A} {\bf B})^{-1} = {\bf B}^{-1} {\bf A}^{-1}\)

When the Conditions Fail (Singular Case)#

Suppose as before we have

  • an \(N \times N\) matrix \({\bf A}\)

  • an \(N \times 1\) vector \({\bf b}\)

We seek a solution \({\bf x}\) to the equation \({\bf A} {\bf x} = {\bf b}\)

What if \({\bf A}\) is singular?

Then \(T{\bf x} = {\bf A} {\bf x}\) is not a bijection, and in fact

  • \(T\) cannot be onto (otherwise it’s a bijection)

  • \(T\) cannot be one-to-one (otherwise it’s a bijection)

Hence neither existence nor uniqueness is guaranteed

Example

The matrix \({\bf A}\) with columns

\[\begin{split} % {\bf a}_1 := \begin{pmatrix} 3 \\ 4 \\ 2 \end{pmatrix}, \quad {\bf a}_2 := \begin{pmatrix} 3 \\ -4 \\ 1 \end{pmatrix} \quad \text{and} \quad {\bf a}_3 := \begin{pmatrix} -3 \\ 4 \\ -1 \end{pmatrix} % \end{split}\]

is singular (\({\bf a}_3 = - {\bf a}_2\))

Its column space \(\mathrm{span}({\bf A})\) is just a plane in \(\mathbb{R}^2\)

Recall \({\bf b} \in \mathrm{span}({\bf A})\)

  • \(\iff\) \(\exists \, x_1, \ldots, x_N\) such that \(\sum_{k=1}^N x_k \mathrm{col}_k({\bf A}) = {\bf b}\)

  • \(\iff\) \(\exists \, {\bf x}\) such that \({\bf A} {\bf x} = {\bf b}\)

Thus if \({\bf b}\) is not in this plane then \({\bf A} {\bf x} = {\bf b}\) has no solution

_images/not_in_span.png

Fig. 66 The vector \({\bf b}\) is not in \(\mathrm{span}({\bf A})\)#

When \({\bf A}\) is \(N \times N\) and singular how rare is scenario \({\bf b} \in \mathrm{span}({\bf A})\)?

Answer: In a sense, very rare

We know that \(\dim(\mathrm{span}({\bf A})) < N\)

Such sets are always “very small” subset of \(\mathbb{R}^N\) in terms of “volume”

  • A \(K < N\) dimensional subspace has “measure zero” in \(\mathbb{R}^N\)

  • A “randomly chosen” \({\bf b}\) has zero probability of being in such a set

Example

Consider the case where \(N = 3\) and \(K=2\)

A two-dimensional linear subspace is a 2D plane in \(\mathbb{R}^3\)

This set has no volume because planes have no ``thickness’’

All this means that if \({\bf A}\) is singular then existence of a solution to \({\bf A} {\bf x} = {\bf b}\) typically fails

In fact the problem is worse — uniqueness fails as well

Fact

If \({\bf A}\) is a singular matrix and \({\bf A} {\bf x} = {\bf b}\) has a solution then it has an infinity (in fact a continuum) of solutions

Determinants#

Let \(S(N)\) be set of all bijections from \(\{1, \ldots, N\}\) to itself

For \(\pi \in S(N)\) we define the signature of \(\pi\) as

\[ % \mathrm{sgn}(\pi) := \prod_{m < n} \frac{\pi(m) - \pi(n)}{m - n} % \]

The determinant of \(N \times N\) matrix \({\bf A}\) is then given as

\[ % \det({\bf A}) := \sum_{\pi \in S(N)} \mathrm{sgn}(\pi) \prod_{n=1}^N a_{\pi(n) n} % \]
  • You don’t need to understand or remember this for our course

Fact

In the \(N = 2\) case this definition reduces to

\[\begin{split} % \det \left( \begin{array}{cc} a & b \\ c & d \\ \end{array} \right) = ad - bc % \end{split}\]
  • Remark: But you do need to remember this \(2 \times 2\) case

Example

\[\begin{split} % \det \left( \begin{array}{cc} 2 & 0 \\ 7 & -1 \\ \end{array} \right) = (2 \times -1) - (7 \times 0) = -2 % \end{split}\]

Important facts concerning the determinant

Fact

If \({\bf I}\) is the \(N \times N\) identity, \({\bf A}\) and \({\bf B}\) are \(N \times N\) matrices and \(\alpha \in \mathbb{R}\), then

  1. \(\det({\bf I}) = 1\)

  2. \({\bf A}\) is nonsingular if and only if \(\det({\bf A}) \ne 0\)

  3. \(\det({\bf A}{\bf B}) = \det({\bf A}) \det({\bf B})\)

  4. \(\det(\alpha {\bf A}) = \alpha^N \det({\bf A})\)

  5. \(\det({\bf A}^{-1}) = (\det({\bf A}))^{-1}\)

Example

Thus singularity in the \(2 \times 2\) case is equivalent to

\[\begin{split} % \det({\bf A}) = \det \left( \begin{array}{cc} a_{11} & a_{12} \\ a_{21} & a_{22} \\ \end{array} \right) = a_{11}a_{22} - a_{12} a_{21} = 0 % \end{split}\]

Exercise: Let \({\bf a}_i := \mathrm{col}_i({\bf A})\) and assume that \(a_{ij} \ne 0\) for each \(i, j\)

Show the following are equivalent:

  1. \(a_{11}a_{22} = a_{12} a_{21}\)

  2. \({\bf a}_1 = \lambda {\bf a}_2\) for some \(\lambda \in \mathbb{R}\)

import numpy as np
A = np.random.randn(2, 2) # Random matrix
print(f'A={A}')
print(f'det(A)={np.linalg.det(A)}')
print(f'det(inv(A))={np.linalg.det(np.linalg.inv(A))}')
A=[[ 1.54293968 -1.05347778]
 [ 0.0942124  -0.86642873]]
det(A)=-1.237596590154744
det(inv(A))=-0.808017740154701

As an exercise, let’s now show that any right inverse is an inverse

Fix square \({\bf A}\) and suppose \({\bf B}\) is a right inverse:

(5)#\[ % {\bf A} {\bf B} = {\bf I} % \]

Applying the determinant to both sides gives \(\det({\bf A}) \det({\bf B}) = 1\)

Hence \({\bf B}\) is nonsingular (why?) and we can

  1. multiply (5) by \({\bf B}\) to get \({\bf B} {\bf A} {\bf B} = {\bf B}\)

  2. then postmultiply by \({\bf B}^{-1}\) to get \({\bf B} {\bf A} = {\bf I}\)

We see that \({\bf B}\) is also left inverse, and therefore an inverse of \({\bf A}\)

Exercise: Do the left inverse case

\(M \times N\) Linear Systems of Equations#

So far we have considered the nice \(N \times N\) case for equations

  • number of equations \(=\) number of unknowns

We have to deal with other cases too

Underdetermined systems:

  • eqs \(<\) unknowns

Overdetermined systems:

  • eqs \(>\) unknowns

Overdetermined Systems#

Consider the system \({\bf A} {\bf x} = {\bf b}\) where \({\bf A}\) is \(N \times K\) and \(K < N\)

  • The elements of \({\bf x}\) are the unknowns

  • More equations than unknowns (\(N > K\))

May not be able to find an \({\bf x}\) that satisfies all \(N\) equations

Let’s look at this in more detail…

Fix \(N \times K\) matrix \({\bf A}\) with \(K < N\)

Let \(T \colon \mathbb{R}^K \to \mathbb{R}^N\) be defined by \(T {\bf x} = {\bf A} {\bf x}\)

We know these to be equivalent:

  1. there exists an \({\bf x} \in \mathbb{R}^K\) with \({\bf A} {\bf x} = {\bf b}\)

  • \({\bf b}\) has a preimage under \(T\)

  • \({\bf b}\) is in \(\mathrm{rng}(T)\)

  1. \({\bf b}\) is in \(\mathrm{span}({\bf A})\)

We also know \(T\) cannot be onto (maps small to big space)

Hence \({\bf b} \in \mathrm{span}({\bf A})\) will not always hold

Given our assumption that \(K < N\), how rare is the scenario \({\bf b} \in \mathrm{span}({\bf A})\)?

Answer: We talked about this before — it’s very rare

We know that \(\dim(\mathrm{rng}(T)) = \dim(\mathrm{span}({\bf A})) \leq K < N\)

A \(K < N\) dimensional subspace has “measure zero” in \(\mathbb{R}^N\)

So should we give up on solving \({\bf A} {\bf x} = {\bf b}\) in the overdetermined case?

What’s typically done is we try to find a best approximation

To define ``best’’ we need a way of ranking approximations

The standard way is in terms of Euclidean norm

In particular, we search for the \({\bf x}\) that solves

\[ \min_{{\bf x} \in \mathbb{R}^K} \| {\bf A} {\bf x} - {\bf b}\|\]

Details later

Underdetermined Systems#

Now consider \({\bf A} {\bf x} = {\bf b}\) when \({\bf A}\) is \(N \times K\) and \(K > N\)

Let \(T \colon \mathbb{R}^K \to \mathbb{R}^N\) be defined by \(T {\bf x} = {\bf A} {\bf x}\)

Now \(T\) maps from a larger to a smaller place

This tells us that \(T\) is not one-to-one

Hence solutions are not in general unique

In fact the following is true

Exercise: Show that \({\bf A} {\bf x} = {\bf b}\) has a solution and \(K > N\), then the same equation has an infinity of solutions

Remark: Working with underdetermined systems is relatively rare in economics / elsewhere

Transpose#

The transpose of \({\bf A}\) is the matrix \({\bf A}'\) defined by

\[\mathrm{col}_n({\bf A}') = \mathrm{row}_n({\bf A})\]

Example

If

(6)#\[\begin{split} % {\bf A} := \left( \begin{array}{cc} 10 & 40 \\ 20 & 50 \\ 30 & 60 \end{array} \right) \quad \text{then} \quad {\bf A}' = \left( \begin{array}{ccc} 10 & 20 & 30 \\ 40 & 50 & 60 \end{array} \right) % \end{split}\]

If

\[\begin{split} % {\bf B} := \left( \begin{array}{ccc} 1 & 3 & 5 \\ 2 & 4 & 6 \\ \end{array} \right) \quad \text{then} \quad {\bf B}' := \left( \begin{array}{cc} 1 & 2 \\ 3 & 4 \\ 5 & 6 \end{array} \right) % \end{split}\]

Fact

For conformable matrices \({\bf A}\) and \({\bf B}\), transposition satisfies

  1. \(({\bf A}')' = {\bf A}\)

  2. \(({\bf A} {\bf B})' = {\bf B}' {\bf A}'\)

  3. \(({\bf A} + {\bf B})' = {\bf A}' + {\bf B}'\)

  4. \((c {\bf A})' = c {\bf A}'\) for any constant \(c\)

For each square matrix \({\bf A}\),

  1. \(\det({\bf A}') = \det({\bf A})\)

  2. If \({\bf A}\) is nonsingular then so is \({\bf A}'\), and \(({\bf A}')^{-1}= ({\bf A}^{-1})'\)

Definition

A square matrix \({\bf A}\) is called symmetric if \({\bf A}' = {\bf A}\)

Equivalent: \(a_{nk} = a_{kn}\) for all \(n, k\)

Example

\[\begin{split} % {\bf A} := \left( \begin{array}{cc} 10 & 20 \\ 20 & 50 \end{array} \right), \qquad {\bf B} := \left( \begin{array}{ccc} 1 & 2 & 3 \\ 2 & 0 & 0 \\ 3 & 0 & 2 \end{array} \right) % \end{split}\]

Exercise: For any matrix \({\bf A}\), show that \({\bf A}' {\bf A}\) and \({\bf A} {\bf A}'\) are always

  1. well-defined (multiplication makes sense)

  2. symmetric

Definition

The trace of a square matrix is defined by

\[\begin{split} % \mathrm{trace} \left( \begin{array}{ccc} a_{11} & \cdots & a_{1N} \\ \vdots & & \vdots \\ a_{N1} & \cdots & a_{NN} \\ \end{array} \right) = \sum_{n=1}^N a_{nn} % \end{split}\]

Fact

\(\mathrm{trace}({\bf A}) = \mathrm{trace}({\bf A}')\)

Fact

If \({\bf A}\) and \({\bf B}\) are square matrices and \(\alpha, \beta \in \mathbb{R}\), then

\[ % \mathrm{trace}(\alpha {\bf A} + \beta {\bf B}) = \alpha \mathrm{trace}({\bf A}) + \beta \mathrm{trace}({\bf B}) % \]

Fact

When conformable, \(\mathrm{trace}({\bf A} {\bf B}) = \mathrm{trace}({\bf B} {\bf A})\)

Definition

A square matrix \({\bf A}\) is called idempotent if \({\bf A} {\bf A} = {\bf A}\)

Example

\[\begin{split} % {\bf A} := \left( \begin{array}{cc} 1 & 1 \\ 0 & 0 \end{array} \right), \qquad {\bf I} := \left( \begin{array}{ccc} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{array} \right) % \end{split}\]

The next result is often used in statistics / econometrics:

Fact

If \({\bf A}\) is idempotent, then \(\mathrm{rank}({\bf A}) = \mathrm{trace}({\bf A})\)

Diagonal Matrices#

Consider a square \(N \times N\) matrix \({\bf A}\)

Definition

The \(N\) elements of the form \(a_{nn}\) are called the principal diagonal

\[\begin{split} % \left( \begin{array}{cccc} a_{11} & a_{12} & \cdots & a_{1N} \\ a_{21} & a_{22} & \cdots & a_{2N} \\ \vdots & \vdots & & \vdots \\ a_{N1} & a_{N2} & \cdots & a_{NN} \\ \end{array} \right) % \end{split}\]

A square matrix \({\bf D}\) is called diagonal if all entries off the principal diagonal are zero

\[\begin{split} % {\bf D} = \left( \begin{array}{cccc} d_1 & 0 & \cdots & 0 \\ 0 & d_2 & \cdots & 0 \\ \vdots & \vdots & & \vdots \\ 0 & 0 & \cdots & d_N \\ \end{array} \right) % \end{split}\]

Often written as

\[ % {\bf D} = \mathrm{diag}(d_1, \ldots, d_N) % \]

Incidentally, the same notation works in Python

import numpy as np
D = np.diag((2, 4, 6, 8, 10))
print(D)
[[ 2  0  0  0  0]
 [ 0  4  0  0  0]
 [ 0  0  6  0  0]
 [ 0  0  0  8  0]
 [ 0  0  0  0 10]]

Diagonal systems are very easy to solve

Example

\[\begin{split} % \begin{pmatrix} d_1 & 0 & 0 \\ 0 & d_2 & 0 \\ 0 & 0 & d_3 \end{pmatrix} \begin{pmatrix} x_1 \\ x_2 \\ x_3 \end{pmatrix} = \begin{pmatrix} b_1 \\ b_2 \\ b_3 \end{pmatrix} % \end{split}\]

is equivalent to

\[\begin{split} % \begin{array}{c} d_1 x_1 = b_1 \\ d_2x_2 = b_2 \\ d_3 x_3 = b_3 \end{array} % \end{split}\]

Fact

If \({\bf C} = \mathrm{diag}(c_1, \ldots, c_N)\) and \({\bf D} = \mathrm{diag}(d_1, \ldots,d_N)\) then

  1. \({\bf C} + {\bf D} = \mathrm{diag}(c_1 + d_1, \ldots, c_N + d_N)\)

  2. \({\bf C} {\bf D} = \mathrm{diag}(c_1 d_1, \ldots, c_N d_N)\)

  3. \({\bf D}^k = \mathrm{diag}(d^k_1, \ldots, d^k_N)\) for any \(k \in \mathbb{N}\)

  4. \(d_n \geq 0\) for all \(n\) \(\implies\) \({\bf D}^{1/2}\) exists and equals

\[\mathrm{diag}(\sqrt{d_1}, \ldots, \sqrt{d_N})\]
  1. \(d_n \ne 0\) for all \(n\) \(\implies\) \({\bf D}\) is nonsingular and

\[{\bf D}^{-1} = \mathrm{diag}(d_1^{-1}, \ldots, d_N^{-1})\]

Proofs: Check 1 and 2 directly, other parts follow

Definition

A square matrix is called lower triangular if every element strictly above the principle diagonal is zero

Example

(7)#\[\begin{split} % {\bf L} := \left( \begin{array}{ccc} 1 & 0 & 0 \\ 2 & 5 & 0 \\ 3 & 6 & 1 \end{array} \right) % \end{split}\]

Definition

A square matrix is called upper triangular if every element strictly below the principle diagonal is zero

Example

\[\begin{split} % {\bf U} := \left( \begin{array}{ccc} 1 & 2 & 3 \\ 0 & 5 & 6 \\ 0 & 0 & 1 \end{array} \right) % \end{split}\]

Definition

A matrix is called triangular if either upper or lower triangular

Associated linear equations also simple to solve

Example

\[\begin{split} % \left( \begin{array}{ccc} 4 & 0 & 0 \\ 2 & 5 & 0 \\ 3 & 6 & 1 \end{array} \right) \left( \begin{array}{ccc} x_1 \\ x_2 \\ x_3 \end{array} \right) = \left( \begin{array}{c} b_1 \\ b_2 \\ b_3 \\ \end{array} \right) % \end{split}\]

becomes

\[\begin{split} % \begin{array}{c} 4x_1 = b_1 \\ 2x_1 + 5x_2 = b_2 \\ 3x_1 + 6x_2 + x_3 = b_3 \end{array} % \end{split}\]

Top equation involves only \(x_1\), so can solve for it directly

Plug that value into second equation, solve out for \(x_2\), etc.

Eigenvalues and Eigenvectors#

Let \({\bf A}\) be \(N \times N\)

In general \({\bf A}\) maps \({\bf x}\) to some arbitrary new location \({\bf A} {\bf x}\)

But sometimes \({\bf x}\) will only be scaled:

(8)#\[ % {\bf A} {\bf x} = \lambda {\bf x} \quad \text{for some scalar $\lambda$} % \]

Definition

If (8) holds and \({\bf x}\) is nonzero, then

  1. \({\bf x}\) is called an eigenvector of \({\bf A}\) and \(\lambda\) is called an eigenvalue

  2. \(({\bf x}, \lambda)\) is called an eigenpair

Clearly \(({\bf x}, \lambda)\) is an eigenpair of \({\bf A}\) \(\implies\) \((\alpha {\bf x}, \lambda)\) is an eigenpair of \({\bf A}\) for any nonzero \(\alpha\)

Example

Let

\[\begin{split} % {\bf A} := \begin{pmatrix} 1 & -1 \\ 3 & 5 \end{pmatrix} % \end{split}\]

Then

\[\begin{split} % \lambda = 2 \quad \text{ and } \quad {\bf x} = \begin{pmatrix} 1 \\ -1 \end{pmatrix} % \end{split}\]

form an eigenpair because \({\bf x} \ne {\bf 0}\) and

\[\begin{split} % {\bf A} {\bf x} = \begin{pmatrix} 1 & -1 \\ 3 & 5 \end{pmatrix} \begin{pmatrix} 1 \\ -1 \end{pmatrix} = \begin{pmatrix} 2 \\ -2 \end{pmatrix} = 2 \begin{pmatrix} 1 \\ -1 \end{pmatrix} = \lambda {\bf x} % \end{split}\]
import numpy as np
A = [[1, 2],
     [2, 1]]
eigvals, eigvecs = np.linalg.eig(A)
for i in range(eigvals.size):
  x = eigvecs[:,i]
  lm = eigvals[i]
  print(f'Eigenpair {i}:\n{lm:.5f} --> {x}')
  print(f'Check Ax=lm*x: {np.dot(A, x)} = {lm * x}')
Eigenpair 0:
3.00000 --> [0.70710678 0.70710678]
Check Ax=lm*x: [2.12132034 2.12132034] = [2.12132034 2.12132034]
Eigenpair 1:
-1.00000 --> [-0.70710678  0.70710678]
Check Ax=lm*x: [ 0.70710678 -0.70710678] = [ 0.70710678 -0.70710678]
_images/eigenvecs.png

Fig. 67 The eigenvectors of \({\bf A}\)#

Consider the matrix

\[\begin{split} % {\bf R} := \left( \begin{array}{cc} 0 & -1 \\ 1 & 0 \\ \end{array} \right) % \end{split}\]

Induces counter-clockwise rotation on any point by \(90^{\circ}\)

Hint

The rows of the matrix show where the classic basis vectors are translated to.

Hence no point \({\bf x}\) is scaled

Hence there exists no pair \(\lambda \in \mathbb{R}\) and \({\bf x} \ne {\bf 0}\) such that

\[{\bf R} {\bf x} = \lambda {\bf x}\]

In other words, no real-valued eigenpairs exist

_images/rotation_1.png

Fig. 68 The matrix \({\bf R}\) rotates points by \(90^{\circ}\)#

_images/rotation_2.png

Fig. 69 The matrix \({\bf R}\) rotates points by \(90^{\circ}\)#

But \({\bf R} {\bf x} = \lambda {\bf x}\) can hold if we allow complex values

Example

\[\begin{split} % \left( \begin{array}{cc} 0 & -1 \\ 1 & 0 \\ \end{array} \right) \begin{pmatrix} 1 \\ -i \end{pmatrix} = \begin{pmatrix} i \\ 1 \end{pmatrix} = i \begin{pmatrix} 1 \\ -i \end{pmatrix} % \end{split}\]

That is,

\[\begin{split} % {\bf R} {\bf x} = \lambda {\bf x} \quad \text{for} \quad \lambda := i \quad \text{and} \quad {\bf x} := \begin{pmatrix} 1 \\ -i \end{pmatrix} % \end{split}\]

Hence \(({\bf x}, \lambda)\) is an eigenpair provided we admit complex values

Note

We do, since this is standard, but will not go into details in this course

Fact

For any square matrix \({\bf A}\)

\[ % \lambda \text{ is an eigenvalue of } {\bf A} \; \iff \; \det({\bf A} - \lambda {\bf I}) = 0 % \]

Example

In the \(2 \times 2\) case,

\[\begin{split} % {\bf A} = \left( \begin{array}{cc} a & b \\ c & d \\ \end{array} \right) \quad \implies \quad {\bf A} - \lambda {\bf I} = \left( \begin{array}{cc} a - \lambda & b \\ c & d - \lambda \end{array} \right) % \end{split}\]
\[\begin{split} % \implies \det({\bf A} - \lambda {\bf I}) = (a - \lambda)(d - \lambda) - bc \\ = \lambda^2 - (a + d) \lambda + (ad - bc) % \end{split}\]

Hence the eigenvalues of \({\bf A}\) are given by the two roots of

\[ % \lambda^2 - (a + d) \lambda + (ad - bc) = 0 % \]

Equivalently,

\[ % \lambda^2 - \mathrm{trace}({\bf A}) \lambda + \det({\bf A}) = 0 % \]

Diagonalization#

Definition

Square matrix \({\bf A}\) is said to be similar to square matrix \({\bf B}\) if

\[ % \exists \text{ invertible matrix ${\bf P}$ such that } {\bf A} = {\bf P} {\bf B} {\bf P}^{-1} % \]
_images/diagonalize.png

Fact

If \({\bf A}\) is similar to \({\bf B}\), then \({\bf A}^t\) is similar to \({\bf B}^t\) for all \(t \in \mathbb{N}\)

Definition

If \({\bf A}\) is similar to a diagonal matrix, then \({\bf A}\) is called diagonalizable

Fact

Let \({\bf A}\) be diagonalizable with \({\bf A} = {\bf P} {\bf D} {\bf P}^{-1}\) and let

  1. \({\bf D} = \mathrm{diag}(\lambda_1, \ldots, \lambda_N)\)

  2. \({\bf p}_n := \mathrm{col}_n({\bf P})\)

Then \(({\bf p}_n, \lambda_n)\) is an eigenpair of \({\bf A}\) for each \(n\)

Fact

If \(N \times N\) matrix \({\bf A}\) has \(N\) distinct eigenvalues \(\lambda_1, \ldots, \lambda_N\), then \({\bf A}\) is diagonalizable as \({\bf A} = {\bf P} {\bf D} {\bf P}^{-1}\) where

  1. \({\bf D} = \mathrm{diag}(\lambda_1, \ldots, \lambda_N)\)

  2. \(\mathrm{col}_n({\bf P})\) is an eigenvector for \(\lambda_n\)

Example

Let

\[\begin{split} % {\bf A} := \begin{pmatrix} 1 & -1 \\ 3 & 5 \end{pmatrix} % \end{split}\]

The eigenvalues of \({\bf A}\) are 2 and 4, while the eigenvectors are

\[\begin{split} % {\bf p}_1 := \begin{pmatrix} 1 \\ -1 \end{pmatrix} \quad \text{and} \quad {\bf p}_2 := \begin{pmatrix} 1 \\ -3 \end{pmatrix} % \end{split}\]

Hence

\[ % {\bf A} = {\bf P} \mathrm{diag}(2, 4) {\bf P}^{-1} % \]
import numpy as np
from numpy.linalg import inv
A = np.array([[1, -1],
              [3, 5]])
eigvals, eigvecs = np.linalg.eig(A)
D = np.diag(eigvals)
P = eigvecs
print('A =',A,sep='\n')
print('D =',D,sep='\n')
print('P =',P,sep='\n')
print('P^-1 =',inv(P),sep='\n')
print('P*D*P^-1 =',P@D@inv(P),sep='\n')
A =
[[ 1 -1]
 [ 3  5]]
D =
[[2. 0.]
 [0. 4.]]
P =
[[-0.70710678  0.31622777]
 [ 0.70710678 -0.9486833 ]]
P^-1 =
[[-2.12132034 -0.70710678]
 [-1.58113883 -1.58113883]]
P*D*P^-1 =
[[ 1. -1.]
 [ 3.  5.]]

The Euclidean Matrix Norm#

The concept of norm is very helpful for working with vectors: provides notions of distance, similarity, convergence

How about an analogous concept for matrices?

Definition

Given \(N \times K\) matrix \({\bf A}\), denote \(\| {\bf A} \|\) the matrix norm of \({\bf A}\).

Euclidean matrix norm is defined by

\[ % \| {\bf A} \| := \max \left\{ \frac{\| {\bf A} {\bf x} \|}{ \| {\bf x} \|} \,: \, {\bf x} \in \mathbb{R}^K, \; {\bf x} \ne {\bf 0} \right\} % \]

In the maximization we can restrict attention to \({\bf x}\) s.t. \(\| {\bf x} \| = 1\)

To see this let

\[ % a := \max_{{\bf x} \ne {\bf 0}} \frac{\| {\bf A} {\bf x} \|}{ \| {\bf x} \|} \qquad \text{and} \qquad b := \max_{\| {\bf x} \| = 1} \frac{\| {\bf A} {\bf x} \|}{ \| {\bf x} \|} = \max_{\| {\bf x} \| = 1} \| {\bf A} {\bf x} \| % \]

Evidently \(a \geq b\) because max is over a larger domain

To see the reverse let

  • \({\bf x}_a\) be the maximizer over \({\bf x} \ne {\bf 0}\) and let \(\alpha := 1 / \| {\bf x}_a\|\)

  • \({\bf x}_b := \alpha {\bf x}_a \)

Then

\[ % b \geq \frac{ \| {\bf A} {\bf x}_b \| }{ \| {\bf x}_b \|} = \frac{ \| \alpha {\bf A} {\bf x}_a \| }{ \| \alpha {\bf x}_a \|} = \frac{\alpha}{\alpha} \frac{\| {\bf A} {\bf x}_a \|}{ \| {\bf x}_a \|} = a % \]

Definition

If \(\| {\bf A} \| < 1\) then \({\bf A}\) is called contractive — it shrinks the norm

_images/contractive.png

The matrix norm has similar properties to the Euclidean norm

Fact

For conformable matrices \({\bf A}\) and \({\bf B}\), we have

  1. \(\| {\bf A} \| = {\bf 0}\) if and only if all entries of \({\bf A}\) are zero

  2. \(\| \alpha {\bf A} \| = |\alpha| \| {\bf A} \|\) for any scalar \(\alpha\)

  3. \(\| {\bf A} + {\bf B} \| \leq \| {\bf A} \| + \| {\bf B} \|\)

  4. \(\| {\bf A} {\bf B} \| \leq \| {\bf A} \| \| {\bf B} \|\)

The last inequality is called the submultiplicative property of the matrix norm

For square \({\bf A}\) it implies that \(\|{\bf A}^k\| \leq \|{\bf A} \|^k\) for any \(k \in \mathbb{N}\)

Fact

For the diagonal matrix

\[\begin{split} % {\bf D} = \mathrm{diag}(d_1, \ldots, d_N) = \left( \begin{array}{cccc} d_1 & 0 & \cdots & 0 \\ 0 & d_2 & \cdots & 0 \\ \vdots & \vdots & & \vdots \\ 0 & 0 & \cdots & d_N \\ \end{array} \right) % \end{split}\]

we have

\[ % \| {\bf D} \| = \max_n |d_n| % \]

We can use matrices as elements of sequences

Let \(\{ {\bf A}_j \}\) and \({\bf A}\) be \(N \times K\) matrices

  • If \(\| {\bf A}_j - {\bf A} \| \to 0\) then we say that \({\bf A}_j\) converges to \({\bf A}\)

  • If \(\sum_{j=1}^J {\bf A}_j\) converges to some matrix \({\bf B}_{\infty}\) as \(J \to \infty\) we write

\[ % \sum_{j=1}^{\infty} {\bf A}_j = {\bf B}_{\infty} % \]

In other words,

\[ % {\bf B}_{\infty} = \sum_{j=1}^{\infty} {\bf A}_j \quad \iff \quad \lim_{J \to \infty} \; \left\| \sum_{j=1}^J {\bf A}_j - {\bf B}_{\infty} \right\| \to 0 % \]

Neumann Series [optional]#

Consider the difference equation \({\bf x}_{t+1} = {\bf A} {\bf x}_t + {\bf b}\), where

  • \({\bf x}_t \in \mathbb{R}^N\) represents the values of some variables at time \(t\)

  • \({\bf A}\) and \({\bf b}\) form the parameters in the law of motion for \({\bf x}_t\)

Question of interest: is there an \({\bf x}\) such that

\[ % {\bf x}_t = {\bf x} \; \implies \; {\bf x}_{t+1} = {\bf x} % \]

In other words, we seek an \({\bf x} \in \mathbb{R}^N\) that solves the system of equations

\[ % {\bf x} = {\bf A} {\bf x} + {\bf b}, \quad \text{where} \quad {\bf A} \text{ is } N \times N \text{ and } {\bf b} \text{ is } N \times 1 % \]

We can get some insight from the scalar case \(x_{t+1} = a x_t + b\)

If \(|a| < 1\), then this equation has the solution

\[ % \bar x = \frac{b}{1-a} = b \sum_{k=0}^{\infty} a^k % \]

Does an analogous result hold in the vector case \({\bf x} = {\bf A} {\bf x} + {\bf b}\)?

Yes, if we replace condition \(|a| < 1\) with \(\| {\bf A} \| < 1\)

Let \({\bf b}\) be any vector in \(\mathbb{R}^N\) and \({\bf A}\) be an \(N \times N\) matrix

The next result is called the Neumann series lemma

Fact

If \(\| {\bf A}^k \| < 1\) for some \(k \in \mathbb{N}\), then \({\bf I} - {\bf A}\) is invertible and

\[ % ({\bf I} - {\bf A})^{-1} = \sum_{j=0}^{\infty} {\bf A}^j % \]

In this case \({\bf x} = {\bf A} {\bf x} + {\bf b}\) has the unique solution

\[ % \bar {\bf x} = \sum_{j=0}^{\infty} {\bf A}^j {\bf b} % \]

How to test the hypotheses of the Neumann series lemma?

Definition

The spectral radius of square matrix \({\bf A}\) is

\[ % \rho({\bf A}) := \max \{ |\lambda| \colon \lambda \text{ is an eigenvalue of } {\bf A}\} % \]

Here \(|\lambda|\) is the modulus of the possibly complex number \(\lambda\). That is, if \(\lambda = a + ib\), then \(|\lambda| = \sqrt{a^2 + b^2}\).

Quadratic Forms#

Up till now we have studied linear functions extensively

Next level of complexity is quadratic maps

Definition

Let \({\bf A}\) be \(N \times N\) and symmetric, and let \({\bf x}\) be \(N \times 1\)

The quadratic function on \(\mathbb{R}^N\) associated with \({\bf A}\) is the map

\[ % Q \colon \mathbb{R}^N \to \mathbb{R}, \qquad Q({\bf x}) := {\bf x}' {\bf A} {\bf x} = \sum_{j=1}^N \sum_{i=1}^N a_{ij} x_i x_j % \]

The properties of \(Q\) depend on \({\bf A}\)

An \(N \times N\) symmetric matrix \({\bf A}\) is called

  1. nonnegative definite if \({\bf x}' {\bf A} {\bf x} \geq 0\) for all \({\bf x} \in \mathbb{R}^N\)

  • positive definite if \({\bf x}' {\bf A} {\bf x} > 0\) for all \({\bf x} \in \mathbb{R}^N\) with \({\bf x} \ne {\bf 0}\)

  • nonpositive definite if \({\bf x}' {\bf A} {\bf x} \leq 0\) for all \({\bf x} \in \mathbb{R}^N\)

  • negative definite if \({\bf x}' {\bf A} {\bf x} < 0\) for all \({\bf x} \in \mathbb{R}^N\) with \({\bf x} \ne {\bf 0}\)

_images/qform_pd.png

Fig. 70 A positive definite case: \(Q({\bf x}) = {\bf x}' {\bf I} {\bf x}\)#

_images/qform_nd.png

Fig. 71 A negative definite case: \(Q({\bf x}) = {\bf x}'(-{\bf I}){\bf x}\)#

Note that some matrices have none of these properties

  • \({\bf x}' {\bf A} {\bf x} < 0\) for some \({\bf x}\)

  • \({\bf x}' {\bf A} {\bf x} > 0\) for other \({\bf x}\)

In this case \({\bf A}\) is called indefinite

_images/qform_indef.png

Fig. 72 Indefinite quadratic function \(Q({\bf x}) = x_1^2/2 + 8 x_1 x_2 + x_2^2/2\)#

Fact

A symmetric matrix \({\bf A}\) is

  1. positive definite \(\iff\) all eigenvalues are strictly positive

  2. negative definite \(\iff\) all eigenvalues are strictly negative

  3. nonpositive definite \(\iff\) all eigenvalues are nonpositive

  4. nonnegative definite \(\iff\) all eigenvalues are nonnegative

It follows that

  • \({\bf A}\) is positive definite \(\implies\) \(\det({\bf A}) > 0\)

In particular, \({\bf A}\) is nonsingular

Extra study material#

Watch excellent video series on linear algebra by Grant Sanderson (3blue1brown) — best visualisations for strong intuition