Lecture 1: Linear Algebra: Systems of Linear Equations, Matrices, Vector Spaces, Linear Independence

Part I: Concept

Part II: Matrix Operation

(ฮปA)ij=ฮป(Aij)

Part III: Vector Spaces and Groups

Image/Class/Mathematics-for-AI/2.png
Image/Class/Mathematics-for-AI/3.png
Image/Class/Mathematics-for-AI/4.png

Continuation of Notes

Lecture 2: Linear Algebra: Basis and Rank, Linear Mappings, Affine Spaces

Part I: Basis and Rank

Part II: Linear Mappings

Part III: Basis Change and Transformation Matrices

This section covers the representation of abstract linear mappings as matrices and the mechanics of changing coordinate systems.

1. The Transformation Matrix for a Linear Map

A transformation matrix provides a concrete computational representation for an abstract linear mapping, relative to chosen bases.

2. The Change of Basis Matrix

This is a special application of the transformation matrix, used to convert a vector's coordinates from one basis to another within the same vector space. This process is equivalent to finding the transformation matrix for the identity map (id:Vโ†’V, where id(x)=x).

You are absolutely right, and I sincerely apologize. Your observation is spot on. I failed to integrate the clarifying distinction between "Domain/Codomain" (the spaces) and "Input/Output Basis" (the coordinate systems) into the formal note. You are correct that the note I provided, while technically accurate to the slides, lost the very explanation that resolved your earlier confusion.

3. The Theorem of Basis Change for Linear Mappings

Change-of-Basis Theorem
This theorem provides a formula to calculate the new transformation matrix for a linear map when the bases (the coordinate systems) for its domain and codomain are changed.

4. Matrix Equivalence and Similarity

These concepts formalize the idea that different matrices can represent the same underlying linear map, just with different coordinate systems.

5. Composition of Linear Maps

Part IV: Affine Spaces and Subspaces

While vector spaces and subspaces are fundamental, they are constrained by one critical requirement: they must contain the origin. Affine spaces generalize this idea to describe geometric objects like lines and planes that do not necessarily pass through the origin.

Feature Vector Subspace (U) Affine Subspace (L = p + U)
Must Contain Origin? Yes. (0 โˆˆ U) No, unless p โˆˆ U.
Closure under Addition? Yes. If uโ‚, uโ‚‚ โˆˆ U, then uโ‚ + uโ‚‚ โˆˆ U. No. In general, lโ‚ + lโ‚‚ โˆ‰ L.
Closure under Scaling? Yes. If u โˆˆ U, then cu โˆˆ U. No. In general, clโ‚ โˆ‰ L.
Geometric Example A line/plane through the origin. Any line/plane, shifted.
Linear System Example Solution set of Ax = 0. Solution set of Ax = b.

Part V: Hyperplanes

A hyperplane is a generalization of the concept of a line (in 2D) and a plane (in 3D) to vector spaces of any dimension. It is an extremely important and common special case of an affine subspace.

1. Core Intuition

Its key function is to "slice" the entire space into two half-spaces, making it an ideal decision boundary in classification problems.

2. Two Equivalent Definitions of a Hyperplane

Hyperplanes can be defined in two equivalent ways: one algebraic and one geometric.

Definition 1: The Algebraic Definition (via a Single Linear Equation)

A hyperplane H in Rn is the set of all points x that satisfy a single linear equation:

a1x1+a2x2+โ‹ฏ+anxn=d

where a1,โ€ฆ,an are coefficients that are not all zero, and d is a constant.

Using vector notation, this equation becomes much more compact:

aTx=d
Definition 2: The Geometric Definition (via Affine Subspaces)

A hyperplane H in an n-dimensional vector space V is an affine subspace of dimension n-1.

H=p+U

Where:

3. The Connection Between the Definitions

These two definitions are perfectly equivalent.

4. Hyperplanes in Machine Learning

Hyperplanes are at the core of many machine learning algorithms, most famously the Support Vector Machine (SVM).

Part VI: Affine Mappings

We have established that linear mappings, of the form ฯ†(x) = Ax, always preserve the origin (i.e., ฯ†(0) = 0). However, many practical applications, especially in computer graphics, require transformations that include translation, which moves the origin. This more general class of transformation is called an affine mapping.

1. Core Idea: A Linear Map Followed by a Translation

An affine mapping is, in essence, a composition of a linear mapping and a translation.

2. Formal Definition

A mapping f: V โ†’ W from a vector space V to a vector space W is called an affine mapping if it can be written in the form:

f(x)=Ax+b

Where:

Distinction from Linear Mappings:

3. Key Properties of Affine Mappings

While affine maps are generally not linear (since f(x+y) โ‰  f(x) + f(y)), they preserve several crucial geometric properties.

  1. Lines Map to Lines: An affine map transforms a straight line into another straight line (or, in a degenerate case, a single point if the line's direction is in the null space of A).

  2. Parallelism is Preserved: If two lines are parallel, their images under an affine map will also be parallel.

  3. Ratios of Lengths are Preserved: If a point P is the midpoint of a line segment QR, then its image f(P) will be the midpoint of the image segment f(Q)f(R). This property is vital for maintaining the relative structure of geometric shapes.

  4. Affine Combinations are Preserved: This is the most fundamental algebraic property of an affine map. If a point y is an affine combination of a set of points xแตข (meaning y = ฮฃฮฑแตขxแตข where ฮฃฮฑแตข = 1), then its image f(y) is the same affine combination of the images f(xแตข):

    f(โˆ‘ฮฑixi)=โˆ‘ฮฑif(xi),provided thatโˆ‘ฮฑi=1

4. Homogeneous Coordinates: The Trick to Unify Transformations

In fields like computer graphics, it is highly desirable to represent all transformations, including translations, with a single matrix multiplication. The standard form Ax + b requires both a multiplication and an addition, which is inconvenient for composing multiple transformations.

Homogeneous Coordinates elegantly solve this problem by adding an extra dimension, effectively turning an affine map into a linear map in a higher-dimensional space.

5. Summary

Concept Linear Mapping (Ax) Affine Mapping (Ax + b)
Essence Rotation / Scaling / Shearing Linear Transformation + Translation
Preserves Origin? Yes, f(0) = 0 No, f(0) = b in general
Preserves Lin. Comb.? Yes No
What is Preserved? Lines, parallelism, linear combinations Lines, parallelism, affine combinations
Representation Matrix A Matrix A and vector b
Homogeneous Form [A00T1] [Ab0T1]

Lecture 3: Analytic Geometry: Norms, Inner Products, and Lengths and Distances, Angles and Orthogonality

Part I: Geometric Structures on Vector Spaces

In the previous parts, we established the algebraic framework of vector spaces and linear mappings. Now, we will enrich these spaces with geometric structure, allowing us to formalize intuitive concepts like the length of a vector, the distance between vectors, and the angle between them. These concepts are captured by norms and inner products.

1. Norms

A norm is a formal generalization of the intuitive notion of a vector's "length" or "magnitude".

2. Inner Products

An inner product is a more fundamental concept than a norm. It is a function that allows us to define not only the Euclidean norm but also the angle between vectors and the notion of orthogonality (perpendicularity).

3. The Bridge: From Inner Products to Geometry

The inner product is the foundation of Euclidean geometry within a vector space. All key geometric concepts can be derived from it.

Part II: Geometric Structures on Vector Spaces

1. Symmetric, Positive Definite (SPD) Matrices and Inner Products

In finite-dimensional vector spaces like Rn, the abstract concept of an inner product can be concretely represented and computed using a special class of matrices. These are Symmetric, Positive Definite (SPD) matrices. They are fundamental in machine learning, statistics, and optimization because they provide a way to define custom, yet valid, notions of distance, angle, and similarity, which are essential for algorithms like Support Vector Machines (with kernels), Gaussian models, and Newton's method in optimization.

Part III: Angles, Orthogonality, and Orthogonal Matrices

1. Angles and Orthogonality

Inner products not only define lengths and distances but also enable us to define the angle between vectors, thereby generalizing the concept of "perpendicularity".

2. Orthogonal Matrices

An orthogonal matrix is a special type of square matrix whose corresponding linear transformation geometrically represents a shape-preserving transformation (like a rotation or reflection) and has excellent computational properties.

Part IV: Metric Spaces and the Formal Definition of Distance

Previously, we defined distance in an inner product space as d(x, y) = ||x - y||. This is a specific instance of a much more general concept called a metric. A metric formalizes the intuitive notion of "distance" between elements of any set, not just vectors.

1. The Metric Function (ๅบฆ้‡ๅ‡ฝๆ•ฐ)

A metric (or distance function) on a set V is a function that quantifies the distance between any two elements of that set.

2. The Connection: From Inner Products to Metrics

The distance we derived from the inner product is a valid metric. We can prove this by showing it satisfies all three metric axioms.

3. Why is the Concept of a Metric Useful?

The power of defining a metric abstractly is that it allows us to measure "distance" in contexts far beyond standard Euclidean geometry.

4. Summary: Hierarchy of Spaces

This shows how these geometric concepts build upon each other:

Inner Product Space โ†’ Normed Space โ†’ Metric Space โ†’ Topological Space

However, the reverse is not always true. There are metrics (like edit distance) that do not come from a norm, and norms (like the L1-norm) that do not come from an inner product.

Part V: Orthogonal Projections

Orthogonal projection is a fundamental operation that finds the "closest" vector in a subspace to a given vector in the larger space. It is the geometric foundation for concepts like least-squares approximation.

1. The Concept of Orthogonal Projection

Let U be a subspace of an inner product space V (e.g., Rn with the dot product), and let xโˆˆV be a vector. The orthogonal projection of x onto the subspace U, denoted ฯ€U(x), is the unique vector in U that is "closest" to x.

This projection, which we will call p=ฯ€U(x), is defined by two fundamental properties:

  1. Membership Property: The projection p must lie within the subspace U.

    • (pโˆˆU)
  2. Orthogonality Property: The vector connecting the original vector x to its projection p (the "error" vector xโˆ’p) must be orthogonal to the entire subspace U.

    • ((xโˆ’p)โŠฅU)

This second property implies that (xโˆ’p) must be orthogonal to every vector in U. A key theorem states that this is equivalent to being orthogonal to all vectors in a basis for U.

2. Deriving the Projection Formula (The Normal Equation)

Our goal is to find an algebraic method to compute the projection vector p. The strategy is to translate the two geometric properties above into a system of linear equations.

Step 1: Express the Membership Property using a Basis

First, we need a basis for the subspace U. Let's say we find a basis {b1,b2,โ€ฆ,bk}. We can arrange these basis vectors as the columns of a matrix B:

B=[|||b1b2โ‹ฏbk|||]

Since the projection p must be in the subspace U (which is the column space of B), p must be a linear combination of the columns of B. This means there must exist a unique vector of coefficients ฮป=(ฮป1,โ€ฆ,ฮปk)T such that:

p=ฮป1b1+ฮป2b2+โ‹ฏ+ฮปkbk

In matrix form, this is written as:

p=Bฮป

Our problem is now reduced from finding the vector p to finding the unknown coefficient vector ฮป.

Step 2: Express the Orthogonality Property as an Equation

The orthogonality property states that (xโˆ’p)โŠฅU. This means the dot product of (xโˆ’p) with every basis vector of U must be zero:

{b1โ‹…(xโˆ’p)=0b2โ‹…(xโˆ’p)=0โ‹ฎbkโ‹…(xโˆ’p)=0โŸบ{b1T(xโˆ’p)=0b2T(xโˆ’p)=0โ‹ฎbkT(xโˆ’p)=0

This system of equations can be written compactly in matrix form. Notice that the rows of the matrix are the transposes of our basis vectors, which is exactly the matrix BT:

[---ย b1Tย ------ย b2Tย ---โ‹ฎ---ย bkTย ---](xโˆ’p)=0โŸนBT(xโˆ’p)=0

Step 3: Combine and Solve for ฮป

We now have a system of two equations:

  1. p=Bฮป
  2. BT(xโˆ’p)=0

Substitute the first equation into the second:

BT(xโˆ’Bฮป)=0

Distribute BT:

BTxโˆ’BT(Bฮป)=0

Rearrange the terms to isolate the unknown ฮป:

(BTB)ฮป=BTx

This final result is known as the Normal Equation. It is a system of linear equations for the unknown coefficients ฮป.

3. The Algorithm for Orthogonal Projection

Given a vector x and a subspace U:

  1. Find a Basis: Find a basis {b1,โ€ฆ,bk} for the subspace U.
  2. Form the Basis Matrix B: Create a matrix B whose columns are the basis vectors.
  3. Set up the Normal Equation: Compute the matrix Bแต€B and the vector Bแต€x.
  4. Solve for ฮป: Solve the linear system (Bแต€B)ฮป = Bแต€x to find the coefficient vector ฮป.
  5. Compute the Projection p: Calculate the final projection vector using the formula p = Bฮป.

Important Note: The matrix Bแต€B is square and is invertible if and only if the columns of B are linearly independent (which they are, because they form a basis).

4. Special Case: Orthonormal Basis

If the basis for U is orthonormal, the columns of B are orthonormal. In this case, the calculation simplifies dramatically:

The matrix P=BBT is called the projection matrix. This simplified formula only works when the basis is orthonormal. For a general, non-orthogonal basis, one must solve the full Normal Equation.

Lecture 4: Analytic Geometry: Orthonormal Basis, Orthogonal Complement, Inner Product of Functions, Orthogonal Projections, Rotations

Part I: Orthonormal Basis and Orthogonal Complement

1. Orthonormal Basis

2. Gram-Schmidt Process: Constructing an Orthonormal Basis

The Gram-Schmidt process is a fundamental algorithm that transforms any set of linearly independent vectors (a basis) into an orthonormal basis for the same subspace.

3. Orthogonal Complement and Decomposition

The orthogonal complement generalizes the concept of perpendicularity from single vectors to entire subspaces.

4. Applications and Examples of Orthogonal Complements

Part II: Inner Product of Functions, Orthogonal Projections, and Rotations

1. Inner Product of Functions

The concept of an inner product can be extended from the familiar space Rn to vector spaces of functions, enabling us to apply geometric intuition to abstract objects like polynomials or signals.

2. Orthogonal Projections

Orthogonal projection is a core operation for finding the "best approximation" or "closest point" of a vector within a given subspace. It is the geometric foundation for solving least-squares problems.

3. Rotations

Rotations are a fundamental class of geometric transformations that preserve the shape and size of objects. In linear algebra, they are represented by a special type of matrix.

Part III: A Detailed Explanation of Orthogonal Projections

Projections are a critically important class of linear transformations with wide-ranging applications in computer graphics, coding theory, statistics, and machine learning.

1. The Importance and Concept of Orthogonal Projections

2. Formal Definition and Properties of Projections

3. Projection onto a One-Dimensional Subspace (a Line)

Image/Class/Mathematics-for-AI/5.png
Image/Class/Mathematics-for-AI/6.png

We begin by deriving the projection formula for the simplest case: projecting a vector onto a line. Unless stated otherwise, we assume the standard dot product as the inner product.

4. Projection onto a General Subspace

Image/Class/Mathematics-for-AI/7.png

The three-step method for 1D projection can be generalized to any m-dimensional subspace UโІRn.

Extension: Projections between Subspaces

5. Core Application I: Gram-Schmidt Orthogonalization

Image/Class/Mathematics-for-AI/8.png
Image/Class/Mathematics-for-AI/9.png
Image/Class/Mathematics-for-AI/10.png
The Gram-Schmidt process is a classic algorithm for constructing an orthonormal basis, and its core idea is the repeated application of orthogonal projection.

**ๆ‹“ๅฑ•๏ผšCholeskyๅˆ†่งฃ

6. Core Application II: Projection onto an Affine Subspace

So far, we have discussed projections onto subspaces that pass through the origin. We now generalize this to affine subspaces (e.g., lines or planes that do not pass through the origin).

7. Core Application III: Projections and Least Squares Solutions

Moore Penrose Pseudo inverse
Orthogonal projection provides a powerful geometric framework for solving inconsistent linear systems of the form Ax=b, forming the foundation of the Least Squares Method.

Part IV: A Detailed Explanation of Rotations

Rotations are another important class of linear transformations, following projections, that play a central role in geometry, computer graphics, and robotics.

1. Fundamental Concepts of Rotation

2. Rotations in R2

2.1 Derivation of the R2 Rotation Matrix: Two Perspectives

Perspective 1: Basis Change (The "Columns are Transformed Basis Vectors" View)

This is the standard derivation based on the nature of linear transformations.

Perspective 2: Polar Coordinates and Trigonometric Identities (The "Direct Geometric" View)

This is a more direct geometric proof that does not rely on the idea of basis change.

  1. Represent the Vector: Express an arbitrary vector in polar coordinates: x=rcosโกฯ•,y=rsinโกฯ•.
  2. Represent the Rotation: Rotating this vector by an angle ฮธ changes its angle to ฯ•+ฮธ. The new coordinates (xโ€ฒ,yโ€ฒ) are:xโ€ฒ=rcosโก(ฯ•+ฮธ),yโ€ฒ=rsinโก(ฯ•+ฮธ)
  3. Apply Angle-Sum Formulas:
    • xโ€ฒ=r(cosโกฯ•cosโกฮธโˆ’sinโกฯ•sinโกฮธ)=xcosโกฮธโˆ’ysinโกฮธ
    • yโ€ฒ=r(sinโกฯ•cosโกฮธ+cosโกฯ•sinโกฮธ)=xsinโกฮธ+ycosโกฮธ
  4. Write in Matrix Form:[xโ€ฒyโ€ฒ]=[cosโกฮธโˆ’sinโกฮธsinโกฮธcosโกฮธ][xy]

2.2 Applications and Notes for R2 Rotations

3. Rotations in R3

Rotations in 3D are more complex than in 2D because they must occur around an axis of rotation.

3.1 Directional Convention for 3D Rotations: The Right-Hand Rule

To define "counter-clockwise," we use the Right-Hand Rule:

3.2 Fundamental Rotations about the Coordinate Axes

Any complex 3D rotation can be decomposed into a sequence of fundamental rotations about the three primary axes (x, y, z).

  1. Rotation about the x-axis (e1) by Rx(ฮธ)

    • Description: The x-coordinate remains unchanged; the rotation occurs in the yz-plane.
    • Matrix:Rx(ฮธ)=[1000cosโกฮธโˆ’sinโกฮธ0sinโกฮธcosโกฮธ]
  2. Rotation about the y-axis (e2) by Ry(ฮธ)

    • Description: The y-coordinate remains unchanged; the rotation occurs in the zx-plane.
    • Matrix:Ry(ฮธ)=[cosโกฮธ0sinโกฮธ010โˆ’sinโกฮธ0cosโกฮธ]
  3. Rotation about the z-axis (e3) by Rz(ฮธ)

    • Description: The z-coordinate remains unchanged; the rotation occurs in the xy-plane.
    • Matrix:Rz(ฮธ)=[cosโกฮธโˆ’sinโกฮธ0sinโกฮธcosโกฮธ0001]

3.3 Trigonometric Proof of the 3D Fundamental Rotation Matrices

3.4 Sequential Rotations in 3D

4. Rotations in Higher Dimensions (Rn): Givens Rotations

5. General Properties of Rotations

Lecture 5: Matrix Decompositions: Determinant and Trace, Eigenvalues and Eigenvectors, Cholesky Decomposition, Eigendecomposition and Diagonalization