Note of Mathematics for Computer Science
1 Calculus I (MIT 18.01)
1.1 Differentiation
1.1.1 Limit
\lim_{x\to a}f(x) = L \iff \forall \epsilon>0, \exists \delta > 0: \lvert f(x) - L\rvert < \epsilon , \forall x\in(a-\delta, a)\cup(a, a+\delta)
1.1.2 Derivative
The derivative is the slope of the line tangent to the graph of f(x).
Tangent line is the limit of the secant line, as the distance between the 2 points goes to zero.
\lim_{\Delta x \to 0} \frac{\Delta f}{\Delta x} = \lim_{\Delta x \to 0} \frac{f(x_0+\Delta x) - f(x_0)}{\Delta x} = f'(x_0)
Equivalent notations:
\frac{\mathrm{d}f(x)}{\mathrm{d}x}, \frac {\mathrm{d}}{\mathrm{d}x} f(x), f', Df
1.1.3 Differential notation
\mathrm{d}y = f'(x)\mathrm{d}x \quad (y = f(x))
Both \mathrm{d}y and f'(x)\mathrm{d}x are called differentials.
1.1.4 Continuity
\lim_{x\to x_0} f(x) = f(x_0) \iff f(x) \text{ is continuous at } x_0
Different kinds of discontinuity:
- Removable: left hand limit equals to right hand limit, but not f(x_0) or f(x_0) is not defined.
- Jump: both left hand limit to right hand limit exists, but not equal.
- Infinite: both left hand limit to right hand limit exists, but not the same infinity.
- Other: left hand limit or right hand limit does not exist.
Differentiable implies continuous as:
\begin{aligned} & \lim_{\Delta x \to 0} \Delta f(a+\Delta x) = f(a) \\ \iff & x = a, \lim_{\Delta x \to 0} \Delta y = \lim_{\Delta x \to 0} \frac{\Delta y}{\Delta x} \Delta x = (\lim_{\Delta x \to 0} \frac{\Delta y}{\Delta x}) (\lim_{\Delta x \to 0} \Delta x) = \frac{\mathrm{d}y}{\mathrm{d}x} \cdot0=0 \end{aligned}
1.1.5 Conputations of derivatives
basic:
\begin{aligned} \mathrm{d}Cu & = C \mathrm{d}u \\ \mathrm{d}(u+v) &= \mathrm{d}u + \mathrm{d}v \\ \mathrm{d}(uv) &= u \mathrm{d}v + v \mathrm{d}u \\ \mathrm{d}\frac uv &=\frac{v \mathrm{d}u-u\mathrm{d}v}{v^2} \end{aligned}
chain:
\frac{\mathrm{d}y}{\mathrm{d}x}=\frac{\mathrm{d}y}{\mathrm{d}u}\cdot\frac{\mathrm{d}u}{\mathrm{d}x} \quad y=f(u) \text{ and } u = g(x)
or in the form:
[f(g(x))]' = (f\circ g)'(x) = f'(g(x)) \cdot g'(x)
1.1.6 Higher derivative notation
\begin{aligned} f^{''}(x) & \quad y^{''} & \quad \frac{\mathrm{d}^2y}{\mathrm{d} x^2} & \quad \frac {\mathrm{d}^2}{\mathrm{d} x ^ 2} f(x) \\ f^{(n)}(x) & \quad y^{n} & \quad \frac{\mathrm{d}^ny}{\mathrm{d} x^n} & \quad \frac {\mathrm{d}^n}{\mathrm{d} x ^ n} f(x) \end{aligned}
1.1.7 Implicit differentiation
Use the chain rule and treat the response variable cautiously. Example: \mathrm{d}\tan^{-1}x.
\frac{f'(t)}{f(t)} = \frac{\mathrm{d}}{\mathrm{d}t}\ln (f(t))
1.1.8 Derivative table
| Func | Derivative | Domain |
|---|---|---|
| C | 0 | x \in \mathbb{R} |
| x^n | nx^{n-1} | n \in \mathbb{R} |
| a^x | \ln a\cdot a^x | a\neq 1 |
| \log_ax | \frac 1 {\ln a} \cdot\frac{1}{x} | a\neq 1 |
| \sin x | \cos x | x \in \mathbb{R} |
| \cos x | -\sin x | x \in \mathbb{R} |
| \tan x | \frac{1}{\cos^2x}, \sec^2x | x\neq \frac{\pi}{2}+k\pi, k\in\mathbb{Z} |
| \sec x | \sec x \tan x | x\neq \frac{\pi}{2}+k\pi, k\in\mathbb{Z} |
| \sin^{-1}x | \frac{1}{\sqrt{1-x^2}} | x\in(-1,1) |
| \tan^{-1}x | \frac{1}{1+x^2} | x \in \mathbb{R} |
1.1.9 L'Hospital's rule
If f(a) = g(a) = 0, have derivatives for x=a, and g'(a) \neq 0, then:
\lim_{x\to a}\frac{f(x)}{g(x)} = \lim_{x\to a} \frac{\frac{f(x)}{x-a}}{\frac{g(x)}{x-a}} = \frac{\lim_{x\to a}\frac{f(x) - f(a)}{x-a}}{\lim_{x\to a}\frac{g(x) - g(a)}{x-a}} =\frac{f'(a)}{g'(a)}
It is also ok to use L'Hospital's rule on limits of the form \frac{\infty}{\infty} or x\to\pm\infty.
1.2 Applications of Differentiation
1.2.1 Linear approximation
f(x) \approx f(x_0) + f'(x_0)(x-x_0), \quad x\approx x_0
1.2.2 Newton's method
To solve f(x)=0:
x_{n+1} = x_n - \frac{f(x_n)}{f'(x_n)}
The mathematical theory providing conditions under which Newton's method is guaranteed to succeed can be found in numerical analysis.
1.2.3 Mean value theorem
\exists c \in (a, b), f'(c)=\frac{f(b) - f(a)}{b-a}; f(x) \text{ is continuous}, x\in [a, b] \land f(x) \text{ is differentiable} , x\in (a, b)
1.2.4 Anti-derivative table
| Func | Anti-derivative | Domain |
|---|---|---|
| x^n | \frac{x^{n+1}}{n+1}+c | n \neq -1 |
| \frac 1x | \ln\lvert x\rvert+c | x\neq0 |
| \sin x | -\cos x +c | x \in \mathbb{R} |
| \tan x | -\ln(\cos x) + c | x\neq \frac{\pi}{2}+k\pi, k\in\mathbb{Z} |
| \sec x | \ln(\tan x+\sec x) +c | x\in(-\frac{\pi}{2}+2k\pi, \frac{\pi}{2}+2k\pi),k\in\mathbb{Z} |
| \frac{1}{\cos^2x}, \sec^2x | \tan x+c | x\neq \frac{\pi}{2}+k\pi, k\in\mathbb{Z} |
| \frac{1}{\sqrt{1-x^2}} | \sin^{-1}x | - |
| \frac{1}{1+x^2} | \tan^{-1}x | - |
1.2.5 Differential in economics
Economics law, give C(x) is function of average cost:
\frac{\mathrm{d}}{\mathrm{d}x}\frac{C(x)}{x} = 0 \iff C'(x)=\frac{C(x)}{x}
Aka, if average cost reaches minimum, then marginal cost = average cost.
And, if profit reaches maxium, then marginal revenue = marginal cost.
Elasticity of demand: E(p)=-\frac p x \frac{\mathrm{d}x}{\mathrm{d}p}.
1.3 Integration
1.3.1 Riemann sum
\sum_{i=1}^{n}f(c_i)\Delta x, \quad c_i \in [x_{i - 1}, x_i], x_i = a + \frac{b-a} n i
1.3.2 Definite integral
\lim_{n\to \infty}\sum_{i=1}^{n}f(c_i)\Delta x = \int^b_af(x) \mathrm d x
1.3.3 First fundamental theorem of calculus
F'(x) = f(x) \land f(x) \text{ is continous} \implies \int^b_af(x) \mathrm d x=F(b)-F(a)
1.3.4 Second fundamental theorem of calculus
F(x) = \int_a^x f(t)\mathrm d t \land f(x) \text{ is continous} \implies F'(x) = f(x)
1.3.5 Substitution
Given f(x) = g(u(x)):
\int g(u) \mathrm{d} u = \int g(u(x))u'(x)\mathrm{d}x = \int f(x)u'(x) \mathrm{d}x
For definite:
\int_{x_1}^{x2} f(x)u'(x) \mathrm{d}x = \int_{u(x_1)}^{u(x_2)} g(u) \mathrm{d} u
1.3.6 Trigonometric function integration
\int \sin^n(x)\cos^m(x) \mathrm d x
- Either n or m is odd; u = \text{odd\_func}(x).
- n and m are all even; use double angle formula until 1 applies.
\int \sec^n(x)\tan^m(x) \mathrm d x
- n is even, u = \tan x.
- m is odd, u = \sec x.
- n is odd, m is even: a. \tan^2 x + 1 = \sec^2 x b. use the result for \int \sec ^n x \mathrm d x.
1.3.7 Partial function integration
\frac{P(x)}{Q(x)} can be splitted in a systematic way.
Partial integral:
(uv)' = u'v+uv' \iff uv'= (uv)' - u'v \iff \int uv' \mathrm d x = uv - \int u'v \mathrm d x
Examples:
- \int (\ln x)^n \mathrm d x
- \int x^n \mathrm e ^ x \mathrm d x
- \int \sin x \mathrm e ^ x \mathrm d x
- \int \cos x \mathrm e ^ x \mathrm d x
- \int \sec ^n x \mathrm d x
1.3.8 Improper integral
\int_{a}^{\infty} f(x) \mathrm d x = \lim_{x\to\infty}F(x)\mathrm d x
is said to converge is the limit exists.
1.4 Applications of Integral
1.4.1 Area between 2 curves
A = \int_a^b(f(x)-g(x))\mathrm d x
1.4.2 Volume by slice method
To get the volume of a function revolved about x-axis:
\mathrm d V = \pi f(x)^2 \mathrm d x
1.4.3 Volume by disk method
To get the volume of a function revolved about y-axis:
\mathrm d V = \pi x^2 \mathrm dy
1.4.4 Volume by cylinder method
To get the volume of a function revolved about y-axis:
\mathrm d V = 2\pi x y \mathrm dx
1.4.5 Weighted average
\frac{\int_a^bf(x)w(x)\mathrm d x}{\int_a^bw(x)\mathrm d x}
1.4.6 Simpson's method
For numerical integration:
\int_{a}^{b} f(x) \mathrm d x = \frac{b - a}{6}[f(a) + 4f(\frac{a+b}{2}) + f(b)] + \mathcal{O}((b-a)^4)
For \mathcal{O}((b-a)^4): Formulas for the Error in Simpson's Rule.
1.4.7 Arc length
L = \int \mathrm d s =\int_{x_0}^{x_1}\sqrt{1+(\frac{dy}{dx})^2}\mathrm d x
1.4.8 Surface area of a function revolved about x-axis
S = \int 2\pi y \mathrm d s
1.4.9 Area in polar coordinates
S = \int^{\theta_2}_{\theta_1}\frac 1 2 r^2 \mathrm d \theta
1.5 Series
1.5.1 Sequence
A sequence is said to be bounded if there are 2 numbers A, B such that A\leq x_n \leq B for every n.
A sequence is said to be convergent if it has a limit, aka, \forall \epsilon>0, \exists n_0: |x_n-L|<\epsilon, \forall n>n_0.
\mathrm{convergent} \implies \mathrm{bounded}
An increasing sequence converges if and only if it is bounded.
1.5.2 Series
s_n = \sum_i a_i
is called infinite series, or simply series. a_i is called terms.
If the sequence \{s_n\} converges, then the series converges.
Geometric series:
\begin{aligned} a_n & = x ^n \\ s_n & = \frac{1-x^{n+1}}{1-x} \end{aligned}
It converges for |x| < 1.
1.5.3 Ratio test and root test
If \sum a_n is a series of positive terms, given L = \lim_{n\to \infty}\frac {a_{n+1}}{a_n}, then:
- L < 1 \to \text{converge}
- L > 1 \to \text{diverge}
- L=1 \to \text{inconclusive}: e.g. a_n=\frac 1 {n^p}
Given sum of geometric series converges for r\in (0,1), given L = \lim_{n\to \infty} (a_n)^{\frac 1 n}, then:
- L < 1 \to \text{converge}
- L > 1 \to \text{diverge}
- L=1 \to \text{inconclusive}
1.5.4 Alternating series test
For \sum_{n=1}^{\infty}(-1)^{n+1}a_n:
- a_n is not increasing
- a_n \to 0
A series \sum a_n is said to be absolutely convergent if \sum |a_n| converges.
\text{absolute convergence} \to \text{convergence}
1.5.5 Power series
s_n = \sum_{n=0}^{\infty}a_nx^n
where the a_n are constants and x is variable. Geometric series is the simplest power series.
Every power series \sum a_nx^n has a radius of convergence R, where 0 \leq R \leq \infty, with the property that the series converges absolutely if |x| < R and diverges if |x| > R:
R = \lim_{n\to\infty}|\frac{a_n}{a_{n+1}}|
Within the radius of convergence, the powerseries is continuous, differentiable and integrable.
1.5.6 Integral comparison
Consider a positive, non-increasing function f(x) > 0:
Let s_n=\sum_{i=1}^n f(i), I_n=\int_{1}^{n} f(x)\mathrm dx:
As decreasing:
f(i+1) \leq f(x) \leq f(i), x\in[i, i+1] \implies \sum_{i=2}^{n} f(i)\ \leq \ \int_{1}^{n} f(x)\,dx\ \leq \ \sum_{i=1}^{n-1} f(i)
Equivalently:
s_n - f(1) \leq I_n \leq s_{n-1} \iff f(n) \leq s_n - I_n \leq f(1)
Similarily for increasing function f:
f(1) \leq s_n - I_n \leq f(n)
So, for s_n = \sum_i f(i) and \int_1^\infty f(x)\mathrm d x, either both converge or both diverge.
1.5.7 Euler's constant
As in integral comparison, there is F(n) = \sum_1^n a_n - \int_1^nf(x) \mathrm d x, and 0 \leq F(n) \leq a_1. F(n) is decreasing and L = \lim_{n\to\infty}F(n) exists. The application is:
\text{Euler's constant } \gamma = \lim_{n\to\infty} (\sum_{i=1}^{n} \frac 1 i - \int_{1}^{n}\frac 1 x \mathrm d x)
1.5.8 Taylor's series
At x=0:
f(x)=\sum_{i=0}^{n}\frac{f^{(i)}(0)}{i!}x^i +R_n(x)
Or x=x_0:
f(x)=\sum_{i=0}^{\infty}\frac{f^{(i)}(x_0)}{i!}(x-x_0)^i
Derivative remainder of Taylor's formula:
R_n(x) = \frac{f^{(n+1)}(c)}{(n+1)!}x^{n+1}, c \in(0,x)
The Taylor's series converges to f(x) if and only if:
\lim_{n\to\infty}R_n(x)=0
After operations of Taylor's series of \sin x, \cos x, \mathrm e^x:
\mathrm e ^ {ix}=\cos x + i \sin x \implies \mathrm e ^ {i\pi} + 1 = 0
1.6 Misc
1.6.1 \mathrm{e}
\frac {\mathrm{d}} {\mathrm{d} x}a^x = a^x \to a=\mathrm{e}\iff \lim_{\Delta x \to0}\frac{\mathrm{e}^{\Delta x}-1}{\Delta x} = 1 \iff \mathrm{e}=\lim_{n\to \infty} (1+\frac 1 n)^n, n=\frac 1 {\Delta x}
It is easy to prove f(x)=c\mathrm{e}^x is the only function f'(x)=f(x) other than trivial f(x)=0.
1.6.2 \int_{-\infty}^{\infty}e^{-x^2}\mathrm d x
Volume by slice method for
- V = volume under e^{-r^2} (r = \sqrt{x^2+y^2} = \pi
- To prove V = (\int_{-\infty}^{\infty}e^{-x^2}\mathrm d x)^2 using slice.
1.6.3 Maximum overhung
1.6.4 Real number system
The axiomatic approach (I prefer this quora answer):
They are an Abelian Group under addition:
- Commutative x+y=y+x
- Associative (x+y)+z=x+(y+z)
- Identity x+0=0+x=x
- Inverse \forall x \exists y:x+y=0
Excluding zero they are an Abelian Group under multiplication:
- Commutative x\cdot y=y\cdot x
- Associative (x\cdot y)\cdot z=x\cdot (y\cdot z)
- Identity x\cdot1=1\cdot x=x
- Inverse \forall x\neq0\exists y:x\cdot y=1
Multiplication distributes over addition:
- x\cdot (y+z)=(x\cdot y)+(x\cdot z)
- (x+y)\cdot z=(x\cdot z)+(y\cdot z)
They are totally ordered:
- x<y or y<x or x=y
- x<y\to x+z<y+z
- x\geq 0\land y\geq 0\to xy\geq 0
They are dedekind-complete (have the least-upper-bound property):
- Every non-empty bounded subset has a least upper bound.
It turns out that any structure satisfying these axioms is isomorphic, so we call any such structure the Real numbers, (\mathbb{R},+,\cdot ). In short \mathbb{R} is the dedekind-complete ordered Field.
2 Calculus II (MIT 18.02)
2.1 Vectors and Matrices
2.1.1 Product
\begin{aligned} \vec A \cdot \vec B & = \sum a_ib_i = |\vec A||\vec B| \cos \theta \\ \vec A \times \vec B & = \begin{vmatrix} \vec i & \vec j & \vec k \\ a_1 & a_2 & a_3 \\ b_1 & b_2 & b_3 \end{vmatrix} = \vec i \begin{vmatrix} a_2 & a_3 \\ b_2 & b_3 \end{vmatrix} - \vec j \begin{vmatrix} a_1 & a_3 \\ b_1 & b_3 \end{vmatrix} + \vec k \begin{vmatrix} a_1 & a_2 \\ b_1 & b_2 \end{vmatrix} \\ |\vec A \times \vec B| & = \text{area of space parallelogram} = |\vec A||\vec B|\sin \theta\\ \frac{\vec A \times \vec B}{|\vec A \times \vec B|} &= \hat n\text{ perpendicular to plane formed by } \vec A \text{ and } \vec B \\ \end{aligned}
\frac{\mathrm d}{\mathrm dt}\bigl(\vec{A} \times \vec{B}\bigr) = \frac{\mathrm d\vec{A}}{\mathrm dt} \times \vec{B} \;+\; \vec{A} \times \frac{\mathrm d\vec{B}}{\mathrm dt}
2.1.2 Determinant
\begin{aligned} |\det(\vec A, \vec B)| & = \text {area of parallelogram} \\ |\det(\vec A, \vec B, \vec C)| & = \text {area of parallelepiped} \end{aligned}
2.1.3 Inverse matrix
\mathbf{A}^{-1} = \frac{1}{\det({\mathbf{A}})}\text{adj}(\mathbf{A})
2.2 Partial Derivatives
2.2.1 Partial derivatives
To treat all other variables as constant:
f_x = \frac{\partial f}{\partial x} = \lim_{\Delta x \to 0} \frac{f(x_0+\Delta x, y_0) - f(x_0, y_0)}{\Delta x}
2.2.2 Least square
To find the the best-fit line y = ax + b for x_i, y_i:
\text{min } [ D_{a, b} = \sum_i (y_i - (ax_i + b)) ^ 2 ] \iff \frac{\partial D}{\partial a} = \frac {\partial D}{\partial b} = 0
2.2.3 Min/max
Look at second derivatives, f_{xx} = \frac{\partial^2 f}{\partial x^2}, f_{xy}, f_{yx}, f_{yy}, fact(Clairaut's theorem): f_{xy} = f_{yx}.
Given f and critical point (x_0, y_0), let A = f_{xx}(x_0, y_0), B = f_{xy}(x_0, y_0), C = f_{yy}(x_0, y_0), then:
- AC - B^2 > 0: A>0 local min, A<0 local max
- AC - B^2 < 0: saddle
- AC - B^2 = 0: see higher derivative
2.2.4 Taylor's theorem
For f:\mathbb{R}^n\to\mathbb{R}, which is k-times continuously differentiable at \mathbf{x_0} \in \mathbb{R}^n, then there exists function h_\alpha:\mathbb{R}^n\to\mathbb{R}, where |\alpha| = k, such that:
f(\mathbf{x}) = \sum_{|\alpha| \leq k} \frac {D^\alpha f(\mathbf x_0)}{\alpha!}(\mathbf{x} - \mathbf{x_0})^\alpha + \sum_{|\alpha| = k}h_\alpha(\mathbf x)(\mathbf{x} - \mathbf{x_0})^\alpha \\ \text{ and } \lim_{\mathbf{x}\to\mathbf{x_0}}h_\alpha(\mathbf{x}) = 0
2.2.5 Total differential
For f = f(x, y, z):
\mathrm d f = f_x \mathrm d x + f_y \mathrm d y + f_z \mathrm d z
2.2.6 Chain rule
For f = f(x, y, z), x = x(t), y = y(t), z = z(t):
\frac{\mathrm d f}{\mathrm d t} = f_x \frac{\mathrm d x}{\mathrm d t} + f_y \frac{\mathrm d y}{\mathrm d t} + f_z \frac{\mathrm d z}{\mathrm d t}
For w = f(x, y), x = x(u, v), y = y(u, v):
\begin{aligned} \mathrm d w & = f_x \mathrm d x + f_y \mathrm d y = f_x (x_u \mathrm d u + x_v \mathrm d v) + f_y (y_u \mathrm d u + y_v \mathrm d v) \\ \implies \frac{\partial f}{\partial u} & = f_x x_u + f_y y_u = \frac{\partial f}{\partial x}\frac{\partial x}{\partial u} + \frac{\partial f}{\partial y}\frac{\partial y}{\partial u} \\ \end{aligned}
2.2.7 Gradient vector
For w = w(x, y, z), x = x(t), y = y(t), z = z(t), to be perpendicular to the level surfaces w = c:
\nabla w = \langle w_x, w_y, w_z \rangle
2.2.8 Rate of change in an arbitrary direction
For w as move in some direction:
\frac{\mathrm d w}{\mathrm d s \left.\right| \hat{u}} = \nabla w \cdot \hat{u} = \left|\nabla w\right|\cos \theta
2.2.9 Lagarange multipliers
To get min/max of f = f(x, y, z) given g(x, y, z) = c:
\nabla f = \lambda \nabla g
2.2.10 Partial derivative with non-indenpendent variables
If f = f(x, y, z), g(x, y, z) = c, ask for (\frac{\partial f}{\partial z})_{y}:
\begin{aligned} & \mathrm d g = 0 \to g_x\mathrm d x + g_y\mathrm d y + g_z \mathrm d z = 0 \\ & \mathrm d y = 0 \to g_x\mathrm d x + g_z \mathrm d z = 0 \iff dx = -\frac{g_z \mathrm d z}{g_x} \\ & \begin{aligned} \mathrm d f &= f_x\mathrm d x + f_y\mathrm d y + f_z \mathrm d z \\ & = f_x\mathrm d x + f_z \mathrm d z \\ & = f_x(-\frac{g_z \mathrm d z}{g_x}) + f_z \mathrm d z \\ & = \frac{f_zg_x-f_xg_z}{g_x}\mathrm d z \end{aligned} \\ & \implies (\frac{\partial f}{\partial z})_{y} = \frac{g_xf_z-f_xg_z}{g_x} \end{aligned}
2.3 Double Integral in Plane
2.3.1 Double integral
For the volume below z = f(x, y) over plane region R:
\iint _{R}f(x, y)\mathrm d A = \iint_{\min y}^{\max y} f(x, y) \mathrm dy \mathrm d x
2.3.2 Polar coordinates
\mathrm d A = r\mathrm d r \mathrm d \theta
2.3.3 Area of region
\iint_R1\mathrm d A
2.3.4 Center of mass
With density function \delta(x, y):
\begin{aligned} \overline x = \frac{1}{\iint_R \delta \mathrm d A} \iint_R x\delta \mathrm d A \\ \overline y = \frac{1}{\iint_R \delta \mathrm d A} \iint_R y\delta \mathrm d A \end{aligned}
2.3.5 Rotational moment of inertia
I = \iint_R (\text{distance to axis})^2\delta \mathrm d A
2.3.6 Jacobian matrix
For u=u(x, y), v = v(x, y):
\mathbf{J} = \frac{\partial(u, v)}{\partial(x, y)}= \begin{vmatrix} u_x & u_y \\ v_x & v_y \end{vmatrix}
More generally for \mathbf{f}:\mathbb{R}^n\rightarrow\mathbb{R}^m: \mathbf{x}\in\mathbb{R}^n, \mathbf{f(x)} \in \mathbb{R}^m:
\mathbf{J}_{ij} = \frac{\partial f_i}{\partial x_j}
2.3.7 Change of variables
For u=u(x, y), v = v(x, y):
\mathrm d u \mathrm d v = \left|\mathbf{J}\right| \mathrm d x \mathrm d y
2.4 Line Integral in Plane
2.4.1 Vector fields
\vec{F} = M \mathbf{\hat{i}} + N \mathbf{\hat{j}}, M = M(x,y), N = N(x, y)
2.4.2 Line integral
W is force multiply distance, W = \vec{F}\cdot \Delta \vec{r}, for a small motion \Delta \vec{r}, total work:
\begin{aligned} W &= \lim_{\Delta \vec{r}\to0}\sum_i{\vec F \cdot \Delta \vec{r_i}}\\ &= \int_C\vec{F}\cdot \mathrm d \vec{r} \\ &= \int_C M \mathrm d x + N \mathrm d y \\ &= \int_C \vec F \cdot \mathbf{\hat T} \mathrm d s \end{aligned}
2.4.3 Fundamental theorem of calculus for line integrals
\vec F = \nabla f is called gradient field, f is called potential function.
\int_C \nabla f \cdot \mathrm d \vec r = f(P_1) - f(P_0), C \text{ is curve from } P_0 \text{ to } P_1 \text{ .}
Consequences:
- Path indenpendence
- Conservativeness
N_x = M_y {\iff}^* \vec F \text{ is a gradient field } \iff \vec F \text{ is conservative: } \oint_C \vec F \cdot \mathrm d \vec r = 0 \text{ for any closed curve}
* only holds if \vec F is defined everywhere or in a simply-connected region, see below.
2.4.4 Curl
curl(\vec F) = N_x - M_y
2.4.5 Green's theorem
\begin{aligned} \oint_C \vec F \cdot \mathrm d \vec r = \iint_R curl(\vec F) \mathrm d A \iff & \oint_C M \mathrm d x + N \mathrm d y = \iint_R (N_x - M_y)\mathrm d A \\ \iff & \oint_C M \mathrm d x = \iint_R - M_y \mathrm d A \land \oint_C N \mathrm d y = \iint_R N_x \mathrm d A \end{aligned}
2.4.6 Flux
\begin{aligned} F &= \lim_{\Delta \vec{r}\to0}\sum_i{(\vec F \cdot \mathbf{\hat n}) \Delta \vec{s_i}} \\ &= \int_C \vec F \cdot \mathbf{\hat n} \mathrm d s \\ &=\int_C \vec F \cdot \langle \mathrm d y, -\mathrm d x \rangle \\ &= \int_C -N\mathrm d x + M \mathrm d y \end{aligned}
2.4.7 Div
div(\vec F) = M_x + N_y
2.4.8 Green's theorem for flux
\oint_C \vec F \cdot \mathbf{\hat n} \mathrm d s = \iint_R div(\vec F) \mathrm d A \iff \oint_C -N\mathrm d x + M \mathrm d y = \iint_R M_x + N_y\mathrm d A
Simply-connected region is a region R if, given any closed curve in R, the interior region is entirely contained in R.
2.5 Triple Integrals and Surface Integrals in 3D-Space
2.5.1 Triple integrals
\iiint_R f \mathrm d V
Appllications are similar to 2-space integrals: mass, center of mass, inertia, etc.
2.5.2 Cylindrical and spherical coordinates
Cylindrical coordinates (r, \theta, z)
\begin{aligned} x &= r \cos \theta \\ y &= r \sin \theta \end{aligned}
\mathrm d V = r\mathrm dr\mathrm d \theta \mathrm d z
Spherical coordinates (\rho, \phi, \theta):
\begin{array}{l} \begin{aligned} \rho &= \text{distance to origin} \\ \phi &= \text{angle down from } z \\ \theta &= \text{same as cylindrical} \end{aligned} \end{array} \implies \begin{array}{l} \begin{aligned} z &= \rho \cos \phi \\ r &= \rho \sin \phi \end{aligned} \end{array}
\begin{aligned} \mathrm d S & = r^2\sin\phi \mathrm d \phi \mathrm d \theta\\ \mathrm d V & = \rho^2\sin\phi\mathrm d \rho \mathrm d \phi \mathrm d \theta \\ \end{aligned}
2.5.3 Vector fields in 3D space
\vec{F} = P \mathbf{\hat{i}} + Q \mathbf{\hat{j}} + R \mathbf{\hat{k}}, P = P(x,y, z), Q = Q(x, y, z), R = R(x, y, z)
2.5.4 Flux in 3D space
F = \iint_S \vec{F}\cdot\mathbf{\hat{n}}\mathrm d S
Notation:
\mathbf{\hat{n}}\mathrm d S = \mathrm d \vec{S}
For plane z = f(x, y):
\mathbf{\hat{n}}\mathrm d S = \pm \langle -f_x, -f_y, 1 \rangle \mathrm d x \mathrm d y
For plane x = x(u, v), y = y(u, v), z = z(u, v):
\mathbf{\hat{n}}\mathrm d S = \pm \langle \frac{\partial x}{\partial u}, \frac{\partial y}{\partial u} , \frac{\partial z}{\partial u} \rangle \times \langle \frac{\partial x}{\partial v}, \frac{\partial y}{\partial v}, \frac{\partial z}{\partial v} \rangle \mathrm d u \mathrm d v = \pm \frac{\partial\vec{r}}{\partial u}\times\frac{\partial\vec{r}}{\partial v} \mathrm d u \mathrm d v
For plane w(x, y, z) = c
\begin{aligned} \text{Normal vector: } & \mathbf{N} = \nabla w \\ \text{Unit normal vector: } & \mathbf{\hat n} = \pm \frac{\mathbf N}{|\mathbf N|} \\ \text{Project to x-y plane (for instance): } & \Delta A = \cos \alpha \Delta S = \frac{\mathbf{\hat n}\cdot \mathbf{\hat{k}}}{|\mathbf{\hat n}| |\mathbf{\hat{k}}|} \Delta S =\mathbf{\hat n}\cdot \mathbf{\hat{k}} \Delta S \\ \implies & \mathrm dS = \frac{1}{\mathbf{\hat n}\cdot \mathbf{\hat{k}}}\mathrm d x \mathrm d y \\ \implies & \mathrm dS = \pm \frac{|\mathbf N|}{\mathbf N \cdot \mathbf{\hat{k}}} \mathrm d x \mathrm d y \\ \implies & \mathbf{\hat{n}}\mathrm d S = \pm \frac{\nabla w}{\nabla w \cdot \hat k} \mathrm d x \mathrm d y \end{aligned}
2.5.5 Divergence theorem
For F = P\vec i + Q \vec j + R \vec k:
div(\vec{F}) = P_x + Q_y + R_z = \nabla \cdot \vec F
For a closed surface S bounding region R, with normal pointing outwards, and \vec F vector field defined and differentiable over all of R:
\oiint_S \vec{F}\cdot\mathbf{\hat{n}}\mathrm d S = \iiint_R \nabla \cdot \vec F \mathrm d V
2.5.6 Diffusion equation
For motion of smoke in (immobile) air, where u(x, y, z, t) is concentration of smoke, \mathbf{F} is flow of smoke:
\begin{aligned} & \mathbf{F} = -k \nabla u \\ & \iint_S \mathbf{F}\cdot \mathbf{\hat{n}} \mathrm d S = -\frac{\mathrm d}{\mathrm d t}\iiint_Ru\mathrm d V \\ \implies & \frac{\partial u}{\partial t} = - div \mathbf{F} = k \nabla^2 u = k(\frac{\partial ^ 2 u}{\partial x ^ 2} + \frac{\partial ^ 2 u}{\partial y ^ 2} + \frac{\partial ^ 2 u}{\partial z ^ 2}) \end{aligned}
2.5.7 Curl in 3D space
curl(\vec F) = \nabla \times \mathbf{F} = \begin{vmatrix} \mathbf{\hat i} & \mathbf{\hat j} & \mathbf{\hat k} \\ \frac \partial {\partial x} & \frac \partial {\partial y} & \frac \partial {\partial z} \\ P & Q & R \end{vmatrix} = (R_y- Q_z)\mathbf{\hat i} + (P_z - R_x)\mathbf{\hat j} + (Q_x - P_y) \mathbf{\hat k}
2.5.8 Stokes theorem
For a closed curve C, and any surface S bounded by C, then:
\oint_C \mathbf{F}\cdot \mathrm d \vec{r} = \iint_S (\nabla \times \mathbf{F})\cdot \mathbf{\hat n} \mathrm d S
3 Linear Algebra (MIT 18.06)
3.1 Vectors
3.1.1 Notation
2 separate numbers v_1 and v_2 produces a two-dimensional vector \mathbf{v} = \begin{bmatrix} v_1 \\ v_2 \end{bmatrix}; it can be written as \mathbf{v} = (v_1, v_2).
3.1.2 Dot product
\mathbf{v}\cdot \mathbf{w} = \sum_i v_iw_i
3.1.3 Length
\lVert \mathbf{v}\rVert = \sqrt{\mathbf{v}\cdot\mathbf{v}}
3.1.4 Angle
\cos \theta = \frac{\mathbf{v}\cdot\mathbf{w}}{\lVert\mathbf{v}\rVert \lVert\mathbf{w}\rVert}
3.2 Matrices
3.2.1 Multiplication
\begin{aligned} \mathbf C &= \mathbf{AB} \\ \mathbf C_{ij} &= \sum_{k} \mathbf{A}_{ik}\mathbf B_{kj} \end{aligned}
Fundamental law of matrix multiplication:
(\mathbf{AB})\mathbf C = \mathbf{A} (\mathbf{BC})
3.2.2 Matrix elementary transformations
- Row interchange, R_i \leftrightarrow R_j ;
- Row scale, R_i \rightarrow kR_i ;
- Row addition, R_i \rightarrow R_i + kR_j .
3.2.3 Gauss elimination (for \mathbf{Ax} = \mathbf b)
To use the elementary transformations to make the matrix a upper triangular matrix. Each single step of transformation can be expressed as one matrix, \Pi_{i} \mathbf{E_i} \cdot \mathbf A = \mathbf U.
\mathbf E \left[ \mathbf{Ax}, \mathbf b \right] = \left[ \mathbf {Ux}, \mathbf{Eb} \right]
3.2.4 Gauss-Jordan elimination (for \mathbf{A^{-1}})
To use the elementary transformations to make the matrix an identity matrix. Again, transformation can be expressed as one matrix, \Pi_{i} \mathbf{E_i} \cdot \mathbf A = \mathbf I \iff \Pi_{i} \mathbf{E_i} = \mathbf{A^{-1}}.
\mathbf E \left[ \mathbf{A}, \mathbf I \right] = \left[ \mathbf{I}, \mathbf A^{-1} \right]
3.2.5 \mathbf {PA} = \mathbf{LU} factorization
All non-singular square matrices can be factorized to \mathbf {PA} = \mathbf{LU} format. \mathbf P is the permutation matrix.
3.3 Vector Space and Subspace
3.3.1 Space of vectors
The vector space \mathbb{R}^n consists of all column vectors \mathbf{v} with n components.
Linear combinations of vectors of space \mathbf S must be in \mathbf S.
3.3.2 Subspaces
A subspace of a vector space is a set of vectors (including \mathbf 0) that satisfies two requirements: If \mathbf{v} and \mathbf{w} are vectors in the subspace and c is any scalar, then
- \mathbf{v} + \mathbf{w} is in the subspace
- c\mathbf{v} is in the subspace.
A subspace of \mathbf R ^n is a vector space inside \mathbf R ^n.
Intersection of subspaces is a subspace.
3.3.3 Column space
The column space C(\mathbf{A}) contain all combinations of the columns of \mathbf A: a subspace of \mathbf R ^ m.
The system \mathbf{A}\mathbf{x} = \mathbf{b} is solvable \iff \mathbf{b} is in the column space of \mathbf{A}.
3.3.4 Null space
The null space N(\mathbf{A}) contains all solutions \mathbf{x} to \mathbf A\mathbf{x}=\mathbf{0}, which contains \mathbf{x}=\mathbf{0}.
3.3.5 Echelon matrix \mathbf{R}
R = \begin{bmatrix} 1 & 0 & a & 0 & c \\ 0 & 1 & b & 0 & d \\ 0 & 0 & 0 & 1 & e \\ 0 & 0 & 0 & 0 & 0 \end{bmatrix} \quad s_1 = \begin{bmatrix} -a \\ -b \\ 1 \\ 0 \\ 0 \end{bmatrix} \quad s_2 = \begin{bmatrix} -c \\ -d \\ 0 \\ -e \\ 1 \end{bmatrix}
The reduced row echelon form \mathbf{R} = \mathbf{rref}(A) has all pivots as 1, with 0 above and below.
If column j of \mathbf{R} is free, there is a particular solution to A\mathbf{x}=\mathbf{0} with x_j = 1; as above.
Number of pivots = nonzero rows in \mathbf R = rank r, then there is n-r free columns.
3.3.6 \mathbf A\mathbf{x}=\mathbf{b}
x_p\text{ (one particular solution)}+\text{ linear combinations of } x_n \text{ (in the null space)}
\mathbf E \left[ \mathbf{A}, \mathbf x \right] = \left[ \mathbf{R}, \mathbf d\right]: \mathbf{A}/\mathbf{R} is solvable \iff all zero rows of \mathbf{R} have zeros in \mathbf d.
3.3.7 Cases for \mathbf A\mathbf{x}=\mathbf{b}
| Symbol | Meaning |
|---|---|
| \mathbf{I} | Invertible matrix |
| \mathbf{P} | Column permutation matrix |
| \mathbf F | Free columns |
| Condition | \mathbf R | Solution | Note |
|---|---|---|---|
| n = m = r | \left[ \mathbf{I} \right] | 1 | square, invertible |
| r = m, r < n | \left[ \mathbf{I} \quad \mathbf{F} \right]\mathbf{P} | \infty | - |
| r < m, r = n | \left[ \begin{array}{c} \mathbf{I} \\ \mathbf{0} \end{array} \right] | 0/1 | 1 \iff all zero rows of \mathbf{R} have zeros in \mathbf d |
| r < m, r < n | \left[ \begin{array}{cc} \mathbf{I} & \mathbf{F} \\ \mathbf{0} & \mathbf{0} \end{array} \right]\mathbf{P} | 0/\infty | \infty \iff all zero rows of \mathbf{R} have zeros in \mathbf d |
3.3.8 Independence, basis, dimension
3.3.8.1 Independence
Independent columns of \mathbf A : The only solution to \mathbf A\mathbf x = \mathbf 0 is \mathbf x = \mathbf 0 \iff r= n. The nullspace is \mathbb{Z}.
Independent vectors: The only zero combination \sum c_i \mathbf v_i = \mathbf 0 has all c's = 0.
3.3.8.2 Basis
The vectors \mathbf v_1, \ldots, \mathbf v_k span the space \mathbf S if \mathbf S = all combinations of the \mathbf v 's.
The vectors \mathbf v_1, \ldots, \mathbf v_k are a basis for \mathbf S if they are independent and they span \mathbf S .
3.3.8.3 Dimension
The dimension of a space \mathbf S is the number of vectors in every basis for \mathbf S, which is the rank of the matrix.
3.3.9 Dimensions of the four subspaces
| Name | Notation | Subspace of | Dimension |
|---|---|---|---|
| Column space | C(\mathbf{A}) | \mathbb{R}^n | r |
| Row space | C(\mathbf{A}^\mathtt{T}) | \mathbb{R}^m | r |
| Null space | N(\mathbf{A}) | \mathbb{R}^n | n - r |
| Left null space | N(\mathbf{A}^\mathtt{T}) | \mathbb{R}^m | m-r |
Elimination produces bases for the row space and nullspace of \mathbf{A}: They are the same as for \mathbf{R}.
Elimination often changes the column space and left nullspace (but dimensions don't change).
3.3.10 Rank 1 matrices
Rank one matrices: \mathbf{A} = \mathbf{u}\mathbf{v}^\mathtt{T} = \text{column times row}: C(\mathbf{A}) has basis \mathbf{u}, C(\mathbf{A}^\mathtt{T}) has basis \mathbf{v}.
3.4 Orthogonality
3.4.1 Vector
\mathbf v ^\mathtt{T} \mathbf w = 0 \iff \lVert \mathbf{v} \rVert^2 + \lVert \mathbf{w} \rVert^2 = \lVert \mathbf{v} + \mathbf{w} \rVert^2
3.4.2 Subspace
Subspaces \mathbf{V} and \mathbf{W} are orthogonal when \mathbf{v}^\mathtt{T} \mathbf{w} = 0 for every \mathbf{v} in \mathbf{V} and every \mathbf{w} in \mathbf{W}.
The row space of \mathbf{A} is orthogonal to the nullspace, as:
\begin{bmatrix} \mathbf{r}_1 \\ \mathbf{r}_2 \\ \mathbf{r}_3 \\ \vdots \\ \mathbf{r}_m \end{bmatrix} \mathbf{x} = \begin{bmatrix} 0 \\ 0 \\ 0 \\ \vdots \\ 0 \end{bmatrix}
Similarily, the column space is orthogonal to N(\mathbf{A}^\mathtt{T}).
Nullspace and row space are orthogonal complements in \mathbb{R}^n.
3.4.3 Project to vector
To project \mathbf b to \mathbf a, assume the projection is \mathbf p = p\mathbf a, then:
\mathbf a ^\mathtt{T} (\mathbf b - \mathbf p) = \mathbf 0 \iff p = \frac{\mathbf a ^\mathtt{T}\mathbf b}{\mathbf a^\mathtt{T} \mathbf a} \iff \mathbf p = \mathbf a \frac{\mathbf a ^\mathtt{T}\mathbf b}{\mathbf a^\mathtt{T} \mathbf a}
Then the projection matrix: \mathbf P = \frac{\mathbf a \mathbf a ^\mathtt{T}}{\mathbf a^\mathtt{T} \mathbf a}, \quad r(\mathbf P) = 1, \mathbf P^\mathtt{T} = \mathbf P, \mathbf P^n = \mathbf P
3.4.4 Project to subspace
This is a super set of projecting to vector.
To project \mathbf b to subspace \mathbf A with basis \mathbf a_{1\dots m}, assume the projection is \mathbf p, then:
\mathbf p = \sum_{i = 1}^{m}x_i\mathbf a _i = \mathbf A \mathbf{\hat x}
\mathbf a_i^\mathtt{T}(\mathbf b - \mathbf p) = 0, \forall i \in [1..m] \iff \begin{bmatrix} \mathbf{a}_1^\mathtt{T} \\ \mathbf{a}_2^\mathtt{T} \\ \mathbf{a}_3^\mathtt{T} \\ \vdots \\ \mathbf{a}_m^\mathtt{T} \end{bmatrix} (\mathbf b - \mathbf p) = \begin{bmatrix} 0 \\ 0 \\ 0 \\ \vdots \\ 0 \end{bmatrix} \iff \mathbf A^\mathtt{T}(\mathbf b - \mathbf p) = \mathbf 0
\begin{aligned} \implies & \mathbf A^\mathtt{T}(\mathbf b - \mathbf A \mathbf{\hat x}) = \mathbf 0 \iff \mathbf{\hat x} = (\mathbf A ^\mathtt{T} \mathbf A) ^ {-1} \mathbf A ^\mathtt{T} \mathbf b \\ \implies & \mathbf p = \mathbf A (\mathbf A ^\mathtt{T} \mathbf A) ^ {-1} \mathbf A ^\mathtt{T} \mathbf b \\ \implies & \mathbf P = \mathbf A (\mathbf A ^\mathtt{T} \mathbf A) ^ {-1} \mathbf A ^\mathtt{T} \end{aligned}
3.4.5 Invertibility of \mathbf{A}^\mathtt{T} \mathbf{A}
\mathbf{A} has independent columns or \text{r}(A) = n \implies \mathbf{A}^\mathtt{T} \mathbf{A} is invertible:
\begin{aligned} & \mathbf{A}^\mathtt{T} \mathbf{A} \text{ is invertible } \iff (\mathbf{A}^\mathtt{T} \mathbf{A} \mathbf x = \mathbf 0 \implies \mathbf x = \mathbf 0 ) \\ & \mathbf{A}^\mathtt{T} \mathbf{A} \mathbf x = \mathbf 0 \implies \mathbf x ^\mathtt{T} \mathbf{A}^\mathtt{T} \mathbf{A} \mathbf x = 0 \implies (\mathbf{A} \mathbf x)^\mathtt{T} \mathbf{A} \mathbf x = 0 \implies \mathbf{A} \mathbf x = \mathbf 0 \overset{\text{r}(A) = n} \implies \mathbf x = \mathbf 0\\ \end{aligned}
3.4.6 Least squares approximations
\text {if } \mathbf {Ax} = \mathbf{b} \text{ cannot be solved, } \text{min} \lVert \mathbf {A \hat x} - \mathbf{b} \rVert \iff \mathbf A \mathbf{ \hat x } = \mathbf {p} \implies \mathbf{ \hat x } = (\mathbf A ^\mathtt{T} \mathbf A) ^ {-1} \mathbf A ^\mathtt{T} \mathbf b
3.4.7 Orthogonal matrices
The columns \textbf q_{1 \dots n} are orthonormal if \mathbf q_i^\mathtt{T}\mathbf q _j = \begin{cases} 0,\quad i \ne j \\ 1,\quad i = j \end{cases}. Then \mathbf Q ^ \mathtt T \mathbf Q = \mathbf I, \lVert \mathbf Q \mathbf x \rVert = \lVert \mathbf x \rVert, and the matrix \mathbf P projecting \mathbf b to subspace \mathbf Q is \mathbf Q \mathbf Q ^\mathtt{T}.
If Q is also square, then \mathbf Q ^ \mathtt T \mathbf Q = \mathbf I and \mathbf{Q}^\mathtt T = \mathbf{Q}^{-1}.
3.4.8 \mathbf A = \mathbf Q \mathbf R decomposition
Gram-Schmidt decomposition can transform any column independent matrix to \mathbf Q as \mathbf Q ^ \mathtt T \mathbf Q = \mathbf I.
\begin{aligned} & \mathbf{A} = \begin{bmatrix} \mathbf{a}_1 & \mathbf{a}_2 & \cdots & \mathbf{a}_{n} \end{bmatrix}, \mathbf{Q} = \begin{bmatrix} \mathbf{q}_1 & \mathbf{q}_2 & \cdots & \mathbf{q}_{n} \end{bmatrix} \\ & \mathbf q_i = \mathbf q_i - \sum_{j = 1}^{n-1}\frac{\mathbf q_j^\mathtt T \mathbf q_i}{\mathbf q_j^\mathtt T \mathbf q_j}\mathbf q_j = \mathbf q_i - \sum_{j = 1}^{n-1}\mathbf q_j^\mathtt T \mathbf q_i \mathbf q_j, \mathbf q_i = \frac{\mathbf q_i}{\lVert \mathbf q_i \rVert} \end{aligned}
\mathbf A \mathbf{ \hat x } = \mathbf {p} \implies \mathbf{ \hat x } = (\mathbf A ^\mathtt{T} \mathbf A) ^ {-1} \mathbf A ^\mathtt{T} \mathbf b, \quad \mathbf A = \mathbf Q \mathbf R \implies \mathbf{ \hat x } = \mathbf R ^{-1} \mathbf Q ^ \mathtt {T} \mathbf b
\mathbf A = \mathbf Q \mathbf R decomposition is also the method of finding the eigen values numerically, which is covered in chapter 11.3.
3.5 Determinants
3.5.1 Basic properties
- The determinant of the n by n identity matrix is 1.
- The sign of determinant reverses when two rows are exchanged.
- The determinant is a linear function of each row separately, with all other rows stay fixed:
\begin{vmatrix} ta & tb \\ c & d \end{vmatrix} = t \begin{vmatrix} a & b \\ c & d \end{vmatrix} , \quad \begin{vmatrix} a + a' & b + b' \\ c & d \end{vmatrix} = \begin{vmatrix} a & b \\ c & d \end{vmatrix} + \begin{vmatrix} a' & b' \\ c & d \end{vmatrix}
3.5.2 Derived properties
- If two rows of \mathbf{A} are equal, then \det \mathbf{A} = 0.
- Subtracting a multiple of one row from another row leaves \det \mathbf{A} unchanged.
- A matrix with a row of zeros has \det \mathbf{A} = 0.
- If \mathbf{A} is triangular then \det \mathbf{A} = a_{11}a_{22}\cdots a_{nn} = \text{product of diagonal entries}.
- If \mathbf{A} is singular then \det \mathbf{A} = 0. If \mathbf{A} is invertible then \det \mathbf{A} \neq 0.
- The determinant of \mathbf{AB} is \det \mathbf{A} times \det \mathbf{B}: \lvert \mathbf{AB} \rvert = \lvert \mathbf{A} \rvert \lvert \mathbf{B} \rvert.
- The transpose \mathbf{A}^\text T has the same determinant as \mathbf{A}.
3.5.3 Big formula
\begin{aligned} \det \mathbf{A} & = \text{sum over all } n! \text{ column permutations } P = (\alpha, \beta, \dots, \omega) \\ &= \sum (\det P) a_{1\alpha} a_{2\beta} \cdots a_{n\omega} \end{aligned}
(\det P) is the sign.
3.5.4 Cofactor
C_{ij} = (-1)^{i+j} \det \mathbf M_{ij} \det \mathbf{A} = a_{i1}C_{i1} + a_{i2}C_{i2} + \cdots + a_{in}C_{in}
\mathbf M_{ij} is the matrix deleting row i and column j.
3.5.5 Cramer's rule
To solve \mathbf A \mathbf x = \mathbf b:
\mathbf{A} \begin{bmatrix} x_1 & 0 & 0 \\ x_2 & 1 & 0 \\ x_3 & 0 & 1 \end{bmatrix} = \begin{bmatrix} b_1 & a_{12} & a_{13} \\ b_2 & a_{22} & a_{23} \\ b_3 & a_{32} & a_{33} \end{bmatrix} = \mathbf{B}_1 \implies x_1 = \frac {|\mathbf B _1|}{|\mathbf A|} \implies x_i = \frac {|\mathbf B _i|}{|\mathbf A|}
Let \mathbf b be one column of \mathbf I:
(\mathbf A^{-1})_{ij} = \frac{C_{ji}}{\det \mathbf{A}}
3.5.6 Area
A is a n \times n matrix, rows/columns are the vectors formed an area; |\det \mathbf A| is the area.
3.6 Eigen
3.6.1 Eigen vector and eigen value
\mathbf A \mathbf x = \lambda \mathbf x
\mathbf x is the eigen vector, \lambda is the eigen value.
\mathbf A \mathbf x = \lambda \mathbf x \implies (\mathbf A - \lambda \mathbf I)\mathbf x = \mathbf 0 \implies \mathbf x = \mathbf 0 \lor \det (\mathbf A - \lambda \mathbf I) = 0
3.6.2 \text{tr} and \text{det}
\begin{aligned} \text{tr} & \mathbf A = \sum_i\mathbf A_{ii} = \sum_i \lambda i \\ \det & \mathbf A = \prod_i \lambda i \end{aligned}
3.6.3 Diagonalization
\mathbf S is the column matrix of eigen vectors of \mathbf A:
\mathbf A \mathbf S = \mathbf S \mathbf{\Lambda} \implies \mathbf A = \mathbf S \mathbf{\Lambda} \mathbf S ^{-1}, \mathbf{\Lambda} = \mathbf S ^{-1} \mathbf A \mathbf S
If \mathbf A has n non-equal eigen values, then \mathbf A must be digonalizable. If not, \mathbf A might be digonlizable.
3.6.4 Symmetric matrices
The eigen values are real, and the eigen vectors can be chosen as perpendicular, and:
\mathbf A = \mathbf Q \mathbf{\Lambda} \mathbf Q ^{-1} = \mathbf Q \mathbf{\Lambda} \mathbf Q ^{\text T}
Proof of real eigen values, given \textbf A is real and symmetric:
\begin{aligned} & \mathbf A \mathbf x = \lambda \mathbf x \\ {\text{take conjugate, multiply }\mathbf {\overline x^{\text T}}} \quad \implies & \mathbf A \mathbf {\overline x} = \overline \lambda \mathbf {\overline x}, \quad \mathbf {\overline x^{\text T}} \mathbf A \mathbf x = \lambda \mathbf {\overline x^{\text T}} \mathbf x \\ {\text{use symmetry, multiply } \mathbf x} \quad \implies & \mathbf {\overline x ^{\text T}} \mathbf A = \mathbf {\overline x ^{\text T}} \overline \lambda, \quad \mathbf {\overline x^{\text T}} \mathbf A \mathbf x = \overline \lambda \mathbf {\overline x ^{\text T}} \mathbf x \\ {\text{left are same}} \quad \implies & \overline \lambda = \lambda \implies \lambda \in \mathbb{R} \end{aligned}
Proof of perpendicular, given \mathbf S\mathbf x=\lambda_1\mathbf x, \mathbf S\mathbf y=\lambda_2\mathbf y:
\lambda_1\mathbf x^\text T \mathbf y = (\lambda_1\mathbf x)^\text T \mathbf y = (\mathbf S \mathbf x) ^ \text T \mathbf y = \mathbf x ^ \text T \mathbf S \mathbf y = \mathbf x ^ \text T\lambda_2\mathbf y = \lambda_2 \mathbf x ^ \text T \mathbf y \implies \mathbf x ^ \text T \mathbf y = \mathbf 0
For repeated eigenvalues, picking any basis of that eigenspace, whose dimension is equal to the repeated times of the eigenvalue, and then doing the Gram-Schmidt decomposition would find the perpendicular eigenvectors.
3.6.5 Positive definite matrices
Positive definite matrices are symmetric matrices with:
- all eigen values > 0; or
- all pivots > 0; or
- all upper left determinants > 0; or
- \textbf x ^ \text T \mathbf S \mathbf x > 0, \forall x \neq \textbf 0.
- \mathbf A ^ \text T \mathbf A, if \mathbf A is column independent.
Application for positive definite matrices:
f(x_i) has a minimum at x=x_0 if \left. \frac{\partial f}{\partial x_i} \right|_{x_0} = 0 and matrix \textbf S(x_0), which is \left. S_{ij}(x_0) = \frac{\partial ^2f}{\partial x_i\partial x_j}\right|_{x_0}, is positive definite.
3.6.6 Similar matrices
For n\times n matrix \mathbf A, \mathbf B, if there is \mathbf M which \mathbf B = \mathbf M ^ {-1} \mathbf A \mathbf M, then \mathbf A, \mathbf B are similar. They have same eigen values.
3.6.7 Jordan form
If there is repeated eigen values, then there are 2 kinds of sets of similar matrices. E.g., for 2 \times 2 matrics with eigen values 4, 4:
\begin{bmatrix} 4 & 0 \\ 0 & 4 \end{bmatrix}, \begin{bmatrix} 4 & 1 \\ 0 & 4 \end{bmatrix}
\begin{bmatrix} 4 & 0 \\ 0 & 4 \end{bmatrix} is only similar to itself. The size of the set is 1.
\begin{bmatrix} 4 & 1 \\ 0 & 4 \end{bmatrix} is similar to many other matrices; it is called the jordan form.
Jordan block:
J_i = \begin{bmatrix} \lambda & 1 & 0 & \cdots & 0 \\ 0 & \lambda & 1 & \cdots & 0 \\ 0 & 0 & \lambda & \cdots & 0 \\ \vdots & \vdots & \vdots & \ddots & 1 \\ 0 & 0 & 0 & \cdots & \lambda \end{bmatrix}
Every square matrix \mathbf{A} is similar to a Jordan matrix \mathbf{J}.
3.7 Linear Transformation
3.7.1 Linear
A transformation T is linear if:
T(\mathbf{v} + \mathbf{w}) = T(\mathbf{v}) + T(\mathbf{w}) \land T(c\mathbf{v}) = cT(\mathbf{v})
If we wish to know T(\mathbf{v}) for all vectors \mathbf{v} in \mathbb{R}^n, we just need to know T(\mathbf{v}_1), T(\mathbf{v}_2), ..., T(\mathbf{v}_n) for any basis \mathbf{v}_1, \mathbf{v}_2, ..., \mathbf{v}_n of the input space:
\begin{aligned} \mathbf{v} &= c_1\mathbf{v}_1 + c_2\mathbf{v}_2 + \cdots + c_n\mathbf{v}_n \\ T(\mathbf{v}) &= c_1T(\mathbf{v}_1) + c_2T(\mathbf{v}_2) + \cdots + c_nT(\mathbf{v}_n) \end{aligned}
c_i are the coordinates.
3.7.2 Matrix for linear transformation
For the matrix \mathbf A_{m\times n} of the transformation T: \mathbb{R}^n\to\mathbb{R}^m, \mathbf v_i, \mathbf w_i are bases for \mathbb{R}^n,\mathbb{R}^m:
T(\mathbf v_i) = \sum_{j=1}^{m}\mathbf{A}_{ji}\mathbf w_j
\mathbf A_{m \times n}\mathbf {c}_{1..n} = \mathbf{c'}_{1..m}.
3.7.3 Change of basis
Let the columns of matrix \mathbf{W} be the basis vectors of the new basis. Then if \mathbf{x} is a vector in the old basis, we can convert it to a vector \mathbf{c} in the new basis using the relation:
\mathbf{x} = \mathbf{W} \mathbf{c}.
If transformation T has the matrix \mathbf{A} when working with the basis \mathbf{v}_1, \mathbf{v}_2, ..., \mathbf{v}_8 and T has the matrix \mathbf{B} when working with the basis \mathbf{w}_1, \mathbf{w}_2, ..., \mathbf{w}_8, it turns out that \mathbf{A} and \mathbf{B} must be similar matrices. In other words:
\mathbf{B} = \mathbf{W}^{-1} \mathbf{A} \mathbf{W}
for some change of basis matrix \mathbf{W}.
Note: basis could be chosen as eigenvectors, then the matrix would be the eigenvalue diagonal matrix.
3.8 SVD
SVD is a generalization of the diagonalization process, but it applies to all m \times n matrices, regardless of whether they are square or non-square, symmetric or non-symmetric.
For any m \times n matrix \mathbf{A}, the SVD theorem guarantees that \mathbf{A} can be decomposed as: \mathbf{A} = \mathbf{U} \mathbf{\Sigma} \mathbf{V}^T where:
- \mathbf{U} is an m \times m orthogonal (or unitary) matrix,
- \mathbf{\Sigma} is an m \times n diagonal matrix with non-negative real numbers (the singular values),
- \mathbf{V} is an n \times n orthogonal (or unitary) matrix.
We can use \mathbf{A}^\text{T} \mathbf{A} and \mathbf{A}\mathbf{A}^\text{T} to find \mathbf V and \mathbf U.
3.8.1 Pseudo inverse
2 side inverse for a full rank square matrix \mathbf A (m = n = r):
\mathbf A ^{-1}\mathbf A = \mathbf A \mathbf A ^ {-1} = \mathbf I
Left inverse for rectangular matrix \mathbf A (r = n < m):
\mathbf A ^{-1}_\text{left} \mathbf A = \mathbf I _ {n\times n}
Since \mathbf A ^ \text T \mathbf A is n \times n and full rank:
\mathbf A ^{-1}_\text{left} = (\mathbf A ^ \text T \mathbf A)^{-1}\textbf{A}^\text T
Similarily:
\mathbf A ^{-1}_\text{right} = \textbf{A}^\text T (\mathbf A \mathbf A ^ \text T)^{-1}
If \mathbf{x} \neq \mathbf{y} are vectors in the row space of \mathbf{A}, then \mathbf{A}\mathbf{x} \neq \mathbf{A}\mathbf{y} in the column space of \mathbf{A}.
The pseudoinverse \mathbf{A}^+ of \mathbf{A} is the matrix for which \mathbf{x} = \mathbf{A}^+ \mathbf{A} \mathbf{x} for all \mathbf{x} in the row space of \mathbf{A}:
\mathbf{A}^+ = \mathbf{V} \mathbf{\Sigma}^+ \mathbf{U}^\text T
3.9 Complex Matrices
Length:
\lVert \mathbf z \rVert ^ 2 = \overline {\mathbf z} ^ \text T \mathbf{z} = \mathbf z ^ \text H \mathbf z
Inner product:
\mathbf x ^ \text H \mathbf y = \sum \overline x_i y_i
3.9.1 Unit root
\omega^n=1 \implies \omega_k = \mathrm{e}^{\mathrm i\cdot2\pi\cdot\frac{k}{n}}, 0 \leq k \leq n - 1, k\in \mathbb Z
3.9.2 Hermitian matrices \mathbf S = \mathbf S ^ \text H
The eigen values are real, and the eigen vectors can be chosen as perpendicular.
3.9.3 Unitary matrices
A unitary matrix is a complex square matrix that has orthonormal columns.
\mathbf Q ^ \text H = \mathbf Q ^ {-1}
3.9.4 Fourier matrix
Fourier matrix is like:
\mathbf F_n = \frac{1}{\sqrt{n}} \begin{pmatrix} 1 & 1 & 1 & \dots & 1 \\ 1 & \omega & \omega^2 & \dots & \omega^{n-1} \\ 1 & \omega^2 & \omega^4 & \dots & \omega^{2(n-1)} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 1 & \omega^{n-1} & \omega^{2(n-1)} & \dots & \omega^{(n-1)(n-1)} \end{pmatrix}, \omega = \mathrm{e}^{\mathrm i\cdot2\pi\cdot\frac{1}{n}}
Or:
F^{-1}_{xy} = \frac{1}{\sqrt{n}} \omega^{xy}
Fourier matrix is unitary matrix as, for column 0\leq p, q\leq n -1:
\begin{aligned} & \mathbf F_p ^ \text H \mathbf F_q = \frac 1 n \sum_{k=0}^{n-1}\overline{\omega ^ {pk}}\omega ^{qk}, \quad \overline{\omega ^ {pk}} = \mathrm{e}^{\mathrm -i\cdot2\pi\cdot\frac{pk}{n}} = \omega^{-pk} \\ \implies & \mathbf F_p ^ \text H \mathbf F_q = \frac 1 n \sum_{k=0}^{n-1}\omega^{-pk}\omega ^{qk} = \frac 1 n\sum_{k=0}^{n-1}\omega^{(q-p)k} \\ \implies & \mathbf F_p ^ \text H \mathbf F_q = \begin{cases} \frac 1 n \sum_{k=0}^{n-1} \omega ^0 = 1, \quad p = q \\ \frac 1 n \sum_{k=0}^{n-1} \omega ^ {(q - p)k} = \frac 1 n \frac{1 - \omega^{(q-p)n}}{1 - \omega^{q-p}} = 0, \quad p \neq q \\ \end{cases} \end{aligned}
So,
\mathbf F_n^{-1} = \mathbf{F}^\text H \iff F^{-1}_{xy} = \frac{1}{\sqrt{n}} \omega^{-xy}
3.9.5 FFT
For \textbf y = \mathbf F_n\textbf a (point-value representation to coefficient representation):
\begin{aligned} \sqrt{n} \cdot \textbf y_j & = \sum_{k=0}^{n-1} \omega_n^{jk}\cdot \textbf a_k \\ & = \sum_{k = 0}^{\frac n 2 -1}\omega^{2jk}_{n}\textbf a_{2k} + \omega_n^j\sum_{k = 0}^{\frac n 2 -1}\omega^{2jk}_{n}\textbf a_{2k+1} \\ & = \sum_{k = 0}^{\frac n 2 -1}\omega^{jk}_{\frac n 2}\textbf a_{2k} + \omega_n^j \sum_{k = 0}^{\frac n 2 -1}\omega^{jk}_{\frac n 2}\textbf a_{2k+1}, \quad 0 \leq j \leq n - 1\\ \end{aligned}
Let j = j+\frac n 2 to last line of above:
\begin{aligned} \sqrt{n} \cdot \textbf y _{j+\frac n 2} & = \sum_{k = 0}^{\frac n 2 -1}\omega^{(j+\frac n 2)k}_{\frac n 2}\textbf a_{2k} + \omega_n^{j+\frac n 2} \sum_{k = 0}^{\frac n 2 -1}\omega^{(j+\frac n 2)k}_{\frac n 2}\textbf a_{2k+1}, \quad 0 \leq j \leq \frac n 2 - 1\\ \end{aligned}
And there is \omega^{(j+\frac n 2)k}_{\frac n 2} = \omega_{\frac n 2}^{jk +\frac n 2 k} = \omega_{\frac n 2}^{jk}\cdot(\omega_{\frac n 2}^{\frac n 2})^k = \omega_{\frac n 2}^{jk} and \omega_n^{j+\frac n 2}= \omega_n^{j}\cdot\omega^{\frac n 2}_n = -\omega_n^j:
\begin{aligned} \sqrt{n} \cdot \textbf y _{j+\frac n 2} & = \sum_{k = 0}^{\frac n 2 -1}\omega^{jk}_{\frac n 2}\textbf a_{2k} -\omega_n^j \sum_{k = 0}^{\frac n 2 -1}\omega^{jk}_{\frac n 2}\textbf a_{2k+1}, \quad 0 \leq j \leq \frac n 2 - 1\\ \end{aligned}
So:
\begin{aligned} \sqrt{n} \cdot\textbf y _j & = \sum_{k = 0}^{\frac n 2 -1}\omega^{jk}_{\frac n 2}\textbf a_{2k} + \omega_n^j \sum_{k = 0}^{\frac n 2 -1}\omega^{jk}_{\frac n 2}\textbf a_{2k+1} = [\textbf F_{\frac n 2}a']_j + \omega_n^j[ F_{\frac n 2}a'']_j = \textbf{y}'_j + \omega_n^j \textbf{y}''_j, 0 \leq j \leq \frac n 2 - 1 \\ \sqrt{n} \cdot \textbf y _{j+\frac n 2} & = \sum_{k = 0}^{\frac n 2 -1}\omega^{jk}_{\frac n 2}\textbf a_{2k} -\omega_n^j \sum_{k = 0}^{\frac n 2 -1}\omega^{jk}_{\frac n 2}\textbf a_{2k+1} = [\textbf F_{\frac n 2}a']_j - \omega_n^j[ F_{\frac n 2}a'']_j = \textbf{y}'_j - \omega_n^j\textbf{y}''_j, 0 \leq j \leq \frac n 2 - 1\\ \end{aligned}
4 Discrete Mathematics (MIT 6.042)
4.1 Proofs
4.1.1 Propositions
A proposition is a statement that is either true or false.
A predicate is a proposition whose truth depends on the value of variable(s).
A mathematical proof of a proposition is a chain of logical deductions leading to the proposition from a base set of axioms.
Axioms should be complete and consistent, but see Gödel's incompleteness theorems.
4.1.1.1 Notations
| English | Symbolic |
|---|---|
| NOT(P) | \neg P (alternatively, \overline{P}) |
| P AND Q | P \wedge Q |
| P OR Q | P \vee Q |
| P IMPLIES Q | P \rightarrow Q |
| if P then Q | P \rightarrow Q |
| P IFF Q | P \leftrightarrow Q |
| P XOR Q | P \oplus Q |
A proposition is valid is all settings of the variables makes the proposition true.
A proposition is satisfiable is some setting of the variables makes the proposition true. The general problem of deciding whether a proposition is satisfiable is called SAT.
A function-like notation is used to denote a predicate suppied with specific variable values:
P(n)::= n\text{ is a perfect square}
4.1.1.2 Algebra
Commutative:
\begin{aligned} & A \lor B = B \lor A \\ & A \land B = B \land A \end{aligned}
Associative:
\begin{aligned} & (A \lor B) \lor C = A \lor (B \lor C) \\ & (A \land B) \land C = A \land (B \land C) \end{aligned}
Distributive:
\begin{aligned} & A \land (B \lor C) = (A \land B) \lor (A \land C) \\ & A \lor (B \land C) = (A \lor B) \land (A \lor C) \end{aligned}
De Morgan's:
\begin{aligned} & \neg (A \land B) = \neg A \lor \neg B \\ & \neg (A \lor B) = \neg A \land \neg B \end{aligned}
4.1.1.3 DNF and CNF
Every proposition is equavalent to conjunctive normal form (CNF) or disjunctive normal form (DNF).
DNF is like (A \land B) \lor C, \text{OR} of \text{AND}.
A full DNF is a DNF where every minterm(connected by \land) contains all the variables. It can be implied by the true table of the formula.
To convert a proposition to full DNF:
- Expand the proposition to DNF, using algebra above.
- Add missing variables, e.g. A\land B = A \land B \land (C \lor \neg C) = (A \land B \land C) \lor (A \land B \land \neg C). Or in other notation, AB = AB(C + \overline C) = ABC + AB\overline C.
- Remove the duplicates.
4.1.2 Patterns of proof
Logical deductions or inference rules, are used to prove new propositions using previously proved ones:
\begin{aligned} & \text{Rule.} \quad \frac{P, \quad P \rightarrow Q}{Q} \\ & \text{Rule.} \quad \frac{\quad P \rightarrow Q, \quad Q \rightarrow R}{\quad P \rightarrow R} \\ & \text{Rule.} \quad \frac{\text{NOT}(P) \rightarrow \text{NOT}(Q)}{\quad Q \rightarrow P} \end{aligned}
4.1.2.1 Methods
- By cases
- For P\implies Q:
- assume P is true, then prove Q with that assumption
- prove the contrapositive
- For P \iff Q:
- prove P \implies Q \land Q \implies P
- contruct \iff chains
- By contradiction
4.1.2.2 Set
Unique, unordered.
- Complement: A^c = C - A
- Power: \mathcal P(A)
- Empty: \emptyset
Commutative Laws: \begin{aligned} & A \cup B = B \cup A \\ & A \cap B = B \cap A \end{aligned}
Associative Laws: \begin{aligned} & (A \cup B) \cup C = A \cup (B \cup C) \\ & (A \cap B) \cap C = A \cap (B \cap C) \end{aligned}
Distributive Laws: \begin{aligned} & A \cap (B \cup C) = (A \cap B) \cup (A \cap C) \\ & A \cup (B \cap C) = (A \cup B) \cap (A \cup C) \end{aligned}
De Morgan's Laws: \begin{aligned} & (A \cap B)^c = A^c \cup B^c \\ & (A \cup B)^c = A^c \cap B^c \end{aligned}
4.1.2.3 Sequence
Ordered.
- Empty: \lambda
Sequence can be generated by cross product of sets:
\begin{aligned} \mathop{\Large\times}_{i=1}^{n} A_i &= \{(a_1, a_2, \dots, a_n) \mid a_i \in A_i, \text{ for } i = 1, 2, \dots, n \} \\ \lvert \mathop{\Large\times}_{i=1}^{n} A_i \rvert &= \prod_{i=1}^{n} \left| A_i \right| \end{aligned}
4.1.3 Induction
\begin{aligned} \text{Induction Rule} & \quad \frac{\text P(0), \quad \forall n \in \mathbb{N}.\text P(n)\rightarrow \text P(n+1)} {\forall m \in \mathbb{N}. \text P(m)} \\ \text{Strong Induction Rule} & \quad \frac{\text P(0), \quad [\forall k \leq n \in \mathbb{N}. \text P(k) ]\rightarrow \text P(n+1)} {\forall m \in \mathbb{N}. \text P(m)} \end{aligned}
4.1.3.1 Well Ordering Principle
A set is well-ordered if every non-empty subset has a least element under a given ordering.
\mathbb N is well-ordered.
To prove that P(n) is true for all n \in \mathbb{N} using WOP:
- Define the set of C of counter examples that P is true, C ::= \{n \in \mathbb N \mid P(n) \text{ is false}\}.
- Per proof by contradiction, assume C is nonempty.
- By WOP, there will be a smallest element n in C.
- Reach a contradiction, that P(n) is actually true or there must be another element smaller than n.
- Conclude that C must be empty.
4.1.3.2 Invariant
To prove that some property X holds for every step of a process:
- Define P(t) to be the predicate that X holds immediately after step t.
- Show that P(0) is true, namely that X holds for the start state.
- Show that
\forall t \in \mathbb{N}. \ P(t) \implies P(t + 1), namely, that for any t \geq 0, if X holds immediately after step t, it must also hold after the following step.
4.2 Number Theory
4.2.1 Basic
\begin{aligned} & a \mid b \iff k\cdot a=b, k \in \mathbb Z\\ & a \mid b \land a \mid c \iff a \mid sb+tc, s,t\in\mathbb Z \\ & \text{rem}(a, b) = a - \lfloor a/b \rfloor b \\ & a \equiv b \mod n \iff \text{rem}(a, n) = \text{rem}(b, n) \end{aligned}
4.2.2 GCD
\gcd(a, b) = \gcd(b, r), r = \text{rem}(a, b)
Proof:
\begin{aligned} & \text{let } A = \{x \mid (x \mid a \land x \mid b)\}, B = \{x \mid (x \mid b \land x \mid r)\}, g \in A \\ \implies & g \mid a \land g \mid b \\ \implies & g \mid sa+tb, s,t\in \mathbb Z \\ \implies & s = 1, t = -\lfloor a/b \rfloor, g\mid r \\ \implies & A \subseteq B \\ \text{similarily, } & B \subseteq A \\ \implies & \gcd(a, b) = \gcd(b, r) \end{aligned}
4.2.3 Bézout's identity
\exists s, t \in \mathbb{Z}, sa + tb = k \iff \gcd(a, b) \mid k
4.2.3.1 Proof
- \exists s, t \in \mathbb{Z}, sa + tb = k \implies \gcd(a, b) \mid k:
\begin{aligned} & \text{let } g = \gcd(a, b) \\ \implies & a = a'g, b = b'g \\ \implies & sa'g + tb'g = g(sa' + tb') = k \\ \implies & g \mid k \end{aligned}
- \gcd(a, b) \mid k \implies \exists s, t \in \mathbb{Z}, sa + tb = k:
\begin{aligned} & \text{let } A = \{x \mid x = sa + tb, s, t \in \mathbb{Z}\} \\ \implies & \{a, b\} \subseteq A \\ & \text{let } d_0 = \min A_+ ; d_0 = x_0a+y_0b\\ & \text{consider } a = q d_0 + r, r \in [0, d_0) \\ \implies & r = a - qd_0 \\ \implies & r \in A \\ & \text{as either } r > 0 \text{ or } r = 0; \text{ but } d_0 = \min A_+, r < d_0 \\ \implies & r = 0 \\ \implies & d_0 \mid a \\ & \text{similarily, } d_0 \mid b \\ \implies & \gcd(a, b) \geq d_0 \\ & \text{consider } d \mid a, d \mid b \\ \implies & d_0 = x_0a'd + y_0b'd \\ \implies & d \mid d_0 \\ \implies & \gcd(a, b) \leq d_0 \\ \implies & d_0 = \gcd(a, b) \\ \implies & \exists s, t \in \mathbb{Z}, sa + tb = \gcd(a, b) \end{aligned}
4.2.3.2 Calc
The extended euclid's algorithm can find the s, t.
- b\neq 0, let q = \lfloor a/b \rfloor:
\begin{aligned} s' b + t' r = \gcd(b, r), r = a - q b \\ s' b + t' (a - qb) = \gcd(b, r) \\ t'a + (s' - t' q)b=\gcd(b, r) \end{aligned} \implies \begin{cases} s = t' \\ t = s'-t' \lfloor a/b \rfloor \end{cases}
- b = 0 \implies s = 1, t = 0.
4.2.4 Inverse
\gcd(k, n) = 1 \implies \exists s, t\in \mathbb Z, sk + tn = 1 \implies sk \equiv 1 \mod n
4.2.5 Euler's theorem
Totient func:
\phi (n) = \lvert \{ k \mid 1 \leq k \leq n, k\in \mathbb Z \land \gcd(k, n) = 1\} \rvert
Euler's theroem:
\gcd(k,n) = 1 \implies k^{\phi(n)} \equiv 1 \mod n
Lemma: \gcd(k, n) = 1 \implies a \equiv b \mod n \iff ak \equiv bk \mod n
Reduced residue system is closed under multiplication:
\begin{aligned} & \text{let } \mathbb Z_n^* = \{x \mid \gcd(x, n) = 1, x < n\} \\ & \text{consider } z_i \in \mathbb Z_n^* \\ \implies & \gcd(z_i, n) = 1, \gcd(k, n) = 1 \\ \implies & \gcd(kz_i, n) = 1 \\ \implies & \text{rem}(kz_i, n) \in Z_n^* \\ & \text{consider lemma above} \\ \implies & Z_n^* = \{\text{rem}(kz, n) \mid z\in Z_n^*\} \end{aligned}
Proof:
\begin{aligned} & Z_n^* = \{\text{rem}(kz, n) \mid z\in Z_n^*\} \\ \implies & \prod_{z_i \in Z_n^*} z_i \equiv \prod_{z_i \in Z_n^*} kz_i \mod n \\ \text{consider } & \prod_{z_i \in Z_n^*} kz_i \equiv k^{\phi(n)} \prod_{z_i \in Z_n^*} z_i \mod n \\ \text{consider } & \gcd(z_i, n) = 1 \\ \implies & \gcd (\prod_{z_i \in Z_n^*} z_i, n) = 1 \\ \implies & k^{\phi(n)}\equiv 1 \mod n \end{aligned}
4.2.6 Fermat's little theorem
For prime p:
\begin{aligned} & \phi(p) = p-1 \\ & p \nmid k \implies \begin{cases} & k^{p-1} \equiv 1 \mod p \\ & k^{-1} = k^{p-2} \end{cases} \\ & \phi(pq) = (p-1)(q-1), \quad p, q\text{ are prime} \end{aligned}
4.2.7 Chinese remainder theorem
There are n integers, \forall i, j, m_i \perp m_j, M = \prod m_i. For
\begin{cases} x \equiv a_1 \pmod{m_1} \\ x \equiv a_2 \pmod{m_2} \\ \vdots \\ x \equiv a_n \pmod{m_n} \end{cases}
There exists unique x \pmod M.
Lemma:
\forall i, x \equiv y \mod m_i \implies \forall i, m_i \mid (x - y) \implies M \mid (x - y) \implies x \equiv y \mod M
Solution, M_i = \frac M {m_i}:
x \equiv \sum_i a_i M_i y_i \mod M, \quad {M_i} {y_i} \equiv 1 \mod m_i
4.2.8 RSA
p, q are 2 primes; n = pq.
\begin{aligned} \text{public key } & (e, n), e \perp \phi(n), e < \phi(n) \\ \text{private key } & (d, n), ed \equiv 1 \mod \phi(n) \end{aligned}
For message m, m < n:
\begin{aligned} & c \equiv m ^ e \mod n \\ & m = c ^ d \mod n \end{aligned}
Proof for m \perp n:
\begin{cases} ed \equiv 1 \mod \phi(n) \implies ed = 1 + k\phi(n) \\ m ^ {\phi(n)} \equiv 1 \mod n \implies m^{k\phi(n)} \equiv 1 \mod n \end{cases} \implies m ^ {ed} = m ^{1 + k\phi(n)} \equiv m \mod n
For m \not\perp n:
\begin{aligned} & n = pq \\ \text{consider } & m = kp, m < n \\ \implies & \begin{cases} m \equiv 0 \mod p \\ m \perp q \end{cases} \\ & m \equiv 0 \mod p \\ \implies & m^{ed} \equiv m \mod p \\ & m \perp q \\ \implies & m ^ {\phi(q)} \equiv 1 \mod q \\ \implies & m ^{q-1} \equiv 1 \mod q \\ \implies & m^{ed} \equiv m^{1 + k\phi(n)} \equiv m \cdot m^{(q-1)\cdot(p-1)k} \equiv m \cdot \left( m^{q-1}\right)^{(p-1)k} \equiv m \mod q \\ \text{consider }&\text{lemma of CRT} \\ \implies & m^{ed} \equiv m \mod n \end{aligned}
4.3 Graph
4.3.1 Introduction
Undirected graph G = (V, E) consists of a non-empty set V and a set E of unordered pairs from V.
Simple graph is a graph without self-loop and duplicate edges.
4.3.2 Isomorphism
Two graphs G=(V_G,E_G) and H=(V_H,E_H) are isomorphic if there exists a bijection f:V_G\to V_H such that for all u,v\in V_G:
(u,v)\in E_G \iff (f(u),f(v))\in E_H
Ref: Graph Isomorphism in Quasipolynomial Time.
4.3.3 Coloring
Coloring is to assign colors to vertices so adjacent vertices have different colors.
\chi(G) is the minimum number of colors needed.
\forall k \geq 3, deciding whether \chi (G) \leq k is NP-complete.
4.3.3.1 DSATUR algorithm
Saturation is defined as the number of distinct colors present in its neighborhood.
- At each step selects the uncolored vertex with the highest saturation, and choose higher degree if there are saturation ties.
- Once a vertex is selected, it is assigned the smallest available color not used by its colored neighbors.
4.3.3.2 Upper bound
| Graph Class | \chi | Key Theorem / Complexity |
|---|---|---|
| trees & bipartite graphs | \chi = 2 | BFS, Linear-time |
| planar graphs | \chi \leq 4 | Four-Color Theorem, 3-colorability is NP-C |
| graphs with max degree \Delta + 1 | \chi \leq \Delta + 1 | - |
4.3.4 Bipartite
A biparitite graph is a graph that the vertices can be partitioned to 2 sets L, R, and the vertices of every edge are among 2 sets.
If there is no odd-length circle, then the graph is bipartite.
4.3.5 Matching
A matching is a set of edges don't share common vertices.
Perfect matching is the matching covers all vertices of L.
4.3.5.1 Hall's theorem
A bipartite graph G = (X, Y, E) has a matching that covers X if and only if for every subset S \subseteq X:
|N(S)| \;\ge\; |S|
N(S) is the neighborhood of S in Y:
N(S) = \{\,y \in Y : \exists\,x \in S \text{ with }(x,y)\in E\}
4.3.5.2 Stable matching
Given two groups of agents (commonly called men and women), where each agent ranks members of the opposite group by preference. The stable matching problem asks for a pairing (a matching) between the two groups so that no pair of agents would both rather be with each other than with their assigned partners. If such a pair exists, it's called a blocking pair and the matching is unstable.
Gale Shapley algorithm is used to solve it.
Initialization:
- Mark every man and woman as free
- Each man keeps a pointer to the next woman on his list (initially the first).
Main Loop (while some man is free), for each currently free man m:
- Let w be the highest-ranked woman on m's list he has not yet proposed to.
- m proposes to w and advances his pointer.
- If w is free, she accepts;
- If w is engaged to man m', compare w's preferences for m vs. m'.
- If she prefers m: She breaks off with m' and becomes engaged to m; m' becomes free again.
- Otherwise, she rejects m, and m remains free to try his next choice.
Important properties:
- O(n^2), in the worst case each man proposes to every woman once.
- Stable, no man-woman pair who both prefer each other to their current partners in result.
- Proposer optimal, every man receives his best achievable partner among all stable matchings.
4.3.5.3 Other matching problems
| Problem | Solution | Time Complexity |
|---|---|---|
| Maxium cardinality | Edmonds' Blossom | O(V^3) |
| Maxium cardinality (bipartite) | Hopcroft-Karp (Dinic) | O(\sqrt{V}\,E) |
| Maximum weight (bipartite) | MCMF | O(VE\log V) |
| Maximum weight (complete bipartite) | Hungarian | O(V^3) |
To use MCMF to solve maximum weight matching (bipartite G = (L \cup R, E )), constructing graph like:
| Edge | Capacity | Cost |
|---|---|---|
| S \rightarrow l, \quad l \in L | 1 | 0 |
| l \rightarrow r, \quad (l, r) \in E | 1 | C-w(l, r) |
| l \rightarrow T, \quad l \in L | 1 | 0 |
| r \rightarrow T, \quad r \in R | 1 | 0 |
4.3.6 Selected problems
Walk is a sequence of vertices.
Path is a walk without duplicate vertices.
Number of walks between i, j with length k for undirected grath in the Matrix G representation is G_{i, j}^{(k)}.
A directed graph is strongly connected if all the pairs u, v are connected.
DAG is a directed graph without any directed cycles.
An euler walk is a walk that traverses every edge in a graph exactly once.
An euler tour is a euler walk starts and ends at the same vertex.
A connected graph has an euler tour if and only if every vertex has even degree.
A hamiltonian cycle in a graph G is a cycle that visits every node in G exactly once. There is always hailtonian path in tournament graph.
For a weighted graph, to find the hamiltonian cycle with minimum sum of weights is called the travelling salesperson problem.
Euclidean TSP in fixed dimension d with 1+\epsilon times of optimal is P: A Polynomial-Time Approximation Scheme (PTAS) for Euclidean TSP.
4.3.7 Tournament graph
Tournament graph is completed graph that for every pair of u,v, exactly one of the directed edge u \to v or v \to u exists.
Every tournament graph contains a hamiltonian path.
Base: trivial case.
Induction:
For T_{n+1}, delete any node v; then there is hamiltonian path u_1 \to u_2 \to \cdots u_n for T_{n+1} - v. Then, either:
- v beats all u
- all u beats v
- \exists i, \, u_i \to v \land v \to u_{i+1}
Hamiltonian path can be contructed for all those 3.
4.3.8 Planar graph
In graph theory, a planar graph is a graph that can be embedded in the plane, i.e., it can be drawn on the plane in such a way that its edges intersect only at their endpoints.
Euler's formula:
If a connected graph has a planar embedding, then:
v - e + f = 2
4.3.9 Minimum panning tree
A connected and acylic graph is a tree. A leaf is degree one node in a tree.
A MST is a spanning tree with the smallest sum of edge weights.
Algorithm:
- Start with empty set E.
- Repeatedly add the minimum-weight edge that doesn't form a cycle with E.
4.3.9.1 Lemma of cut
A cut partitions the vertex set V to 2 non-empty sets. Among the edges crossing the cut, any minimum-weight edge is called a light edge.
Cut property: for every cut, any of its light edges belongs to some MST.
- Let T be an MST and an edge e \notin T.
- Add e to T forms a unique cycle C.
- If edge e' \in C \land w(e') \geq w(e), then T' = T - e' + e is also a MST.
4.3.9.2 Proof
Invariant: at the n-th step of loop, there exists a MST T^\star_{n} that the edge set E \subseteq T^\star_n.
Base: \emptyset \subseteq T^\star.
Induction:
- If new edge e \in T^\star_{n+1}, done.
- If new edge e \notin T^\star_{n+1}, add e to T^{\star}_{n+1} to form a cycle. As e must be the light edge, so there exists some edge e' in the cycle to make T^\star_{n+1} - e' + e a MST.
4.3.10 Communication networks
For circuit switching, the goal of the network is to get data from input to output considering switch size / switch count / diameter and congestion.
The congestion is defined as the the largest number of paths passing a single switch for any routing pattern in the most optimized way.
4.3.10.1 Butterfly network
The butterfly network is a communication network with N = 2^m input/output.
It has 2^m rows and m+1 columns.
Connection pattern:
b_1\cdots b_{j+1}\cdots b_m, \, j \to \begin{cases} b_1\cdots b_{j+1}\cdots b_m, \, j + 1 \\ b_1\cdots \overline{b_{j+1}}\cdots b_m, \, j + 1 \end{cases}
Routing method from (x_i)_2 to (y_i)_2:
(x_i, \,0) \to (y_1x_2\cdots x_m, \, 1)\to(y_1y_2\cdots x_m, \, 2) \to(y_i, m)
4.3.10.2 Comparison
| Network for N nodes | Diameter | Switch Size | Switches | Congestion |
|---|---|---|---|---|
| Complete binary tree | 2\log N + 2 | 3\times3 | 2N - 1 | N |
| 2-D array | 2N | 2\times2 | N^2 | 2 |
| Butterfly | \log N + 2 | 2\times2 | N(\log N + 1) | \sqrt{N} or \tfrac{\sqrt{N}}{2} |
| Benes | 2\log N + 1 | 2\times2 | 2N\log N | 1 |
4.4 Relation
4.4.1 Definition
Given sets A and B, a binary relation R: A \to B from A to B is a subset of A \times B. The sets A and B are called the domain and codomain of R.
aRb or a \sim_R b can be used for (a,b) \in R.
| Property | Description | Symbol | Cardinality |
|---|---|---|---|
| Total | every element of A maps to some element of B | - | - |
| Surjective | every element of B is mapped to some element of A | A \twoheadrightarrow B | \mid A \mid \ge \mid B \mid |
| Injective | every element of B is mapped to at most once | A \hookrightarrow B | \mid A \mid \le \mid B \mid |
| Function | every element of A maps to at most one element of B | A \to B | - |
| Bijective | total, surjective, injective, and function | A \xrightarrow{\sim} B | \mid A \mid = \mid B \mid |
4.4.2 Relations on a single set
| Property | Formal Condition | Graph Interpretation |
|---|---|---|
| Reflexive | \forall x \in A,\; xRx | Every node has a self-loop |
| Irreflexive | \neg \exists x \in A,\; xRx | No self-loop |
| Symmetric | \forall x,y \in A,\; xRy \implies yRx | Every directed edge x \to y is paired with y \to x |
| Antisymmetric | \forall x,y \in A,\; (xRy \land yRx) \implies x = y | No bi-directional edges between nodes |
| Asymmetric | \neg \exists x,y \in A:\; xRy \land yRx | No self-loop, and no bi-directional edges between nodes |
| Transitive | \forall x,y,z \in A,\; (xRy \land yRz) \implies xRz | For any walk v_{1}\cdots v_k, there is a direct edge v_i \to v_k |
Note: Irreflexive + Antisymmetric = Asymmetric.
4.4.3 Equivalence relations
A relation is an equivalence relation if it is reflexive, symmetric, and transitive.
Given an equivalence relation R : A \to A, the equivalence class of an element x \in A is the set of all elements of A related to x by R. It is denoted as [x]:
[x] = \{ y \mid xRy \}.
A partition of a finite set A is a collection of disjoint, nonempty subsets A_1, A_2, \ldots, A_n whose union is all of A. The subsets are usually called the blocks of the partition.
The equivalence classes of an equivalence relation on a set A form a partition of A.
4.4.4 Partial orders
A relation R on a set A is a weak partial order, noted as \preceq, if it is transitive, antisymmetric, and reflexive, e.g. \leq.
Relation R is a strong partial order if it is transitive, asymmetric, noted as \prec, e.g. <.
A total order is a partial order in which every pair of distinct elements is comparable.
4.4.5 DAG
Given a partial order \preceq on a set A, the pair (A, \preceq) is called a partially ordered set or poset.
Poset is acyclic.
Given a digraph G = (V, E), the transitive closure of G is the digraph G^+ = (V, E^+) where
E^+ = \{ u \to v \mid \text{there is a directed path from } u \text{ to } v \text{ in } G \}.
4.4.6 Topological sort
A topological sort of a poset (A, \preceq) is a total order (A, \preceq_T) such that:
(A, \preceq) \implies (A, \preceq_T)
Every finite poset has a topological sort.
4.4.7 Parallel task scheduling
A chain in a poset is a total order sequence of elements.
For any finite poset (A, \preceq) whose longest chain has length t, it is possible to partition A to t subsets such that \forall a \in A_i, i=1,2,\dots,t that b \preceq a are in set \cup_1^{i-1}A_i.
Constructive proof is to partition via rank: r(a)\;=\;\bigl(\text{length of the longest chain ending at }a\bigr):
A_i \;:=\; \{\,a\in A : r(a)=i\}, \qquad i=1,2,\dots,t
An antichain in a oset is a set of elements such that any two elements in the set are incomparable.
A_i are antichains.
For all t > 0, every poset with n elements has either a chain of size greater than t or an antichain of size at least n / t.
4.5 Sum and Asymptotics
4.5.1 Basics
By closed form, that is an expression that does not make use of summation or product symbols.
4.5.2 Harmonic number
H_n = \sum_{i = 1}^{n} \frac 1 i
By integral comparison, H_n \sim \ln n.
4.5.3 n!
4.5.3.1 Trivial
\begin{aligned} & \ln(n!) = \sum\ln(i), \; \int_1^n \ln(d) \mathrm d x = n\ln(n) - n + 1 \\ \implies & n\ln(n) - n + 1 \leq \sum\ln(i)\leq n\ln(n) - n + 1 + \ln(n) \\ \implies & \frac{n^n}{\mathrm e ^ {n - 1}} \leq n! \leq \frac{n^{n+1}}{\mathrm e^{n-1}} \end{aligned}
4.5.3.2 Stirling's formula
For all n \ge 1:
\begin{aligned} & n! \;=\; \sqrt{2\pi n}\,\Bigl(\frac{n}{e}\Bigr)^{\!n}\,e^{\epsilon(n)}, \;\; \frac{1}{12n + 1} \;\le\; \epsilon(n) \;\le\; \frac{1}{12n} \\ \iff & \ln(n!) = n \ln n - n + \frac 1 2 \ln(2\pi n) + \frac 1 {12n} + \mathcal{O}(\frac 1 {n^3}) \end{aligned}
4.5.4 Asymptotic notation
| Notation | Read as | Definition |
|---|---|---|
| O | big oh | f(x)=O(g(x))\iff\exists\,C>0,x_0\text{ s.t. }\forall x>x_0,\lvert f(x)\rvert\le C\lvert g(x)\rvert. |
| o | little oh | f(x)=o(g(x))\iff\displaystyle\lim_{x\to\infty}\frac{f(x)}{g(x)}=0. |
| \Omega | big omega | f(x)=\Omega(g(x))\iff\exists\,C>0,x_0\text{ s.t. }\forall x>x_0,\lvert f(x)\rvert\ge C\lvert g(x)\rvert. |
| \omega | little omega | f(x)=\omega(g(x))\iff\displaystyle\lim_{x\to\infty}\frac{f(x)}{g(x)}=\infty. |
| \Theta | theta | f(x)=\Theta(g(x))\iff f(x)=O(g(x)) \land f(x)=\Omega(g(x)). |
| \sim | asymptotically equivalent | f(x)\sim g(x)\iff\displaystyle\lim_{x\to\infty}\frac{f(x)}{g(x)}=1. |
4.6 Recurrence
4.6.1 Linear recurrence
Homogeneous linear recurrences are like:
f(n) = \sum_{i=1}^{d}a_if(n - i)
The characteristic equation:
P(r)=r^d-a_1 r^{d-1}-a_2 r^{d-2}-\cdots-a_{d-1} r-a_d=0
If r_1,\dots,r_s are the distinct roots with multiplicities m_1,\dots,m_s (\sum m_j=d), then
f(n)=\sum_{j=1}^s \Big(\sum_{k=0}^{m_j-1} c_{j,k}\, n^{k}\Big)\, r_j^{\,n}
with constants c_{j,k} determined by the d initial values.
4.6.1.1 Numerical solution
To find the roots numerically:
s_n=\begin{bmatrix} f(n)\\ f(n-1)\\ \vdots\\ f(n-d+1) \end{bmatrix},\quad s_{d-1}=\begin{bmatrix}f(d-1)\\ \vdots\\ f(0)\end{bmatrix}, \quad C=\begin{bmatrix} a_1 & a_2 & \cdots & a_{d-1} & a_d\\ 1 & 0 & \cdots & 0 & 0\\ 0 & 1 & \cdots & 0 & 0\\ \vdots & & \ddots & & \vdots\\ 0 & 0 & \cdots & 1 & 0 \end{bmatrix}
Then:
\begin{aligned} & s_{n+1}=C\,s_n \\ \implies & s_n=C^{\,n-(d-1)}\,s_{d-1}\ \ (n\ge d-1) \\ \implies & f(n)=e_1^\top\,C^{\,n-(d-1)}\,s_{d-1},\qquad e_1=[1,0,\dots,0]^\top \end{aligned}
If there is no repeated root:
C=V\Lambda V^{-1},\quad \Lambda=\mathrm{diag}(r_1,\dots,r_d)
Then
C^m=V\Lambda^m V^{-1}=V\,\mathrm{diag}(r_1^m,\dots,r_d^m)\,V^{-1}
With state s_n=[f(n),f(n-1),\dots,f(n-d+1)]^\top and s_{d-1} fixed by the initial values,
s_n=C^{\,n-(d-1)} s_{d-1}=V\Lambda^{\,n-(d-1)} V^{-1}s_{d-1}
Taking the first coordinate gives
f(n)=\sum_{j=1}^{d} A_j\, r_j^{\,n}, \quad A = V^{-1}s_{d-1}, \quad V = \begin{bmatrix} r_1^{d-1}&\cdots&r_d^{d-1}\\ r_1^{d-2}&\cdots&r_d^{d-2}\\ \vdots&&\vdots\\ 1&\cdots&1 \end{bmatrix}
4.6.1.2 Non-Homogeneous
For f(n) = \sum_{i=1}^{d}a_if(n - i) + g(n):
Like homogeneous, solve:
P(r)=r^d-a_1r^{d-1}-a_2r^{d-2}-\cdots-a_d=0.
If r_1,\dots,r_s are the distinct roots with multiplicities m_1,\dots,m_s (\sum m_j=d), then
f_h(n)=\sum_{j=1}^s \Big(\sum_{k=0}^{m_j-1} c_{j,k}\, n^{k}\Big)\, r_j^{\,n}
Guess g(n):
| g(n) | Try f_p(n) of the form |
|---|---|
| constant C | A |
| polynomial n^p | degree-p polynomial in n |
| r^n | A\,r^n |
| n^p r^n | (degree-p polynomial)\cdot r^n |
| A\cos(\omega n)+B\sin(\omega n) | A'\cos(\omega n)+B'\sin(\omega n) |
Resonance rule: if the base (e.g., r or e^{\pm i\omega}) is a root of P with multiplicity m, multiply the whole trial form by n^{m}.
f(n)=f_h(n)+f_p(n)
4.6.2 Arka-Bazzi method
Ref:
- Notes on Better Master Theorems for Divide-and-Conquer Recurrences
- On the Solution of Linear Recurrence Equations
Suppose that the function T : \mathbb{R} \to \mathbb{R} satisfies the recurrence
T(x) \;=\; \begin{cases} \text{nonnegative and bounded}, & 0 \le x \le x_0\\[6pt] \displaystyle \sum_{i=1}^k a_i\,T\bigl(b_i x + h_i(x)\bigr) \;+\; g(x), & x > x_0 \end{cases}
where
- a_1,\dots,a_k are positive constants.
- b_1,\dots,b_k are constants in the interval [0,1].
- x_0 is large enough so that T is well-defined.
- g(x) is nonnegative and \lvert g'(x)\rvert is bounded by a polynomial.
- \lvert h_i(x)\rvert = O\bigl(x/\log^2 x\bigr) for each i.
Then
T(x) \;=\; \Theta\!\Bigl(x^p\Bigl(1 + \int_{1}^{x} \frac{g(u)}{u^{p+1}} \,du\Bigr)\Bigr), \; \text{s.t. } \sum_{i=1}^k a_i\,b_i^p \;=\; 1
| g(x) | \displaystyle \int_{1}^{x}\frac{g(u)}{u^{p+1}}du | T(x) |
|---|---|---|
| O\big(x^{p-\varepsilon}\big), \varepsilon>0 | O(1) | \Theta\big(x^{p}\big) |
| \Theta\Big(\dfrac{x^{p}}{\log^{1+\delta}x}\Big), \delta>0 | \Theta(1) | \Theta\big(x^{p}\big) |
| \Theta\Big(\dfrac{x^{p}}{\log x}\Big) | \Theta(\log\log x) | \Theta\big(x^{p}\log\log x\big) |
| \Theta\Big(\dfrac{x^{p}}{\log^{c}x}\Big), 0<c<1 | \Theta\big(\log^{1-c}x\big) | \Theta\big(x^{p}\log^{1-c}x\big) |
| \Theta\big(x^{p}\big) | \Theta(\log x) | \Theta\big(x^{p}\log x\big) |
| \Theta\big(x^{p}(\log x)^k\big), k>-1 | \Theta\big((\log x)^{k+1}\big) | \Theta\big(x^{p}(\log x)^{k+1}\big) |
| \Theta\big(x^{p}(\log x)^k(\log\log x)^m\big) , k>-1 | \Theta\big((\log x)^{k+1}(\log\log x)^m\big) | \Theta\big(x^{p}(\log x)^{k+1}(\log\log x)^m\big) |
| \Theta\big(x^{p+\varepsilon}\big), \varepsilon>0 | \Theta\big(x^{\varepsilon}\big) | \Theta\big(x^{p+\varepsilon}\big) |
4.6.3 Master theorem
T is a recurrence of the form:
T(n) = a \, T\left(\frac{n}{b}\right) + f(n)
| Condition for f(n) | T(n) |
|---|---|
| O\!\left(n^{\log_b a-\epsilon}\right) for some \epsilon>0 | \Theta\!\left(n^{\log_b a}\right) |
| \Theta\!\left(n^{\log_b a}\,\log^{k} n\right) for some k\ge 0 | \Theta\!\left(n^{\log_b a}\,\log^{k+1} n\right) |
| \Omega\!\left(n^{\log_b a+\epsilon}\right) for some \epsilon>0; af(n/b)\le cf(n) for some c<1 and large n | \Theta\!\left(f(n)\right) |
4.7 Counting
4.7.1 Rules
4.7.1.1 Productive
|\mathop{\Large\times}_{i}P_i| = \prod_i|P_i|
4.7.1.2 Sum
\left| \bigcup_{i=1}^n P_i \right| = \sum_{\emptyset \neq I \subseteq \{1,\dots,n\}} (-1)^{|I|+1} \left| \bigcap_{i \in I} P_i \right|
The above inclusion-exclusion identity holds for any finitely additive measure \mu (e.g., counting measure, probability).
Application (euler function):
Let S_i be the set of integers \le n that are divisible by the prime p_i, |S_i|=\frac{n}{p_i},|S_i\cap S_j|=\frac{n}{p_i p_j}:
|S|=n\Bigl(\sum_i\frac1{p_i}-\sum_{i<j}\frac1{p_i p_j}+\cdots+(-1)^{m-1}\frac1{p_1\cdots p_m}\Bigr)
\phi(n)=n-|S| = n\Bigl(1-\sum_i\frac1{p_i}+\sum_{i<j}\frac1{p_i p_j}-\cdots+(-1)^m\frac1{p_1\cdots p_m}\Bigr)=n\prod_{i=1}^m\Bigl(1-\frac1{p_i}\Bigr)
4.7.1.3 Division
If f : A \to B is k-to-1, then:
|A| = k \cdot |B|
E.g.:
- n-objects permutation on a ring is \frac{n!}n = (n-1)!
- k-objects subsets over n-objects is \frac{n!}{k!(n-k)!}
- the (k_1, k_2, \dots, k_m)-split of A is a sequence (A_1, A_2, \dots, A_m) where A_i\cap A_j = \emptyset \land |A_i| = k_i \land \sum k_i = n; the number of splits is {n!} / {\prod k_i !}
4.7.1.4 Bookkeeper
Let l_1, l_2, \dots, l_m be distinct symbols. Suppose we form sequences (or arrangements) of length k_1 + k_2 + \cdots + k_m, where:
- symbol l_1 occurs exactly k_1 times,
- symbol l_2 occurs exactly k_2 times,
- \dots
- symbol l_m occurs exactly k_m times.
Then the number of distinct sequences is given by:
\binom{n}{k_1, k_2, \dots, k_m} = \frac{n!}{k_1! k_2! \cdots k_m!}
4.7.1.5 Binomial and multinomial
Binomial:
For all n \in \mathbb{N} and a, b \in \mathbb{R}:
(a+b)^n = \sum_{k=0}^{n} \binom{n}{k} a^{\,n-k} b^{\,k}
Multinomial:
(x_1 + x_2 + \cdots + x_m)^n = \sum_{k_1+k_2+\cdots+k_m=n} \binom{n}{k_1, k_2, \dots, k_m} x_1^{k_1} x_2^{k_2} \cdots x_m^{k_m}
4.7.1.6 Pascal's identity
\binom{n}{k} = \binom{n-1}{k-1} + \binom{n-1}{k}
4.7.1.7 Pigeonhole principle
\text{If } n \text{ pigeons are placed into } m \text{ holes, then some hole contains at least } \left\lceil \frac{n}{m} \right\rceil \text{ pigeons.}
4.7.2 Generating functions
Generating functions transform sequences to functions:
\langle g_0, g_1, g_2, g_3, \dots \rangle \;\longleftrightarrow\; g_0 + g_1x + g_2x^2 + g_3x^3 + \cdots
Some examples:
\begin{aligned} \langle 1, 1, 1, 1, \dots \rangle \;& \longleftrightarrow\; 1 + x + x^2 + x^3 + x^4 + \cdots = \frac{1}{1-x} \\ \langle 1, -1, 1, -1, \dots \rangle \;& \longleftrightarrow\; 1 - x + x^2 - x^3 + x^4 - \cdots = \frac{1}{1+ x} \\ \langle 1, 0, 1, 0, 1, 0, \dots \rangle \;& \longleftrightarrow\; 1 + x^2 + x^4 + x^6 + x^8 + \cdots = \frac{1}{1-x^2} \\ \langle 1, a, a^2, a^3, \dots \rangle \;& \longleftrightarrow\; 1 + ax + a^2x^2 + a^3x^3 + \cdots = \frac{1}{1-ax} \\ \end{aligned}
4.7.2.1 Rules
Scaling:
\langle s_0, s_1, s_2, \dots \rangle = F(x) \iff \langle cs_0, cs_1, cs_2, \dots \rangle = cF(x)
Addition:
\begin{aligned} & \langle s_0, s_1, s_2, \dots \rangle = F(x), \; \langle t_0, t_1, t_2, \dots \rangle = G(x) \\ \iff & \langle s_0 + t_0, s_1 + t_1, s_2 + t_2, \dots \rangle = F(x) + G(x) \end{aligned}
Right shifting by k leading zeros:
\begin{aligned} & \langle f_0, f_1, f_2, \dots \rangle \longleftrightarrow F(x) \\ \iff & \langle \underbrace{0,0,\ldots,0}_{k \ \text{zeroes}}, f_0, f_1, f_2, \ldots \rangle \;\longleftrightarrow\; x^k \cdot F(x) \end{aligned}
Differentiation:
\begin{aligned} \langle f_n \rangle \;\longleftrightarrow\; F(x) \iff \langle (n+1)f_{n+1} \rangle \;\longleftrightarrow\; F'(x) \end{aligned}
Product:
\begin{aligned} & \langle a_n \rangle \, \longleftrightarrow\, A(x), \, \langle b_n \rangle \, \longleftrightarrow\, B(x) \\ \iff & \langle c_n \rangle \longleftrightarrow A(x)\cdot B(x), \, c_n = \sum_{k = 0}^{n}a_k\cdot b_{n-k} \end{aligned} \\
Summation is a special case of product, b_n = 1, B(x) = \frac 1 {1 - x}:
\begin{aligned} & \langle a_n \rangle \, \longleftrightarrow\, A(x) \\ \iff & \langle s_n \rangle \longleftrightarrow \frac {A(x)}{1-x}, \; s_n = \sum_{k = 0}^{n} a_k \end{aligned}
4.7.2.2 From generating function to sequence
The sequence f_n for function F(x) is:
f_n = \frac{F^{(n)}(0)}{n!}
which is from Taylor's series.
Partial functions:
[x^n]\frac{c x^a}{(1-\alpha x)^b} = \frac{c \,(n-a+b-1)!}{(n-a)! \,(b-1)!} \, \alpha^{\,n-a}
General linear recurrence: \begin{aligned} & f_n = a_1 f_{n-1} + a_2 f_{n-2} + \cdots + a_d f_{n-d} + g_n, \; n \geq d \\ \implies & F(x) = \frac{h(x) + G(x)}{1 - a_1 x - a_2 x^2 - \cdots - a_d x^d}, \; G(x) = \sum_{n=d}^\infty g_n x^n , \; h(x) = \sum_{i=0}^{d-1} h_i x^i \end{aligned}
4.7.3 Sets
Set A is infinite \iff A bij A \cup \{B\}.
Set C is countable \iff it is finite \lor \mathbb N \xrightarrow{\sim} C
Sets A, B are countable, then A \cup B, A \times B are countable \implies \mathbb Q is countable.
4.7.4 Cantor's theorem
For any set A, the power set \mathcal P(A) has strictly larger cardinality than A itself.
\mathcal P(A) is at least as big as A: there is injection f: A \;\longrightarrow\; \mathcal P(A) by f(a) = \{a\}.
\mathcal P(A) is strictly bigger:
Assume there was a surjection: g: A \;\longrightarrow\; \mathcal P(A):
Define: A_g \;=\;\{\,a\in A : a\notin g(a)\}\;\subseteq A.
By surjectivity, pick a_0\in A with g(a_0)=A_g.
a_0\in A_g. By the definition of A_g, a_0\notin g(a_0). But g(a_0)=A_g, so this says a_0\notin A_g. Contradiction.
a_0\notin A_g. By the definition of A_g, a_0\in g(a_0). With g(a_0)=A_g, this gives a_0\in A_g. Contradiction.
Both branches contradict, so no such surjective g exists. Therefore |\mathcal P(A)|>|A|.
4.8 Probability
4.8.1 Basic
A countable sample space S is a nonempty countable set.
- An element w \in S is called an outcome,
- A subset E \subseteq S is called an event.
A probability function on a sample space S is a total function \Pr : S \to \mathbb{R}
such that:
- \Pr[w] \geq 0 \quad \forall w \in S
- \sum_{w \in S} \Pr[w] = 1
A sample space with a probability function, (S, \Pr), is a probability space.
Rules:
\begin{aligned} & \Pr [E] = \sum_{w \in E} \Pr[w] \\ & \Pr[\bar A] = 1 - \Pr[A] \\ \end{aligned}
Ref: \cup_i S_i.
4.8.2 Conditional
\begin{aligned} \Pr[A \mid B] & = \frac{\Pr[A \cap B]}{\Pr[B]} \\ \Pr\!\left[\bigcap_{i=1}^n E_i\right] & = \prod_{k=1}^n \Pr\!\left[E_k \;\middle|\; \bigcap_{i=1}^{k-1} E_i\right] \end{aligned}
Bayes' rule:
\Pr[B \mid A] = \frac{\Pr[A \mid B] \cdot \Pr[B]}{\Pr[A]}
If E_1, E_2, \dots, E_n are disjoint events whose union is the entire sample space S (that is, they form a partition of S), then for any event A:
\Pr[A] = \sum_{i=1}^n \Pr[A \mid E_i] \cdot \Pr[E_i]
4.8.3 Independence
Events A and B are independent if either:
- \Pr[B] = 0, or
- \Pr[A \mid B] = \Pr[A].
Or:
Events A and B are independent \iff \Pr[A \cap B] = \Pr[A] \cdot \Pr[B] .
4.8.3.1 Mutual independence
A set of events E_1, E_2, \dots, E_n is mutually independent if for every subset S \subseteq \{1,2,\dots,n\}:
\Pr\!\left[\bigcap_{j \in S} E_j \right] \;=\; \prod_{j \in S} \Pr[E_j]
Or, for every subset S \subseteq \{1,\dots,n\} \setminus \{i\}:
\Pr[E_i \mid \bigcap_{j \in S} E_j] = \Pr[E_i]
4.8.4 Random variables
A random variable R on a probability space (S, \Pr) is a total function R : S \;\to\; \mathbb{R},
where:
- S = the sample space (the set of all possible outcomes),
- \Pr = the probability function on S,
- \mathbb{R} = the real number.
Thus, each outcome w \in S is assigned a real number R(w).
4.8.4.1 Independence
If each R_i is discrete with countable range S_i, then the following are equivalent:
- R_1,\dots,R_n are mutually independent.
- For all x_i\in S_i,
\Pr[R_1=x_1,\dots,R_n=x_n]=\prod_{i=1}^n \Pr[R_i=x_i]
4.8.4.2 Distribution functions
Let R be a random variable with codomain V. The probability density function (pdf) of R is a function \text{PDF}_R : V \;\to\; [0,1]:
\text{PDF}_R(x) = \begin{cases} \Pr[R = x], & \text{if } x \in \operatorname{range}(R) \\ 0, & \text{if } x \notin \operatorname{range}(R) \end{cases}
- Non-negativity: \text{PDF}_R(x) \geq 0 \quad \forall x \in V ,
- Normalization: \sum_{x \in \operatorname{range}(R)} \text{PDF}_R(x) = 1 .
The cummulative distribution function (cdf) is:
\text{CDF}_R(x)=\Pr[R\le x] = \sum_{y\le x} \Pr[R = y]=\sum_{y\le x}\text{PDF}_R(y)
4.8.4.3 Binomial distribution
Think of n repeated, identical trials with only two outcomes each (success/failure). Let X be the number of successes in the n trials. Then X\sim\mathrm{Binomial}(n,p).
\Pr[X=k]=\binom{n}{k}p^{k}(1-p)^{n-k}, \quad k=0,1,\dots,n
4.8.4.3.1 Aproximation
For k=\alpha n with 0<\alpha<1:
\binom{n}{\alpha n}\ \lesssim \frac{2^{\,nH(\alpha)}}{\sqrt{2\pi\,\alpha(1-\alpha)\,n}}
H(\alpha)=\alpha\log_2\!\frac{1}{\alpha}+(1-\alpha)\log_2\!\frac{1}{1-\alpha} is the binary entropy function.
So:
\begin{aligned} f_{n,p}(\alpha n)\ & \lesssim \frac{2^{\,n\big(\alpha\log_2\frac{p}{\alpha}+(1-\alpha)\log_2\frac{1-p}{1-\alpha}\big)}} {\sqrt{2\pi\,\alpha(1-\alpha)\,n}}\ \\ \end{aligned}
4.8.4.3.2 Sampling
Let X_1,\dots,X_n \stackrel{\text{i.i.d.}}{\sim}\mathrm{Bernoulli}(p) with unknown p\in(0,1). Define \hat p=\frac1n\sum_{i=1}^n X_i.
Given accuracy \varepsilon\in(0,\tfrac12) and confidence level 1-\alpha, find the smallest n such that
\sup_{p\in(0,1)} \Pr\big[|\hat p-p|>\varepsilon\big]\ \le\ \alpha
Non-asymptotic (guaranteed) choice via Hoeffding:
For any p\in(0,1),
\Pr\!\big[|\hat p-p|>\varepsilon\big]\ \le\ 2e^{-2n\varepsilon^2}
So it suffices to make 2e^{-2n\varepsilon^2}\le \alpha, i.e.
\boxed{\quad n \;\ge\; \frac{1}{2\varepsilon^2}\,\ln\frac{2}{\alpha}\quad}
A concrete integer choice is n=\left\lceil \frac{1}{2\varepsilon^2}\ln\frac{2}{\alpha}\right\rceil. This bound is uniform in p and needs no approximations.
Chebyshev (very conservative):
\operatorname{Var}[\hat p]=\tfrac{p(1-p)}{n}\le \tfrac{1}{4n}, hence
\Pr\!\big[|\hat p-p|>\varepsilon\big]\ \le\ \frac{1}{4n\varepsilon^2} \quad\Rightarrow\quad n \;\ge\; \frac{1}{4\alpha\varepsilon^2}
Normal/CLT heuristic (often near-minimal):
Worst variance occurs near p=\tfrac12, so
\Pr\!\big[|\hat p-p|>\varepsilon\big]\ \approx\ 2\Phi\!\big[-2\varepsilon\sqrt n\big]\ \le\ \alpha \;\Rightarrow\; \boxed{\; n \approx \frac{z_{1-\alpha/2}^2}{4\varepsilon^2}\;}
where z_{1-\alpha/2} is the standard normal quantile.
4.8.5 Expectation
\mathbb{E}_X[R] = \sum_{w\in\mathcal{S}} R[w]\,\Pr[w]
4.8.5.1 Geometric distribution
Let X \sim \mathrm{Bernoulli}(p) trials, T\in\{1,2,\dots\} = index of the first success:
\Pr[T=t]=p(1-p)^{t-1},\quad \mathbb{E}[T]=\frac1p
Memorylessness (discrete, unique to geometric):
\Pr[T>m+n\mid T>m]=\Pr[T>n],\qquad \Pr[T>n]=(1-p)^n
For a non-repairable item that, in each discrete time step, fails independently with probability p, the time-to-failure (in steps) is T\sim\mathrm{Geom}(p) as above. The Mean Time To Failure (MTTF) equals the expectation:
\mathrm{MTTF}=\mathbb{E}[T]=\frac1p \text{ steps}
4.8.5.2 Rules
Conditional:
Let R be a discrete random variable, and let A be an event with \Pr[A] > 0. The conditional expectation of R given A is:
\mathbb{E}[R \mid A] \;=\; \sum_{r \in \operatorname{range}(R)} r \cdot \Pr[R = r \mid A]
Total:
Let R be a random variable on a sample space S, and let \{A_1, A_2, \dots\} be a partition of S:
\mathbb{E}[R] \;=\; \sum_i \mathbb{E}[R \mid A_i] \, \Pr[A_i]
Linearity:
For any random variables R_1, R_2, \dots, R_k and constants a_1, a_2, \dots, a_k \in \mathbb{R}:
\mathbb{E}\!\left[\sum_{i=1}^k a_i R_i \right] \;=\; \sum_{i=1}^k a_i \, \mathbb{E}[R_i]
Linearity works even if the R_i's are dependent.
Products:
If R_1,\dots,R_k are mutually independent (and have finite means), then
\mathbb{E}\!\left[\prod_{i=1}^k R_i\right]=\prod_{i=1}^k \mathbb{E}[R_i]
Function:
For linear function f:
\mathbb{E}[f(T)] = f(\mathbb{E}[T])
4.8.6 Deviation
The variance of a random variable R:
\operatorname{Var}[R] \;=\; \mathbb{E}\!\big[(R - \mathbb{E}[R])^2\big] = \mathbb{E}[R^2] - (\mathbb{E}[R])^2
The standdard deviation \sigma_R of a random variable R:
\sigma_R = \sqrt{\mathrm{Var}[R]}
4.8.6.1 Rules
Linear Transformations:
\mathrm{Var}[aR + b] = a^2 \, \mathrm{Var}[R]
If R_1 and R_2 are independent random variables:
\mathrm{Var}[R_1 + R_2] \;=\; \mathrm{Var}[R_1] + \mathrm{Var}[R_2]
If R_1, R_2, \dots, R_n are pairwise independent random variables:
\mathrm{Var}[\sum_i R_i] = \sum_i \mathrm{Var}[R_i]
4.8.6.2 Markov's theorem
If R is a nonnegative random variable, then for all x > 0:
\Pr[R \geq x] \;\leq\; \frac{\mathbb{E}[R]}{x}
4.8.6.3 Chebyshev's theorem
For any random variable R, any \alpha>0, and x>0:
\Pr\big[|R|\ge x\big] =\Pr\big[|R|^\alpha \ge x^\alpha\big] \le \frac{\mathbb{E}[|R|^\alpha]}{x^\alpha}
Let C = R - \mathbb E [R], \; \alpha = 2, and recall the variance:
\Pr[|C|\ge x] = \Pr[|R-\mathbb{E}[R]|\ge x] \le \frac{\operatorname{Var}[R]}{x^2}
Let x = c \sigma_R:
\Pr\big[|R-\mathbb{E}R|\ge c\,\sigma_R\big] \;\le\;\frac{\mathrm{Var}[R]}{(c\,\sigma_R)^2} =\frac{\sigma_R^2}{c^2\,\sigma_R^2} =\frac{1}{c^2}
4.8.6.4 Cantelli's inequality
Let \mu=\mathbb{E}[R] and \sigma^2=\mathrm{Var}[R]. Let Z=R-\mu.
For any a\ge 0 and any t>0:
\{Z\ge t\}\subseteq \{Z+a\ge t+a\}\subseteq \{(Z+a)^2\ge (t+a)^2\}
By Markov:
\Pr[Z\ge t]\le \frac{\mathbb{E}[(Z+a)^2]}{(t+a)^2} = \frac{\sigma^2+a^2}{(t+a)^2}
Minimize the RHS over a\ge 0: f(a)=(\sigma^2+a^2)/(t+a)^2 has minimum at a^*=\sigma^2/t, giving:
\Pr[Z\ge t]\le \frac{\sigma^2}{t^2+\sigma^2}
With t=c\,\sigma:
\Pr\big[R-\mathbb{E}[R] \ge c\,\sigma_R\big]\le \frac{1}{1+c^2}
Same to -Z:
\Pr\big[R-\mathbb{E}[R] \le -c\,\sigma_R\big]\le \frac{1}{1+c^2}
4.8.6.5 Chernoff bound
Let T_1,\dots,T_n be mutually independent with 0\le T_i\le1, and set \mu=\sum_i \mathbb{E}[T_i]. For any c\ge1:
\Pr[T\ge c\mu]\;\le\;e^{-k\mu}\quad \text{where }k=c\ln c-c+1
Proof:
For any \lambda>0, by Markov on e^{\lambda T}:
\Pr[T\ge c\mu]=\Pr[e^{\lambda T}\ge e^{\lambda c\mu}] \le \frac{\mathbb{E}[e^{\lambda T}]}{e^{\lambda c\mu}}
Mutual independence gives \mathbb{E}[e^{\lambda T}]=\prod_i \mathbb{E}[e^{\lambda T_i}].
For x\in[0,1], convexity of e^{\lambda x} yields:
e^{\lambda x}\le (1-x)e^{0}+x e^{\lambda}=1+x\,(e^{\lambda}-1)
With \mu_i=\mathbb{E}[T_i], 1+x \le \exp(x):
\mathbb{E}[e^{\lambda T_i}] \le 1+\mu_i\,(e^{\lambda}-1) \le \exp\!\big(\mu_i (e^{\lambda}-1)\big)
Multiplying,
\mathbb{E}[e^{\lambda T}] \le \exp\!\Big((e^{\lambda}-1)\sum_i \mu_i\Big)=\exp\!\big((e^{\lambda}-1)\mu\big)
Therefore
\Pr[T\ge c\mu]\le \exp\!\big((e^{\lambda}-1)\mu-\lambda c\mu\big)
For \lambda\ge0: f(\lambda)=(e^{\lambda}-1)\mu-\lambda c\mu is minimized at \lambda=\ln c:
\Pr[T\ge c\mu]\le \exp\!\big(-(c\ln c-c+1)\mu\big)=e^{-k\mu}
4.8.6.6 Murphy'a law
Let A_1,\dots,A_n be mutually independent events and let T=\sum_{i=1}^n \mathbf{1}_{A_i} be the number that occur. Let \mu=\mathbb{E}[T]=\sum_{i=1}^n \Pr[A_i]:
\Pr[T=0]=\Pr\!\Big[\bigcap_{i=1}^n \overline{A_i}\Big] =\prod_{i=1}^n (1-\Pr[A_i]) \;\le\; \prod_{i=1}^n e^{-\Pr[A_i]} = e^{-\sum_i \Pr[A_i]} = e^{-\mu}
Given 1-x\le e^{-x} for all x.
4.8.7 Random walk
We consider a one-dimensional Markov chain (simple random walk with absorbing boundaries) on the finite state space
\{0,1,\dots,w\}, \quad w\ge 2
The process starts at some initial state n\in\{1,\dots,w-1\}. At each step:
- with probability p\in(0,1), the position increases by +1;
- with probability q:=1-p, the position decreases by -1.
We are interested in the absorption probabilities at the two boundaries. Define
\begin{aligned} a_n & = \Pr\!\bigl[\text{hit }w\text{ before }0 \,\big|\, X_0=n\bigr] \\ b_n & = \Pr\!\bigl[\text{hit }0\text{ before }w \,\big|\, X_0=n\bigr] \\ c_n & = \Pr\!\bigl[\text{never hit } \{0,w\} \,\big|\, X_0=n\bigr] \end{aligned}
4.8.7.1 Recurrence
By conditioning on the first step from state n,
a_n = p\,a_{n+1} + q\,a_{n-1}, \quad 1\le n \le w-1 ; \quad a_0=0, a_w=1
4.8.7.1.1 p \neq q
Try a_n=\lambda^n. Substituting yields the characteristic equation
p\lambda^2 - \lambda + q = 0 \implies \begin{cases} \lambda_1 = 1\\ \lambda_2 = \frac{q}{p} \end{cases} \implies \boxed{\,a_n = \frac{1-\left(\tfrac{q}{p}\right)^n}{1-\left(\tfrac{q}{p}\right)^w}\,}
4.8.7.1.2 p = q
The characteristic equation has a repeated root \lambda=1:
\boxed{\,a_n = \frac{n}{w}\,}
4.8.7.2 Destiny
From any state 1\le i \le w-1, there exists a path to an absorbing boundary in at most w-1 steps (e.g., always step left or always step right). The probability of such a path is at least
\rho := \min\{p^{\,w-1},\,q^{\,w-1}\} > 0
Therefore, the probability of not being absorbed within the next w-1 steps is at most 1-\rho. Repeating over independent blocks of length w-1,
\Pr[\text{not absorbed after }k(w-1)\text{ steps}] \le (1-\rho)^k \xrightarrow{k\to\infty} 0
Thus absorption occurs with probability one, so c_n=0:
\boxed{\,b_n=1-a_n\,}
4.8.7.3 Lifespan
Let T_n=\mathbb E[T\mid X_0=n]. By conditioning on the first step,
T_n=1+p\,T_{n+1}+q\,T_{n-1},\qquad 1\le n\le w-1,\quad(q=1-p)
with boundary conditions T_0=0,\;T_w=0.
4.8.7.3.1 p \neq q
Let r = \frac q p, homogeneous part is T_n^{(h)}=A+B\,r^n.
For particular part, try linear Cn:
Cn=1+pC(n+1)+qC(n-1)=1+Cn+(p-q)C \Rightarrow C=\frac{1}{q-p}
Thus T_n=A+B r^n+\dfrac{n}{q-p}.
Use the boundaries to get A/B:
\boxed{\,T_n=\frac{1}{q-p}\left(n-\;w\,\frac{1-(q/p)^n}{1-(q/p)^w}\right)\,},\qquad (p\neq q).
4.8.7.3.2 p = q
The homogeneous solution is A+Bn. Try a quadratic particular solution Cn^2:
(T_{n+1}-2T_n+T_{n-1})=2C=-2\;\Rightarrow\;C=-1
So T_n=A+Bn-n^2. Apply T_0=0\Rightarrow A=0 and T_w=0\Rightarrow Bw-w^2=0\Rightarrow B=w:
\boxed{\,T_n=n(w-n)\,}
4.8.8 Applications
4.8.8.1 Birthday paradox and hash collision
Let N be the number of possible "birthdays" (e.g., N=365), and let k samples be drawn uniformly and independently from \{1,\dots,N\}, then:
\Pr[\text{no-coll}] =\frac{(N)_k}{N^k} =\frac{N!}{(N-k)!\,N^k}
Take logs and apply Stirling's formula:
\begin{aligned} \ln \Pr_{\text{no-coll}} &\approx\big[N\ln N-N+\tfrac12\ln(2\pi N)+\tfrac{1}{12N}\big] \\ &\quad-\big[(N-k)\ln(N-k)-(N-k)+\tfrac12\ln(2\pi(N-k))+\tfrac{1}{12(N-k)}\big] -k\ln N \\ & \approx -\frac{k(k-1)}{2N}\;-\;\frac{k(k-1)(2k-1)}{12N^2} \;+\;\mathcal O\!\left(\frac{k^4}{N^3}\right) \end{aligned}
So, \Pr_{\text{coll}}\approx 1-\exp\!\left(-\frac{k(k-1)}{2N}\right), \;\; k=o (\sqrt n).
For hashing function with range size N, collision existence probability \Pr, the evaluation times k:
k \;\approx\; \sqrt{\,2\ln\!\frac{1}{1-\Pr} }\sqrt N
\Pr = 10^{-6} \implies k \approx 0.0014 \sqrt N.
4.8.8.2 Nontransitive dice
Let each die D = \{d_1,\dots,d_m\} be a multiset of integers with uniform distribution.
For two dice D_i, D_j:
P[D_i \succ D_j] = \frac{1}{m^2}\sum_{a=1}^m \sum_{b=1}^m \mathbf{1}[d_{ia} > d_{jb}]
There is a collection of dice \{D_1, \dots, D_k\} is nontransitive if there exists a cycle such that:
P[D_1 \succ D_2] > \tfrac{1}{2}, \quad P[D_2 \succ D_3] > \tfrac{1}{2}, \quad \dots, \quad P[D_k \succ D_1] > \tfrac{1}{2}
And for k \geq 3, there exist nontransitive {D_k}, Ref: Maximally Nontransitive Dice .
4.8.8.3 Coupon collector problem
To collect all n distinct coupons. Each draw gives a random coupon, each type equally likely and independent. What is the expection of number of draws to collect all the coupons?
- Think of collecting as stages. After you've already seen k distinct types, there are n-k unseen types left.
- On the next draw, the chance of getting a new type is \frac{n-k}{n}.
- The number of draws to succeed with probability p is geometric with expectation 1/p. So the expected extra draws to go from k types to k+1 types is
\mathbb{E}[X_k]=\frac{1}{(n-k)/n}=\frac{n}{\,n-k\,}
Then:
\mathbb{E}[T] = \sum_{i = 0}^{n - 1} \mathbb E [X_i] = n H_n \sim n\ln n
4.8.8.4 Penney's game
The original HHT vs HTT problem:
Two players watch a single run of fair coin tosses. A wins if the pattern HHT appears first; B wins if HTT appears first.
Modeling the process by states defined as "the longest suffix of the running tosses that is also a prefix of either target".
Let p_0 be A's win probability from the start, and p_H, p_{HH}, p_{HT} from states ending with (H, HH, HT) respectively. The fair-coin transitions give:
- p_0=\tfrac12 p_H+\tfrac12 p_0\Rightarrow p_0=p_H
- p_H=\tfrac12 p_{HH}+\tfrac12 p_{HT}
- p_{HH}=\tfrac12\cdot 1+\tfrac12 p_{HH}\Rightarrow p_{HH}=1
- p_{HT}=\tfrac12\cdot 0+\tfrac12 p_H=\tfrac12 p_H
Solving yields p_H=\tfrac23, hence p_0=\tfrac23.
The length-3 'counter sequence' rule:
In Penney's game, the second player can always pick a length-3 pattern that beats the first player's choice with probability (2/3). If the first player chooses S=a_1a_2a_3 (each a_i\in\{H,T\}), the counter is
S^\star=\overline{a_2}a_1a_2
This construction exploits prefix-suffix overlaps: whenever play has lined up the opponent's first two symbols, your counter tends to complete one step sooner.
For length \geq 3, there is always counter sequence: Optimal Penney Ante Strategy via CorrelationPolynomial Identities.