Note of Mathematics for Computer Science

1 Calculus I (MIT 18.01)

1.1 Differentiation

1.1.1 Limit

\lim_{x\to a}f(x) = L \iff \forall \epsilon>0, \exists \delta > 0: \lvert f(x) - L\rvert < \epsilon , \forall x\in(a-\delta, a)\cup(a, a+\delta)

1.1.2 Derivative

The derivative is the slope of the line tangent to the graph of f(x).

Tangent line is the limit of the secant line, as the distance between the 2 points goes to zero.

\lim_{\Delta x \to 0} \frac{\Delta f}{\Delta x} = \lim_{\Delta x \to 0} \frac{f(x_0+\Delta x) - f(x_0)}{\Delta x} = f'(x_0)

Equivalent notations:

\frac{\mathrm{d}f(x)}{\mathrm{d}x}, \frac {\mathrm{d}}{\mathrm{d}x} f(x), f', Df

1.1.3 Differential notation

\mathrm{d}y = f'(x)\mathrm{d}x \quad (y = f(x))

Both \mathrm{d}y and f'(x)\mathrm{d}x are called differentials.

1.1.4 Continuity

\lim_{x\to x_0} f(x) = f(x_0) \iff f(x) \text{ is continuous at } x_0

Different kinds of discontinuity:

Differentiable implies continuous as:

\begin{aligned} & \lim_{\Delta x \to 0} \Delta f(a+\Delta x) = f(a) \\ \iff & x = a, \lim_{\Delta x \to 0} \Delta y = \lim_{\Delta x \to 0} \frac{\Delta y}{\Delta x} \Delta x = (\lim_{\Delta x \to 0} \frac{\Delta y}{\Delta x}) (\lim_{\Delta x \to 0} \Delta x) = \frac{\mathrm{d}y}{\mathrm{d}x} \cdot0=0 \end{aligned}

1.1.5 Conputations of derivatives

basic:

\begin{aligned} \mathrm{d}Cu & = C \mathrm{d}u \\ \mathrm{d}(u+v) &= \mathrm{d}u + \mathrm{d}v \\ \mathrm{d}(uv) &= u \mathrm{d}v + v \mathrm{d}u \\ \mathrm{d}\frac uv &=\frac{v \mathrm{d}u-u\mathrm{d}v}{v^2} \end{aligned}

chain:

\frac{\mathrm{d}y}{\mathrm{d}x}=\frac{\mathrm{d}y}{\mathrm{d}u}\cdot\frac{\mathrm{d}u}{\mathrm{d}x} \quad y=f(u) \text{ and } u = g(x)

or in the form:

[f(g(x))]' = (f\circ g)'(x) = f'(g(x)) \cdot g'(x)

1.1.6 Higher derivative notation

\begin{aligned} f^{''}(x) & \quad y^{''} & \quad \frac{\mathrm{d}^2y}{\mathrm{d} x^2} & \quad \frac {\mathrm{d}^2}{\mathrm{d} x ^ 2} f(x) \\ f^{(n)}(x) & \quad y^{n} & \quad \frac{\mathrm{d}^ny}{\mathrm{d} x^n} & \quad \frac {\mathrm{d}^n}{\mathrm{d} x ^ n} f(x) \end{aligned}

1.1.7 Implicit differentiation

Use the chain rule and treat the response variable cautiously. Example: \mathrm{d}\tan^{-1}x.

\frac{f'(t)}{f(t)} = \frac{\mathrm{d}}{\mathrm{d}t}\ln (f(t))

1.1.8 Derivative table

Func Derivative Domain
C 0 x \in \mathbb{R}
x^n nx^{n-1} n \in \mathbb{R}
a^x \ln a\cdot a^x a\neq 1
\log_ax \frac 1 {\ln a} \cdot\frac{1}{x} a\neq 1
\sin x \cos x x \in \mathbb{R}
\cos x -\sin x x \in \mathbb{R}
\tan x \frac{1}{\cos^2x}, \sec^2x x\neq \frac{\pi}{2}+k\pi, k\in\mathbb{Z}
\sec x \sec x \tan x x\neq \frac{\pi}{2}+k\pi, k\in\mathbb{Z}
\sin^{-1}x \frac{1}{\sqrt{1-x^2}} x\in(-1,1)
\tan^{-1}x \frac{1}{1+x^2} x \in \mathbb{R}

1.1.9 L'Hospital's rule

If f(a) = g(a) = 0, have derivatives for x=a, and g'(a) \neq 0, then:

\lim_{x\to a}\frac{f(x)}{g(x)} = \lim_{x\to a} \frac{\frac{f(x)}{x-a}}{\frac{g(x)}{x-a}} = \frac{\lim_{x\to a}\frac{f(x) - f(a)}{x-a}}{\lim_{x\to a}\frac{g(x) - g(a)}{x-a}} =\frac{f'(a)}{g'(a)}

It is also ok to use L'Hospital's rule on limits of the form \frac{\infty}{\infty} or x\to\pm\infty.

1.2 Applications of Differentiation

1.2.1 Linear approximation

f(x) \approx f(x_0) + f'(x_0)(x-x_0), \quad x\approx x_0

1.2.2 Newton's method

To solve f(x)=0:

x_{n+1} = x_n - \frac{f(x_n)}{f'(x_n)}

The mathematical theory providing conditions under which Newton's method is guaranteed to succeed can be found in numerical analysis.

1.2.3 Mean value theorem

\exists c \in (a, b), f'(c)=\frac{f(b) - f(a)}{b-a}; f(x) \text{ is continuous}, x\in [a, b] \land f(x) \text{ is differentiable} , x\in (a, b)

1.2.4 Anti-derivative table

Func Anti-derivative Domain
x^n \frac{x^{n+1}}{n+1}+c n \neq -1
\frac 1x \ln\lvert x\rvert+c x\neq0
\sin x -\cos x +c x \in \mathbb{R}
\tan x -\ln(\cos x) + c x\neq \frac{\pi}{2}+k\pi, k\in\mathbb{Z}
\sec x \ln(\tan x+\sec x) +c x\in(-\frac{\pi}{2}+2k\pi, \frac{\pi}{2}+2k\pi),k\in\mathbb{Z}
\frac{1}{\cos^2x}, \sec^2x \tan x+c x\neq \frac{\pi}{2}+k\pi, k\in\mathbb{Z}
\frac{1}{\sqrt{1-x^2}} \sin^{-1}x -
\frac{1}{1+x^2} \tan^{-1}x -

1.2.5 Differential in economics

Economics law, give C(x) is function of average cost:

\frac{\mathrm{d}}{\mathrm{d}x}\frac{C(x)}{x} = 0 \iff C'(x)=\frac{C(x)}{x}

Aka, if average cost reaches minimum, then marginal cost = average cost.

And, if profit reaches maxium, then marginal revenue = marginal cost.

Elasticity of demand: E(p)=-\frac p x \frac{\mathrm{d}x}{\mathrm{d}p}.

1.3 Integration

1.3.1 Riemann sum

\sum_{i=1}^{n}f(c_i)\Delta x, \quad c_i \in [x_{i - 1}, x_i], x_i = a + \frac{b-a} n i

1.3.2 Definite integral

\lim_{n\to \infty}\sum_{i=1}^{n}f(c_i)\Delta x = \int^b_af(x) \mathrm d x

1.3.3 First fundamental theorem of calculus

F'(x) = f(x) \land f(x) \text{ is continous} \implies \int^b_af(x) \mathrm d x=F(b)-F(a)

1.3.4 Second fundamental theorem of calculus

F(x) = \int_a^x f(t)\mathrm d t \land f(x) \text{ is continous} \implies F'(x) = f(x)

1.3.5 Substitution

Given f(x) = g(u(x)):

\int g(u) \mathrm{d} u = \int g(u(x))u'(x)\mathrm{d}x = \int f(x)u'(x) \mathrm{d}x

For definite:

\int_{x_1}^{x2} f(x)u'(x) \mathrm{d}x = \int_{u(x_1)}^{u(x_2)} g(u) \mathrm{d} u

1.3.6 Trigonometric function integration

\int \sin^n(x)\cos^m(x) \mathrm d x

  1. Either n or m is odd; u = \text{odd\_func}(x).
  2. n and m are all even; use double angle formula until 1 applies.

\int \sec^n(x)\tan^m(x) \mathrm d x

  1. n is even, u = \tan x.
  2. m is odd, u = \sec x.
  3. n is odd, m is even: a. \tan^2 x + 1 = \sec^2 x b. use the result for \int \sec ^n x \mathrm d x.

1.3.7 Partial function integration

\frac{P(x)}{Q(x)} can be splitted in a systematic way.

Partial integral:

(uv)' = u'v+uv' \iff uv'= (uv)' - u'v \iff \int uv' \mathrm d x = uv - \int u'v \mathrm d x

Examples:

  1. \int (\ln x)^n \mathrm d x
  2. \int x^n \mathrm e ^ x \mathrm d x
  3. \int \sin x \mathrm e ^ x \mathrm d x
  4. \int \cos x \mathrm e ^ x \mathrm d x
  5. \int \sec ^n x \mathrm d x

1.3.8 Improper integral

\int_{a}^{\infty} f(x) \mathrm d x = \lim_{x\to\infty}F(x)\mathrm d x

is said to converge is the limit exists.

1.4 Applications of Integral

1.4.1 Area between 2 curves

A = \int_a^b(f(x)-g(x))\mathrm d x

1.4.2 Volume by slice method

To get the volume of a function revolved about x-axis:

\mathrm d V = \pi f(x)^2 \mathrm d x

1.4.3 Volume by disk method

To get the volume of a function revolved about y-axis:

\mathrm d V = \pi x^2 \mathrm dy

1.4.4 Volume by cylinder method

To get the volume of a function revolved about y-axis:

\mathrm d V = 2\pi x y \mathrm dx

1.4.5 Weighted average

\frac{\int_a^bf(x)w(x)\mathrm d x}{\int_a^bw(x)\mathrm d x}

1.4.6 Simpson's method

For numerical integration:

\int_{a}^{b} f(x) \mathrm d x = \frac{b - a}{6}[f(a) + 4f(\frac{a+b}{2}) + f(b)] + \mathcal{O}((b-a)^4)

For \mathcal{O}((b-a)^4): Formulas for the Error in Simpson's Rule.

1.4.7 Arc length

L = \int \mathrm d s =\int_{x_0}^{x_1}\sqrt{1+(\frac{dy}{dx})^2}\mathrm d x

1.4.8 Surface area of a function revolved about x-axis

S = \int 2\pi y \mathrm d s

1.4.9 Area in polar coordinates

S = \int^{\theta_2}_{\theta_1}\frac 1 2 r^2 \mathrm d \theta

1.5 Series

1.5.1 Sequence

A sequence is said to be bounded if there are 2 numbers A, B such that A\leq x_n \leq B for every n.

A sequence is said to be convergent if it has a limit, aka, \forall \epsilon>0, \exists n_0: |x_n-L|<\epsilon, \forall n>n_0.

\mathrm{convergent} \implies \mathrm{bounded}

An increasing sequence converges if and only if it is bounded.

1.5.2 Series

s_n = \sum_i a_i

is called infinite series, or simply series. a_i is called terms.

If the sequence \{s_n\} converges, then the series converges.

Geometric series:

\begin{aligned} a_n & = x ^n \\ s_n & = \frac{1-x^{n+1}}{1-x} \end{aligned}

It converges for |x| < 1.

1.5.3 Ratio test and root test

If \sum a_n is a series of positive terms, given L = \lim_{n\to \infty}\frac {a_{n+1}}{a_n}, then:

Given sum of geometric series converges for r\in (0,1), given L = \lim_{n\to \infty} (a_n)^{\frac 1 n}, then:

1.5.4 Alternating series test

For \sum_{n=1}^{\infty}(-1)^{n+1}a_n:

A series \sum a_n is said to be absolutely convergent if \sum |a_n| converges.

\text{absolute convergence} \to \text{convergence}

1.5.5 Power series

s_n = \sum_{n=0}^{\infty}a_nx^n

where the a_n are constants and x is variable. Geometric series is the simplest power series.

Every power series \sum a_nx^n has a radius of convergence R, where 0 \leq R \leq \infty, with the property that the series converges absolutely if |x| < R and diverges if |x| > R:

R = \lim_{n\to\infty}|\frac{a_n}{a_{n+1}}|

Within the radius of convergence, the powerseries is continuous, differentiable and integrable.

1.5.6 Integral comparison

Consider a positive, non-increasing function f(x) > 0:

Let s_n=\sum_{i=1}^n f(i), I_n=\int_{1}^{n} f(x)\mathrm dx:

As decreasing:

f(i+1) \leq f(x) \leq f(i), x\in[i, i+1] \implies \sum_{i=2}^{n} f(i)\ \leq \ \int_{1}^{n} f(x)\,dx\ \leq \ \sum_{i=1}^{n-1} f(i)

Equivalently:

s_n - f(1) \leq I_n \leq s_{n-1} \iff f(n) \leq s_n - I_n \leq f(1)

Similarily for increasing function f:

f(1) \leq s_n - I_n \leq f(n)

So, for s_n = \sum_i f(i) and \int_1^\infty f(x)\mathrm d x, either both converge or both diverge.

1.5.7 Euler's constant

As in integral comparison, there is F(n) = \sum_1^n a_n - \int_1^nf(x) \mathrm d x, and 0 \leq F(n) \leq a_1. F(n) is decreasing and L = \lim_{n\to\infty}F(n) exists. The application is:

\text{Euler's constant } \gamma = \lim_{n\to\infty} (\sum_{i=1}^{n} \frac 1 i - \int_{1}^{n}\frac 1 x \mathrm d x)

1.5.8 Taylor's series

At x=0:

f(x)=\sum_{i=0}^{n}\frac{f^{(i)}(0)}{i!}x^i +R_n(x)

Or x=x_0:

f(x)=\sum_{i=0}^{\infty}\frac{f^{(i)}(x_0)}{i!}(x-x_0)^i

Derivative remainder of Taylor's formula:

R_n(x) = \frac{f^{(n+1)}(c)}{(n+1)!}x^{n+1}, c \in(0,x)

The Taylor's series converges to f(x) if and only if:

\lim_{n\to\infty}R_n(x)=0

After operations of Taylor's series of \sin x, \cos x, \mathrm e^x:

\mathrm e ^ {ix}=\cos x + i \sin x \implies \mathrm e ^ {i\pi} + 1 = 0

1.6 Misc

1.6.1 \mathrm{e}

\frac {\mathrm{d}} {\mathrm{d} x}a^x = a^x \to a=\mathrm{e}\iff \lim_{\Delta x \to0}\frac{\mathrm{e}^{\Delta x}-1}{\Delta x} = 1 \iff \mathrm{e}=\lim_{n\to \infty} (1+\frac 1 n)^n, n=\frac 1 {\Delta x}

It is easy to prove f(x)=c\mathrm{e}^x is the only function f'(x)=f(x) other than trivial f(x)=0.

1.6.2 \int_{-\infty}^{\infty}e^{-x^2}\mathrm d x

Volume by slice method for

  1. V = volume under e^{-r^2} (r = \sqrt{x^2+y^2} = \pi
  2. To prove V = (\int_{-\infty}^{\infty}e^{-x^2}\mathrm d x)^2 using slice.

1.6.3 Maximum overhung

Paterson et al. (2007)

1.6.4 Real number system

The axiomatic approach (I prefer this quora answer):

They are an Abelian Group under addition:

Excluding zero they are an Abelian Group under multiplication:

Multiplication distributes over addition:

They are totally ordered:

They are dedekind-complete (have the least-upper-bound property):

It turns out that any structure satisfying these axioms is isomorphic, so we call any such structure the Real numbers, (\mathbb{R},+,\cdot ). In short \mathbb{R} is the dedekind-complete ordered Field.

2 Calculus II (MIT 18.02)

2.1 Vectors and Matrices

2.1.1 Product

\begin{aligned} \vec A \cdot \vec B & = \sum a_ib_i = |\vec A||\vec B| \cos \theta \\ \vec A \times \vec B & = \begin{vmatrix} \vec i & \vec j & \vec k \\ a_1 & a_2 & a_3 \\ b_1 & b_2 & b_3 \end{vmatrix} = \vec i \begin{vmatrix} a_2 & a_3 \\ b_2 & b_3 \end{vmatrix} - \vec j \begin{vmatrix} a_1 & a_3 \\ b_1 & b_3 \end{vmatrix} + \vec k \begin{vmatrix} a_1 & a_2 \\ b_1 & b_2 \end{vmatrix} \\ |\vec A \times \vec B| & = \text{area of space parallelogram} = |\vec A||\vec B|\sin \theta\\ \frac{\vec A \times \vec B}{|\vec A \times \vec B|} &= \hat n\text{ perpendicular to plane formed by } \vec A \text{ and } \vec B \\ \end{aligned}

\frac{\mathrm d}{\mathrm dt}\bigl(\vec{A} \times \vec{B}\bigr) = \frac{\mathrm d\vec{A}}{\mathrm dt} \times \vec{B} \;+\; \vec{A} \times \frac{\mathrm d\vec{B}}{\mathrm dt}

2.1.2 Determinant

\begin{aligned} |\det(\vec A, \vec B)| & = \text {area of parallelogram} \\ |\det(\vec A, \vec B, \vec C)| & = \text {area of parallelepiped} \end{aligned}

2.1.3 Inverse matrix

\mathbf{A}^{-1} = \frac{1}{\det({\mathbf{A}})}\text{adj}(\mathbf{A})

2.2 Partial Derivatives

2.2.1 Partial derivatives

To treat all other variables as constant:

f_x = \frac{\partial f}{\partial x} = \lim_{\Delta x \to 0} \frac{f(x_0+\Delta x, y_0) - f(x_0, y_0)}{\Delta x}

2.2.2 Least square

To find the the best-fit line y = ax + b for x_i, y_i:

\text{min } [ D_{a, b} = \sum_i (y_i - (ax_i + b)) ^ 2 ] \iff \frac{\partial D}{\partial a} = \frac {\partial D}{\partial b} = 0

2.2.3 Min/max

Look at second derivatives, f_{xx} = \frac{\partial^2 f}{\partial x^2}, f_{xy}, f_{yx}, f_{yy}, fact(Clairaut's theorem): f_{xy} = f_{yx}.

Given f and critical point (x_0, y_0), let A = f_{xx}(x_0, y_0), B = f_{xy}(x_0, y_0), C = f_{yy}(x_0, y_0), then:

2.2.4 Taylor's theorem

For f:\mathbb{R}^n\to\mathbb{R}, which is k-times continuously differentiable at \mathbf{x_0} \in \mathbb{R}^n, then there exists function h_\alpha:\mathbb{R}^n\to\mathbb{R}, where |\alpha| = k, such that:

f(\mathbf{x}) = \sum_{|\alpha| \leq k} \frac {D^\alpha f(\mathbf x_0)}{\alpha!}(\mathbf{x} - \mathbf{x_0})^\alpha + \sum_{|\alpha| = k}h_\alpha(\mathbf x)(\mathbf{x} - \mathbf{x_0})^\alpha \\ \text{ and } \lim_{\mathbf{x}\to\mathbf{x_0}}h_\alpha(\mathbf{x}) = 0

2.2.5 Total differential

For f = f(x, y, z):

\mathrm d f = f_x \mathrm d x + f_y \mathrm d y + f_z \mathrm d z

2.2.6 Chain rule

For f = f(x, y, z), x = x(t), y = y(t), z = z(t):

\frac{\mathrm d f}{\mathrm d t} = f_x \frac{\mathrm d x}{\mathrm d t} + f_y \frac{\mathrm d y}{\mathrm d t} + f_z \frac{\mathrm d z}{\mathrm d t}

For w = f(x, y), x = x(u, v), y = y(u, v):

\begin{aligned} \mathrm d w & = f_x \mathrm d x + f_y \mathrm d y = f_x (x_u \mathrm d u + x_v \mathrm d v) + f_y (y_u \mathrm d u + y_v \mathrm d v) \\ \implies \frac{\partial f}{\partial u} & = f_x x_u + f_y y_u = \frac{\partial f}{\partial x}\frac{\partial x}{\partial u} + \frac{\partial f}{\partial y}\frac{\partial y}{\partial u} \\ \end{aligned}

2.2.7 Gradient vector

For w = w(x, y, z), x = x(t), y = y(t), z = z(t), to be perpendicular to the level surfaces w = c:

\nabla w = \langle w_x, w_y, w_z \rangle

2.2.8 Rate of change in an arbitrary direction

For w as move in some direction:

\frac{\mathrm d w}{\mathrm d s \left.\right| \hat{u}} = \nabla w \cdot \hat{u} = \left|\nabla w\right|\cos \theta

2.2.9 Lagarange multipliers

To get min/max of f = f(x, y, z) given g(x, y, z) = c:

\nabla f = \lambda \nabla g

2.2.10 Partial derivative with non-indenpendent variables

If f = f(x, y, z), g(x, y, z) = c, ask for (\frac{\partial f}{\partial z})_{y}:

\begin{aligned} & \mathrm d g = 0 \to g_x\mathrm d x + g_y\mathrm d y + g_z \mathrm d z = 0 \\ & \mathrm d y = 0 \to g_x\mathrm d x + g_z \mathrm d z = 0 \iff dx = -\frac{g_z \mathrm d z}{g_x} \\ & \begin{aligned} \mathrm d f &= f_x\mathrm d x + f_y\mathrm d y + f_z \mathrm d z \\ & = f_x\mathrm d x + f_z \mathrm d z \\ & = f_x(-\frac{g_z \mathrm d z}{g_x}) + f_z \mathrm d z \\ & = \frac{f_zg_x-f_xg_z}{g_x}\mathrm d z \end{aligned} \\ & \implies (\frac{\partial f}{\partial z})_{y} = \frac{g_xf_z-f_xg_z}{g_x} \end{aligned}

2.3 Double Integral in Plane

2.3.1 Double integral

For the volume below z = f(x, y) over plane region R:

\iint _{R}f(x, y)\mathrm d A = \iint_{\min y}^{\max y} f(x, y) \mathrm dy \mathrm d x

2.3.2 Polar coordinates

\mathrm d A = r\mathrm d r \mathrm d \theta

2.3.3 Area of region

\iint_R1\mathrm d A

2.3.4 Center of mass

With density function \delta(x, y):

\begin{aligned} \overline x = \frac{1}{\iint_R \delta \mathrm d A} \iint_R x\delta \mathrm d A \\ \overline y = \frac{1}{\iint_R \delta \mathrm d A} \iint_R y\delta \mathrm d A \end{aligned}

2.3.5 Rotational moment of inertia

I = \iint_R (\text{distance to axis})^2\delta \mathrm d A

2.3.6 Jacobian matrix

For u=u(x, y), v = v(x, y):

\mathbf{J} = \frac{\partial(u, v)}{\partial(x, y)}= \begin{vmatrix} u_x & u_y \\ v_x & v_y \end{vmatrix}

More generally for \mathbf{f}:\mathbb{R}^n\rightarrow\mathbb{R}^m: \mathbf{x}\in\mathbb{R}^n, \mathbf{f(x)} \in \mathbb{R}^m:

\mathbf{J}_{ij} = \frac{\partial f_i}{\partial x_j}

2.3.7 Change of variables

For u=u(x, y), v = v(x, y):

\mathrm d u \mathrm d v = \left|\mathbf{J}\right| \mathrm d x \mathrm d y

2.4 Line Integral in Plane

2.4.1 Vector fields

\vec{F} = M \mathbf{\hat{i}} + N \mathbf{\hat{j}}, M = M(x,y), N = N(x, y)

2.4.2 Line integral

W is force multiply distance, W = \vec{F}\cdot \Delta \vec{r}, for a small motion \Delta \vec{r}, total work:

\begin{aligned} W &= \lim_{\Delta \vec{r}\to0}\sum_i{\vec F \cdot \Delta \vec{r_i}}\\ &= \int_C\vec{F}\cdot \mathrm d \vec{r} \\ &= \int_C M \mathrm d x + N \mathrm d y \\ &= \int_C \vec F \cdot \mathbf{\hat T} \mathrm d s \end{aligned}

2.4.3 Fundamental theorem of calculus for line integrals

\vec F = \nabla f is called gradient field, f is called potential function.

\int_C \nabla f \cdot \mathrm d \vec r = f(P_1) - f(P_0), C \text{ is curve from } P_0 \text{ to } P_1 \text{ .}

Consequences:

  1. Path indenpendence
  2. Conservativeness

N_x = M_y {\iff}^* \vec F \text{ is a gradient field } \iff \vec F \text{ is conservative: } \oint_C \vec F \cdot \mathrm d \vec r = 0 \text{ for any closed curve}

* only holds if \vec F is defined everywhere or in a simply-connected region, see below.

2.4.4 Curl

curl(\vec F) = N_x - M_y

2.4.5 Green's theorem

\begin{aligned} \oint_C \vec F \cdot \mathrm d \vec r = \iint_R curl(\vec F) \mathrm d A \iff & \oint_C M \mathrm d x + N \mathrm d y = \iint_R (N_x - M_y)\mathrm d A \\ \iff & \oint_C M \mathrm d x = \iint_R - M_y \mathrm d A \land \oint_C N \mathrm d y = \iint_R N_x \mathrm d A \end{aligned}

2.4.6 Flux

\begin{aligned} F &= \lim_{\Delta \vec{r}\to0}\sum_i{(\vec F \cdot \mathbf{\hat n}) \Delta \vec{s_i}} \\ &= \int_C \vec F \cdot \mathbf{\hat n} \mathrm d s \\ &=\int_C \vec F \cdot \langle \mathrm d y, -\mathrm d x \rangle \\ &= \int_C -N\mathrm d x + M \mathrm d y \end{aligned}

2.4.7 Div

div(\vec F) = M_x + N_y

2.4.8 Green's theorem for flux

\oint_C \vec F \cdot \mathbf{\hat n} \mathrm d s = \iint_R div(\vec F) \mathrm d A \iff \oint_C -N\mathrm d x + M \mathrm d y = \iint_R M_x + N_y\mathrm d A

Simply-connected region is a region R if, given any closed curve in R, the interior region is entirely contained in R.

2.5 Triple Integrals and Surface Integrals in 3D-Space

2.5.1 Triple integrals

\iiint_R f \mathrm d V

Appllications are similar to 2-space integrals: mass, center of mass, inertia, etc.

2.5.2 Cylindrical and spherical coordinates

Cylindrical coordinates (r, \theta, z)

\begin{aligned} x &= r \cos \theta \\ y &= r \sin \theta \end{aligned}

\mathrm d V = r\mathrm dr\mathrm d \theta \mathrm d z

Spherical coordinates (\rho, \phi, \theta):

\begin{array}{l} \begin{aligned} \rho &= \text{distance to origin} \\ \phi &= \text{angle down from } z \\ \theta &= \text{same as cylindrical} \end{aligned} \end{array} \implies \begin{array}{l} \begin{aligned} z &= \rho \cos \phi \\ r &= \rho \sin \phi \end{aligned} \end{array}

\begin{aligned} \mathrm d S & = r^2\sin\phi \mathrm d \phi \mathrm d \theta\\ \mathrm d V & = \rho^2\sin\phi\mathrm d \rho \mathrm d \phi \mathrm d \theta \\ \end{aligned}

2.5.3 Vector fields in 3D space

\vec{F} = P \mathbf{\hat{i}} + Q \mathbf{\hat{j}} + R \mathbf{\hat{k}}, P = P(x,y, z), Q = Q(x, y, z), R = R(x, y, z)

2.5.4 Flux in 3D space

F = \iint_S \vec{F}\cdot\mathbf{\hat{n}}\mathrm d S

Notation:

\mathbf{\hat{n}}\mathrm d S = \mathrm d \vec{S}

For plane z = f(x, y):

\mathbf{\hat{n}}\mathrm d S = \pm \langle -f_x, -f_y, 1 \rangle \mathrm d x \mathrm d y

For plane x = x(u, v), y = y(u, v), z = z(u, v):

\mathbf{\hat{n}}\mathrm d S = \pm \langle \frac{\partial x}{\partial u}, \frac{\partial y}{\partial u} , \frac{\partial z}{\partial u} \rangle \times \langle \frac{\partial x}{\partial v}, \frac{\partial y}{\partial v}, \frac{\partial z}{\partial v} \rangle \mathrm d u \mathrm d v = \pm \frac{\partial\vec{r}}{\partial u}\times\frac{\partial\vec{r}}{\partial v} \mathrm d u \mathrm d v

For plane w(x, y, z) = c

\begin{aligned} \text{Normal vector: } & \mathbf{N} = \nabla w \\ \text{Unit normal vector: } & \mathbf{\hat n} = \pm \frac{\mathbf N}{|\mathbf N|} \\ \text{Project to x-y plane (for instance): } & \Delta A = \cos \alpha \Delta S = \frac{\mathbf{\hat n}\cdot \mathbf{\hat{k}}}{|\mathbf{\hat n}| |\mathbf{\hat{k}}|} \Delta S =\mathbf{\hat n}\cdot \mathbf{\hat{k}} \Delta S \\ \implies & \mathrm dS = \frac{1}{\mathbf{\hat n}\cdot \mathbf{\hat{k}}}\mathrm d x \mathrm d y \\ \implies & \mathrm dS = \pm \frac{|\mathbf N|}{\mathbf N \cdot \mathbf{\hat{k}}} \mathrm d x \mathrm d y \\ \implies & \mathbf{\hat{n}}\mathrm d S = \pm \frac{\nabla w}{\nabla w \cdot \hat k} \mathrm d x \mathrm d y \end{aligned}

2.5.5 Divergence theorem

For F = P\vec i + Q \vec j + R \vec k:

div(\vec{F}) = P_x + Q_y + R_z = \nabla \cdot \vec F

For a closed surface S bounding region R, with normal pointing outwards, and \vec F vector field defined and differentiable over all of R:

\oiint_S \vec{F}\cdot\mathbf{\hat{n}}\mathrm d S = \iiint_R \nabla \cdot \vec F \mathrm d V

2.5.6 Diffusion equation

For motion of smoke in (immobile) air, where u(x, y, z, t) is concentration of smoke, \mathbf{F} is flow of smoke:

\begin{aligned} & \mathbf{F} = -k \nabla u \\ & \iint_S \mathbf{F}\cdot \mathbf{\hat{n}} \mathrm d S = -\frac{\mathrm d}{\mathrm d t}\iiint_Ru\mathrm d V \\ \implies & \frac{\partial u}{\partial t} = - div \mathbf{F} = k \nabla^2 u = k(\frac{\partial ^ 2 u}{\partial x ^ 2} + \frac{\partial ^ 2 u}{\partial y ^ 2} + \frac{\partial ^ 2 u}{\partial z ^ 2}) \end{aligned}

2.5.7 Curl in 3D space

curl(\vec F) = \nabla \times \mathbf{F} = \begin{vmatrix} \mathbf{\hat i} & \mathbf{\hat j} & \mathbf{\hat k} \\ \frac \partial {\partial x} & \frac \partial {\partial y} & \frac \partial {\partial z} \\ P & Q & R \end{vmatrix} = (R_y- Q_z)\mathbf{\hat i} + (P_z - R_x)\mathbf{\hat j} + (Q_x - P_y) \mathbf{\hat k}

2.5.8 Stokes theorem

For a closed curve C, and any surface S bounded by C, then:

\oint_C \mathbf{F}\cdot \mathrm d \vec{r} = \iint_S (\nabla \times \mathbf{F})\cdot \mathbf{\hat n} \mathrm d S

3 Linear Algebra (MIT 18.06)

3.1 Vectors

3.1.1 Notation

2 separate numbers v_1 and v_2 produces a two-dimensional vector \mathbf{v} = \begin{bmatrix} v_1 \\ v_2 \end{bmatrix}; it can be written as \mathbf{v} = (v_1, v_2).

3.1.2 Dot product

\mathbf{v}\cdot \mathbf{w} = \sum_i v_iw_i

3.1.3 Length

\lVert \mathbf{v}\rVert = \sqrt{\mathbf{v}\cdot\mathbf{v}}

3.1.4 Angle

\cos \theta = \frac{\mathbf{v}\cdot\mathbf{w}}{\lVert\mathbf{v}\rVert \lVert\mathbf{w}\rVert}

3.2 Matrices

3.2.1 Multiplication

\begin{aligned} \mathbf C &= \mathbf{AB} \\ \mathbf C_{ij} &= \sum_{k} \mathbf{A}_{ik}\mathbf B_{kj} \end{aligned}

Fundamental law of matrix multiplication:

(\mathbf{AB})\mathbf C = \mathbf{A} (\mathbf{BC})

3.2.2 Matrix elementary transformations

  1. Row interchange, R_i \leftrightarrow R_j ;
  2. Row scale, R_i \rightarrow kR_i ;
  3. Row addition, R_i \rightarrow R_i + kR_j .

3.2.3 Gauss elimination (for \mathbf{Ax} = \mathbf b)

To use the elementary transformations to make the matrix a upper triangular matrix. Each single step of transformation can be expressed as one matrix, \Pi_{i} \mathbf{E_i} \cdot \mathbf A = \mathbf U.

\mathbf E \left[ \mathbf{Ax}, \mathbf b \right] = \left[ \mathbf {Ux}, \mathbf{Eb} \right]

3.2.4 Gauss-Jordan elimination (for \mathbf{A^{-1}})

To use the elementary transformations to make the matrix an identity matrix. Again, transformation can be expressed as one matrix, \Pi_{i} \mathbf{E_i} \cdot \mathbf A = \mathbf I \iff \Pi_{i} \mathbf{E_i} = \mathbf{A^{-1}}.

\mathbf E \left[ \mathbf{A}, \mathbf I \right] = \left[ \mathbf{I}, \mathbf A^{-1} \right]

3.2.5 \mathbf {PA} = \mathbf{LU} factorization

All non-singular square matrices can be factorized to \mathbf {PA} = \mathbf{LU} format. \mathbf P is the permutation matrix.

3.3 Vector Space and Subspace

3.3.1 Space of vectors

The vector space \mathbb{R}^n consists of all column vectors \mathbf{v} with n components.

Linear combinations of vectors of space \mathbf S must be in \mathbf S.

3.3.2 Subspaces

A subspace of a vector space is a set of vectors (including \mathbf 0) that satisfies two requirements: If \mathbf{v} and \mathbf{w} are vectors in the subspace and c is any scalar, then

A subspace of \mathbf R ^n is a vector space inside \mathbf R ^n.

Intersection of subspaces is a subspace.

3.3.3 Column space

The column space C(\mathbf{A}) contain all combinations of the columns of \mathbf A: a subspace of \mathbf R ^ m.

The system \mathbf{A}\mathbf{x} = \mathbf{b} is solvable \iff \mathbf{b} is in the column space of \mathbf{A}.

3.3.4 Null space

The null space N(\mathbf{A}) contains all solutions \mathbf{x} to \mathbf A\mathbf{x}=\mathbf{0}, which contains \mathbf{x}=\mathbf{0}.

3.3.5 Echelon matrix \mathbf{R}

R = \begin{bmatrix} 1 & 0 & a & 0 & c \\ 0 & 1 & b & 0 & d \\ 0 & 0 & 0 & 1 & e \\ 0 & 0 & 0 & 0 & 0 \end{bmatrix} \quad s_1 = \begin{bmatrix} -a \\ -b \\ 1 \\ 0 \\ 0 \end{bmatrix} \quad s_2 = \begin{bmatrix} -c \\ -d \\ 0 \\ -e \\ 1 \end{bmatrix}

The reduced row echelon form \mathbf{R} = \mathbf{rref}(A) has all pivots as 1, with 0 above and below.

If column j of \mathbf{R} is free, there is a particular solution to A\mathbf{x}=\mathbf{0} with x_j = 1; as above.

Number of pivots = nonzero rows in \mathbf R = rank r, then there is n-r free columns.

3.3.6 \mathbf A\mathbf{x}=\mathbf{b}

x_p\text{ (one particular solution)}+\text{ linear combinations of } x_n \text{ (in the null space)}

\mathbf E \left[ \mathbf{A}, \mathbf x \right] = \left[ \mathbf{R}, \mathbf d\right]: \mathbf{A}/\mathbf{R} is solvable \iff all zero rows of \mathbf{R} have zeros in \mathbf d.

3.3.7 Cases for \mathbf A\mathbf{x}=\mathbf{b}

Symbol Meaning
\mathbf{I} Invertible matrix
\mathbf{P} Column permutation matrix
\mathbf F Free columns
Condition \mathbf R Solution Note
n = m = r \left[ \mathbf{I} \right] 1 square, invertible
r = m, r < n \left[ \mathbf{I} \quad \mathbf{F} \right]\mathbf{P} \infty -
r < m, r = n \left[ \begin{array}{c} \mathbf{I} \\ \mathbf{0} \end{array} \right] 0/1 1 \iff all zero rows of \mathbf{R} have zeros in \mathbf d
r < m, r < n \left[ \begin{array}{cc} \mathbf{I} & \mathbf{F} \\ \mathbf{0} & \mathbf{0} \end{array} \right]\mathbf{P} 0/\infty \infty \iff all zero rows of \mathbf{R} have zeros in \mathbf d

3.3.8 Independence, basis, dimension

3.3.8.1 Independence

Independent columns of \mathbf A : The only solution to \mathbf A\mathbf x = \mathbf 0 is \mathbf x = \mathbf 0 \iff r= n. The nullspace is \mathbb{Z}.

Independent vectors: The only zero combination \sum c_i \mathbf v_i = \mathbf 0 has all c's = 0.

3.3.8.2 Basis

The vectors \mathbf v_1, \ldots, \mathbf v_k span the space \mathbf S if \mathbf S = all combinations of the \mathbf v 's.

The vectors \mathbf v_1, \ldots, \mathbf v_k are a basis for \mathbf S if they are independent and they span \mathbf S .

3.3.8.3 Dimension

The dimension of a space \mathbf S is the number of vectors in every basis for \mathbf S, which is the rank of the matrix.

3.3.9 Dimensions of the four subspaces

Name Notation Subspace of Dimension
Column space C(\mathbf{A}) \mathbb{R}^n r
Row space C(\mathbf{A}^\mathtt{T}) \mathbb{R}^m r
Null space N(\mathbf{A}) \mathbb{R}^n n - r
Left null space N(\mathbf{A}^\mathtt{T}) \mathbb{R}^m m-r

Elimination produces bases for the row space and nullspace of \mathbf{A}: They are the same as for \mathbf{R}.

Elimination often changes the column space and left nullspace (but dimensions don't change).

3.3.10 Rank 1 matrices

Rank one matrices: \mathbf{A} = \mathbf{u}\mathbf{v}^\mathtt{T} = \text{column times row}: C(\mathbf{A}) has basis \mathbf{u}, C(\mathbf{A}^\mathtt{T}) has basis \mathbf{v}.

3.4 Orthogonality

3.4.1 Vector

\mathbf v ^\mathtt{T} \mathbf w = 0 \iff \lVert \mathbf{v} \rVert^2 + \lVert \mathbf{w} \rVert^2 = \lVert \mathbf{v} + \mathbf{w} \rVert^2

3.4.2 Subspace

Subspaces \mathbf{V} and \mathbf{W} are orthogonal when \mathbf{v}^\mathtt{T} \mathbf{w} = 0 for every \mathbf{v} in \mathbf{V} and every \mathbf{w} in \mathbf{W}.

The row space of \mathbf{A} is orthogonal to the nullspace, as:

\begin{bmatrix} \mathbf{r}_1 \\ \mathbf{r}_2 \\ \mathbf{r}_3 \\ \vdots \\ \mathbf{r}_m \end{bmatrix} \mathbf{x} = \begin{bmatrix} 0 \\ 0 \\ 0 \\ \vdots \\ 0 \end{bmatrix}

Similarily, the column space is orthogonal to N(\mathbf{A}^\mathtt{T}).

Nullspace and row space are orthogonal complements in \mathbb{R}^n.

3.4.3 Project to vector

To project \mathbf b to \mathbf a, assume the projection is \mathbf p = p\mathbf a, then:

\mathbf a ^\mathtt{T} (\mathbf b - \mathbf p) = \mathbf 0 \iff p = \frac{\mathbf a ^\mathtt{T}\mathbf b}{\mathbf a^\mathtt{T} \mathbf a} \iff \mathbf p = \mathbf a \frac{\mathbf a ^\mathtt{T}\mathbf b}{\mathbf a^\mathtt{T} \mathbf a}

Then the projection matrix: \mathbf P = \frac{\mathbf a \mathbf a ^\mathtt{T}}{\mathbf a^\mathtt{T} \mathbf a}, \quad r(\mathbf P) = 1, \mathbf P^\mathtt{T} = \mathbf P, \mathbf P^n = \mathbf P

3.4.4 Project to subspace

This is a super set of projecting to vector.

To project \mathbf b to subspace \mathbf A with basis \mathbf a_{1\dots m}, assume the projection is \mathbf p, then:

\mathbf p = \sum_{i = 1}^{m}x_i\mathbf a _i = \mathbf A \mathbf{\hat x}

\mathbf a_i^\mathtt{T}(\mathbf b - \mathbf p) = 0, \forall i \in [1..m] \iff \begin{bmatrix} \mathbf{a}_1^\mathtt{T} \\ \mathbf{a}_2^\mathtt{T} \\ \mathbf{a}_3^\mathtt{T} \\ \vdots \\ \mathbf{a}_m^\mathtt{T} \end{bmatrix} (\mathbf b - \mathbf p) = \begin{bmatrix} 0 \\ 0 \\ 0 \\ \vdots \\ 0 \end{bmatrix} \iff \mathbf A^\mathtt{T}(\mathbf b - \mathbf p) = \mathbf 0

\begin{aligned} \implies & \mathbf A^\mathtt{T}(\mathbf b - \mathbf A \mathbf{\hat x}) = \mathbf 0 \iff \mathbf{\hat x} = (\mathbf A ^\mathtt{T} \mathbf A) ^ {-1} \mathbf A ^\mathtt{T} \mathbf b \\ \implies & \mathbf p = \mathbf A (\mathbf A ^\mathtt{T} \mathbf A) ^ {-1} \mathbf A ^\mathtt{T} \mathbf b \\ \implies & \mathbf P = \mathbf A (\mathbf A ^\mathtt{T} \mathbf A) ^ {-1} \mathbf A ^\mathtt{T} \end{aligned}

3.4.5 Invertibility of \mathbf{A}^\mathtt{T} \mathbf{A}

\mathbf{A} has independent columns or \text{r}(A) = n \implies \mathbf{A}^\mathtt{T} \mathbf{A} is invertible:

\begin{aligned} & \mathbf{A}^\mathtt{T} \mathbf{A} \text{ is invertible } \iff (\mathbf{A}^\mathtt{T} \mathbf{A} \mathbf x = \mathbf 0 \implies \mathbf x = \mathbf 0 ) \\ & \mathbf{A}^\mathtt{T} \mathbf{A} \mathbf x = \mathbf 0 \implies \mathbf x ^\mathtt{T} \mathbf{A}^\mathtt{T} \mathbf{A} \mathbf x = 0 \implies (\mathbf{A} \mathbf x)^\mathtt{T} \mathbf{A} \mathbf x = 0 \implies \mathbf{A} \mathbf x = \mathbf 0 \overset{\text{r}(A) = n} \implies \mathbf x = \mathbf 0\\ \end{aligned}

3.4.6 Least squares approximations

\text {if } \mathbf {Ax} = \mathbf{b} \text{ cannot be solved, } \text{min} \lVert \mathbf {A \hat x} - \mathbf{b} \rVert \iff \mathbf A \mathbf{ \hat x } = \mathbf {p} \implies \mathbf{ \hat x } = (\mathbf A ^\mathtt{T} \mathbf A) ^ {-1} \mathbf A ^\mathtt{T} \mathbf b

3.4.7 Orthogonal matrices

The columns \textbf q_{1 \dots n} are orthonormal if \mathbf q_i^\mathtt{T}\mathbf q _j = \begin{cases} 0,\quad i \ne j \\ 1,\quad i = j \end{cases}. Then \mathbf Q ^ \mathtt T \mathbf Q = \mathbf I, \lVert \mathbf Q \mathbf x \rVert = \lVert \mathbf x \rVert, and the matrix \mathbf P projecting \mathbf b to subspace \mathbf Q is \mathbf Q \mathbf Q ^\mathtt{T}.

If Q is also square, then \mathbf Q ^ \mathtt T \mathbf Q = \mathbf I and \mathbf{Q}^\mathtt T = \mathbf{Q}^{-1}.

3.4.8 \mathbf A = \mathbf Q \mathbf R decomposition

Gram-Schmidt decomposition can transform any column independent matrix to \mathbf Q as \mathbf Q ^ \mathtt T \mathbf Q = \mathbf I.

\begin{aligned} & \mathbf{A} = \begin{bmatrix} \mathbf{a}_1 & \mathbf{a}_2 & \cdots & \mathbf{a}_{n} \end{bmatrix}, \mathbf{Q} = \begin{bmatrix} \mathbf{q}_1 & \mathbf{q}_2 & \cdots & \mathbf{q}_{n} \end{bmatrix} \\ & \mathbf q_i = \mathbf q_i - \sum_{j = 1}^{n-1}\frac{\mathbf q_j^\mathtt T \mathbf q_i}{\mathbf q_j^\mathtt T \mathbf q_j}\mathbf q_j = \mathbf q_i - \sum_{j = 1}^{n-1}\mathbf q_j^\mathtt T \mathbf q_i \mathbf q_j, \mathbf q_i = \frac{\mathbf q_i}{\lVert \mathbf q_i \rVert} \end{aligned}

\mathbf A \mathbf{ \hat x } = \mathbf {p} \implies \mathbf{ \hat x } = (\mathbf A ^\mathtt{T} \mathbf A) ^ {-1} \mathbf A ^\mathtt{T} \mathbf b, \quad \mathbf A = \mathbf Q \mathbf R \implies \mathbf{ \hat x } = \mathbf R ^{-1} \mathbf Q ^ \mathtt {T} \mathbf b

\mathbf A = \mathbf Q \mathbf R decomposition is also the method of finding the eigen values numerically, which is covered in chapter 11.3.

3.5 Determinants

3.5.1 Basic properties

  1. The determinant of the n by n identity matrix is 1.
  2. The sign of determinant reverses when two rows are exchanged.
  3. The determinant is a linear function of each row separately, with all other rows stay fixed:

\begin{vmatrix} ta & tb \\ c & d \end{vmatrix} = t \begin{vmatrix} a & b \\ c & d \end{vmatrix} , \quad \begin{vmatrix} a + a' & b + b' \\ c & d \end{vmatrix} = \begin{vmatrix} a & b \\ c & d \end{vmatrix} + \begin{vmatrix} a' & b' \\ c & d \end{vmatrix}

3.5.2 Derived properties

  1. If two rows of \mathbf{A} are equal, then \det \mathbf{A} = 0.
  2. Subtracting a multiple of one row from another row leaves \det \mathbf{A} unchanged.
  3. A matrix with a row of zeros has \det \mathbf{A} = 0.
  4. If \mathbf{A} is triangular then \det \mathbf{A} = a_{11}a_{22}\cdots a_{nn} = \text{product of diagonal entries}.
  5. If \mathbf{A} is singular then \det \mathbf{A} = 0. If \mathbf{A} is invertible then \det \mathbf{A} \neq 0.
  6. The determinant of \mathbf{AB} is \det \mathbf{A} times \det \mathbf{B}: \lvert \mathbf{AB} \rvert = \lvert \mathbf{A} \rvert \lvert \mathbf{B} \rvert.
  7. The transpose \mathbf{A}^\text T has the same determinant as \mathbf{A}.

3.5.3 Big formula

\begin{aligned} \det \mathbf{A} & = \text{sum over all } n! \text{ column permutations } P = (\alpha, \beta, \dots, \omega) \\ &= \sum (\det P) a_{1\alpha} a_{2\beta} \cdots a_{n\omega} \end{aligned}

(\det P) is the sign.

3.5.4 Cofactor

C_{ij} = (-1)^{i+j} \det \mathbf M_{ij} \det \mathbf{A} = a_{i1}C_{i1} + a_{i2}C_{i2} + \cdots + a_{in}C_{in}

\mathbf M_{ij} is the matrix deleting row i and column j.

3.5.5 Cramer's rule

To solve \mathbf A \mathbf x = \mathbf b:

\mathbf{A} \begin{bmatrix} x_1 & 0 & 0 \\ x_2 & 1 & 0 \\ x_3 & 0 & 1 \end{bmatrix} = \begin{bmatrix} b_1 & a_{12} & a_{13} \\ b_2 & a_{22} & a_{23} \\ b_3 & a_{32} & a_{33} \end{bmatrix} = \mathbf{B}_1 \implies x_1 = \frac {|\mathbf B _1|}{|\mathbf A|} \implies x_i = \frac {|\mathbf B _i|}{|\mathbf A|}

Let \mathbf b be one column of \mathbf I:

(\mathbf A^{-1})_{ij} = \frac{C_{ji}}{\det \mathbf{A}}

3.5.6 Area

A is a n \times n matrix, rows/columns are the vectors formed an area; |\det \mathbf A| is the area.

3.6 Eigen

3.6.1 Eigen vector and eigen value

\mathbf A \mathbf x = \lambda \mathbf x

\mathbf x is the eigen vector, \lambda is the eigen value.

\mathbf A \mathbf x = \lambda \mathbf x \implies (\mathbf A - \lambda \mathbf I)\mathbf x = \mathbf 0 \implies \mathbf x = \mathbf 0 \lor \det (\mathbf A - \lambda \mathbf I) = 0

3.6.2 \text{tr} and \text{det}

\begin{aligned} \text{tr} & \mathbf A = \sum_i\mathbf A_{ii} = \sum_i \lambda i \\ \det & \mathbf A = \prod_i \lambda i \end{aligned}

3.6.3 Diagonalization

\mathbf S is the column matrix of eigen vectors of \mathbf A:

\mathbf A \mathbf S = \mathbf S \mathbf{\Lambda} \implies \mathbf A = \mathbf S \mathbf{\Lambda} \mathbf S ^{-1}, \mathbf{\Lambda} = \mathbf S ^{-1} \mathbf A \mathbf S

If \mathbf A has n non-equal eigen values, then \mathbf A must be digonalizable. If not, \mathbf A might be digonlizable.

3.6.4 Symmetric matrices

The eigen values are real, and the eigen vectors can be chosen as perpendicular, and:

\mathbf A = \mathbf Q \mathbf{\Lambda} \mathbf Q ^{-1} = \mathbf Q \mathbf{\Lambda} \mathbf Q ^{\text T}

Proof of real eigen values, given \textbf A is real and symmetric:

\begin{aligned} & \mathbf A \mathbf x = \lambda \mathbf x \\ {\text{take conjugate, multiply }\mathbf {\overline x^{\text T}}} \quad \implies & \mathbf A \mathbf {\overline x} = \overline \lambda \mathbf {\overline x}, \quad \mathbf {\overline x^{\text T}} \mathbf A \mathbf x = \lambda \mathbf {\overline x^{\text T}} \mathbf x \\ {\text{use symmetry, multiply } \mathbf x} \quad \implies & \mathbf {\overline x ^{\text T}} \mathbf A = \mathbf {\overline x ^{\text T}} \overline \lambda, \quad \mathbf {\overline x^{\text T}} \mathbf A \mathbf x = \overline \lambda \mathbf {\overline x ^{\text T}} \mathbf x \\ {\text{left are same}} \quad \implies & \overline \lambda = \lambda \implies \lambda \in \mathbb{R} \end{aligned}

Proof of perpendicular, given \mathbf S\mathbf x=\lambda_1\mathbf x, \mathbf S\mathbf y=\lambda_2\mathbf y:

\lambda_1\mathbf x^\text T \mathbf y = (\lambda_1\mathbf x)^\text T \mathbf y = (\mathbf S \mathbf x) ^ \text T \mathbf y = \mathbf x ^ \text T \mathbf S \mathbf y = \mathbf x ^ \text T\lambda_2\mathbf y = \lambda_2 \mathbf x ^ \text T \mathbf y \implies \mathbf x ^ \text T \mathbf y = \mathbf 0

For repeated eigenvalues, picking any basis of that eigenspace, whose dimension is equal to the repeated times of the eigenvalue, and then doing the Gram-Schmidt decomposition would find the perpendicular eigenvectors.

3.6.5 Positive definite matrices

Positive definite matrices are symmetric matrices with:

  1. all eigen values > 0; or
  2. all pivots > 0; or
  3. all upper left determinants > 0; or
  4. \textbf x ^ \text T \mathbf S \mathbf x > 0, \forall x \neq \textbf 0.
  5. \mathbf A ^ \text T \mathbf A, if \mathbf A is column independent.

Application for positive definite matrices:

f(x_i) has a minimum at x=x_0 if \left. \frac{\partial f}{\partial x_i} \right|_{x_0} = 0 and matrix \textbf S(x_0), which is \left. S_{ij}(x_0) = \frac{\partial ^2f}{\partial x_i\partial x_j}\right|_{x_0}, is positive definite.

3.6.6 Similar matrices

For n\times n matrix \mathbf A, \mathbf B, if there is \mathbf M which \mathbf B = \mathbf M ^ {-1} \mathbf A \mathbf M, then \mathbf A, \mathbf B are similar. They have same eigen values.

3.6.7 Jordan form

If there is repeated eigen values, then there are 2 kinds of sets of similar matrices. E.g., for 2 \times 2 matrics with eigen values 4, 4:

\begin{bmatrix} 4 & 0 \\ 0 & 4 \end{bmatrix}, \begin{bmatrix} 4 & 1 \\ 0 & 4 \end{bmatrix}

\begin{bmatrix} 4 & 0 \\ 0 & 4 \end{bmatrix} is only similar to itself. The size of the set is 1.

\begin{bmatrix} 4 & 1 \\ 0 & 4 \end{bmatrix} is similar to many other matrices; it is called the jordan form.

Jordan block:

J_i = \begin{bmatrix} \lambda & 1 & 0 & \cdots & 0 \\ 0 & \lambda & 1 & \cdots & 0 \\ 0 & 0 & \lambda & \cdots & 0 \\ \vdots & \vdots & \vdots & \ddots & 1 \\ 0 & 0 & 0 & \cdots & \lambda \end{bmatrix}

Every square matrix \mathbf{A} is similar to a Jordan matrix \mathbf{J}.

3.7 Linear Transformation

3.7.1 Linear

A transformation T is linear if:

T(\mathbf{v} + \mathbf{w}) = T(\mathbf{v}) + T(\mathbf{w}) \land T(c\mathbf{v}) = cT(\mathbf{v})

If we wish to know T(\mathbf{v}) for all vectors \mathbf{v} in \mathbb{R}^n, we just need to know T(\mathbf{v}_1), T(\mathbf{v}_2), ..., T(\mathbf{v}_n) for any basis \mathbf{v}_1, \mathbf{v}_2, ..., \mathbf{v}_n of the input space:

\begin{aligned} \mathbf{v} &= c_1\mathbf{v}_1 + c_2\mathbf{v}_2 + \cdots + c_n\mathbf{v}_n \\ T(\mathbf{v}) &= c_1T(\mathbf{v}_1) + c_2T(\mathbf{v}_2) + \cdots + c_nT(\mathbf{v}_n) \end{aligned}

c_i are the coordinates.

3.7.2 Matrix for linear transformation

For the matrix \mathbf A_{m\times n} of the transformation T: \mathbb{R}^n\to\mathbb{R}^m, \mathbf v_i, \mathbf w_i are bases for \mathbb{R}^n,\mathbb{R}^m:

T(\mathbf v_i) = \sum_{j=1}^{m}\mathbf{A}_{ji}\mathbf w_j

\mathbf A_{m \times n}\mathbf {c}_{1..n} = \mathbf{c'}_{1..m}.

3.7.3 Change of basis

Let the columns of matrix \mathbf{W} be the basis vectors of the new basis. Then if \mathbf{x} is a vector in the old basis, we can convert it to a vector \mathbf{c} in the new basis using the relation:

\mathbf{x} = \mathbf{W} \mathbf{c}.

If transformation T has the matrix \mathbf{A} when working with the basis \mathbf{v}_1, \mathbf{v}_2, ..., \mathbf{v}_8 and T has the matrix \mathbf{B} when working with the basis \mathbf{w}_1, \mathbf{w}_2, ..., \mathbf{w}_8, it turns out that \mathbf{A} and \mathbf{B} must be similar matrices. In other words:

\mathbf{B} = \mathbf{W}^{-1} \mathbf{A} \mathbf{W}

for some change of basis matrix \mathbf{W}.

Note: basis could be chosen as eigenvectors, then the matrix would be the eigenvalue diagonal matrix.

3.8 SVD

SVD is a generalization of the diagonalization process, but it applies to all m \times n matrices, regardless of whether they are square or non-square, symmetric or non-symmetric.

For any m \times n matrix \mathbf{A}, the SVD theorem guarantees that \mathbf{A} can be decomposed as: \mathbf{A} = \mathbf{U} \mathbf{\Sigma} \mathbf{V}^T where:

We can use \mathbf{A}^\text{T} \mathbf{A} and \mathbf{A}\mathbf{A}^\text{T} to find \mathbf V and \mathbf U.

3.8.1 Pseudo inverse

2 side inverse for a full rank square matrix \mathbf A (m = n = r):

\mathbf A ^{-1}\mathbf A = \mathbf A \mathbf A ^ {-1} = \mathbf I

Left inverse for rectangular matrix \mathbf A (r = n < m):

\mathbf A ^{-1}_\text{left} \mathbf A = \mathbf I _ {n\times n}

Since \mathbf A ^ \text T \mathbf A is n \times n and full rank:

\mathbf A ^{-1}_\text{left} = (\mathbf A ^ \text T \mathbf A)^{-1}\textbf{A}^\text T

Similarily:

\mathbf A ^{-1}_\text{right} = \textbf{A}^\text T (\mathbf A \mathbf A ^ \text T)^{-1}

If \mathbf{x} \neq \mathbf{y} are vectors in the row space of \mathbf{A}, then \mathbf{A}\mathbf{x} \neq \mathbf{A}\mathbf{y} in the column space of \mathbf{A}.

The pseudoinverse \mathbf{A}^+ of \mathbf{A} is the matrix for which \mathbf{x} = \mathbf{A}^+ \mathbf{A} \mathbf{x} for all \mathbf{x} in the row space of \mathbf{A}:

\mathbf{A}^+ = \mathbf{V} \mathbf{\Sigma}^+ \mathbf{U}^\text T

3.9 Complex Matrices

Length:

\lVert \mathbf z \rVert ^ 2 = \overline {\mathbf z} ^ \text T \mathbf{z} = \mathbf z ^ \text H \mathbf z

Inner product:

\mathbf x ^ \text H \mathbf y = \sum \overline x_i y_i

3.9.1 Unit root

\omega^n=1 \implies \omega_k = \mathrm{e}^{\mathrm i\cdot2\pi\cdot\frac{k}{n}}, 0 \leq k \leq n - 1, k\in \mathbb Z

3.9.2 Hermitian matrices \mathbf S = \mathbf S ^ \text H

The eigen values are real, and the eigen vectors can be chosen as perpendicular.

3.9.3 Unitary matrices

A unitary matrix is a complex square matrix that has orthonormal columns.

\mathbf Q ^ \text H = \mathbf Q ^ {-1}

3.9.4 Fourier matrix

Fourier matrix is like:

\mathbf F_n = \frac{1}{\sqrt{n}} \begin{pmatrix} 1 & 1 & 1 & \dots & 1 \\ 1 & \omega & \omega^2 & \dots & \omega^{n-1} \\ 1 & \omega^2 & \omega^4 & \dots & \omega^{2(n-1)} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 1 & \omega^{n-1} & \omega^{2(n-1)} & \dots & \omega^{(n-1)(n-1)} \end{pmatrix}, \omega = \mathrm{e}^{\mathrm i\cdot2\pi\cdot\frac{1}{n}}

Or:

F^{-1}_{xy} = \frac{1}{\sqrt{n}} \omega^{xy}

Fourier matrix is unitary matrix as, for column 0\leq p, q\leq n -1:

\begin{aligned} & \mathbf F_p ^ \text H \mathbf F_q = \frac 1 n \sum_{k=0}^{n-1}\overline{\omega ^ {pk}}\omega ^{qk}, \quad \overline{\omega ^ {pk}} = \mathrm{e}^{\mathrm -i\cdot2\pi\cdot\frac{pk}{n}} = \omega^{-pk} \\ \implies & \mathbf F_p ^ \text H \mathbf F_q = \frac 1 n \sum_{k=0}^{n-1}\omega^{-pk}\omega ^{qk} = \frac 1 n\sum_{k=0}^{n-1}\omega^{(q-p)k} \\ \implies & \mathbf F_p ^ \text H \mathbf F_q = \begin{cases} \frac 1 n \sum_{k=0}^{n-1} \omega ^0 = 1, \quad p = q \\ \frac 1 n \sum_{k=0}^{n-1} \omega ^ {(q - p)k} = \frac 1 n \frac{1 - \omega^{(q-p)n}}{1 - \omega^{q-p}} = 0, \quad p \neq q \\ \end{cases} \end{aligned}

So,

\mathbf F_n^{-1} = \mathbf{F}^\text H \iff F^{-1}_{xy} = \frac{1}{\sqrt{n}} \omega^{-xy}

3.9.5 FFT

For \textbf y = \mathbf F_n\textbf a (point-value representation to coefficient representation):

\begin{aligned} \sqrt{n} \cdot \textbf y_j & = \sum_{k=0}^{n-1} \omega_n^{jk}\cdot \textbf a_k \\ & = \sum_{k = 0}^{\frac n 2 -1}\omega^{2jk}_{n}\textbf a_{2k} + \omega_n^j\sum_{k = 0}^{\frac n 2 -1}\omega^{2jk}_{n}\textbf a_{2k+1} \\ & = \sum_{k = 0}^{\frac n 2 -1}\omega^{jk}_{\frac n 2}\textbf a_{2k} + \omega_n^j \sum_{k = 0}^{\frac n 2 -1}\omega^{jk}_{\frac n 2}\textbf a_{2k+1}, \quad 0 \leq j \leq n - 1\\ \end{aligned}

Let j = j+\frac n 2 to last line of above:

\begin{aligned} \sqrt{n} \cdot \textbf y _{j+\frac n 2} & = \sum_{k = 0}^{\frac n 2 -1}\omega^{(j+\frac n 2)k}_{\frac n 2}\textbf a_{2k} + \omega_n^{j+\frac n 2} \sum_{k = 0}^{\frac n 2 -1}\omega^{(j+\frac n 2)k}_{\frac n 2}\textbf a_{2k+1}, \quad 0 \leq j \leq \frac n 2 - 1\\ \end{aligned}

And there is \omega^{(j+\frac n 2)k}_{\frac n 2} = \omega_{\frac n 2}^{jk +\frac n 2 k} = \omega_{\frac n 2}^{jk}\cdot(\omega_{\frac n 2}^{\frac n 2})^k = \omega_{\frac n 2}^{jk} and \omega_n^{j+\frac n 2}= \omega_n^{j}\cdot\omega^{\frac n 2}_n = -\omega_n^j:

\begin{aligned} \sqrt{n} \cdot \textbf y _{j+\frac n 2} & = \sum_{k = 0}^{\frac n 2 -1}\omega^{jk}_{\frac n 2}\textbf a_{2k} -\omega_n^j \sum_{k = 0}^{\frac n 2 -1}\omega^{jk}_{\frac n 2}\textbf a_{2k+1}, \quad 0 \leq j \leq \frac n 2 - 1\\ \end{aligned}

So:

\begin{aligned} \sqrt{n} \cdot\textbf y _j & = \sum_{k = 0}^{\frac n 2 -1}\omega^{jk}_{\frac n 2}\textbf a_{2k} + \omega_n^j \sum_{k = 0}^{\frac n 2 -1}\omega^{jk}_{\frac n 2}\textbf a_{2k+1} = [\textbf F_{\frac n 2}a']_j + \omega_n^j[ F_{\frac n 2}a'']_j = \textbf{y}'_j + \omega_n^j \textbf{y}''_j, 0 \leq j \leq \frac n 2 - 1 \\ \sqrt{n} \cdot \textbf y _{j+\frac n 2} & = \sum_{k = 0}^{\frac n 2 -1}\omega^{jk}_{\frac n 2}\textbf a_{2k} -\omega_n^j \sum_{k = 0}^{\frac n 2 -1}\omega^{jk}_{\frac n 2}\textbf a_{2k+1} = [\textbf F_{\frac n 2}a']_j - \omega_n^j[ F_{\frac n 2}a'']_j = \textbf{y}'_j - \omega_n^j\textbf{y}''_j, 0 \leq j \leq \frac n 2 - 1\\ \end{aligned}

4 Discrete Mathematics (MIT 6.042)

4.1 Proofs

4.1.1 Propositions

A proposition is a statement that is either true or false.

A predicate is a proposition whose truth depends on the value of variable(s).

A mathematical proof of a proposition is a chain of logical deductions leading to the proposition from a base set of axioms.

Axioms should be complete and consistent, but see Gödel's incompleteness theorems.

4.1.1.1 Notations

English Symbolic
NOT(P) \neg P (alternatively, \overline{P})
P AND Q P \wedge Q
P OR Q P \vee Q
P IMPLIES Q P \rightarrow Q
if P then Q P \rightarrow Q
P IFF Q P \leftrightarrow Q
P XOR Q P \oplus Q

A proposition is valid is all settings of the variables makes the proposition true.

A proposition is satisfiable is some setting of the variables makes the proposition true. The general problem of deciding whether a proposition is satisfiable is called SAT.

A function-like notation is used to denote a predicate suppied with specific variable values:

P(n)::= n\text{ is a perfect square}

4.1.1.2 Algebra

Commutative:

\begin{aligned} & A \lor B = B \lor A \\ & A \land B = B \land A \end{aligned}

Associative:

\begin{aligned} & (A \lor B) \lor C = A \lor (B \lor C) \\ & (A \land B) \land C = A \land (B \land C) \end{aligned}

Distributive:

\begin{aligned} & A \land (B \lor C) = (A \land B) \lor (A \land C) \\ & A \lor (B \land C) = (A \lor B) \land (A \lor C) \end{aligned}

De Morgan's:

\begin{aligned} & \neg (A \land B) = \neg A \lor \neg B \\ & \neg (A \lor B) = \neg A \land \neg B \end{aligned}

4.1.1.3 DNF and CNF

Every proposition is equavalent to conjunctive normal form (CNF) or disjunctive normal form (DNF).

DNF is like (A \land B) \lor C, \text{OR} of \text{AND}.

A full DNF is a DNF where every minterm(connected by \land) contains all the variables. It can be implied by the true table of the formula.

To convert a proposition to full DNF:

  1. Expand the proposition to DNF, using algebra above.
  2. Add missing variables, e.g. A\land B = A \land B \land (C \lor \neg C) = (A \land B \land C) \lor (A \land B \land \neg C). Or in other notation, AB = AB(C + \overline C) = ABC + AB\overline C.
  3. Remove the duplicates.

4.1.2 Patterns of proof

Logical deductions or inference rules, are used to prove new propositions using previously proved ones:

\begin{aligned} & \text{Rule.} \quad \frac{P, \quad P \rightarrow Q}{Q} \\ & \text{Rule.} \quad \frac{\quad P \rightarrow Q, \quad Q \rightarrow R}{\quad P \rightarrow R} \\ & \text{Rule.} \quad \frac{\text{NOT}(P) \rightarrow \text{NOT}(Q)}{\quad Q \rightarrow P} \end{aligned}

4.1.2.1 Methods

4.1.2.2 Set

Unique, unordered.

Commutative Laws: \begin{aligned} & A \cup B = B \cup A \\ & A \cap B = B \cap A \end{aligned}

Associative Laws: \begin{aligned} & (A \cup B) \cup C = A \cup (B \cup C) \\ & (A \cap B) \cap C = A \cap (B \cap C) \end{aligned}

Distributive Laws: \begin{aligned} & A \cap (B \cup C) = (A \cap B) \cup (A \cap C) \\ & A \cup (B \cap C) = (A \cup B) \cap (A \cup C) \end{aligned}

De Morgan's Laws: \begin{aligned} & (A \cap B)^c = A^c \cup B^c \\ & (A \cup B)^c = A^c \cap B^c \end{aligned}

4.1.2.3 Sequence

Ordered.

Sequence can be generated by cross product of sets:

\begin{aligned} \mathop{\Large\times}_{i=1}^{n} A_i &= \{(a_1, a_2, \dots, a_n) \mid a_i \in A_i, \text{ for } i = 1, 2, \dots, n \} \\ \lvert \mathop{\Large\times}_{i=1}^{n} A_i \rvert &= \prod_{i=1}^{n} \left| A_i \right| \end{aligned}

4.1.3 Induction

\begin{aligned} \text{Induction Rule} & \quad \frac{\text P(0), \quad \forall n \in \mathbb{N}.\text P(n)\rightarrow \text P(n+1)} {\forall m \in \mathbb{N}. \text P(m)} \\ \text{Strong Induction Rule} & \quad \frac{\text P(0), \quad [\forall k \leq n \in \mathbb{N}. \text P(k) ]\rightarrow \text P(n+1)} {\forall m \in \mathbb{N}. \text P(m)} \end{aligned}

4.1.3.1 Well Ordering Principle

A set is well-ordered if every non-empty subset has a least element under a given ordering.

\mathbb N is well-ordered.

To prove that P(n) is true for all n \in \mathbb{N} using WOP:

4.1.3.2 Invariant

To prove that some property X holds for every step of a process:

4.2 Number Theory

4.2.1 Basic

\begin{aligned} & a \mid b \iff k\cdot a=b, k \in \mathbb Z\\ & a \mid b \land a \mid c \iff a \mid sb+tc, s,t\in\mathbb Z \\ & \text{rem}(a, b) = a - \lfloor a/b \rfloor b \\ & a \equiv b \mod n \iff \text{rem}(a, n) = \text{rem}(b, n) \end{aligned}

4.2.2 GCD

\gcd(a, b) = \gcd(b, r), r = \text{rem}(a, b)

Proof:

\begin{aligned} & \text{let } A = \{x \mid (x \mid a \land x \mid b)\}, B = \{x \mid (x \mid b \land x \mid r)\}, g \in A \\ \implies & g \mid a \land g \mid b \\ \implies & g \mid sa+tb, s,t\in \mathbb Z \\ \implies & s = 1, t = -\lfloor a/b \rfloor, g\mid r \\ \implies & A \subseteq B \\ \text{similarily, } & B \subseteq A \\ \implies & \gcd(a, b) = \gcd(b, r) \end{aligned}

4.2.3 Bézout's identity

\exists s, t \in \mathbb{Z}, sa + tb = k \iff \gcd(a, b) \mid k

4.2.3.1 Proof

  1. \exists s, t \in \mathbb{Z}, sa + tb = k \implies \gcd(a, b) \mid k:

\begin{aligned} & \text{let } g = \gcd(a, b) \\ \implies & a = a'g, b = b'g \\ \implies & sa'g + tb'g = g(sa' + tb') = k \\ \implies & g \mid k \end{aligned}

  1. \gcd(a, b) \mid k \implies \exists s, t \in \mathbb{Z}, sa + tb = k:

\begin{aligned} & \text{let } A = \{x \mid x = sa + tb, s, t \in \mathbb{Z}\} \\ \implies & \{a, b\} \subseteq A \\ & \text{let } d_0 = \min A_+ ; d_0 = x_0a+y_0b\\ & \text{consider } a = q d_0 + r, r \in [0, d_0) \\ \implies & r = a - qd_0 \\ \implies & r \in A \\ & \text{as either } r > 0 \text{ or } r = 0; \text{ but } d_0 = \min A_+, r < d_0 \\ \implies & r = 0 \\ \implies & d_0 \mid a \\ & \text{similarily, } d_0 \mid b \\ \implies & \gcd(a, b) \geq d_0 \\ & \text{consider } d \mid a, d \mid b \\ \implies & d_0 = x_0a'd + y_0b'd \\ \implies & d \mid d_0 \\ \implies & \gcd(a, b) \leq d_0 \\ \implies & d_0 = \gcd(a, b) \\ \implies & \exists s, t \in \mathbb{Z}, sa + tb = \gcd(a, b) \end{aligned}

4.2.3.2 Calc

The extended euclid's algorithm can find the s, t.

  1. b\neq 0, let q = \lfloor a/b \rfloor:

\begin{aligned} s' b + t' r = \gcd(b, r), r = a - q b \\ s' b + t' (a - qb) = \gcd(b, r) \\ t'a + (s' - t' q)b=\gcd(b, r) \end{aligned} \implies \begin{cases} s = t' \\ t = s'-t' \lfloor a/b \rfloor \end{cases}

  1. b = 0 \implies s = 1, t = 0.

4.2.4 Inverse

\gcd(k, n) = 1 \implies \exists s, t\in \mathbb Z, sk + tn = 1 \implies sk \equiv 1 \mod n

4.2.5 Euler's theorem

Totient func:

\phi (n) = \lvert \{ k \mid 1 \leq k \leq n, k\in \mathbb Z \land \gcd(k, n) = 1\} \rvert

Euler's theroem:

\gcd(k,n) = 1 \implies k^{\phi(n)} \equiv 1 \mod n

Lemma: \gcd(k, n) = 1 \implies a \equiv b \mod n \iff ak \equiv bk \mod n

Reduced residue system is closed under multiplication:

\begin{aligned} & \text{let } \mathbb Z_n^* = \{x \mid \gcd(x, n) = 1, x < n\} \\ & \text{consider } z_i \in \mathbb Z_n^* \\ \implies & \gcd(z_i, n) = 1, \gcd(k, n) = 1 \\ \implies & \gcd(kz_i, n) = 1 \\ \implies & \text{rem}(kz_i, n) \in Z_n^* \\ & \text{consider lemma above} \\ \implies & Z_n^* = \{\text{rem}(kz, n) \mid z\in Z_n^*\} \end{aligned}

Proof:

\begin{aligned} & Z_n^* = \{\text{rem}(kz, n) \mid z\in Z_n^*\} \\ \implies & \prod_{z_i \in Z_n^*} z_i \equiv \prod_{z_i \in Z_n^*} kz_i \mod n \\ \text{consider } & \prod_{z_i \in Z_n^*} kz_i \equiv k^{\phi(n)} \prod_{z_i \in Z_n^*} z_i \mod n \\ \text{consider } & \gcd(z_i, n) = 1 \\ \implies & \gcd (\prod_{z_i \in Z_n^*} z_i, n) = 1 \\ \implies & k^{\phi(n)}\equiv 1 \mod n \end{aligned}

4.2.6 Fermat's little theorem

For prime p:

\begin{aligned} & \phi(p) = p-1 \\ & p \nmid k \implies \begin{cases} & k^{p-1} \equiv 1 \mod p \\ & k^{-1} = k^{p-2} \end{cases} \\ & \phi(pq) = (p-1)(q-1), \quad p, q\text{ are prime} \end{aligned}

4.2.7 Chinese remainder theorem

There are n integers, \forall i, j, m_i \perp m_j, M = \prod m_i. For

\begin{cases} x \equiv a_1 \pmod{m_1} \\ x \equiv a_2 \pmod{m_2} \\ \vdots \\ x \equiv a_n \pmod{m_n} \end{cases}

There exists unique x \pmod M.

Lemma:

\forall i, x \equiv y \mod m_i \implies \forall i, m_i \mid (x - y) \implies M \mid (x - y) \implies x \equiv y \mod M

Solution, M_i = \frac M {m_i}:

x \equiv \sum_i a_i M_i y_i \mod M, \quad {M_i} {y_i} \equiv 1 \mod m_i

4.2.8 RSA

p, q are 2 primes; n = pq.

\begin{aligned} \text{public key } & (e, n), e \perp \phi(n), e < \phi(n) \\ \text{private key } & (d, n), ed \equiv 1 \mod \phi(n) \end{aligned}

For message m, m < n:

\begin{aligned} & c \equiv m ^ e \mod n \\ & m = c ^ d \mod n \end{aligned}

Proof for m \perp n:

\begin{cases} ed \equiv 1 \mod \phi(n) \implies ed = 1 + k\phi(n) \\ m ^ {\phi(n)} \equiv 1 \mod n \implies m^{k\phi(n)} \equiv 1 \mod n \end{cases} \implies m ^ {ed} = m ^{1 + k\phi(n)} \equiv m \mod n

For m \not\perp n:

\begin{aligned} & n = pq \\ \text{consider } & m = kp, m < n \\ \implies & \begin{cases} m \equiv 0 \mod p \\ m \perp q \end{cases} \\ & m \equiv 0 \mod p \\ \implies & m^{ed} \equiv m \mod p \\ & m \perp q \\ \implies & m ^ {\phi(q)} \equiv 1 \mod q \\ \implies & m ^{q-1} \equiv 1 \mod q \\ \implies & m^{ed} \equiv m^{1 + k\phi(n)} \equiv m \cdot m^{(q-1)\cdot(p-1)k} \equiv m \cdot \left( m^{q-1}\right)^{(p-1)k} \equiv m \mod q \\ \text{consider }&\text{lemma of CRT} \\ \implies & m^{ed} \equiv m \mod n \end{aligned}

4.3 Graph

4.3.1 Introduction

Undirected graph G = (V, E) consists of a non-empty set V and a set E of unordered pairs from V.

Simple graph is a graph without self-loop and duplicate edges.

4.3.2 Isomorphism

Two graphs G=(V_G,E_G) and H=(V_H,E_H) are isomorphic if there exists a bijection f:V_G\to V_H such that for all u,v\in V_G:

(u,v)\in E_G \iff (f(u),f(v))\in E_H

Ref: Graph Isomorphism in Quasipolynomial Time.

4.3.3 Coloring

Coloring is to assign colors to vertices so adjacent vertices have different colors.

\chi(G) is the minimum number of colors needed.

\forall k \geq 3, deciding whether \chi (G) \leq k is NP-complete.

4.3.3.1 DSATUR algorithm

Saturation is defined as the number of distinct colors present in its neighborhood.

4.3.3.2 Upper bound

Graph Class \chi Key Theorem / Complexity
trees & bipartite graphs \chi = 2 BFS, Linear-time
planar graphs \chi \leq 4 Four-Color Theorem, 3-colorability is NP-C
graphs with max degree \Delta + 1 \chi \leq \Delta + 1 -

4.3.4 Bipartite

A biparitite graph is a graph that the vertices can be partitioned to 2 sets L, R, and the vertices of every edge are among 2 sets.

If there is no odd-length circle, then the graph is bipartite.

4.3.5 Matching

A matching is a set of edges don't share common vertices.

Perfect matching is the matching covers all vertices of L.

4.3.5.1 Hall's theorem

A bipartite graph G = (X, Y, E) has a matching that covers X if and only if for every subset S \subseteq X:

|N(S)| \;\ge\; |S|

N(S) is the neighborhood of S in Y:

N(S) = \{\,y \in Y : \exists\,x \in S \text{ with }(x,y)\in E\}

4.3.5.2 Stable matching

Given two groups of agents (commonly called men and women), where each agent ranks members of the opposite group by preference. The stable matching problem asks for a pairing (a matching) between the two groups so that no pair of agents would both rather be with each other than with their assigned partners. If such a pair exists, it's called a blocking pair and the matching is unstable.

Gale Shapley algorithm is used to solve it.

Initialization:

Main Loop (while some man is free), for each currently free man m:

Important properties:

  1. O(n^2), in the worst case each man proposes to every woman once.
  2. Stable, no man-woman pair who both prefer each other to their current partners in result.
  3. Proposer optimal, every man receives his best achievable partner among all stable matchings.

4.3.5.3 Other matching problems

Problem Solution Time Complexity
Maxium cardinality Edmonds' Blossom O(V^3)
Maxium cardinality (bipartite) Hopcroft-Karp (Dinic) O(\sqrt{V}\,E)
Maximum weight (bipartite) MCMF O(VE\log V)
Maximum weight (complete bipartite) Hungarian O(V^3)

To use MCMF to solve maximum weight matching (bipartite G = (L \cup R, E )), constructing graph like:

Edge Capacity Cost
S \rightarrow l, \quad l \in L 1 0
l \rightarrow r, \quad (l, r) \in E 1 C-w(l, r)
l \rightarrow T, \quad l \in L 1 0
r \rightarrow T, \quad r \in R 1 0

4.3.6 Selected problems

Walk is a sequence of vertices.

Path is a walk without duplicate vertices.

Number of walks between i, j with length k for undirected grath in the Matrix G representation is G_{i, j}^{(k)}.

A directed graph is strongly connected if all the pairs u, v are connected.

DAG is a directed graph without any directed cycles.

An euler walk is a walk that traverses every edge in a graph exactly once.

An euler tour is a euler walk starts and ends at the same vertex.

A connected graph has an euler tour if and only if every vertex has even degree.

A hamiltonian cycle in a graph G is a cycle that visits every node in G exactly once. There is always hailtonian path in tournament graph.

For a weighted graph, to find the hamiltonian cycle with minimum sum of weights is called the travelling salesperson problem.

Euclidean TSP in fixed dimension d with 1+\epsilon times of optimal is P: A Polynomial-Time Approximation Scheme (PTAS) for Euclidean TSP.

4.3.7 Tournament graph

Tournament graph is completed graph that for every pair of u,v, exactly one of the directed edge u \to v or v \to u exists.

Every tournament graph contains a hamiltonian path.

Base: trivial case.

Induction:

For T_{n+1}, delete any node v; then there is hamiltonian path u_1 \to u_2 \to \cdots u_n for T_{n+1} - v. Then, either:

  1. v beats all u
  2. all u beats v
  3. \exists i, \, u_i \to v \land v \to u_{i+1}

Hamiltonian path can be contructed for all those 3.

4.3.8 Planar graph

In graph theory, a planar graph is a graph that can be embedded in the plane, i.e., it can be drawn on the plane in such a way that its edges intersect only at their endpoints.

Euler's formula:

If a connected graph has a planar embedding, then:

v - e + f = 2

4.3.9 Minimum panning tree

A connected and acylic graph is a tree. A leaf is degree one node in a tree.

A MST is a spanning tree with the smallest sum of edge weights.

Algorithm:

  1. Start with empty set E.
  2. Repeatedly add the minimum-weight edge that doesn't form a cycle with E.

4.3.9.1 Lemma of cut

A cut partitions the vertex set V to 2 non-empty sets. Among the edges crossing the cut, any minimum-weight edge is called a light edge.

Cut property: for every cut, any of its light edges belongs to some MST.

4.3.9.2 Proof

Invariant: at the n-th step of loop, there exists a MST T^\star_{n} that the edge set E \subseteq T^\star_n.

Base: \emptyset \subseteq T^\star.

Induction:

  1. If new edge e \in T^\star_{n+1}, done.
  2. If new edge e \notin T^\star_{n+1}, add e to T^{\star}_{n+1} to form a cycle. As e must be the light edge, so there exists some edge e' in the cycle to make T^\star_{n+1} - e' + e a MST.

4.3.10 Communication networks

For circuit switching, the goal of the network is to get data from input to output considering switch size / switch count / diameter and congestion.

The congestion is defined as the the largest number of paths passing a single switch for any routing pattern in the most optimized way.

4.3.10.1 Butterfly network

The butterfly network is a communication network with N = 2^m input/output.

It has 2^m rows and m+1 columns.

Connection pattern:

b_1\cdots b_{j+1}\cdots b_m, \, j \to \begin{cases} b_1\cdots b_{j+1}\cdots b_m, \, j + 1 \\ b_1\cdots \overline{b_{j+1}}\cdots b_m, \, j + 1 \end{cases}

Routing method from (x_i)_2 to (y_i)_2:

(x_i, \,0) \to (y_1x_2\cdots x_m, \, 1)\to(y_1y_2\cdots x_m, \, 2) \to(y_i, m)

4.3.10.2 Comparison

Network for N nodes Diameter Switch Size Switches Congestion
Complete binary tree 2\log N + 2 3\times3 2N - 1 N
2-D array 2N 2\times2 N^2 2
Butterfly \log N + 2 2\times2 N(\log N + 1) \sqrt{N} or \tfrac{\sqrt{N}}{2}
Benes 2\log N + 1 2\times2 2N\log N 1

4.4 Relation

4.4.1 Definition

Given sets A and B, a binary relation R: A \to B from A to B is a subset of A \times B. The sets A and B are called the domain and codomain of R.

aRb or a \sim_R b can be used for (a,b) \in R.

Property Description Symbol Cardinality
Total every element of A maps to some element of B - -
Surjective every element of B is mapped to some element of A A \twoheadrightarrow B \mid A \mid \ge \mid B \mid
Injective every element of B is mapped to at most once A \hookrightarrow B \mid A \mid \le \mid B \mid
Function every element of A maps to at most one element of B A \to B -
Bijective total, surjective, injective, and function A \xrightarrow{\sim} B \mid A \mid = \mid B \mid

4.4.2 Relations on a single set

Property Formal Condition Graph Interpretation
Reflexive \forall x \in A,\; xRx Every node has a self-loop
Irreflexive \neg \exists x \in A,\; xRx No self-loop
Symmetric \forall x,y \in A,\; xRy \implies yRx Every directed edge x \to y is paired with y \to x
Antisymmetric \forall x,y \in A,\; (xRy \land yRx) \implies x = y No bi-directional edges between nodes
Asymmetric \neg \exists x,y \in A:\; xRy \land yRx No self-loop, and no bi-directional edges between nodes
Transitive \forall x,y,z \in A,\; (xRy \land yRz) \implies xRz For any walk v_{1}\cdots v_k, there is a direct edge v_i \to v_k

Note: Irreflexive + Antisymmetric = Asymmetric.

4.4.3 Equivalence relations

A relation is an equivalence relation if it is reflexive, symmetric, and transitive.

Given an equivalence relation R : A \to A, the equivalence class of an element x \in A is the set of all elements of A related to x by R. It is denoted as [x]:

[x] = \{ y \mid xRy \}.

A partition of a finite set A is a collection of disjoint, nonempty subsets A_1, A_2, \ldots, A_n whose union is all of A. The subsets are usually called the blocks of the partition.

The equivalence classes of an equivalence relation on a set A form a partition of A.

4.4.4 Partial orders

A relation R on a set A is a weak partial order, noted as \preceq, if it is transitive, antisymmetric, and reflexive, e.g. \leq.

Relation R is a strong partial order if it is transitive, asymmetric, noted as \prec, e.g. <.

A total order is a partial order in which every pair of distinct elements is comparable.

4.4.5 DAG

Given a partial order \preceq on a set A, the pair (A, \preceq) is called a partially ordered set or poset.

Poset is acyclic.

Given a digraph G = (V, E), the transitive closure of G is the digraph G^+ = (V, E^+) where

E^+ = \{ u \to v \mid \text{there is a directed path from } u \text{ to } v \text{ in } G \}.

4.4.6 Topological sort

A topological sort of a poset (A, \preceq) is a total order (A, \preceq_T) such that:

(A, \preceq) \implies (A, \preceq_T)

Every finite poset has a topological sort.

4.4.7 Parallel task scheduling

A chain in a poset is a total order sequence of elements.

For any finite poset (A, \preceq) whose longest chain has length t, it is possible to partition A to t subsets such that \forall a \in A_i, i=1,2,\dots,t that b \preceq a are in set \cup_1^{i-1}A_i.

Constructive proof is to partition via rank: r(a)\;=\;\bigl(\text{length of the longest chain ending at }a\bigr):

A_i \;:=\; \{\,a\in A : r(a)=i\}, \qquad i=1,2,\dots,t

An antichain in a oset is a set of elements such that any two elements in the set are incomparable.

A_i are antichains.

For all t > 0, every poset with n elements has either a chain of size greater than t or an antichain of size at least n / t.

4.5 Sum and Asymptotics

4.5.1 Basics

By closed form, that is an expression that does not make use of summation or product symbols.

4.5.2 Harmonic number

H_n = \sum_{i = 1}^{n} \frac 1 i

By integral comparison, H_n \sim \ln n.

4.5.3 n!

4.5.3.1 Trivial

By integral comparison:

\begin{aligned} & \ln(n!) = \sum\ln(i), \; \int_1^n \ln(d) \mathrm d x = n\ln(n) - n + 1 \\ \implies & n\ln(n) - n + 1 \leq \sum\ln(i)\leq n\ln(n) - n + 1 + \ln(n) \\ \implies & \frac{n^n}{\mathrm e ^ {n - 1}} \leq n! \leq \frac{n^{n+1}}{\mathrm e^{n-1}} \end{aligned}

4.5.3.2 Stirling's formula

For all n \ge 1:

\begin{aligned} & n! \;=\; \sqrt{2\pi n}\,\Bigl(\frac{n}{e}\Bigr)^{\!n}\,e^{\epsilon(n)}, \;\; \frac{1}{12n + 1} \;\le\; \epsilon(n) \;\le\; \frac{1}{12n} \\ \iff & \ln(n!) = n \ln n - n + \frac 1 2 \ln(2\pi n) + \frac 1 {12n} + \mathcal{O}(\frac 1 {n^3}) \end{aligned}

4.5.4 Asymptotic notation

Notation Read as Definition
O big oh f(x)=O(g(x))\iff\exists\,C>0,x_0\text{ s.t. }\forall x>x_0,\lvert f(x)\rvert\le C\lvert g(x)\rvert.
o little oh f(x)=o(g(x))\iff\displaystyle\lim_{x\to\infty}\frac{f(x)}{g(x)}=0.
\Omega big omega f(x)=\Omega(g(x))\iff\exists\,C>0,x_0\text{ s.t. }\forall x>x_0,\lvert f(x)\rvert\ge C\lvert g(x)\rvert.
\omega little omega f(x)=\omega(g(x))\iff\displaystyle\lim_{x\to\infty}\frac{f(x)}{g(x)}=\infty.
\Theta theta f(x)=\Theta(g(x))\iff f(x)=O(g(x)) \land f(x)=\Omega(g(x)).
\sim asymptotically equivalent f(x)\sim g(x)\iff\displaystyle\lim_{x\to\infty}\frac{f(x)}{g(x)}=1.

4.6 Recurrence

4.6.1 Linear recurrence

Homogeneous linear recurrences are like:

f(n) = \sum_{i=1}^{d}a_if(n - i)

The characteristic equation:

P(r)=r^d-a_1 r^{d-1}-a_2 r^{d-2}-\cdots-a_{d-1} r-a_d=0

If r_1,\dots,r_s are the distinct roots with multiplicities m_1,\dots,m_s (\sum m_j=d), then

f(n)=\sum_{j=1}^s \Big(\sum_{k=0}^{m_j-1} c_{j,k}\, n^{k}\Big)\, r_j^{\,n}

with constants c_{j,k} determined by the d initial values.

4.6.1.1 Numerical solution

To find the roots numerically:

s_n=\begin{bmatrix} f(n)\\ f(n-1)\\ \vdots\\ f(n-d+1) \end{bmatrix},\quad s_{d-1}=\begin{bmatrix}f(d-1)\\ \vdots\\ f(0)\end{bmatrix}, \quad C=\begin{bmatrix} a_1 & a_2 & \cdots & a_{d-1} & a_d\\ 1 & 0 & \cdots & 0 & 0\\ 0 & 1 & \cdots & 0 & 0\\ \vdots & & \ddots & & \vdots\\ 0 & 0 & \cdots & 1 & 0 \end{bmatrix}

Then:

\begin{aligned} & s_{n+1}=C\,s_n \\ \implies & s_n=C^{\,n-(d-1)}\,s_{d-1}\ \ (n\ge d-1) \\ \implies & f(n)=e_1^\top\,C^{\,n-(d-1)}\,s_{d-1},\qquad e_1=[1,0,\dots,0]^\top \end{aligned}

If there is no repeated root:

C=V\Lambda V^{-1},\quad \Lambda=\mathrm{diag}(r_1,\dots,r_d)

Then

C^m=V\Lambda^m V^{-1}=V\,\mathrm{diag}(r_1^m,\dots,r_d^m)\,V^{-1}

With state s_n=[f(n),f(n-1),\dots,f(n-d+1)]^\top and s_{d-1} fixed by the initial values,

s_n=C^{\,n-(d-1)} s_{d-1}=V\Lambda^{\,n-(d-1)} V^{-1}s_{d-1}

Taking the first coordinate gives

f(n)=\sum_{j=1}^{d} A_j\, r_j^{\,n}, \quad A = V^{-1}s_{d-1}, \quad V = \begin{bmatrix} r_1^{d-1}&\cdots&r_d^{d-1}\\ r_1^{d-2}&\cdots&r_d^{d-2}\\ \vdots&&\vdots\\ 1&\cdots&1 \end{bmatrix}

4.6.1.2 Non-Homogeneous

For f(n) = \sum_{i=1}^{d}a_if(n - i) + g(n):

Like homogeneous, solve:

P(r)=r^d-a_1r^{d-1}-a_2r^{d-2}-\cdots-a_d=0.

If r_1,\dots,r_s are the distinct roots with multiplicities m_1,\dots,m_s (\sum m_j=d), then

f_h(n)=\sum_{j=1}^s \Big(\sum_{k=0}^{m_j-1} c_{j,k}\, n^{k}\Big)\, r_j^{\,n}

Guess g(n):

g(n) Try f_p(n) of the form
constant C A
polynomial n^p degree-p polynomial in n
r^n A\,r^n
n^p r^n (degree-p polynomial)\cdot r^n
A\cos(\omega n)+B\sin(\omega n) A'\cos(\omega n)+B'\sin(\omega n)

Resonance rule: if the base (e.g., r or e^{\pm i\omega}) is a root of P with multiplicity m, multiply the whole trial form by n^{m}.

f(n)=f_h(n)+f_p(n)

4.6.2 Arka-Bazzi method

Ref:

  1. Notes on Better Master Theorems for Divide-and-Conquer Recurrences
  2. On the Solution of Linear Recurrence Equations

Suppose that the function T : \mathbb{R} \to \mathbb{R} satisfies the recurrence

T(x) \;=\; \begin{cases} \text{nonnegative and bounded}, & 0 \le x \le x_0\\[6pt] \displaystyle \sum_{i=1}^k a_i\,T\bigl(b_i x + h_i(x)\bigr) \;+\; g(x), & x > x_0 \end{cases}

where

  1. a_1,\dots,a_k are positive constants.
  2. b_1,\dots,b_k are constants in the interval [0,1].
  3. x_0 is large enough so that T is well-defined.
  4. g(x) is nonnegative and \lvert g'(x)\rvert is bounded by a polynomial.
  5. \lvert h_i(x)\rvert = O\bigl(x/\log^2 x\bigr) for each i.

Then

T(x) \;=\; \Theta\!\Bigl(x^p\Bigl(1 + \int_{1}^{x} \frac{g(u)}{u^{p+1}} \,du\Bigr)\Bigr), \; \text{s.t. } \sum_{i=1}^k a_i\,b_i^p \;=\; 1

g(x) \displaystyle \int_{1}^{x}\frac{g(u)}{u^{p+1}}du T(x)
O\big(x^{p-\varepsilon}\big), \varepsilon>0 O(1) \Theta\big(x^{p}\big)
\Theta\Big(\dfrac{x^{p}}{\log^{1+\delta}x}\Big), \delta>0 \Theta(1) \Theta\big(x^{p}\big)
\Theta\Big(\dfrac{x^{p}}{\log x}\Big) \Theta(\log\log x) \Theta\big(x^{p}\log\log x\big)
\Theta\Big(\dfrac{x^{p}}{\log^{c}x}\Big), 0<c<1 \Theta\big(\log^{1-c}x\big) \Theta\big(x^{p}\log^{1-c}x\big)
\Theta\big(x^{p}\big) \Theta(\log x) \Theta\big(x^{p}\log x\big)
\Theta\big(x^{p}(\log x)^k\big), k>-1 \Theta\big((\log x)^{k+1}\big) \Theta\big(x^{p}(\log x)^{k+1}\big)
\Theta\big(x^{p}(\log x)^k(\log\log x)^m\big) , k>-1 \Theta\big((\log x)^{k+1}(\log\log x)^m\big) \Theta\big(x^{p}(\log x)^{k+1}(\log\log x)^m\big)
\Theta\big(x^{p+\varepsilon}\big), \varepsilon>0 \Theta\big(x^{\varepsilon}\big) \Theta\big(x^{p+\varepsilon}\big)

4.6.3 Master theorem

T is a recurrence of the form:

T(n) = a \, T\left(\frac{n}{b}\right) + f(n)

Condition for f(n) T(n)
O\!\left(n^{\log_b a-\epsilon}\right) for some \epsilon>0 \Theta\!\left(n^{\log_b a}\right)
\Theta\!\left(n^{\log_b a}\,\log^{k} n\right) for some k\ge 0 \Theta\!\left(n^{\log_b a}\,\log^{k+1} n\right)
\Omega\!\left(n^{\log_b a+\epsilon}\right) for some \epsilon>0; af(n/b)\le cf(n) for some c<1 and large n \Theta\!\left(f(n)\right)

4.7 Counting

4.7.1 Rules

4.7.1.1 Productive

|\mathop{\Large\times}_{i}P_i| = \prod_i|P_i|

4.7.1.2 Sum

\left| \bigcup_{i=1}^n P_i \right| = \sum_{\emptyset \neq I \subseteq \{1,\dots,n\}} (-1)^{|I|+1} \left| \bigcap_{i \in I} P_i \right|

The above inclusion-exclusion identity holds for any finitely additive measure \mu (e.g., counting measure, probability).

Application (euler function):

Let S_i be the set of integers \le n that are divisible by the prime p_i, |S_i|=\frac{n}{p_i},|S_i\cap S_j|=\frac{n}{p_i p_j}:

|S|=n\Bigl(\sum_i\frac1{p_i}-\sum_{i<j}\frac1{p_i p_j}+\cdots+(-1)^{m-1}\frac1{p_1\cdots p_m}\Bigr)

\phi(n)=n-|S| = n\Bigl(1-\sum_i\frac1{p_i}+\sum_{i<j}\frac1{p_i p_j}-\cdots+(-1)^m\frac1{p_1\cdots p_m}\Bigr)=n\prod_{i=1}^m\Bigl(1-\frac1{p_i}\Bigr)

4.7.1.3 Division

If f : A \to B is k-to-1, then:

|A| = k \cdot |B|

E.g.:

  1. n-objects permutation on a ring is \frac{n!}n = (n-1)!
  2. k-objects subsets over n-objects is \frac{n!}{k!(n-k)!}
  3. the (k_1, k_2, \dots, k_m)-split of A is a sequence (A_1, A_2, \dots, A_m) where A_i\cap A_j = \emptyset \land |A_i| = k_i \land \sum k_i = n; the number of splits is {n!} / {\prod k_i !}

4.7.1.4 Bookkeeper

Let l_1, l_2, \dots, l_m be distinct symbols. Suppose we form sequences (or arrangements) of length k_1 + k_2 + \cdots + k_m, where:

Then the number of distinct sequences is given by:

\binom{n}{k_1, k_2, \dots, k_m} = \frac{n!}{k_1! k_2! \cdots k_m!}

4.7.1.5 Binomial and multinomial

Binomial:

For all n \in \mathbb{N} and a, b \in \mathbb{R}:

(a+b)^n = \sum_{k=0}^{n} \binom{n}{k} a^{\,n-k} b^{\,k}

Multinomial:

(x_1 + x_2 + \cdots + x_m)^n = \sum_{k_1+k_2+\cdots+k_m=n} \binom{n}{k_1, k_2, \dots, k_m} x_1^{k_1} x_2^{k_2} \cdots x_m^{k_m}

4.7.1.6 Pascal's identity

\binom{n}{k} = \binom{n-1}{k-1} + \binom{n-1}{k}

4.7.1.7 Pigeonhole principle

\text{If } n \text{ pigeons are placed into } m \text{ holes, then some hole contains at least } \left\lceil \frac{n}{m} \right\rceil \text{ pigeons.}

4.7.2 Generating functions

Generating functions transform sequences to functions:

\langle g_0, g_1, g_2, g_3, \dots \rangle \;\longleftrightarrow\; g_0 + g_1x + g_2x^2 + g_3x^3 + \cdots

Some examples:

\begin{aligned} \langle 1, 1, 1, 1, \dots \rangle \;& \longleftrightarrow\; 1 + x + x^2 + x^3 + x^4 + \cdots = \frac{1}{1-x} \\ \langle 1, -1, 1, -1, \dots \rangle \;& \longleftrightarrow\; 1 - x + x^2 - x^3 + x^4 - \cdots = \frac{1}{1+ x} \\ \langle 1, 0, 1, 0, 1, 0, \dots \rangle \;& \longleftrightarrow\; 1 + x^2 + x^4 + x^6 + x^8 + \cdots = \frac{1}{1-x^2} \\ \langle 1, a, a^2, a^3, \dots \rangle \;& \longleftrightarrow\; 1 + ax + a^2x^2 + a^3x^3 + \cdots = \frac{1}{1-ax} \\ \end{aligned}

4.7.2.1 Rules

Scaling:

\langle s_0, s_1, s_2, \dots \rangle = F(x) \iff \langle cs_0, cs_1, cs_2, \dots \rangle = cF(x)

Addition:

\begin{aligned} & \langle s_0, s_1, s_2, \dots \rangle = F(x), \; \langle t_0, t_1, t_2, \dots \rangle = G(x) \\ \iff & \langle s_0 + t_0, s_1 + t_1, s_2 + t_2, \dots \rangle = F(x) + G(x) \end{aligned}

Right shifting by k leading zeros:

\begin{aligned} & \langle f_0, f_1, f_2, \dots \rangle \longleftrightarrow F(x) \\ \iff & \langle \underbrace{0,0,\ldots,0}_{k \ \text{zeroes}}, f_0, f_1, f_2, \ldots \rangle \;\longleftrightarrow\; x^k \cdot F(x) \end{aligned}

Differentiation:

\begin{aligned} \langle f_n \rangle \;\longleftrightarrow\; F(x) \iff \langle (n+1)f_{n+1} \rangle \;\longleftrightarrow\; F'(x) \end{aligned}

Product:

\begin{aligned} & \langle a_n \rangle \, \longleftrightarrow\, A(x), \, \langle b_n \rangle \, \longleftrightarrow\, B(x) \\ \iff & \langle c_n \rangle \longleftrightarrow A(x)\cdot B(x), \, c_n = \sum_{k = 0}^{n}a_k\cdot b_{n-k} \end{aligned} \\

Summation is a special case of product, b_n = 1, B(x) = \frac 1 {1 - x}:

\begin{aligned} & \langle a_n \rangle \, \longleftrightarrow\, A(x) \\ \iff & \langle s_n \rangle \longleftrightarrow \frac {A(x)}{1-x}, \; s_n = \sum_{k = 0}^{n} a_k \end{aligned}

4.7.2.2 From generating function to sequence

The sequence f_n for function F(x) is:

f_n = \frac{F^{(n)}(0)}{n!}

which is from Taylor's series.

Partial functions:

[x^n]\frac{c x^a}{(1-\alpha x)^b} = \frac{c \,(n-a+b-1)!}{(n-a)! \,(b-1)!} \, \alpha^{\,n-a}

General linear recurrence: \begin{aligned} & f_n = a_1 f_{n-1} + a_2 f_{n-2} + \cdots + a_d f_{n-d} + g_n, \; n \geq d \\ \implies & F(x) = \frac{h(x) + G(x)}{1 - a_1 x - a_2 x^2 - \cdots - a_d x^d}, \; G(x) = \sum_{n=d}^\infty g_n x^n , \; h(x) = \sum_{i=0}^{d-1} h_i x^i \end{aligned}

4.7.3 Sets

Set A is infinite \iff A bij A \cup \{B\}.

Set C is countable \iff it is finite \lor \mathbb N \xrightarrow{\sim} C

Sets A, B are countable, then A \cup B, A \times B are countable \implies \mathbb Q is countable.

4.7.4 Cantor's theorem

For any set A, the power set \mathcal P(A) has strictly larger cardinality than A itself.

\mathcal P(A) is at least as big as A: there is injection f: A \;\longrightarrow\; \mathcal P(A) by f(a) = \{a\}.

\mathcal P(A) is strictly bigger:

Assume there was a surjection: g: A \;\longrightarrow\; \mathcal P(A):

Define: A_g \;=\;\{\,a\in A : a\notin g(a)\}\;\subseteq A.

By surjectivity, pick a_0\in A with g(a_0)=A_g.

a_0\in A_g. By the definition of A_g, a_0\notin g(a_0). But g(a_0)=A_g, so this says a_0\notin A_g. Contradiction.

a_0\notin A_g. By the definition of A_g, a_0\in g(a_0). With g(a_0)=A_g, this gives a_0\in A_g. Contradiction.

Both branches contradict, so no such surjective g exists. Therefore |\mathcal P(A)|>|A|.

4.8 Probability

4.8.1 Basic

A countable sample space S is a nonempty countable set.

A probability function on a sample space S is a total function \Pr : S \to \mathbb{R}

such that:

A sample space with a probability function, (S, \Pr), is a probability space.

Rules:

\begin{aligned} & \Pr [E] = \sum_{w \in E} \Pr[w] \\ & \Pr[\bar A] = 1 - \Pr[A] \\ \end{aligned}

Ref: \cup_i S_i.

4.8.2 Conditional

\begin{aligned} \Pr[A \mid B] & = \frac{\Pr[A \cap B]}{\Pr[B]} \\ \Pr\!\left[\bigcap_{i=1}^n E_i\right] & = \prod_{k=1}^n \Pr\!\left[E_k \;\middle|\; \bigcap_{i=1}^{k-1} E_i\right] \end{aligned}

Bayes' rule:

\Pr[B \mid A] = \frac{\Pr[A \mid B] \cdot \Pr[B]}{\Pr[A]}

If E_1, E_2, \dots, E_n are disjoint events whose union is the entire sample space S (that is, they form a partition of S), then for any event A:

\Pr[A] = \sum_{i=1}^n \Pr[A \mid E_i] \cdot \Pr[E_i]

4.8.3 Independence

Events A and B are independent if either:

  1. \Pr[B] = 0, or
  2. \Pr[A \mid B] = \Pr[A].

Or:

Events A and B are independent \iff \Pr[A \cap B] = \Pr[A] \cdot \Pr[B] .

4.8.3.1 Mutual independence

A set of events E_1, E_2, \dots, E_n is mutually independent if for every subset S \subseteq \{1,2,\dots,n\}:

\Pr\!\left[\bigcap_{j \in S} E_j \right] \;=\; \prod_{j \in S} \Pr[E_j]

Or, for every subset S \subseteq \{1,\dots,n\} \setminus \{i\}:

\Pr[E_i \mid \bigcap_{j \in S} E_j] = \Pr[E_i]

4.8.4 Random variables

A random variable R on a probability space (S, \Pr) is a total function R : S \;\to\; \mathbb{R},

where:

Thus, each outcome w \in S is assigned a real number R(w).

4.8.4.1 Independence

If each R_i is discrete with countable range S_i, then the following are equivalent:

  1. R_1,\dots,R_n are mutually independent.
  2. For all x_i\in S_i,

\Pr[R_1=x_1,\dots,R_n=x_n]=\prod_{i=1}^n \Pr[R_i=x_i]

4.8.4.2 Distribution functions

Let R be a random variable with codomain V. The probability density function (pdf) of R is a function \text{PDF}_R : V \;\to\; [0,1]:

\text{PDF}_R(x) = \begin{cases} \Pr[R = x], & \text{if } x \in \operatorname{range}(R) \\ 0, & \text{if } x \notin \operatorname{range}(R) \end{cases}

The cummulative distribution function (cdf) is:

\text{CDF}_R(x)=\Pr[R\le x] = \sum_{y\le x} \Pr[R = y]=\sum_{y\le x}\text{PDF}_R(y)

4.8.4.3 Binomial distribution

Think of n repeated, identical trials with only two outcomes each (success/failure). Let X be the number of successes in the n trials. Then X\sim\mathrm{Binomial}(n,p).

\Pr[X=k]=\binom{n}{k}p^{k}(1-p)^{n-k}, \quad k=0,1,\dots,n

4.8.4.3.1 Aproximation

For k=\alpha n with 0<\alpha<1:

\binom{n}{\alpha n}\ \lesssim \frac{2^{\,nH(\alpha)}}{\sqrt{2\pi\,\alpha(1-\alpha)\,n}}

H(\alpha)=\alpha\log_2\!\frac{1}{\alpha}+(1-\alpha)\log_2\!\frac{1}{1-\alpha} is the binary entropy function.

So:

\begin{aligned} f_{n,p}(\alpha n)\ & \lesssim \frac{2^{\,n\big(\alpha\log_2\frac{p}{\alpha}+(1-\alpha)\log_2\frac{1-p}{1-\alpha}\big)}} {\sqrt{2\pi\,\alpha(1-\alpha)\,n}}\ \\ \end{aligned}

4.8.4.3.2 Sampling

Let X_1,\dots,X_n \stackrel{\text{i.i.d.}}{\sim}\mathrm{Bernoulli}(p) with unknown p\in(0,1). Define \hat p=\frac1n\sum_{i=1}^n X_i.

Given accuracy \varepsilon\in(0,\tfrac12) and confidence level 1-\alpha, find the smallest n such that

\sup_{p\in(0,1)} \Pr\big[|\hat p-p|>\varepsilon\big]\ \le\ \alpha

Non-asymptotic (guaranteed) choice via Hoeffding:

For any p\in(0,1),

\Pr\!\big[|\hat p-p|>\varepsilon\big]\ \le\ 2e^{-2n\varepsilon^2}

So it suffices to make 2e^{-2n\varepsilon^2}\le \alpha, i.e.

\boxed{\quad n \;\ge\; \frac{1}{2\varepsilon^2}\,\ln\frac{2}{\alpha}\quad}

A concrete integer choice is n=\left\lceil \frac{1}{2\varepsilon^2}\ln\frac{2}{\alpha}\right\rceil. This bound is uniform in p and needs no approximations.

Chebyshev (very conservative):

\operatorname{Var}[\hat p]=\tfrac{p(1-p)}{n}\le \tfrac{1}{4n}, hence

\Pr\!\big[|\hat p-p|>\varepsilon\big]\ \le\ \frac{1}{4n\varepsilon^2} \quad\Rightarrow\quad n \;\ge\; \frac{1}{4\alpha\varepsilon^2}

Normal/CLT heuristic (often near-minimal):

Worst variance occurs near p=\tfrac12, so

\Pr\!\big[|\hat p-p|>\varepsilon\big]\ \approx\ 2\Phi\!\big[-2\varepsilon\sqrt n\big]\ \le\ \alpha \;\Rightarrow\; \boxed{\; n \approx \frac{z_{1-\alpha/2}^2}{4\varepsilon^2}\;}

where z_{1-\alpha/2} is the standard normal quantile.

4.8.5 Expectation

\mathbb{E}_X[R] = \sum_{w\in\mathcal{S}} R[w]\,\Pr[w]

4.8.5.1 Geometric distribution

Let X \sim \mathrm{Bernoulli}(p) trials, T\in\{1,2,\dots\} = index of the first success:

\Pr[T=t]=p(1-p)^{t-1},\quad \mathbb{E}[T]=\frac1p

Memorylessness (discrete, unique to geometric):

\Pr[T>m+n\mid T>m]=\Pr[T>n],\qquad \Pr[T>n]=(1-p)^n

For a non-repairable item that, in each discrete time step, fails independently with probability p, the time-to-failure (in steps) is T\sim\mathrm{Geom}(p) as above. The Mean Time To Failure (MTTF) equals the expectation:

\mathrm{MTTF}=\mathbb{E}[T]=\frac1p \text{ steps}

4.8.5.2 Rules

Conditional:

Let R be a discrete random variable, and let A be an event with \Pr[A] > 0. The conditional expectation of R given A is:

\mathbb{E}[R \mid A] \;=\; \sum_{r \in \operatorname{range}(R)} r \cdot \Pr[R = r \mid A]

Total:

Let R be a random variable on a sample space S, and let \{A_1, A_2, \dots\} be a partition of S:

\mathbb{E}[R] \;=\; \sum_i \mathbb{E}[R \mid A_i] \, \Pr[A_i]

Linearity:

For any random variables R_1, R_2, \dots, R_k and constants a_1, a_2, \dots, a_k \in \mathbb{R}:

\mathbb{E}\!\left[\sum_{i=1}^k a_i R_i \right] \;=\; \sum_{i=1}^k a_i \, \mathbb{E}[R_i]

Linearity works even if the R_i's are dependent.

Products:

If R_1,\dots,R_k are mutually independent (and have finite means), then

\mathbb{E}\!\left[\prod_{i=1}^k R_i\right]=\prod_{i=1}^k \mathbb{E}[R_i]

Function:

For linear function f:

\mathbb{E}[f(T)] = f(\mathbb{E}[T])

4.8.6 Deviation

The variance of a random variable R:

\operatorname{Var}[R] \;=\; \mathbb{E}\!\big[(R - \mathbb{E}[R])^2\big] = \mathbb{E}[R^2] - (\mathbb{E}[R])^2

The standdard deviation \sigma_R of a random variable R:

\sigma_R = \sqrt{\mathrm{Var}[R]}

4.8.6.1 Rules

Linear Transformations:

\mathrm{Var}[aR + b] = a^2 \, \mathrm{Var}[R]

If R_1 and R_2 are independent random variables:

\mathrm{Var}[R_1 + R_2] \;=\; \mathrm{Var}[R_1] + \mathrm{Var}[R_2]

If R_1, R_2, \dots, R_n are pairwise independent random variables:

\mathrm{Var}[\sum_i R_i] = \sum_i \mathrm{Var}[R_i]

4.8.6.2 Markov's theorem

If R is a nonnegative random variable, then for all x > 0:

\Pr[R \geq x] \;\leq\; \frac{\mathbb{E}[R]}{x}

4.8.6.3 Chebyshev's theorem

For any random variable R, any \alpha>0, and x>0:

\Pr\big[|R|\ge x\big] =\Pr\big[|R|^\alpha \ge x^\alpha\big] \le \frac{\mathbb{E}[|R|^\alpha]}{x^\alpha}

Let C = R - \mathbb E [R], \; \alpha = 2, and recall the variance:

\Pr[|C|\ge x] = \Pr[|R-\mathbb{E}[R]|\ge x] \le \frac{\operatorname{Var}[R]}{x^2}

Let x = c \sigma_R:

\Pr\big[|R-\mathbb{E}R|\ge c\,\sigma_R\big] \;\le\;\frac{\mathrm{Var}[R]}{(c\,\sigma_R)^2} =\frac{\sigma_R^2}{c^2\,\sigma_R^2} =\frac{1}{c^2}

4.8.6.4 Cantelli's inequality

Let \mu=\mathbb{E}[R] and \sigma^2=\mathrm{Var}[R]. Let Z=R-\mu.

For any a\ge 0 and any t>0:

\{Z\ge t\}\subseteq \{Z+a\ge t+a\}\subseteq \{(Z+a)^2\ge (t+a)^2\}

By Markov:

\Pr[Z\ge t]\le \frac{\mathbb{E}[(Z+a)^2]}{(t+a)^2} = \frac{\sigma^2+a^2}{(t+a)^2}

Minimize the RHS over a\ge 0: f(a)=(\sigma^2+a^2)/(t+a)^2 has minimum at a^*=\sigma^2/t, giving:

\Pr[Z\ge t]\le \frac{\sigma^2}{t^2+\sigma^2}

With t=c\,\sigma:

\Pr\big[R-\mathbb{E}[R] \ge c\,\sigma_R\big]\le \frac{1}{1+c^2}

Same to -Z:

\Pr\big[R-\mathbb{E}[R] \le -c\,\sigma_R\big]\le \frac{1}{1+c^2}

4.8.6.5 Chernoff bound

Let T_1,\dots,T_n be mutually independent with 0\le T_i\le1, and set \mu=\sum_i \mathbb{E}[T_i]. For any c\ge1:

\Pr[T\ge c\mu]\;\le\;e^{-k\mu}\quad \text{where }k=c\ln c-c+1

Proof:

For any \lambda>0, by Markov on e^{\lambda T}:

\Pr[T\ge c\mu]=\Pr[e^{\lambda T}\ge e^{\lambda c\mu}] \le \frac{\mathbb{E}[e^{\lambda T}]}{e^{\lambda c\mu}}

Mutual independence gives \mathbb{E}[e^{\lambda T}]=\prod_i \mathbb{E}[e^{\lambda T_i}].

For x\in[0,1], convexity of e^{\lambda x} yields:

e^{\lambda x}\le (1-x)e^{0}+x e^{\lambda}=1+x\,(e^{\lambda}-1)

With \mu_i=\mathbb{E}[T_i], 1+x \le \exp(x):

\mathbb{E}[e^{\lambda T_i}] \le 1+\mu_i\,(e^{\lambda}-1) \le \exp\!\big(\mu_i (e^{\lambda}-1)\big)

Multiplying,

\mathbb{E}[e^{\lambda T}] \le \exp\!\Big((e^{\lambda}-1)\sum_i \mu_i\Big)=\exp\!\big((e^{\lambda}-1)\mu\big)

Therefore

\Pr[T\ge c\mu]\le \exp\!\big((e^{\lambda}-1)\mu-\lambda c\mu\big)

For \lambda\ge0: f(\lambda)=(e^{\lambda}-1)\mu-\lambda c\mu is minimized at \lambda=\ln c:

\Pr[T\ge c\mu]\le \exp\!\big(-(c\ln c-c+1)\mu\big)=e^{-k\mu}

4.8.6.6 Murphy'a law

Let A_1,\dots,A_n be mutually independent events and let T=\sum_{i=1}^n \mathbf{1}_{A_i} be the number that occur. Let \mu=\mathbb{E}[T]=\sum_{i=1}^n \Pr[A_i]:

\Pr[T=0]=\Pr\!\Big[\bigcap_{i=1}^n \overline{A_i}\Big] =\prod_{i=1}^n (1-\Pr[A_i]) \;\le\; \prod_{i=1}^n e^{-\Pr[A_i]} = e^{-\sum_i \Pr[A_i]} = e^{-\mu}

Given 1-x\le e^{-x} for all x.

4.8.7 Random walk

We consider a one-dimensional Markov chain (simple random walk with absorbing boundaries) on the finite state space

\{0,1,\dots,w\}, \quad w\ge 2

The process starts at some initial state n\in\{1,\dots,w-1\}. At each step:

We are interested in the absorption probabilities at the two boundaries. Define

\begin{aligned} a_n & = \Pr\!\bigl[\text{hit }w\text{ before }0 \,\big|\, X_0=n\bigr] \\ b_n & = \Pr\!\bigl[\text{hit }0\text{ before }w \,\big|\, X_0=n\bigr] \\ c_n & = \Pr\!\bigl[\text{never hit } \{0,w\} \,\big|\, X_0=n\bigr] \end{aligned}

4.8.7.1 Recurrence

By conditioning on the first step from state n,

a_n = p\,a_{n+1} + q\,a_{n-1}, \quad 1\le n \le w-1 ; \quad a_0=0, a_w=1

4.8.7.1.1 p \neq q

Try a_n=\lambda^n. Substituting yields the characteristic equation

p\lambda^2 - \lambda + q = 0 \implies \begin{cases} \lambda_1 = 1\\ \lambda_2 = \frac{q}{p} \end{cases} \implies \boxed{\,a_n = \frac{1-\left(\tfrac{q}{p}\right)^n}{1-\left(\tfrac{q}{p}\right)^w}\,}

4.8.7.1.2 p = q

The characteristic equation has a repeated root \lambda=1:

\boxed{\,a_n = \frac{n}{w}\,}

4.8.7.2 Destiny

From any state 1\le i \le w-1, there exists a path to an absorbing boundary in at most w-1 steps (e.g., always step left or always step right). The probability of such a path is at least

\rho := \min\{p^{\,w-1},\,q^{\,w-1}\} > 0

Therefore, the probability of not being absorbed within the next w-1 steps is at most 1-\rho. Repeating over independent blocks of length w-1,

\Pr[\text{not absorbed after }k(w-1)\text{ steps}] \le (1-\rho)^k \xrightarrow{k\to\infty} 0

Thus absorption occurs with probability one, so c_n=0:

\boxed{\,b_n=1-a_n\,}

4.8.7.3 Lifespan

Let T_n=\mathbb E[T\mid X_0=n]. By conditioning on the first step,

T_n=1+p\,T_{n+1}+q\,T_{n-1},\qquad 1\le n\le w-1,\quad(q=1-p)

with boundary conditions T_0=0,\;T_w=0.

4.8.7.3.1 p \neq q

Let r = \frac q p, homogeneous part is T_n^{(h)}=A+B\,r^n.

For particular part, try linear Cn:

Cn=1+pC(n+1)+qC(n-1)=1+Cn+(p-q)C \Rightarrow C=\frac{1}{q-p}

Thus T_n=A+B r^n+\dfrac{n}{q-p}.

Use the boundaries to get A/B:

\boxed{\,T_n=\frac{1}{q-p}\left(n-\;w\,\frac{1-(q/p)^n}{1-(q/p)^w}\right)\,},\qquad (p\neq q).

4.8.7.3.2 p = q

The homogeneous solution is A+Bn. Try a quadratic particular solution Cn^2:

(T_{n+1}-2T_n+T_{n-1})=2C=-2\;\Rightarrow\;C=-1

So T_n=A+Bn-n^2. Apply T_0=0\Rightarrow A=0 and T_w=0\Rightarrow Bw-w^2=0\Rightarrow B=w:

\boxed{\,T_n=n(w-n)\,}

4.8.8 Applications

4.8.8.1 Birthday paradox and hash collision

Let N be the number of possible "birthdays" (e.g., N=365), and let k samples be drawn uniformly and independently from \{1,\dots,N\}, then:

\Pr[\text{no-coll}] =\frac{(N)_k}{N^k} =\frac{N!}{(N-k)!\,N^k}

Take logs and apply Stirling's formula:

\begin{aligned} \ln \Pr_{\text{no-coll}} &\approx\big[N\ln N-N+\tfrac12\ln(2\pi N)+\tfrac{1}{12N}\big] \\ &\quad-\big[(N-k)\ln(N-k)-(N-k)+\tfrac12\ln(2\pi(N-k))+\tfrac{1}{12(N-k)}\big] -k\ln N \\ & \approx -\frac{k(k-1)}{2N}\;-\;\frac{k(k-1)(2k-1)}{12N^2} \;+\;\mathcal O\!\left(\frac{k^4}{N^3}\right) \end{aligned}

So, \Pr_{\text{coll}}\approx 1-\exp\!\left(-\frac{k(k-1)}{2N}\right), \;\; k=o (\sqrt n).

For hashing function with range size N, collision existence probability \Pr, the evaluation times k:

k \;\approx\; \sqrt{\,2\ln\!\frac{1}{1-\Pr} }\sqrt N

\Pr = 10^{-6} \implies k \approx 0.0014 \sqrt N.

4.8.8.2 Nontransitive dice

Let each die D = \{d_1,\dots,d_m\} be a multiset of integers with uniform distribution.

For two dice D_i, D_j:

P[D_i \succ D_j] = \frac{1}{m^2}\sum_{a=1}^m \sum_{b=1}^m \mathbf{1}[d_{ia} > d_{jb}]

There is a collection of dice \{D_1, \dots, D_k\} is nontransitive if there exists a cycle such that:

P[D_1 \succ D_2] > \tfrac{1}{2}, \quad P[D_2 \succ D_3] > \tfrac{1}{2}, \quad \dots, \quad P[D_k \succ D_1] > \tfrac{1}{2}

And for k \geq 3, there exist nontransitive {D_k}, Ref: Maximally Nontransitive Dice .

4.8.8.3 Coupon collector problem

To collect all n distinct coupons. Each draw gives a random coupon, each type equally likely and independent. What is the expection of number of draws to collect all the coupons?

\mathbb{E}[X_k]=\frac{1}{(n-k)/n}=\frac{n}{\,n-k\,}

Then:

\mathbb{E}[T] = \sum_{i = 0}^{n - 1} \mathbb E [X_i] = n H_n \sim n\ln n

4.8.8.4 Penney's game

The original HHT vs HTT problem:

Two players watch a single run of fair coin tosses. A wins if the pattern HHT appears first; B wins if HTT appears first.

Modeling the process by states defined as "the longest suffix of the running tosses that is also a prefix of either target".

Let p_0 be A's win probability from the start, and p_H, p_{HH}, p_{HT} from states ending with (H, HH, HT) respectively. The fair-coin transitions give:

Solving yields p_H=\tfrac23, hence p_0=\tfrac23.

The length-3 'counter sequence' rule:

In Penney's game, the second player can always pick a length-3 pattern that beats the first player's choice with probability (2/3). If the first player chooses S=a_1a_2a_3 (each a_i\in\{H,T\}), the counter is

S^\star=\overline{a_2}a_1a_2

This construction exploits prefix-suffix overlaps: whenever play has lined up the opponent's first two symbols, your counter tends to complete one step sooner.

For length \geq 3, there is always counter sequence: Optimal Penney Ante Strategy via CorrelationPolynomial Identities.