Linear Vector Spaces, Bases, Frames, Matrices, Probability, and Random Variables - Prof. C, Study notes of Electrical and Electronics Engineering

Notes on various topics in linear algebra, probability theory, and statistics, including linear vector spaces, bases and frames, matrices, probability, and random variables. It covers concepts such as vector spaces, norms, frames and bases, eigenvectors and eigenvalues, probability theory, random variables, and distribution functions.

Typology: Study notes

Pre 2010

Uploaded on 08/05/2009

koofers-user-j3z
koofers-user-j3z 🇺🇸

9 documents

1 / 6

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Notes for ECE8833 Lecture #2 - Math review (1/08/2009)
1. Linear vector spaces
We will view vectors are collections of numbers x= [x1, x2, . . . , xn] so that xRn
Need to define a way to measure length and distance. Usually based on the `2norm.
hx, yi=Pn
i=1 xiyi, implying that kxk2=phx, xi= (Pn
i=1 x2
i)1/2
Vectors are orthogonal if hx, yi, and orthonormal if they also have kxk=kyk= 1
We will often want to generalize this to the p-norms, where we define length by
kxkp= (Pn
i=1 |xi|p)1/p
Special cases of interest:
kxk1=Pn
i=1 |xi|
kxk= limp→∞ kxkp= maxixi
kxk0= limp0kxkp=PiI(xi>0)
The p-norms are convex for 1 p <
2. Bases and frames
A set of nvectors {φi}where φiRnis called a basis (or complete set) if for every x
there exists a set of coefficients {ci}such that we can write
x=
n
X
i=1
ciφi.
Note that {φi}must be linearly independent, meaning that φi=Pi6=jcjφjfor any set
of coefficients {cj}.
Note that {φi}does not need to be orthonormal (or even orthogonal), but if it is, we
call it an orthonormal basis (ONB) and ci=hx, φiiis the unique set of coefficients
representing x.
In an ONB, we have Parseval’s theorem, which tells us that the energy in the coefficients
is the same as the energy in the signal, kxk2=Pn
i=1 c2
i.
A set of mvectors {φi}where φiRnis called a frame (or overcomplete set if m>n)
if for every x
Akxk2
m
X
i=1
|hx, φii|2Bkxk2
for some constants 0 < A B < . For the finite cases we consider, the main concern
is A0.
1
pf3
pf4
pf5

Partial preview of the text

Download Linear Vector Spaces, Bases, Frames, Matrices, Probability, and Random Variables - Prof. C and more Study notes Electrical and Electronics Engineering in PDF only on Docsity!

Notes for ECE8833 Lecture #2 - Math review (1/08/2009)

  1. Linear vector spaces We will view vectors are collections of numbers x = [x 1 , x 2 ,... , xn] so that x ∈ Rn Need to define a way to measure length and distance. Usually based on the `^2 norm. 〈x, y〉 = ∑ni=1 xiyi, implying that ‖x‖ 2 =

〈x, x〉 = (∑ni=1 x^2 i )^1 /^2 Vectors are orthogonal if 〈x, y〉, and orthonormal if they also have ‖x‖ = ‖y‖ = 1 We will often want to generalize this to the p-norms, where we define length by ‖x‖p = (∑ni=1 |xi|p)^1 /p Special cases of interest: ‖x‖ 1 = ∑ni=1 |xi| ‖x‖∞ = limp→∞ ‖x‖p = maxi xi ‖x‖ 0 = limp→ 0 ‖x‖p = ∑ i I(xi > 0) The p-norms are convex for 1 ≤ p < ∞

  1. Bases and frames A set of n vectors {φi} where φi ∈ Rn^ is called a basis (or complete set) if for every x there exists a set of coefficients {ci} such that we can write

x =

∑^ n i=

ciφi.

Note that {φi} must be linearly independent, meaning that φi = ∑ i 6 =j cj φj for any set of coefficients {cj }. Note that {φi} does not need to be orthonormal (or even orthogonal), but if it is, we call it an orthonormal basis (ONB) and ci = 〈x, φi〉 is the unique set of coefficients representing x. In an ONB, we have Parseval’s theorem, which tells us that the energy in the coefficients is the same as the energy in the signal, ‖x‖^2 = ∑ni=1 c^2 i. A set of m vectors {φi} where φi ∈ Rn^ is called a frame (or overcomplete set if m > n) if for every x A‖x‖^2 ≤

∑^ m i=

|〈x, φi〉|^2 ≤ B‖x‖^2

for some constants 0 < A ≤ B < ∞. For the finite cases we consider, the main concern is A ≥ 0.

Generalization of a basis. No longer a basis when m > n. Can still find coefficients such that x =

∑^ n i=

ciφi

but the solution is no longer unique. There does exist a set of vectors (called the canonical dual set) such that x =

∑^ n i=

〈x, φ˜i〉φi

but in general φ˜i 6 = φi. Special cases: A = B = mn is called a tight frame, and in this case the dual set is easy to find, φ˜i = (^) A^1 φi. A = B implies that the frame is an ONB.

  1. Matrices Given a square n × n matrix A, an eigenvector v satisfies Av = λv for some scalar eigenvalue λ. Eigendecomposition of a matrix is: A = P DP −^1 , where D is a diagonal matrix of all eigenvalues and P is a matrix with eigenvectors on the columns. When A is symmetric, the eigenvectors are orthogonal. This implies that P −^1 = P , so we can write A = ∑^ λivivTi. A matrix A is called positive definite if xAxT^ > 0 for all x 6 = 0. This implies that λi > 0.
  2. What is probability? Frequentist view: the relative number of occurrences of an event if the number of trials went to infinity. Makes us feel good to relate to experiments, but kind of a fallacy since we’re relying on a limiting argument. Bayesian view: a measure of our belief or certainty of a potential outcome (“70% chance of rain today”). More abstract and only related to the outcome of a single event, but a powerful tool to make decisions with. Axiomatic view: none of this matters....let’s just define some functions and move on.
  3. Sample space: Consider an experiment with a set of possible outcomes Ω. Could be a continuous or discrete set. Ω is called the sample space. EX: Ω = { 1 , 2 , 3 , 4 , 5 , 6 }, Ω = {red, blue} or Ω = RN
  4. Probability: Consider a subset of the possible events, A ⊆ Ω. Define Pr [A] as the probability that A occurs. Probability satisfies the following axioms:

pX (xi) = ∑ j pX,Y (xi, yj ) Expectations add: E [X + Y ] = E [X] + E [Y ] Covariance is a measure of joint variance: Cov (X, Y ) = EX [(X − EX [X])(Y − EX [Y ])] = EX [XY ] − EX [X] EX [Y ] Variances DO NOT ADD in general: Var (X + Y ) = Var (X) + 2Cov (X, Y ) + Var (Y ) Conditional probabilities: What is the probability distribution on X if we know that Y = y? pX|Y (x|Y = y) = pX,Y (x, y) /pY (y) Independence: Two or more RVs are independent if pX,Y (x, y) = pX (x) pY (y). This also implies pX|Y (x|y) = pX (x). In other words, the outcome of Y does not affect X. When two RV are independent, Var (X + Y ) = Var (X) + Var (Y ) Correlation: Related to covariance Cor (x, y) = Cov E[x]E(x,y[Y ]) If E [XY ] = E [X] E [Y ], then Cor (X, Y ) = 0 and we call these RVs uncorrelated. Correlation is a weaker notion than independence. Independence implies correlation, but not the other way around.

  1. Bayes theorem p (x|y) = p p(x,y(y)) p (y|x) = p p(x,y(x)) This implies the product rule/chain rule: p (x, y) = p (y|x) p (x) = p (x|y) p (y) Put all together: p (x|y) = p(y| px()yp)(x) Think about this in the context of decision making. Consider a system with unknown input X and observed output Y. This says that we can reason which input was most likely if we know the system behavior and the probability distribution on the input.
  2. Random vectors: We can extend the idea of multiple random variables to a random vector X = [X 1 ,... , XN ]T^. Many of the notions we have already defined extend exactly as you would expect. PDF: p (X) = p (X 1 ,... , XN ) Expectation: E [X] =

−∞ Xp^ (X)^ dX^ =^ μ Covariance matrix: KX = E

[

(X − EX [X])(X − EX [X])T^

]

= E

[

XXT^

]

− μμT

[KX ]i,j = Cov (Xi, Xj ) and [KX ]i,i = Var (Xi) KX is a symmetric, semi-positive definite matrix (more on this later) If M is a K × N matrix, then multiplying the X affects the mean and variance E [M X] = M μ Var (M X) = M KX M T

  1. Common continuous distributions

(a) Uniform (continuous): X ∼ Uniform(a, b) p (x) =

1 /(b − a) a < x < b 0 otherwise (b) Exponential: X ∼ Exponential(λ) p (x) =

λe−λx^ x ≥ 0 0 otherwise (c) Laplacian: X ∼ Laplacian(μ, σ^2 ) p (x) = √ 21 σ 2 e

−| √x−μ| σ^2 / 2 (d) Gaussian/Normal: X ∼ N (μ, σ^2 ) p (x) = √ 21 πσ 2 e−(x−μ)

2 2 σ^2

  1. Central Limit Theorem (CLT) Kolmogorov: “CLT is a dangerous tool in the hands of amateurs” Given {Xi} i.i.d. RVs with E [Xi] = 0 and Var (Xi) = σ^2 Distribution of (^) N^1 ∑Ni=1 Xi goes to N (0, σ^2 ) as N → ∞
  2. Common discrete distributions

(a) Bernoulli: X ∼ Bernoulli(p)

p (x) =

1 − p x = 0 p x = 1 0 otherwise (b) Uniform (discrete): X ∼ Uniform(m) p (k) =

1 /m k = { 0 , 1 ,... , m − 1 } 0 otherwise