Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Machine learning ensemble methods Machine learning ensemble methods Machine learning ensem, Lecture notes of Mathematical Methods

Machine learning ensemble methods Machine learning ensemble methods Machine learning ensemble methods Machine learning ensemble methods Machine learning ensemble methods Machine learning ensemble methods Machine learning ensemble methods Machine learning ensemble methods Machine learning ensemble methods

Typology: Lecture notes

2020/2021

Uploaded on 02/24/2023

yasmin-jwabreh
yasmin-jwabreh 🇵🇸

5 documents

1 / 36

Toggle sidebar

Related documents


Partial preview of the text

Download Machine learning ensemble methods Machine learning ensemble methods Machine learning ensem and more Lecture notes Mathematical Methods in PDF only on Docsity! ENCS5341 Machine Learning and Data Science Linear Algebra and Probability Review Yazan Abu Farha - Birzeit University Slides are based on Stanford CS229 course Linear Algebra Diagonal matrices • A diagonal matrix is a matrix where all non-diagonal elements are 0. This is typically denoted 𝐷 = 𝑑𝑖𝑎𝑔(𝑑", 𝑑#, … , 𝑑!), with 𝐷'( = -𝑑', 𝑖 = 𝑗 0, 𝑖 ≠ 𝑗 • For example the identity matrix 𝐼 = 𝑑𝑖𝑎𝑔 (1,1, … , 1) 4 Vector-Vector Product • inner product or dot product • outer product 5 Matrix-Vector Product • If we write A by rows, then we can express Ax as, 6 Matrix-Vector Product • It is also possible to multiply on the left by a row vector. • expressing A in terms of rows we have yT is a linear combination of the rows of A. 9 Matrix-Matrix Multiplication (different views) 1. As a set of vector-vector products (dot product) 2. As a sum of outer products 10 Matrix-Matrix Multiplication (different views) 3. As a set of matrix-vector products. 4. As a set of vector-matrix products 11 Norms • A norm of a vector ∥x∥ is informally a measure of the “length” of the vector. • More formally, a norm is any function f : ℝ! → ℝ that satisfies 4 properties: 14 Examples of Norms The commonly-used Euclidean or £2 norm, n lIxll2 = ,] 52 x?. i=1 The £; norm, n lIxlla = 50 [xi i=1 The £,, norm, \|x||oo = max; |x;|. 15 The Inverse of a Square Matrix @ The inverse of a square matrix A € R™” is denoted A~!, and is the unique matrix such that A1A=1=AA1. @ We say that A is invertible or non-singular if A! exists and non-invertible or singular otherwise. @ In order for a square matrix A to have an inverse A~!, then A must be full rank. @ Properties (Assuming A, B € R”*" are non-singular): > (AT) 1=A > (AB)! = BA“ > (A-1)7 =(A’)-1. For this reason this matrix is often denoted A~’. 16 Definitions, Axioms, and Corollaries @ Performing an experiment — outcome @ Sample Space (S): set of all possible outcomes of an experiment e Event (E): a subset of S (E C S) @ Probability (Bayesian definition) A number between 0 and 1 to which we ascribe meaning i.e. our belief that an event E occurs e@ Frequentist definition of probability P(E) = lim ™E) n-oco n 19 Definitions, Axioms, and Corollaries Axiom 1: Axiom 2: Axiom 3: Corollary 1: Corollary 2: Corollary 3: 0<P(E)<1 P(Sy=1 If E and F are mutually exclusive (EM F = @), then P(E) + P(F) = P(E UF) P(E®)=1—P(E) (= P(S) — P(E) E CF, then P(E) < P(F) P(E UF) = P(E) + P(F) — P(EF) (Inclusion-Exclusion Principle) 20 Conditional Probability and Bayes’ Rule For any events A, B such that P(B) 4 0, we define: P(An B) P(B) Let's apply conditional probability to obtain Bayes’ Rule! P(BNA) — P(ANB) P(A) P(A) P(B)P(A| B) P(A) P(A| B) := P(B| A) = Conditioned Bayes’ Rule: given events A, B, C, P(A| B,C) = reo) P(B|A,C)P(A| C) 21 Chain Rule For any n events Aj,..., An, the joint probability can be expressed as a product of conditionals: P(A, NAN... A An) — P(A1)P(A2 | A1)P(A3 | A2m A1)..-P(An | An-1N An-2N....9 A1) 24 Independence Events A, B are independent if P(AB) = P(A)P(B) We denote this as A | B. From this, we know that if A _L B, P(ANB) _ P(A)P(B) P(B) ~——~P(B) Implication: If two events are independent, observing one event does not change the probability that the other event occurs. In general: events A;,..., A, are mutually independent if P(()Ai) = [[ P(A) ieS ieS P(A| B) = = P(A) for any subset S C {1,..., n}. 25 Random Variables @ A random variable X is a variable that probabilistically takes on different values. It maps outcomes to real values @ X takes on values in Va/(X) C R or Support Sup(X) @ X =k is the event that random variable X takes on value k Discrete RVs: @ Val(X) is a set @ P(X =k) can be nonzero Continuous RVs: @ Val(X) is a range @ P(X =k) =0 for all k. P(a < X < b) can be nonzero. 26 Probability Density Function (PDF) PDF of a continuous RV is simply the derivative of the CDF. dF x(x) fx (x) = f(x) = dx Thus, b P(a< X < b) = Fx(b) — Fx(a) = / F(x) dx A valid PDF must be such that @ for all real numbers x, fx(x) > 0. @ [°) fx(x)dx =1 29 Expectation Let g be an arbitrary real-valued function. @ If X is a discrete RV with PMF px: Ele(X)]:= Sd) e(x)px(~) x€ Val(X) e If X is a continuous RV with PDF fx: Ble(X)] = | wlxfel)ee Intuitively, expectation is a weighted average of the values of g(x), weighted by the probability of x. 30 Properties of Expectation For any constant a € R and arbitrary real function f: e E[a] = a e Elaf(X)] = aE[F(X)] Linearity of Expectation Given n real-valued functions f,(X), ..., fr(X)), EL). f(X)] = 5 E[A(X)] i=1 i=1 31 Joint and Marginal Distributions e Joint PMF for discrete RV's X, Y: Pxy(x,y) = P(X =x, Y =y) Note that yoxeVal(X) yeval(Y) pxy(x, y) = 1 e@ Marginal PMF of X, given joint PMF of X, Y: px(x) = 0 pxy(x,y) 34 Joint and Marginal Distributions e@ Joint PDF for continuous X, Y: 0° Fxy(x, y) fyy (x, y) = Oxdy Note that [°° [°° fxy(x, y)dxdy = 1 @ Marginal PDF of X, given joint PDF of X, Y: fy (x) = [- fxy (x, y)dy —oo 35