

Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
CS 188 Final Cheat Sheet covering key concepts
Typology: Study notes
1 / 2
This page cannot be seen from the preview
Don't miss anything!


rand
:
init
:
·
·
= comp .
res
to
all
vars S
·
var.
fix evid. Var .,
rand. Set
encoding belief & timestep
non-evid
. Var Sample
: S-P(S/c ,
r ,
: C-p(CIS , r) Sample
: W-P(WIs , r)
·
.
gen
. Subsea. States by
new
set
a
:
of trans.
time sample
new val. Chosen Var. to &
,
all
we
we have
a
distribution,
,
gen.
new sample
need to
know is G
,
,
and Go
bunch of
Xi
Xn)
=
samples of
Wo. our
Amatrix
Mult
:
(mxn)
trans.
model
: weather forecasting
·
us
evid.
Sample wo
...
Simplies
states of w ,
po ① US calc prob &
form
dist of Wo , (F,
b
E
use that to F
Calc W.
and so on
. State var Wi (the weather
i
Upd)
·
evidence var Fi (the forecast
i)
·
· Sensor model P(F : Wil
Transition
P(Wi+ , I Wil
:
4 11
,
/Po
,
,
--
like
(transition model)
above
Comput
You see some evid.
exp when
more
& time t given there's
scomplex
parts
in green
:
B(Wi+
)dP(fi
Wi
T
time elaspe upd
:
Step D
B' (Wit) =
Step
B(Wi +
& P(fi + 1) Witi) B' (Wit1) M
belief dist of
n f
, ...,
fi
② is
= P(Wilf ,...,
fi) Similar
to
inf.
B' (Wil
=
,...,
Particle Filtering
Parts
in
Purple HMM
to
/XI)
·
·
Upd : Obs et + 1
=
P(t+ + 1)X = + 1) -
②
· Normalize
all
·
: n
,
w/replacement
weighted
·
·
exact interference is
domain
grows too large
·
②
net
e every iteration use trans. Model
mult
. by prob of
Seeing Curr. evid given State of sample
Prob given
Yoursensoris nevergoingto give a
can
help you
that sample
&
:
up
..
resamp
/ >
I
=
)
= 0
,
weight limes
particles : particles
: particles
:
↑ a rficus
if you have
enn,
particles
exact prob -
,
= .
4
Machine Learning
· core idea : we give machines access
to data & they
learn
for themselves
Data
is often
al
weighs
i
e
dataset can
be divided in to
should
y
13 , do
:
set
of features ,
Set of classes
,
(a) classify
the sample using
Training Set
: used to fit model the curr· weights
Validation Set
: used to
tune hyperparam
. class predicted
byetybetheWi
(learning
rate ,
model struct , etc
. )
1
if
=
Test Set
: used to test the
entire model
S
Wif(x))
·
types of
machine learning prob .
y
= classify
=
-1 if activationw (X)
=
WTf(X) < 0
: try to est. Some
numerical Val from data
(b) Compare the pred. label y to the
& bunch
of lines on
Plot ,
truelabe
nothinga upd
line of best fit
ex
:
finds
line of best fit to find what
Your
weights : w
f(x)
:
try
training sample wo having
to upd.
data into discrete
classes
your weights (all samples pred
.
Pixels are features in
terminate. Else ,
repeat
img. Of
a # &
,
what Step
def .
the line drawn
: try to group Similar eX
: 2D data
data into clusters
naturally >0 Perception
is pos
·
Types
of
< perception
is neg.
3 supervised
:
x 3
40 - 1 = - 1
has labels ,
e. g. classification & if the
res .
happens to be incour
ex
:
or
then we
upd. Our weight by
:
feat.
has no
labels ,
e. g.
Clustering
4[-i]s
& don't
know exactly what
You're looking for, but
Want to see
if there is
Networks :
naturally see
what Struct
appear
·
prob
are non-linear
·
network
Naive Bayes
· Goal
: Create a
model that can pred
a label
y given
features ,
where we instead of
having binary
affected - 1 ,
I choices
after each
Pred .
label y
Y feat.
node ,
we use a non-linear
ex
: Spam
filter
L activation
func.
·
F. F ... Fr &
NodesStill
dot prod .
their
·
whether word
i inputs w/ their own weight
appears in the
email rectors
· Label email based on
the higher
of these use gradient descent to
two prob:
Upd. all weights-
P(y
= f
=
X
G
P(y = Spam /Fi
= f ,... ,
Fn
=
assume that there is any
type
non-linearity , 00
of relationship blu
words allowing us to
We can
pred.
get dist
for y by multiplying proby given each more
complex classes X
of the
feature var.
=>
come up
we a neural
· Generalized
: network
that is complex
, ... ,
fn)
= argmaxP(y
=
= f , ...
Fu
=
enough ,
we can come up w/
a line that inc.
all circles
argmaxF
create multles
a
Class
we're
of
some
1
Planning
Xi ~
to classify
I
↓ Maximum
Likelihood EstimationLimes
each t a
a
· How to est. CPTs since we don't actually
feat.
know them
(input) ; do X 3
for each node ↑
parameter
est
. W/MLE inputs/feat . I weight
· find Prob. (CPT
val) O
) Such
that we
maximize the likelihood of observing ist layer (made of a bunch
of nodes
Our observation ,
eage is
network is still linear A CPT blu we're
· Ans. is actually fairly intuitive. Y S mult. # by features
& fle ,
=
=
F. Fz ... FN
=
ex. w(y
=
PFMEexw(y=
aka activation
·
that
have
us ,
prob given that
isspam,ithassome word
ina
word in
it
perception
·idea:
linearlyseparates datain
a
a
. by a set of weights
· if data
is linearly separable ,
the
alg. Will perfectly classify the data
· to find boundaries -w
WTf(x)
= 0
that don't have
to cross
the origin ,
that always has val
of
Gi
· creating perception model
:
wehave a bunchofdataa
kind F
of graph a
. Where
each
feat
neg ., so
want to *
perception figures
out what line
build
model that can
you can draw to
Perfectly
div. 2
test Logistics Reg. Conti inference by enum.
·
: instea of simply using
WiX (binary Perception) & LPCTPC, S)
,
on wiX
·
and I
:
can pred.
Prob .
binary
4
:
perceptron
rand
.
·ifwecan
getasetof weights
samples discard sampstate
inconsistent wh
logistic
give us
.
· May have to compute
· How do we compute
Use
large # of samples
radient descent for
·
see also
:
· eX
: gen. the following
regression ,usingsoftmaxfuna
Samples
C ,
iS ,
r ,
w
Gradient Ascent
!
Descent
:
S ,
S ,
Lr ,
w
·
given acc. measure, C P(W
=
w)
in order
to come up
S
L
S
RP(w
=
=
Wbetternetworkaa
or
dec. our loss
· Goal
: Want to find Param .
· For i
= 1 ,
2 , ...,
n (in topolog
loss
order (
Sample X : from
·
formula for P(X : /
global
Optimum does
not exist ,
· Return
,
...,
can use gradient ascent/
· Observation
: gradient is dir.
Samp.
following the gradient ,
we can
X
idea
: we can
imm. Stop
chase
maxima/minima
gen.
Samples as soon
as
· Gradient ascent
: will not
they
become inconst. w/
rand.
Initialize w accept our
evid.
While
w not converged do
:
any still
discard most
w = w
samples but takes
end
less time
to gen.
·
:
those
& rand. Initialize w evid ;
·
While w not
converged do :
: evidence
w = w
(w)
e, ..., 2
end
For i,from
Traing Neural
Parents
leat
· set
the weights to some
initial
Val.
reject
: return
& no sample
is
· input training data , run
forward pass
to generate val
gen.
in this
at all modes ,
calc. loss func.
cycle
·
run
backwards pass-calc
. the we want P(CIr , w -
gradient of loss
w/ respect
to can
throw
away
each of weights
samples
w/ ir orTw
·
descent
to upd.
all weights
C ,
iS ,
r ,
w
· repeat w/ more
data
i
:
S ,
S ,
r ,
w
·
graph ·
use a
dirrob
.
table to
L
C
S
encode relationships blw var. S R
Prob. of an assignment
P(X , Xz , ...,
& L
W S eX : Alarm (A) goes off if there's a
:
burglary (B) or
earthquake (E) ,
res .
in John (5) or Mary (M)
calling
E Likelihood weighting
logistic
fun
· CPTs
:PCBI
,
PLE
· idea : Rejection sampling
· takes
in some
input from
may reject a
lot of
to 0 and maps from
0 to P(5IA)
,
PCMIAlyv
Y
M
samples
if evid. is
Uses : · P(
a ,
j ,
= Unlikely
the
if used as activation func. p(
. P(
. P(
.
Var. While we
which you can apply imm. P(
after a linear layer weight
sample by
don't
for get
bis ,Sa
& pred. the class
of something prob.
of evid
Ind in Bayes
Net
:
given
Parents
val
·
ex:
if ourneural
networkis
as
node is cond , ind.
Of all samples are
used!
to o , sigmoid
func will map all its ancestor
· downstream
var. inf.
the graph
var ,
but
if given-o
, it will given all of its parents upstream
var.
likely be class of O
node is cond. ind.
Of
want
to see all
3 + 1000 ,
it will likely
be all other var. given
its evid .,
not just
class
of 1 b/C very
big Markov blanket (consisting that
which inf.
anything
in middle like of its
parents ,
children ,
downstream
vars.
·
Probability like 0. 5 saying
take
al
at
input
: evidence
i t 's 50/50 e, ..., 2
i inv. the
var. 3
Forivid.
Logistic
Regression :
1
X by
hw(X)
=
var.
all
factors inv. X
to elim -Xi
= Obs.
tune our
out 4 value ,
weights , such
that when we
than Factors
: unnorm.
Prob
Set W
= W
X ,
proportional
to actual
layer
in NN ,
. by
enum. Prob , but doesn't sum
to
, Xz, ...,
,
w
we get
val from o to
+/
ike a
prob
dist-
shoulda
network w/
just I layer
more unnec. terms outside