Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Log in Sign up

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

CS 188 Final Cheat Sheet, Study notes of Computer Science

University of California - Berkeley Computer Science

CS 188 Final Cheat Sheet covering key concepts

Typology: Study notes

2024/2025

Uploaded on 12/10/2025

tamnhi-vu 🇺🇸

2 documents

1 / 2

This page cannot be seen from the preview

Don't miss anything!

rand

HMM

(hidden

Markov

Model)

:

Gibbs

sampling

init

:

4-

·

obs

evidence

e

timestep

a

1

·

States

=

comp

.

assignments

res

incorporate

it

into

our

model

to

all

vars

S

·

State

var

:

random

var

.

2var

.

fix

evid

.

Var

.,

rand

.

Set

①

encoding

belief

&

timestep

can't

non-evid

.

Var

Sample

:

S-P(S/c

,

r

,

+ w)

Sample

:

C-p(CIS

,

r)

Sample

:

W-P(WIs

,

r)

getting

·

evidence

var

:

rand

.

Var

change

.

gen

.

Subsea

.

States

by

new

set

encoding

obse

a

timestep

at

same

looping

through

non-evidvar

:

Markov

ASS

.

of

trans

.

models

so

Represent

HMM

using

initial

time

sample

new

val

.

Chosen

Var

.

to

&

if

we

are

trying

to

figure

out

prob

-

Of

G3

,

all

we

have

a

distribution

,

transition

model

,

gen.

new

sample

need

to

know

is

G2

,

knowing

G

,

and

Go

bunch

of

-

P(Xi

X

,

...,

Xi

-

,

Xi

+

,

...,

Xn)

=

samples

of

Wo

.

our

&

sensor

model

P(X

:

/markov-blankets

(v

:

1)

Amatrix

Mult

:

(mxn)

(nxP)

trans

.

model

3

ex

:

weather

forecasting

·

considers

downstream

&

-

gives

us

a

upstream

evid

.

must

match

Sample

wo

-

>

w

.

-

>

we

- ws

...

Simplies

t

states

of

w

,

po

US

①

calc

prob

&

form

dist

of

Wo

,

(F

,

b4

E2

F3

3

use

that

to

F2

Calc

W

.

②

read

below

and

so

on

.

State

var

Wi

(the

weather

(timelaspe

on

day

i

Upd)

·

evidence

var

Fi

(the

forecast

on

day

i)

·

Initial

distP(Wol

·

Sensor

model

P(F

:

Wil

*

use

.

Trans iti on

model

P(Wi+

,

I

Wil

for

2

things

needed

to

def

HMM

:

basic

4 11

Prob

of

P

,

/Po

,

Palp

,

--

like

HMM

(transition

model)

above

,

↳

2)

evidence

prob

aka

becomes

sensor

model

:

Prob

Comput

You

see

some

evid

.

exp

when

more

&

time

t

given

there's

some

state

&

time

+

scomplex

Shown

2

parts

in

green

-

Forward

Alg

:

above

for

HMM

↳

B(Wi

+

)dP(fi

+

Wit)[P(Wit

WilB)Wil

Wi

T

time

elaspe

upd

:

adv

.

Model's

State

by

one

timestep

Step

D

-

↳

B'

(Wit)

=

P(W

:

+,

(wi)B(wi)

Obs

.

upd

:

incorporate

new

ev as

Step

↳

B(Wi

+

1)

&

P(fi

+

1)

Witi)

B'

(Wit1)

M

↑

belief

dist

of

time

i

given

obs

.

evid.

n

f

,

...,

fi

Step

②

is

C

n

&

B(Wi)

=

P(Wilf

,...,

fi)

Similar

to

exact

inf.

↳

B'

(Wil

=

P(Wilf

,...,

fi

-

1)

W/Baye 's

Shown

2

~

Particle

Filtering

:

Parts

in

Purple

HMM

↳

use

set

of

Samples

(particles)

to

represent

belief

State

·

Stores

n

Particles

(n

<

/XI)

·

pred

.:

Sample

state

from

transition

model

4

X

+

P(X

+

+ 1

(X

+

)

<

SRPQ

·

Upd

:

Obs

et

+

1

&

weight

sample

based

on

evid

.

3

W

=

P(t

+

1)X

=

+

1)

-

Sep

②

.

Fitting

data

better

mean

getting

higher

weight

·

Normalize

weight

across

all

particles

·

Resample

:

n

times

,

sample

w/replacement

&

get

new

particles

↳

avoids

tracking

weighted

samples

·

repeat

for

step

·

exact

interference

is

infeasible

when

domain

of

vars

grows

too

large

·

Step

②

similar

to

likelihood

weighting

from

Bayes

net

↳ e

every

iteration

use

trans

.

Model

to

get

a

new

set

of

samples

,

upd.

Way

we

sample

by

mult

.

by

prob

of

Seeing

Curr

.

evid

given

State

of

sample

If

your

sample

ends

up

on

a

state

w/o

Prob

given

evid

.

bic

maybe

You r

sensor

is

never

going

to

give

a

s

can

help

you

elim

that

sample

bIC

I

/

&

:

ci

up

. .

resamp

-

if an evid

Varabo

a

/

>

Weint

:

I

W

=

P(t

+

1)X

=

+

)

=

0

,

which

weight

limes

,

one

time

for

each

time

step

-

*

particle

filtering

is

not

perfectly

particles

:

particles

:

particles

:

↑ a

rficus

if

you

have

enn,

a

Crews

particles

to

exact

prob

-

i

S

↳

better

for

Complex

,

a

bunch

of

State

var

(5

or

6)

and

a

bunch

(3

,

3) W

=

.

4

of

evid

.

Var

.

Discover Study notes of Computer Science University of California - Berkeley

Partial preview of the text

Download CS 188 Final Cheat Sheet and more Study notes Computer Science in PDF only on Docsity!

rand

HMM (hidden

Markov Model)

:

Gibbs

sampling

init

:

·

obs evidencee timestepa

·

States

= comp .

assignments

res

incorporate it

into our

model

to

all

vars S

·

State var : random

var.

2var.

fix evid. Var .,

rand. Set

encoding belief & timestep

can't

non-evid

. Var Sample

: S-P(S/c ,

r ,

w) Sample

: C-p(CIS , r) Sample

: W-P(WIs , r)

getting

·

evidence var

: rand. Var

change

.

gen

. Subsea. States by

new

set

encoding obse

a

timestep

at

same

looping through non-evidvar

:

Markov ASS.

of trans.

models so Represent HMM using initial

time sample

new val. Chosen Var. to &

if we

are trying to figure

out prob

Of

G

,

all

we

we have

a

distribution,

transition model

,

gen.

new sample

need to

know is G

,

knowing

G

,

and Go

bunch of

P(Xi X , ..., Xi - ,

Xi

, ...,

Xn)

=

samples of

Wo. our

sensor

model

P(X : /markov-blankets (v : 1)

Amatrix

Mult

:

(mxn)

(nxP)

trans.

model

ex

: weather forecasting

·

considers downstream

gives

us

a upstream

evid.

must match

Sample wo

> w . - > we - ws

...

Simplies

t

states of w ,

po ① US calc prob &

form

dist of Wo , (F,

b

E

F

use that to F

Calc W.

② read below

and so on

. State var Wi (the weather

(timelaspe

on day

i

Upd)

·

evidence var Fi (the forecast

on day

i)

·

Initial distP(Wol

· Sensor model P(F : Wil

* use.

Transition

model

P(Wi+ , I Wil

for

2 things needed to def HMM

:

basic

4 11

Prob of

P

,

/Po

,

Palp

,

--

like

HMM

(transition model)

above

evidence prob aka

becomes

sensor model

: Prob

Comput

You see some evid.

exp when

more

& time t given there's

some state & time

scomplex

Shown

parts

in green

Forward Alg

:

above for HMM

B(Wi+

)dP(fi

Wit)[P(Wit

WilB)Wil

Wi

T

time elaspe upd

:

adv. Model's

State by one timestep

Step D

B' (Wit) =

P(W :+, (wi)B(wi)

Obs

. upd :

incorporate

new ev as

Step

B(Wi +

& P(fi + 1) Witi) B' (Wit1) M

belief dist of

time i given obs

. evid.

n f

, ...,

fi

Step

② is

C

n

B(Wi)

= P(Wilf ,...,

fi) Similar

to

exact

inf.

B' (Wil

=

P(Wilf

,...,

fi - 1) W/Baye's

Shown

Particle Filtering

Parts

in

Purple HMM

use set

of Samples (particles)

to

represent belief State

· Stores n Particles (n

/XI)

·

pred .: Sample

state from

transition

model

4 X

+ + P(X + + 1 (X + )

SRPQ

·

Upd : Obs et + 1

& weight sample

based

on evid.

W

=

P(t+ + 1)X = + 1) -

Sep

②

. Fitting data

better mean getting

higher weight

· Normalize

weight across

all

particles

·

Resample

: n

times

,

sample

w/replacement

get new

particles

avoids tracking

weighted

samples

·

repeat for next time step

·

exact interference is

infeasible when

domain

of

vars

grows too large

·

Step

②

similar to likelihood weighting

from

Bayes

net

e every iteration use trans. Model

to get a new set of

samples , upd.

Way we sample by

mult

. by prob of

Seeing Curr. evid given State of sample

If your

sample

ends up on a state

w/o

Prob given

evid. bic maybe

Yoursensoris nevergoingto give a

s

can

help you

elim

that sample

bIC

I

&

:

ci

up

..

resamp

if an evid

Varabo

a

/ >

Weint

I

W

=

P(t

1)X = +

)

= 0

,

which

weight limes

, one time for each

time step

* particle filtering is not perfectly

particles : particles

: particles

:

↑ a rficus

if you have

enn,

a

Crews

particles

to

exact prob -

i S

better forComplex S

,

a bunch

of

State var (5 or 6) and a bunch

(3 , 3) W

= .

4

of evid. Var.

Machine Learning

Perception Al

· core idea : we give machines access

to data & they

learn

for themselves

Data

is often

split into training

, val.

a

al

weighs

i

e

dataset can

be divided in to

but

should

y

E

1 ,

13 , do

:

set

of features ,

X (2)

Set of classes

,

poss

(a) classify

the sample using

Training Set

: used to fit model the curr· weights

Validation Set

: used to

tune hyperparam

. class predicted

byetybetheWi

(learning

rate ,

model struct , etc

. )

1

if

activationw(X)

=

Test Set

: used to test the

entire model

S

Wif(x))

·

some

types of

machine learning prob .

y

= classify

=

-1 if activationw (X)

=

WTf(X) < 0

*IRegression

: try to est. Some

numerical Val from data

(b) Compare the pred. label y to the

& bunch

of lines on

Plot ,

find

truelabe

nothinga upd

line of best fit

ex

:

feature

of houses &

finds

line of best fit to find what

Your

weights : w

w + y

f(x)

price of each house should be 3) If you went through every

* Classification

:

try

to classify

training sample wo having

to upd.

data into discrete

classes

your weights (all samples pred

.

Pixels are features in

an corr. ) , then

terminate. Else ,

repeat

img. Of

a # &

classes

,

what Step

we're trying to Pred. (0-9) *

weights

def .

the line drawn

Clustering

: try to group Similar eX

: 2D data

data into clusters

naturally >0 Perception

is pos

·

Types

of

learning :

< perception

is neg.

3 supervised

:

training data

x 3

40 - 1 = - 1

has labels ,

e. g. classification & if the

res .

happens to be incour

ex

:

or

digits)

then we

upd. Our weight by

unsupervised

:

training data adding the true class *

feat.

has no

labels ,

e. g.

Clustering

4[-i]s

1[i]

& don't

know exactly what

You're looking for, but

Want to see

if there is

Neural

Networks :

naturally see

what Struct

appear

·

motivation :

most

prob

are non-linear

type of

·

Common neural

network

Naive Bayes

classification

Class is the multilayer

· Goal

: Create a

model that can pred

Perception a lg.

a label

y given

features ,

where we instead of

having binary

assume all features are ind.

affected - 1 ,

I choices

after each

by label

Pred .

label y

>

Y feat.

node ,

we use a non-linear

ex

: Spam

filter

L activation

func.

·

y is in Espam , Ham

F. F ... Fr &

NodesStill

dot prod .

their

·

Fi in Eo , 13 is

whether word

i inputs w/ their own weight

appears in the

email rectors

· Label email based on

the higher

of these use gradient descent to

two prob:

Upd. all weights-

P(y

= nam/F ,

= f

,.. ., Fn

=

fn) backpropagation

X

G

P(y = Spam /Fi

= f ,... ,

Fn

=

fn) *

When

applying

we don't

assume that there is any

type

non-linearity , 00

of relationship blu

words allowing us to

We can

pred.

get dist

for y by multiplying proby given each more

complex classes X

of the

feature var.

=>

if we

come up

we a neural

· Generalized

: network

that is complex

spred (f

, ... ,

fn)

= argmaxP(y

=

y(F

= f , ...

Fu

=

fn)

enough ,

we can come up w/

a line that inc.

all circles

argmaxF

and no crosses

create multles

a

Class

we're

of

some

1

Planning

Xi ~

to classify

I

↓ Maximum

Likelihood EstimationLimes

each t a

a

· How to est. CPTs since we don't actually

feat.

know them

(input) ; do X 3

for each node ↑

parameter

est

. W/MLE inputs/feat . I weight

· find Prob. (CPT

val) O

= PC.

) Such

that we

maximize the likelihood of observing ist layer (made of a bunch

of nodes

Our observation ,

P

(observations (0)

eage is

network is still linear A CPT blu we're

· Ans. is actually fairly intuitive. Y S mult. # by features

& fle ,

given data (F, Y)

P(y

=

y)

=

MLE(OI(F

, y))

F. Fz ... FN

=

(H

ex. w(y

=

y) ( +o + al

eX)

Sigmond Func.

PFMEexw(y=

aka activation

·

look at

a ton of emails

that

have

already been classified for

us ,

& find each edge

(CPT),

What is

prob given that

email

isspam,ithassome word

ina

word in

it

perception

Binary

Perception

·idea:

linearlyseparates datain

a

(def

. by a set of weights

· if data

is linearly separable ,

the

alg. Will perfectly classify the data

· to find boundaries -w

WTf(x)

= 0

that don't have

to cross

the origin ,

incorp. a "bias" feat.

that always has val

of

Gi

· creating perception model

:

wehave a bunchofdataa

features on some O

kind F

of graph a

. Where

each

feat

. is either pos. Or Olinear

neg ., so

given just

feat. We

want to *

perception figures

out what line

build

model that can

you can draw to

take these feat.

Perfectly

div. 2

give usanest for

beclasses

one

train

>

valid e

test Logistics Reg. Conti inference by enum.

·

idea

: instea of simply using

WiX (binary Perception) & LPCTPC, S)

Classifying

,

applysigmoid

func.

on wiX

Prior

sampling

·

results always

blw 0

and I

:

can pred.

Prob .

Unlike

binary

4

Al

:

perceptron

rand

.

gen. Samples

·ifwecan

getasetof weights

samples discard sampstate

every

inconsistent wh

logistic

regression form can comb, of evid.

give us

a good

classification ass Calc. Prob.

for a set of data pts

.

· May have to compute

· How do we compute

Use

large # of samples

radient descent for

unlikely scenarios

·

multiclass logistic

· eX

: gen. the following

regression ,usingsoftmaxfuna

Samples

>ComputeP(w)

C ,

iS ,

r ,

w

Gradient Ascent

!

Descent

:

S ,

Lr ,

w

·

given acc. measure, C P(W

=

w)

in order

to come up

S

L

S

RP(w

=

w)

=

Wbetternetworkaa

or

dec. our loss

wh

· Goal

: Want to find Param .

that maximizes obje

· For i

= 1 ,

2 , ...,

n (in topolog

fund.

or minimizes

loss

order (

func.

Sample X : from

·

if

closed-form

formula for P(X : /

Parents (Xi))

global

Optimum does

not exist ,

· Return

(X ,, Xz

,

...,

Xn)

can use gradient ascent/

descent

· Observation

: gradient is dir.

of steepest inc ... by repeatedly Rejection

Samp.

following the gradient ,

we can

X

idea

: we can

imm. Stop

chase

maxima/minima

gen.

Samples as soon

as

· Gradient ascent

: will not

they

become inconst. w/

rand.

Initialize w accept our

evid.

While

w not converged do

:

any still

discard most

w = w

GVwf(w) sample

samples but takes

end

that has

less time

to gen.

·

Gradient descent

:

inconsist.

those

& rand. Initialize w evid ;

·

Al9 :

While w not

converged do :

same for input

: evidence

w = w

GVwf

(w)

likelihood

e, ..., 2

end

& Gibbs

For i,from

Traing Neural

Networks

Parents

leat

· set

the weights to some

initial

Val.

reject

: return

& no sample

is

· input training data , run

forward pass

to generate val

gen.

in this

at all modes ,

calc. loss func.

cycle

On final output return (X 1, Xz, ..., Xn)

·

run

backwards pass-calc

. the we want P(CIr , w -

gradient of loss

w/ respect

to can

throw

away

each of weights

samples

w/ ir orTw

·

use gradient

descent

to upd.

all weights

C ,

iS ,

r ,

w

· repeat w/ more

data

i

Bayes Net

:

S ,

r ,

w

·

acyclic

graph ·

use a

dirrob

.

table to

L

C

S

encode relationships blw var. S R

Calc .

Prob. of an assignment

P(X , Xz , ...,

Xn)

P(X : /Parents (Xi)

& L

W S eX : Alarm (A) goes off if there's a

:

burglary (B) or

earthquake (E) ,

res .

in John (5) or Mary (M)

calling

B

E Likelihood weighting

P(2)

logistic

fun

· CPTs

:PCBI

,

PLE

· idea : Rejection sampling

· takes

in some

input from

o

may reject a

lot of

to 0 and maps from

0 to P(5IA)

,

PCMIAlyv

Y

M

samples

if evid. is

Uses : · P(

b , - 2 ,

a ,

j ,

m)

= Unlikely

> let's fix

the

if used as activation func. p(

b)

. P(

e)

. P(

a) - b ,

e)

.

P(+ j) + a)

evid.

Var. While we

which you can apply imm. P(

mi

a) sample

after a linear layer weight

sample by

don't

for get

bis ,Sa

& pred. the class

of something prob.

of evid

Ind in Bayes

Net

:

given

Parents

include bias

val

·

ex:

if ourneural

networkis

as

1) each

node is cond , ind.

Of all samples are

used!

to o , sigmoid

func will map all its ancestor

nodes (non-

· downstream

var. inf.

to a val from O to 1 descendants) in

the graph

, by upstream

var ,

but

if given-o

, it will given all of its parents upstream

var.

likely be class of O

each

node is cond. ind.

Of

want

to see all

3 + 1000 ,

it will likely

be all other var. given

its evid .,

not just

class

of 1 b/C very

big Markov blanket (consisting that

which inf.

anything

in middle like of its

parents ,

children ,

downstream

vars.

O will be interpreted as Children's other Parents)

·

Al9 :

Probability like 0. 5 saying

take

al

at

input

: evidence

i t 's 50/50 e, ..., 2

Variable

Elim

i inv. the

var. 3

Forivid.

Logistic

Regression :

1

Clim.

hidden var.

X by

you're

hw(X)

=

(1 I

e-wax)

joining (multiplying

trying

var.

uses smaller +g + )

all

factors inv. X

to elim -Xi

= Obs.

we want to fine

tune our

factors

summing

out 4 value ,

for :

weights , such

that when we

than Factors

: unnorm.

Prob

Set W

= W

pass in W

X ,

which is a single infer,

proportional

to actual

P(X

Parents (Xi)

layer

in NN ,

into sigmoid func

. by

enum. Prob , but doesn't sum

to

return (X,

, Xz, ...,

Xn)

,

w

we get

val from o to

< P(

+/

ike a

prob

dist-

shoulda

similar to a neural

network w/

just I layer

more unnec. terms outside

CS 188 Final Cheat Sheet, Study notes of Computer Science

Related documents

Partial preview of the text

Download CS 188 Final Cheat Sheet and more Study notes Computer Science in PDF only on Docsity!

HMM (hidden

Markov Model)

Gibbs

sampling

obs evidencee timestepa

States

assignments

incorporate it

into our

model

State var : random

2var.

can't

getting

evidence var

: rand. Var

change

encoding obse

timestep

at

same

looping through non-evidvar

Markov ASS.

models so Represent HMM using initial

if we

are trying to figure

out prob

Of

G

transition model

knowing

G

sensor

model

P(X : /markov-blankets (v : 1)

(nxP)

ex

considers downstream

gives

a upstream

must match

t

F

② read below

(timelaspe

on day

on day

Initial distP(Wol

* use.

model

for

2 things needed to def HMM

basic

Prob of

P

Palp

HMM

becomes

sensor model

: Prob

some state & time

Shown

Forward Alg

above for HMM

WilB)Wil

adv. Model's

State by one timestep

P(W :+, (wi)B(wi)

Obs

. upd :

incorporate

new ev as

time i given obs

. evid.

Step

C