Machine Learning 10-601, Lecture notes of Artificial Intelligence

A set of notes from a lecture on Machine Learning given by Tom M. Mitchell at Carnegie Mellon University. The lecture covers topics such as Bishop chapter 8, graphical models, Bayes Nets, inference, learning, EM Midterm, and belief propagation. The lecture also discusses the use of Monte Carlo methods and variational methods for tractable approximate solutions. an example of generating a sample from a joint distribution and estimating marginals. The lecture also covers the EM algorithm for learning from partly observed data.

Typology: Lecture notes

2021/2022

Uploaded on 05/11/2023

rubytuesday
rubytuesday 🇺🇸

4.4

(38)

273 documents

1 / 34

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Machine Learning 10-601
Tom M. Mitchell
Machine Learning Department
Carnegie Mellon University
February 25, 2015
Today:
Graphical models
Bayes Nets:
Inference
Learning
EM
Readings:
Bishop chapter 8
Mitchell chapter 6
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22

Partial preview of the text

Download Machine Learning 10-601 and more Lecture notes Artificial Intelligence in PDF only on Docsity!

Machine Learning 10-

Tom M. Mitchell Machine Learning Department Carnegie Mellon University February 25, 2015 Today:

  • Graphical models
  • Bayes Nets:
    • Inference
    • Learning
    • EM Readings:
      • Bishop chapter 8
      • Mitchell chapter 6

Midterm

  • In class on Monday, March 2
  • Closed book
  • You may bring a 8.5x11 “cheat sheet” of notes
  • Covers all material through today
  • Be sure to come on time. We’ll start precisely

at 12 noon

What You Should Know

  • Bayes nets are convenient representation for encoding dependencies / conditional independence
  • BN = Graph plus parameters of CPD’s
    • Defines joint distribution over variables
    • Can calculate everything else from that
    • Though inference may be intractable
  • Reading conditional independence relations from the graph - Each node is cond indep of non-descendents, given only its parents - X and Y are conditionally independent given Z if Z D-separates every path connecting X to Y - Marginal independence : special case where Z={}

Inference in Bayes Nets

  • In general, intractable (NP-complete)
  • For certain cases, tractable
    • Assigning probability to fully observed set of variables
    • Or if just one variable unobserved
    • Or for singly connected graphs (ie., no undirected loops)
      • Belief propagation
  • Sometimes use Monte Carlo methods
    • Generate many samples according to the Bayes Net distribution, then count up the results
  • Variational methods for tractable approximate solutions

Prob. of joint assignment: easy

  • Suppose we are interested in joint assignment <F=f,A=a,S=s,H=h,N=n> What is P(f,a,s,h,n)? let’s use p(a,b) as shorthand for p(A=a, B=b)

Prob. of marginals: not so easy

  • How do we calculate P(N=n)? let’s use p(a,b) as shorthand for p(A=a, B=b)

Generating a sample from joint distribution: easy How can we generate random samples drawn according to P(F,A,S,H,N)? Hint: random sample of F according to P(F=1) = θ F=

  • draw a value of r uniformly from [0,1]
  • if r<θ then output F=1, else F= Solution:
  • draw a random value f for F, using its CPD
  • then draw values for A, for S|A,F, for H|S, for N|S

Generating a sample from joint distribution: easy Note we can estimate marginals like P(N=n) by generating many samples from joint distribution, then count the fraction of samples for which N=n Similarly, for anything else we care about P(F=1|H=1, N=0) à weak but general method for estimating any probability term…

Learning of Bayes Nets

  • Four categories of learning problems
    • Graph structure may be known/unknown
    • Variable values may be fully observed / partly unobserved
  • Easy case: learn parameters for graph structure is known , and data is fully observed
  • Interesting case: graph known , data partly known
  • Gruesome case: graph structure unknown , data partly unobserved

Learning CPTs from Fully Observed Data Flu (^) Allergy Sinus Headache Nose kth^ training example δ(x) = 1 if x=true, = 0 if x=false

  • Example: Consider learning the parameter
  • Max Likelihood Estimate is
  • Remember why? let’s use p(a,b) as shorthand for p(A=a, B=b)

Estimate from partly observed data

  • What if FAHN observed, but not S?
  • Can’t calculate MLE
  • Let X be all observed variable values (over all examples)
  • Let Z be all unobserved variable values
  • Can’t calculate MLE: Flu (^) Allergy Sinus Headache Nose
  • WHAT TO DO?

Estimate from partly observed data

  • What if FAHN observed, but not S?
  • Can’t calculate MLE
  • Let X be all observed variable values (over all examples)
  • Let Z be all unobserved variable values
  • Can’t calculate MLE: Flu (^) Allergy Sinus Headache Nose
  • EM seeks* to estimate:
    • EM guaranteed to find local maximum

EM Algorithm - Informally

EM is a general procedure for learning from partly observed data Given observed variables X, unobserved Z (X={F,A,H,N}, Z={S}) Begin with arbitrary choice for parameters θ Iterate until convergence:

  • E Step: estimate the values of unobserved Z, using θ
  • M Step: use observed values plus E-step estimates to derive a better θ Guaranteed to find local maximum. Each iteration increases

EM Algorithm - Precisely

EM is a general procedure for learning from partly observed data Given observed variables X, unobserved Z (X={F,A,H,N}, Z={S}) Define Iterate until convergence:

  • E Step: Use X and current θ to calculate P(Z|X,θ)
  • M Step: Replace current θ by Guaranteed to find local maximum. Each iteration increases