Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Bayesian Approach to Model Fitting in Neural Networks and Machine Learning, Slides of Computer Networks

An introduction to the bayesian framework for model fitting in neural networks and machine learning. The bayesian approach assumes a prior distribution for all parameters and combines it with the likelihood term to obtain a posterior distribution. The example of coin tossing is used to illustrate the concept. The document also discusses the advantages of using a distribution over parameter values and the use of bayes theorem and maximum likelihood learning.

Typology: Slides

2012/2013

Uploaded on 04/27/2013

sajid 🇮🇳

4.6

(7)

128 documents

1 / 13

This page cannot be seen from the preview

Don't miss anything!

Introduction to Neural Networks

and Machine Learning

Lecture 10: The Bayesian way to fit models

Docsity.com

Partial preview of the text

Download Bayesian Approach to Model Fitting in Neural Networks and Machine Learning and more Slides Computer Networks in PDF only on Docsity!

Introduction to Neural Networks

and Machine Learning

Lecture 10: The Bayesian way to fit models

The Bayesian framework

The Bayesian framework assumes that we always

have a prior distribution for everything.

The prior may be very vague.
When we see some data, we combine our prior

distribution with a likelihood term to get a posterior

distribution.

The likelihood term takes into account how probable the

observed data is given the parameters of the model.

It favors parameter settings that make the data likely.
It fights the prior
With enough data the likelihood terms always win.

Some problems with picking the parameters that are most likely to generate the data

What if we only tossed the coin once and we

got 1 head?

Is p=1 a sensible answer?
- Surely p=0.5 is a much better answer.
Is it reasonable to give a single answer?
If we don’t have much data, we are unsure about

p.

Our computations of probabilities will work much

better if we take this uncertainty into account.

Using a distribution over parameter values

Start with a prior distribution over p. In this case we used a uniform distribution.
Multiply the prior probability of each parameter value by the probability of observing a head given that value.
Then scale up all of the probability densities so that their integral comes to 1. This gives the posterior distribution.

probability density

area=

0 1

probability density

Lets do it another 98 times

After 53 heads and 47 tails we get a very sensible posterior distribution that has its peak at 0. (assuming a uniform prior).

probability density

area=

0 1

Bayes Theorem

∫

p W p D W

p D

p W p D W p W D

p D p W D p D W p W p D W

Prior probability of weight vector W

Posterior probability of weight vector W given training data D

Probability of observed data given W

joint probability

conditional probability

Why we maximize sums of log probs

p ( D | W ) p ( d | W ) c

= ∏ c

We want to maximize the product of the probabilities of the outputs on all the different training cases - Assume the output errors on different training cases, c, are independent.
Because the log function is monotonic, it does not change where the maxima are. So we can maximize sums of log probabilities

log p ( D | W ) log p ( d | W ) c

= ∑ c

A even cheaper trick

Suppose we completely ignore the prior over

weight vectors

This is equivalent to giving all possible weight vectors the same prior probability density.
Then all we have to do is to maximize:
This is called maximum likelihood learning. It is

very widely used for fitting models in statistics.

log p ( D | W ) log p ( D | W ) c

= ∑ c

Supervised Maximum Likelihood Learning

Finding a set of weights, W, that minimizes the

squared errors is exactly the same as finding a W that maximizes the log probability that the model would produce the desired outputs on all the training cases.

Bayesian Approach to Model Fitting in Neural Networks and Machine Learning, Slides of Computer Networks

Related documents

Partial preview of the text

Download Bayesian Approach to Model Fitting in Neural Networks and Machine Learning and more Slides Computer Networks in PDF only on Docsity!

Introduction to Neural Networks

and Machine Learning

Lecture 10: The Bayesian way to fit models

The Bayesian framework

distribution with a likelihood term to get a posterior

distribution.

observed data is given the parameters of the model.

p.

better if we take this uncertainty into account.

noise is added to the model’s actual output.

because we are assuming it’s the same in all cases.

So it just scales the squared error.