Pattern Recognition Problem Set 3: Estimating Probabilities and Decision Making, Assignments of Computer Science

Problem set 3 for cs 4803b/8803b: pattern recognition course. The set includes four problems related to estimating probabilities using the multinomial distribution and maximum likelihood estimation, and decision making based on normally distributed figure of merit from two different tests. Zaphod beeblebrox, the owner of an ion drive delta boat manufacturing business, is seeking help to minimize mistakes and operating costs using these tests.

Typology: Assignments

Pre 2010

Uploaded on 09/17/2009

koofers-user-alb
koofers-user-alb 🇺🇸

8 documents

1 / 3

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
CS 4803B/8803B: Pattern Recognition
Problem Set 3
Date: Feb 22, 2001 Due: March 1, 2001
1. It is suspected that a certain four-sided die is unfair. By rolling the die many times, we
wish to estimate the probability of each of the four outcomes. Let {Pi},i=1,...,4
denote the unknown probabilities.
Suppose the die is rolled Ntimes. If the probabilities to be estimated were known, then
the probability of observing exactly y11’s, y22’s, ...,andy
44’s in the Nrolls would be:
P(y1,...,y
4;P
1
,...,P
4
)= N!
y
1
!y
2
!···y
4
!Py
1
1Py
2
2···Py
4
4.
This distribution is called the multinomial distribution.
Suppose exactly yioccurrences of iare observed in the Nrolls, where i=1,...,4. Find
the maximum likelihood estimates {ˆ
Pi}for the probabilities {Pi},i=1,...,4interms
of the yi’s.
2. DHS Problem 3.7
3. DHS Problem 3.15
4. (Any similiarity in this problem with Douglas Adams’ characters is purely the result of
accidental exposure to the Infinite Improbability Drive.)
Zaphod Beeblebrox owns an ion drive delta boat (henceforth called iddb) manufactur-
ing business in the desert of Damogran. Damogran is so inconveniently arranged with
vast stretches of water separating large desert islands that the only convenient surface
transport is the iddb. No wonder, Zaphod owns half the Universe!
These ion drives are so sophisticated that they cost Zaphod 19000 Damogran Units (also
called DamU) apiece to make the boat. Furthermore, extensive field tests have shown that
20% of the the iddbs manufactured are in fact defective in a way not easy to determine
quickly; these defective units will fail before the warranty expires. Zaphod wants to devise
automated tests that will allow him to predict whether a given iddb will fail. So, he hires
a group of eminent scientists (including Ford Prefect and Marvin) and they come up with
two test procedures.
The Sub-etha Senso-matic NDIR test gives a normally distributed figure of merit (these
people never heard of Gauss and hence called it the normal distribution). The NDIR test
gives figure of merit with a mean of 8 and a standard deviation of 2 for good iddbs and
a figure with a mean of 0 and a standard deviation of 3 for bad iddbs. The Sub-etha
Meso-morphic NDCR test also gives a normally distributed figure of merit. The NDCR
test gives a number with a mean of 10 and an standard deviation of 3 for good iddbs and
a number with a mean of 0 and standard deviation of 4 for bad iddbs.
1
pf3

Partial preview of the text

Download Pattern Recognition Problem Set 3: Estimating Probabilities and Decision Making and more Assignments Computer Science in PDF only on Docsity!

CS 4803B/8803B: Pattern Recognition

Problem Set 3 Date: Feb 22, 2001 Due: March 1, 2001

  1. It is suspected that a certain four-sided die is unfair. By rolling the die many times, we wish to estimate the probability of each of the four outcomes. Let {Pi}, i = 1,... , 4 denote the unknown probabilities. Suppose the die is rolled N times. If the probabilities to be estimated were known, then the probability of observing exactly y 1 1’s, y 2 2’s,.. ., and y 4 4’s in the N rolls would be:

P (y 1 ,... , y 4 ; P 1 ,... , P 4 ) =

N!

y 1 !y 2! · · · y 4!

P 1 y 1 P 2 y 2 · · · P 4 y 4.

This distribution is called the multinomial distribution. Suppose exactly yi occurrences of i are observed in the N rolls, where i = 1,... , 4. Find the maximum likelihood estimates { Pˆi} for the probabilities {Pi}, i = 1,... , 4 in terms of the yi’s.

  1. DHS Problem 3.
  2. DHS Problem 3.
  3. (Any similiarity in this problem with Douglas Adams’ characters is purely the result of accidental exposure to the Infinite Improbability Drive.) Zaphod Beeblebrox owns an ion drive delta boat (henceforth called iddb) manufactur- ing business in the desert of Damogran. Damogran is so inconveniently arranged with vast stretches of water separating large desert islands that the only convenient surface transport is the iddb. No wonder, Zaphod owns half the Universe! These ion drives are so sophisticated that they cost Zaphod 19000 Damogran Units (also called DamU) apiece to make the boat. Furthermore, extensive field tests have shown that 20% of the the iddbs manufactured are in fact defective in a way not easy to determine quickly; these defective units will fail before the warranty expires. Zaphod wants to devise automated tests that will allow him to predict whether a given iddb will fail. So, he hires a group of eminent scientists (including Ford Prefect and Marvin) and they come up with two test procedures. The Sub-etha Senso-matic NDIR test gives a normally distributed figure of merit (these people never heard of Gauss and hence called it the normal distribution). The NDIR test gives figure of merit with a mean of 8 and a standard deviation of 2 for good iddbs and a figure with a mean of 0 and a standard deviation of 3 for bad iddbs. The Sub-etha Meso-morphic NDCR test also gives a normally distributed figure of merit. The NDCR test gives a number with a mean of 10 and an standard deviation of 3 for good iddbs and a number with a mean of 0 and standard deviation of 4 for bad iddbs.

(a) Zaphod is confused because these tests do not seem to give a steady answer and he is afraid that he will make mistakes. So, he calls you to help him make the lowest number of mistakes. You look at it and tell him that you’ll design a simple software program that looks at these numbers and tells whether the iddb should be classified as good or bad. Explain to Zaphod and make neat plots so that he can understand how the decision is being made. (b) Zaphod finds out that it costs him 25000DUs in liability suits for selling bad iddbs and it costs him 5000DUs to do corrective work on the iddb that was classified by the decision making software as bad. He wants to minimize the operating cost. So, he calls you once again. Explain your new decision making process to him. Once again use plots and graphs to clearly indicate the decision boundaries to Zaphod. (c) The government of Damogran (which sends the coast guard to rescue the people from damaged iddbs) wants to reduce the cost of its rescue operations and hence is pondering on new legislation to make the manufacturers more accountable. Zaphod finds that the new pending legislation in the government would make it necessary for him to sell no more than 0.01% bad iddbs. Make the necessary changes in your decision making software to handle the requirements of this new legislation. For this problem you may want to assume that Zaphod will do either the NDCR or the NDIR test but not both.

For the following part, either both the tests (NDCR and NDIR) are to be performed in parallel or only one of them is performed. Do not perform sequential tests.

(d) Zaphod discovers that it costs him 1000DUs to run the NDIR test per iddb and 500DUs to run the NDCR per iddb. Assume the costs of repair and field failure from part (b). If he wants to minimize his entire cost of operation, what should he do:

  • Only the NDIR test?
  • Only the NDCR test?
  • Both the tests? Justify your answer to Zaphod with plots and graphs and numbers.
  1. We’ll do this one in class. But, for extra credit, you might want to try it...

We return to the fishing scenario in Problem Set 2, Problem 7. By now you’ve constructed the optimal Bayesian classifier to decide if a fish is carp or bass, based on the uncertain class information you received from the two experts. You draw two fish from the pond which weigh 1.5 pounds and 5 pounds, respectively. If you applied your classifier inde- pendently to each fish, they would both be labeled bass. However, this may not seem right, and indeed it isn’t (remember: only one expert is right about the pond). In this problem we will figure out what the optimal classification of the two fish should be.

(a) The weight of a fish in the pond can be considered a random variable. If I draw two fish, the weights correspond to two random variables. Are these two variables