Random Sampling: Estimating Probabilities and Expected Values - Prof. L. Dawson, Study notes of Mathematics

An introduction to random sampling and bootstrapping, two statistical methods used to estimate probabilities and expected values of a random variable. The concept of random samples, the importance of random sampling, and the use of histograms to approximate probability mass functions (p.m.f.) for discrete random variables and probability density functions (p.d.f.) for continuous random variables. The document also introduces the concept of bootstrapping and demonstrates how to use excel functions like vlookup and randbetween to generate bootstrap random samples. The objectives of the document include understanding the definitions of random sampling and bootstrapping, using random sampling to estimate probabilities and expected values, and using excel functions to generate bootstrap random samples.

Typology: Study notes

Pre 2010

Uploaded on 08/31/2009

koofers-user-8fh
koofers-user-8fh 🇺🇸

9 documents

1 / 29

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Random SamplingRandom Sampling
Random SamplingRandom Sampling
Math 115A
Spring 2008
Dawson
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d

Partial preview of the text

Download Random Sampling: Estimating Probabilities and Expected Values - Prof. L. Dawson and more Study notes Mathematics in PDF only on Docsity!

Random SamplingRandom SamplingRandom SamplingRandom Sampling

Math 115ASpring 2008Dawson

Why random samples?Why random samples?^ ^ We would like to have probability informationfor our random variable

X ^ Often we don’t know the distribution of

X^ or even its expected value  To estimate the expected value,

For^ fweX^ X^ ^ To estimate the expected value,

For^ fweX^ X^ use^ random sampling

. ◦^ We may approximate the distribution if we can takea large enough sample of

n^ independent observations of^ X ◦ We call these^ n^ independent observations of

X, {x, x, x, …, x}, a random sample of size n^123 n

.

ApproximatingApproximating

p.m.f.p.m.f.ss ^ To approximate a probability mass function:^ ◦^ Group the data according to values of

X ◦^ Create a histogram for these groupings ◦^ Calculate the relative frequencies

Sample Data 0.2000.1800.1600.1400.1200.1000.0800.060 Relative Frequency 0.0400.020 0.000 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14^ Stoppages

Discrete Random VariableDiscrete Random Variable^ ^ Example:^ Let^ X

be the denomination of a chipselected at random from a box that has 4 $1chips, 3 $5 chips, 2 $25 chips, and 1 $100chip. There are twenty observations of

X given below. Use the sample to plot an approximation of the

p.m.f.^ of^ X^ and to approximation of the

p.m.f.^ of^ X^ and to estimate^ E(X).^ (note: changed data from in class to match next slide)^ $25^ $^

$25^ $1^ $5 $25 $1 $1^ $1^ $5 $5 $1 $25^ $1^ $100$1 $1 $25^ $5^ $

ApproximatingApproximating

p.d.f.p.d.f.ss ^ To approximate a probability densityfunction:^ ◦^ Group the data according to the range ofvalues of^ X^ ◦^ Create a histogram for these groupings◦^ Create a histogram for these groupings^ ◦^ Calculate the relative frequencies^ ◦^ Divide the relative frequencies by the bin widthto calculate the

p.d.f.^ heights ◦ Create a line plot connecting the midpoints ofeach of the columns

ApproximatingApproximating

p.d.f.p.d.f.ss^ 0. Rel.Freq^ 0.

25. 0 125.^0 = 2 2125.^0 ×=× wl^25.^0 =

(bins are labeled by their midpoint)^ 0. (^1 3 5 7 9 11 13) In a plot of relative frequencies,the probability is given by theheight of the rectangle. But thatshould not be true for a^ p.d.f.^ – itmust be the area that is equal tothe probability.

(^1 3 5 7 9 11 13) By dividing the relativefrequency by the bin width, weare ensuring that the area ofeach rectangle (length timeswidth) is equal to the relativefrequency.

Continuous Random VariablesContinuous Random Variables^ Example:^ A bus arrives every 10 minutes.

Let^ W be the waiting time (in minutes) until the nextbus.^ Fifty observations of

W^ are given below. Use the sample to plot an approximation of the p.d.f.^ of^ W^ and to estimate

E(X). p.d.f.^ of^ W^ and to estimate

E(X). 9.7^ 0.8^ 6.5^ 5.5^ 9.

3.4^ 7.5^ 4.8^ 0.2^ 8.

7.0^ 2.0^ 0.6^ 2.0^ 3.

2.4^ 8.0^ 6.2^ 6.1^ 5.

0.3^ 4.5^ 9.4^ 9.2^ 3.

0.9^ 5.9^ 7.8^ 5.6^ 1.

2.3^ 1.3^ 9.8^ 7.2^ 6.

6.6^ 5.1^ 3.7^ 0.5^ 5.

8.2^ 3.5^ 3.3^ 3.1^ 7.

3.8^ 6.4^ 4.4^ 8.7^ 9.

Continuous Random VariablesContinuous Random Variables^ Example (cont’d.):^ Bin^ Frequency

BinRelativeMidpointFrequency^ (frequencydivided by^ total)

Height^ (relativefrequencydivided by binwidth) total) 2 9

1 0.^

3 0.^

5 0.^

7 0.^

9 0.^

Sum^50

1.00^.

BootstrappingBootstrapping^ ^ Def: Bootstrapping

is sampling with replacement fromone sample to generate new samples, typically forthe purpose of estimating probabilities andparameters. ^ Why?^ Why?^ ◦^ We are not very likely to have a large enoughsample set, due to costs and time.^ ◦^ We can simulate a larger data set by sampling fromour original data, this is bootstrapping

. ^ We will use the

Excel^ functions, VLOOKUPand RANDBETWEEN to help bootstrapping.

VLOOKUPVLOOKUP^ ^ VLOOKUP is found under the “Formulas” Tab, “Lookup &Reference”

RANDBETWEENRANDBETWEEN^ ^ RANDBETWEEN returns a random integer between the twonumbers you input^ ^ RANDBETWEEN is found under the “Formulas” Tab, “Math &Trig”

RANDBETWEENRANDBETWEEN

^ Bottom:^ thelower integer inyour intervalwhich you are^ searchingsearching ^ Top:^ the upperinteger in yourinterval whichyou are searching

ExampleExample^ ^ The times at which 270 calls arrived at a company’sswitchboard are shown in the sheet

Log^ of^ Phone Log.xls.  Let^ T^ be the random variable that gives the time (inminutes) until the arrival of the first call andbetween the arrivals of successive calls.  The 270 times in the sheet

Log^ determine 270 time intervals, which may be assumed to represent independent^ observations of

T. ^ Let^ L^ be the random variable that gives the arrivaltime of the last call in a run of 15 calls, starting at9 a.m.

Example (cont’d.)Example (cont’d.)^ ^ Use^ VLOOKUP

and^ RANDBETWEEN

to generate 20 observations of

L. ^ In the sheet^ Random

of^ My Phone Log.xls

, RANDBETWEEN^

is used to randomly select an integer between 1 and 270.  In the sheet^ Times

of^ My Phone Log.xls

, VLOOKUP^ is used to find the correspondingtime between calls.  In the sheet^ Start Times

, the time between calls is added to the beginning of the hour (orthe arrival time of the previous call).