Statistics and Probability: Mean, Standard Deviation, and Normal Distribution - Prof. I.D., Exams of Statistics

Various concepts in statistics and probability, including calculating mean and standard deviation from given data sets, understanding qualitative and quantitative variables, exploring relationships between variables using graphs, and the normal distribution. It also includes exercises on finding probabilities for continuous random variables and combining random variables.

Typology: Exams

Pre 2010

Uploaded on 08/26/2009

koofers-user-0y8
koofers-user-0y8 🇺🇸

9 documents

1 / 16

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
UCLA Stat13 Statistical Methods - Midterm Review
Chapter 1 – What is Statistics?
In early 1997 Whitcoulls bookstores conducted a nation-wide survey. Whitcoulls’ shoppers were invited
to fill in a short survey. They were asked to list, in order, their three favourite books. Using the results,
Whitcoulls published the list “New Zealand’s 100 Favourite Books”. The top twenty books from this list
are given below.
No. Book No. Book
1 The Lord of the Rings 11 April Fool’s Day
2 The Power of One 12 Complete Winnie the Pooh
3 Pride and Prejudice 13 The Runaway
4 The Bible 14 Clan of the Cave Bear
5 Wild Swans 15 Long Walk to Freedom
6 The Horse Whisperer 16 Sleepers
7 Cross Stitch 17 Jane Eyre
8 Goosebumps 18 Gone With the Wind
9 The Bone People 19 Wuthering Heights
10 The Hobbit 20 The English Patient
1. Based on the information stated above, the two most obvious types of non-sampling errors that may
be present in this survey are:
(1) self-selection bias and interviewer effects.
(2) random errors and non-response bias.
(3) self-selection bias and question effects.
(4) selection bias and self-selection bias.
(5) selection bias and chance errors.
2. A student wants to select and read any 4 books from this top twenty books list. Choose a simple
random sample of 4 books for this student. To select the sample you must use the thirty random
digits given below. Start at the beginning of the line of random digits given below.
87105 75663 05103 47781 00910 21112
The four books in the random sample are:
(1) Wild Swans, April Fool’s Day, The Hobbit, Goosebumps.
(2) The Hobbit, Wild Swans, The Hobbit, The Power of One.
(3) The Hobbit, Wild Swans, The Power of One, April Fool’s Day.
(4) Goosebumps, Cross Stitch, The Hobbit, Wild Swans.
(5) Goosebumps, Cross Stitch, The Lord of the Rings, Wild Swans.
1-2
3. Consider the following three studies:
Study 1: An animal researcher was interested in cats’ abilities to survive surprisingly high falls if
they had time to twist round and prepare for the impact. Vets in New York City recorded
incidents of cats falling out of apartment windows. The data was divided into three
groups: cats that fell from one or two storeys above the ground; cats that fell from three to
five storeys above the ground and cats that fell from six or more storeys above the ground.
The proportion of cats that survived in each group was then compared.
Study 2: A random sample of 100 students is asked to keep a diary in which they record their
clothing expenditures for the next three months. The expenditures of males and females
are then compared.
Study 3: A sample of 50 shoppers at an appliance store is split into two groups. One group is shown
a television commercial for a new range of appliances that has been filmed in the same
style as previous television commercials for the store. The second group is shown a
television commercial for the same new range of appliances that has been filmed in a
totally new style. An hour after viewing the commercial, each of the shoppers was asked
what they could recall about the new range of appliances and a score based on their
recollection was recorded. The recall scores were then compared for the two groups.
(i) For each study, describe what “treatment” is being compared.
Study 1:
Study 2:
Study 3:
(ii) Which of the three studies would be described as experiments and which would be described as
observational studies?
Study 1:
Study 2:
Study 3:
(iii) For the studies that are observational, briefly explain why an experiment could not be carried
out instead.
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff

Partial preview of the text

Download Statistics and Probability: Mean, Standard Deviation, and Normal Distribution - Prof. I.D. and more Exams Statistics in PDF only on Docsity!

UCLA Stat13 Statistical Methods - Midterm Review

Chapter 1 – What is Statistics?

In early 1997 Whitcoulls

bookstores conducted a nation-wide survey. Whitcoulls’ shoppe

rs were invited

are given below.Whitcoulls published the list “New Zealand’s 100 Favourite Books”. The top twenty books from this listto fill in a short survey. They were asked to list, in order, their three favourite books. Using the results,

No.

Book

No.

Book

The Lord of the Rings

April Fool’s Day

The Power of One

Complete Winnie the Pooh

Pride and Prejudice

The Runaway

The Bible

Clan of the Cave Bear

Wild Swans

Long Walk to Freedom

The Horse Whisperer

Sleepers

Cross Stitch

Jane Eyre

Goosebumps

Gone With the Wind

The Bone People

Wuthering Heights

The Hobbit

The English Patient

(1) be present in this survey are:Based on the information stated above, the two most obvious types of non-sampling errors that may

self-selection bias and interviewer effects.

random errors and non-response bias.

self-selection bias and question effects.

selection bias and self-selection bias.

selection bias and chance errors.

A student wants to select and read any 4 books from this top twenty books list.

Choose a simple

random sample of 4 books for this student.

To select the sample you must use the thirty random

digits given below. Start at the beginning of the line of random digits given below.

(1) The four books in the random sample are:

Wild Swans, April Fool’s Day, The Hobbit, Goosebumps.

The Hobbit, Wild Swans, The Hobbit, The Power of One.

The Hobbit, Wild Swans, The Power of One, April Fool’s Day.

Goosebumps, Cross Stitch, The Hobbit, Wild Swans.

Goosebumps, Cross Stitch, The Lord of the Rings, Wild Swans.

Study 1: Consider the following three studies:

incidents of cats falling out of apartment windows.they had time to twist round and prepare for the impact. Vets in New York City recordedAn animal researcher was interested in cats’ abilities to survive surprisingly high falls if

The data was divided into three

The proportion of cats that survived in each group was then compared.five storeys above the ground and cats that fell from six or more storeys above the ground.groups: cats that fell from one or two storeys above the ground; cats that fell from three to

Study 2:

clothing expenditures for the next three months.A random sample of 100 students is asked to keep a diary in which they record their

The expenditures of males and females

are then compared.

Study 3:

(^) A sample of 50 shoppers at an appliance store is split into two groups. One group is shown style as previous television commercials for the store.a television commercial for a new range of appliances that has been filmed in the same

The second group is shown a

totally new style.television commercial for the same new range of appliances that has been filmed in a

An hour after viewing the commercial, each of the shoppers was asked

recollection was recorded. The recall scores were then compared for the two groups.what they could recall about the new range of appliances and a score based on their

(i)

Study 3: Study 2: Study 1: For each study, describe what “treatment” is being compared.

(ii)

Study 3:Study 2: Study 1: observational studies?Which of the three studies would be described as experiments and which would be described as

(iii)

out instead.For the studies that are observational, briefly explain why an experiment could not be carried

part of a cancer study.In 1950 two hundred employees from the Christchurch Firestone Tire and Rubber Company became

These employees were observed until 1996 and any occurrences of cancer

within this group were recorded. This study is

(^) best

(^) called:

a double-blind experiment.

a randomised experiment.

a sample survey.

a retrospective observational study.

a prospective observational study.

Which

(^) one

(^) of the following statements is

(^) false

Non-sampling errors are often bigger than the random sampling errors in surveys.

People will sometimes answer a question differently for different interviewers.

sampling from is different to the one of interest.Sophisticated sampling projections can always correct the results if the population you are

Slight changes in the wording of questions can often make a big change to survey results.

people who respond.Non-response can cause bias in surveys because non-respondents can behave differently from

response to the following question:A TIME daily poll on the Internet invited readers to make a choice from a given list of options, in

“Three times in the last five months, children went on killing sprees.

What is fuelling this

bizarre and tragic trend?”

As of 2 June 1998, the largest proportion of respondents (29%) chose the option:

“Nurture: The American family is crumbling; permissive parents are raising wild children.”

Nurture We wish to use this percentage as an estimate of the proportion of all Americans who believe that

(^) is the cause.

Which

(^) one

(^) of the following is

(^) not

(^) a potential source of non-sampling error in this survey?

Question effects.

Self-selection bias.

Selection bias.

Non-response bias.

Transferring findings.

Television polls have become commonplace in New Zealand over the last few years.

A television

sports programme often runs polls on questions such as:

(^) “Do you approve or disapprove of Wayne

Smith as the All Black coach?”

Viewers are then invited to phone in their vote at a cost of

approximately 99 cents per minute. Identify two sources of bias in this form of survey.

questionTIME magazine, 20 December 1993, reported that 70% of Americans answered “Yes” to the

“Do you favour stricter gun-control laws?”

The figure was obtained from a telephone poll

(i) of 500 adult Americans. Are the following statements true or false? Explain briefly.

The sample was too small to provide any useful results.

(ii)

The survey does not take into account the views of homeless people.

(iii)

The survey may be inaccurate due to non-response bias.

(iv)

The survey should be repeated so that it includes a control group.

Two drugs are to be compared.

A group of 20 people are each randomly allocated one of the two

(1) the drug. Which best describes this situation?drugs. Neither the people who were treated nor the doctor who administered the drugs knew who got

An observational study.

A double blind experiment.

A sample survey.

A case-control study.

A block design.

Draw a box plot for the following set of data:

Five-number summary: (18, 22, 29, 38, 78)

(1) Do you agree with the following statements? Discuss.

The distribution from which this sample is drawn is highly skewed.

The interquartile range is 21.

There are no observations greater than 78.

The observation 78 is an outside value for the box plot representing the above data.

The observation 18 is an outside value for the box plot representing the above data.

The five-number summary for a set of data is:

Which

(^) one

(^) of the following is

(^) false

Each of the whiskers on the box plot of the data must be greater than 42 units in length.

It is not possible determine the mean of the data from this five-number summary.

At least half of the observations are between 22 and 50 inclusive.

The interquartile range is 28.

None of the observations in the data set is an outside value on the box plot of the data.

2-4 department of the Technitron company. The stem-and-leaf plot below shows the annual salaries for the 21 employees in the engineering Questions 9 to 11 refer to the following information.

Stem-and-leaf plot of SALARY

n (^) = 21

Units: 4 | 7 = $47,

(1) The median for the SALARY data set is:

(1) The upper quartile for the SALARY data set is:

Which

(^) one

(^) of the following statements is

(^) true

omitted, as there are no data values on it.The stem-and-leaf plot is drawn incorrectly because the second to last line should have been

The stem-and-leaf plot is drawn incorrectly as there is a 0 missing on the second to last line.

The stem-and-leaf plot is drawn correctly despite the fact that there is only one row for stem 2.

is one stem-digit with more leaf-digits than any other stem-digit.The stem-and-leaf plot has been drawn correctly because the length of the plot is such that there

value.The stem-and-leaf plot is drawn incorrectly because 4 | 7 in the units statement is not a data

ID

Chapter 3 – Exploratory Tools for Relationships

1. Section A: Types of Variables

(a)

Quantitative

(^) variables are _____________________________________________ and counts.

(b)

Qualitative

(^) variables describe _________________

________________________________.

Quantitative variables

(^) can be either

(^) discrete

(^) or

(^) continuous

(a)

Variables with

(^) few

(^) repeated values

(^) are treated as ____________________________________.

(b)

Variables with

(^) many

(^) repeated values

are treated as __________________________________.

Qualitative variables

(^) can be either

(^) categorical

(^) or

(^) ordinal

(a)

Variables

(^) with order

(^) are called __________________________________.

(b)

Variables

(^) without order

(^) are called

____________________________________.

(a)

To explore the relationship between two

(^) quantitative

(^) variables we use a __________________

______________________.

(b)

To explore relationships between a

qualitative

variable and a

quantitative

variable we use

___________ plots, ________________________________ plots and _______________ plots.

(c)

To explore the relationship between two

qualitative

variables we use a ___________________

__________________ of _________________________.

3-2 in the USA. TVNZ News, 5 August 1997, reported that smoking is on the increase in the high socio-economic group Questions 1 and 2 refer to the following information. Section B: Two Variables

It was claimed that the advertising and fashion industries are responsible for this increase.

person has measurements made on variables that describe some aspect of their image.The data shown in the table below is a subset of the data from a study on a large number of people. Each

ID

Gender

Weight (kg)

Socio-Ec Status

Smoking Status

Age

Female

High

Smoker

Male

Low

Smoker

Male

Middle

Non-smoker

Female

Middle

Non-smoker

Table 1:

(^) Data on People’s Images

(1) Smoking Status is to construct a:The most appropriate way to begin to explore the relationship between Socio-Economic Status and

for the column values.two-way table of counts with Socio-Economic Status for the row values and Smoking Status

each plot.dot plot of Socio-Economic Status for each level of Smoking Status, using the same scale for

each plot.box plot of Socio-Economic Status for each level of Smoking Status, using the same scale for

frequency table for each of these two variables.

scatter plot of Socio-Economic Status against Smoking Status.

(1) to construct a:The most appropriate way to begin to explore the relationship between Weight and Smoking Status is

values.two-way table of counts with Weight for the row values and Smoking Status for the column

dot plot of Weight for each level of Smoking Status, using the same scale for each plot.

box plot of Weight for each level of Smoking Status, using the same scale for each plot.

frequency table for each of these two variables.

scatter plot of Weight against Smoking Status.

Chapter 4 – Probabilities and Proportions

In 1995 there were 2011 students enrolled in either

(^) Stat

(^) or

Stat

Statistics at

(^) one University

campus. The numbers of female and male students are given in the following table.

Females

Males

Total

Stat

Stat

Total

(a)

Convert the above table of counts into a probability table (to 4 dp).

Females

Males

Total

Stat

Stat Total

(b)

(i) is:One of the 2011 students is chosen at random. What is the probability that the student chosen

a male taking

(^) Stat

(ii)

a female?

(iii)

a female taking

(^) Stat

4-

(c)

Given that a student is taking

(^) Stat

, what is the probability that they are male?

(d)

What is the probability that a randomly chosen male student is taking

(^) Stat

Consider drivers stopped at random for breath testing.

Below is a partially completed probability

whether they were (or were not) wearing seat belts.table providing information about such drivers, with regards to their age (40 or under, over 40) and

40 or under

Over 40

Total

Wearing a seat belt

Not wearing seat belt

Total

(a)

Complete the table.

(b)

What is the probability that a driver stopped at random is not wearing a seat belt?

(c)

over 40?If a driver stopped at random is not wearing a seat belt, then what is the probability the driver is

(d)

What is the probability that a driver stopped at random is 40 or under?

Chapter 5 – Discrete Random Variables

1. Section A: Discrete Random Variables

Let random variableThe sample space is {HH, HT, TH, TT}.Consider the experiment of tossing two fair coins.

X

(^) be the number of tails.

(a)

The probability function for this experiment is:

x

pr(

X (^) = (^) x )

(b)

(i) Find the probability that:

X

(^) is more than 1

(ii)

X

(^) is at least 1

(iii)

X

(^) is at most 2

Random variable

Y

(^) has the following probability function:

y

pr(

Y (^) = (^) y )

(a) Find the probability that:

Y

(^) is more than 12

(b)

Y

(^) is no more than 10

(c)

Y

(^) is at least 6

(d)

Y

(^) is at least 6 and at most 12

(e)

Y

(^) is at least 10 and at most 12

(f)

Y

(^) is more than 13 but less than 25.

5-2 1. Section B: Binomial Distribution

least one book.The owner of a small bookshop estimates that 30% of the customers who enter the shop purchase at

At 10.30am on a particular day there are 7 potential customers in the shop.

book.potential customers, calculate the probability that at least two of these people purchase at least oneAssuming that these customers can be regarded as a random sample from the population of all

malfunction during the computer’s warranty period. LetThe manufacturer of disk drives for a well-known brand of computers expects 5% of the drives to

X

(^) be the number of disk drives, in a batch of

10 randomly selected disk drives, which malfunction during this period.

X

has a Binomial

(a) distribution.

Identify

(^) n (^) and

(^) p , the parameters of the Binomial random variable.

(b)

In the context of this exercise, state the assumptions required for

X

to have a Binomial

distribution.

(c)

Are the assumptions satisfied here?

(d)

(i) Calculate the probability that:

no disk drive will malfunction during the warranty period.

(ii)

exactly one disk drive will malfunction during the warranty period.

(iii)

at least two disk drives will malfunction during the warranty period.

(iv)

between 2 and 5 (inclusive) disk drives will malfunction during the warranty period.

serious cases, while 36 presented as mild cases.The medical records of a group of diabetic patients presenting at a clinic showed that 50 presented as

Of the 31 patients aged under 40, 16 presented as

(a) mild cases.

Present this information in the table below.

(b)

(i) A patient is chosen at random. Find the probabilities that:

the patient is under 40 and has a mild case.

(ii)

the patient is at least 40 years old or has a serious case.

(iii)

the patient has a serious case and is at least 40 years old.

(c)

Of those presenting with serious cases, what proportion are aged under 40?

(d)

Of those aged at least 40, what proportion present with mild cases?

A bank classifies borrowers as high-risk or low-risk.

Of all its loans, 5% are in default.

Forty

(a) are to low-risk borrowers.percent (40%) of those loans in default are to high-risk borrowers, while 77% of loans not in default

Complete the table.

(b)

What percentage of loans is made to borrowers in the high-risk category?

(c)

What is the probability that a high-risk borrower will default on his or her loan?

bachelor’s degrees are in business.According to recent figures from the National Centre of Educational Statistics (US), 17.5% of all

27% of bachelor’s degrees in business are obtained by women

(a) and 48.75% of other degrees are obtained by men.

Complete the table.

(b)

man?What is the probability that a randomly selected recent bachelor’s degree graduate will be a

(c)

man with a degree in business?What is the probability that a randomly selected recent bachelor’s degree graduate will be a

(d)

have a degree in business?What is the probability that a randomly selected female recent bachelor’s degree graduate will

alcohol daily. A drinking pattern found by a survey is that 19% of male drinkers and 10% of female drinkers drink

Also, 51% of all drinkers are male (a ‘drinker’ was defined as someone who had

(1) is:The probability that a randomly selected drinker from this survey who drinks alcohol daily is femaleconsumed alcohol in the previous 12 months).

had lung cancer.A medical trial was conducted to investigate whether a new drug extended the life of a patient who

Assume that the survival time (in months) for patients on this drug is Normally

(a) distributed with a mean of 31.1 months and a standard deviation of 16.0 months.

Use the following

STATA

output to answer the questions below.

(i)

Calculate the probability that a patient survives for no more than one year.

(ii)

Calculate the proportion of patients who survive for between one year and two years.

6-

(iii)

Calculate the number of months beyond which 80% of the patients survive.

(iv)

Calculate the range of the central 80% of survival times.

(b)

A sample of survival times is taken for 38 patients on this drug.

Plots of these 38 survival

times are shown below.

Use these plots to comment on the validity of the assumption that the

Comment: survival time is Normally distributed.

without having to change positions.The designer of a new aircraft’s cockpit wants to position a switch so that most pilots can reach it

Suppose that among airline pilots the distribution of the

is approximately Normally distributed with meanmaximum distance (measured from the back of the seat) that can be reached without moving the seat

μ (^) = 125cm and standard deviation

σ (^) = 10cm.

Normal Cumulative Distribution Function

with

mean

=

and

standard

deviation

=

x

P(X<=

x)

Normal Inverse Cumulative Distribution Function

with

mean

=

and

standard

deviation

=

P(X

<=

x)

x

Use the

STATA

output above to answer the following questions.

(a)

reach it without moving the seat?If the switch is placed 120cm from the back of the seat, what proportion of pilots will be able to

(b)

required that 95% of pilots be able to reach it without moving the seat?What is the maximum distance from the back of the seat that the switch could be placed if it is

(c)

(i)

If the pilot has a

(^) z -score of 1.5, what does this mean in this context?

(ii)

To what maximum reach does a

(^) z -score of 1.5 correspond?

6-6 Formulae for Combining Random variables Section C: Combining Random Variables

(^) (An extract from the formulae appendix)

For any constants

(^) a

and

b :

E(

aX

(^) + (^) b ) =

(^) a E( X ) +

(^) b

sd(

aX

(^) + (^) b ) = |

a |sd(

X

If (^) X 1 (^) and

X

2 (^) are independent random variables:

E(

a 1 X 1 (^) + (^) a 2 X 2 ) =

(^) a 1 E(

X 1 ) +

(^) a 2 E(

X 2 )

sd(

a 1 X 1 (^) + (^) a 2 X 2 ) =

2

2

2

2

2

1

2

1

sd

sd

X

a

X

a

If (^) X 1 , X 2 , .....,

X

n (^) is a random sample from a distribution with mean

μ (^) and standard deviation

σ :

E(

X

1 (^) + (^) X 2 (^) + ...... +

X

n ) =

(^) n μ

sd(

X 1 (^) + (^) X 2 (^) + ...... +

X

n ) =

σ

n

If (^) Y (^) = (^) a 1 X 1 +

(^) a 2 X 2 (^) is written as

(a) complete the following by filling in the gaps:

W

X

Y

(b)

T

X

Y

(c)

V

Y

X

X

and

Y

are independent random variables.

X

(^) has a mean of 1 and a standard deviation of 2, and

Y

has a mean of 3 and a standard deviation of 3.

Suppose

W

Y

X

The standard deviation of

W

σ W , is:

X

(^) is a random variable with a mean of 2 and a standard deviation of 2 and

Y

(^) is a random variable with

a mean of 3 and a standard deviation of 4.

If

X

and

Y

are independent random variables and

W

X

Y

(^) then the standard deviation of

W

σ W , is:

x

x

Y

a 1

X

1

a 2

X

2

x

x

x

x

x

x

university.A university professor keeps records of his travel time while he is driving between his home and the

Over a long period of time he has found that his morning travel times are approximately

Normally distributed with a mean of 31 minutes and a standard deviation of 3 minutes.

His return

Use thedeviation of 3.5 minutes.journey in the evening is also Normally distributed but with a mean of 35.5 minutes and a standard

STATA

output on the next page to answer the following questions.

(a)

work.Find the probability that on a typical day he spends more than one hour travelling to and from

(b)

Find the probability that on a given day his morning journey is longer than his evening journey.

(c)

morning journey?On what proportion of days is the evening journey more than five minutes longer than the

(d)

(i) Over a five-day working week, what is the distribution of the total time for:

morning journeys?

(ii)

evening journeys?

(iii)

all journeys?

6-10 Normal Cumulative Distribution Function

with

mean

=

and

standard

deviation

=

x

P(

X

<=

x)

Normal Cumulative Distribution Function

with

mean

=

and

standard

deviation

=

x

P(

X

<=

x)

Normal Cumulative Distribution Function

with

mean

=

-4.

and

standard

deviation

=

x

P(

X

<=

x)

-10.

-5.

Normal Cumulative Distribution Function

with

mean

=

-4.

and

standard

deviation

=

x

P(

X

<=

x)

-10.

-5.

Normal Cumulative Distribution Function

with

mean

=

and

standard

deviation

=

x

P(

X

<=

x)

-10.

-5.

Normal Cumulative Distribution Function

with

mean

=

and

standard

deviation

=

x

P(

X

<=

x)

-10.

-5.