









Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Various concepts in statistics and probability, including calculating mean and standard deviation from given data sets, understanding qualitative and quantitative variables, exploring relationships between variables using graphs, and the normal distribution. It also includes exercises on finding probabilities for continuous random variables and combining random variables.
Typology: Exams
1 / 16
This page cannot be seen from the preview
Don't miss anything!










In early 1997 Whitcoulls
bookstores conducted a nation-wide survey. Whitcoulls’ shoppe
rs were invited
are given below.Whitcoulls published the list “New Zealand’s 100 Favourite Books”. The top twenty books from this listto fill in a short survey. They were asked to list, in order, their three favourite books. Using the results,
No.
Book
No.
Book
The Lord of the Rings
April Fool’s Day
The Power of One
Complete Winnie the Pooh
Pride and Prejudice
The Runaway
The Bible
Clan of the Cave Bear
Wild Swans
Long Walk to Freedom
The Horse Whisperer
Sleepers
Cross Stitch
Jane Eyre
Goosebumps
Gone With the Wind
The Bone People
Wuthering Heights
The Hobbit
The English Patient
(1) be present in this survey are:Based on the information stated above, the two most obvious types of non-sampling errors that may
self-selection bias and interviewer effects.
random errors and non-response bias.
self-selection bias and question effects.
selection bias and self-selection bias.
selection bias and chance errors.
A student wants to select and read any 4 books from this top twenty books list.
Choose a simple
random sample of 4 books for this student.
To select the sample you must use the thirty random
digits given below. Start at the beginning of the line of random digits given below.
(1) The four books in the random sample are:
Wild Swans, April Fool’s Day, The Hobbit, Goosebumps.
The Hobbit, Wild Swans, The Hobbit, The Power of One.
The Hobbit, Wild Swans, The Power of One, April Fool’s Day.
Goosebumps, Cross Stitch, The Hobbit, Wild Swans.
Goosebumps, Cross Stitch, The Lord of the Rings, Wild Swans.
Study 1: Consider the following three studies:
incidents of cats falling out of apartment windows.they had time to twist round and prepare for the impact. Vets in New York City recordedAn animal researcher was interested in cats’ abilities to survive surprisingly high falls if
The data was divided into three
The proportion of cats that survived in each group was then compared.five storeys above the ground and cats that fell from six or more storeys above the ground.groups: cats that fell from one or two storeys above the ground; cats that fell from three to
Study 2:
clothing expenditures for the next three months.A random sample of 100 students is asked to keep a diary in which they record their
The expenditures of males and females
are then compared.
Study 3:
(^) A sample of 50 shoppers at an appliance store is split into two groups. One group is shown style as previous television commercials for the store.a television commercial for a new range of appliances that has been filmed in the same
The second group is shown a
totally new style.television commercial for the same new range of appliances that has been filmed in a
An hour after viewing the commercial, each of the shoppers was asked
recollection was recorded. The recall scores were then compared for the two groups.what they could recall about the new range of appliances and a score based on their
(i)
Study 3: Study 2: Study 1: For each study, describe what “treatment” is being compared.
(ii)
Study 3:Study 2: Study 1: observational studies?Which of the three studies would be described as experiments and which would be described as
(iii)
out instead.For the studies that are observational, briefly explain why an experiment could not be carried
part of a cancer study.In 1950 two hundred employees from the Christchurch Firestone Tire and Rubber Company became
These employees were observed until 1996 and any occurrences of cancer
within this group were recorded. This study is
(^) best
(^) called:
a double-blind experiment.
a randomised experiment.
a sample survey.
a retrospective observational study.
a prospective observational study.
Which
(^) one
(^) of the following statements is
(^) false
Non-sampling errors are often bigger than the random sampling errors in surveys.
People will sometimes answer a question differently for different interviewers.
sampling from is different to the one of interest.Sophisticated sampling projections can always correct the results if the population you are
Slight changes in the wording of questions can often make a big change to survey results.
people who respond.Non-response can cause bias in surveys because non-respondents can behave differently from
response to the following question:A TIME daily poll on the Internet invited readers to make a choice from a given list of options, in
“Three times in the last five months, children went on killing sprees.
What is fuelling this
bizarre and tragic trend?”
As of 2 June 1998, the largest proportion of respondents (29%) chose the option:
“Nurture: The American family is crumbling; permissive parents are raising wild children.”
Nurture We wish to use this percentage as an estimate of the proportion of all Americans who believe that
(^) is the cause.
Which
(^) one
(^) of the following is
(^) not
(^) a potential source of non-sampling error in this survey?
Question effects.
Self-selection bias.
Selection bias.
Non-response bias.
Transferring findings.
Television polls have become commonplace in New Zealand over the last few years.
A television
sports programme often runs polls on questions such as:
(^) “Do you approve or disapprove of Wayne
Smith as the All Black coach?”
Viewers are then invited to phone in their vote at a cost of
approximately 99 cents per minute. Identify two sources of bias in this form of survey.
questionTIME magazine, 20 December 1993, reported that 70% of Americans answered “Yes” to the
“Do you favour stricter gun-control laws?”
The figure was obtained from a telephone poll
(i) of 500 adult Americans. Are the following statements true or false? Explain briefly.
The sample was too small to provide any useful results.
(ii)
The survey does not take into account the views of homeless people.
(iii)
The survey may be inaccurate due to non-response bias.
(iv)
The survey should be repeated so that it includes a control group.
Two drugs are to be compared.
A group of 20 people are each randomly allocated one of the two
(1) the drug. Which best describes this situation?drugs. Neither the people who were treated nor the doctor who administered the drugs knew who got
An observational study.
A double blind experiment.
A sample survey.
A case-control study.
A block design.
Draw a box plot for the following set of data:
Five-number summary: (18, 22, 29, 38, 78)
(1) Do you agree with the following statements? Discuss.
The distribution from which this sample is drawn is highly skewed.
The interquartile range is 21.
There are no observations greater than 78.
The observation 78 is an outside value for the box plot representing the above data.
The observation 18 is an outside value for the box plot representing the above data.
The five-number summary for a set of data is:
Which
(^) one
(^) of the following is
(^) false
Each of the whiskers on the box plot of the data must be greater than 42 units in length.
It is not possible determine the mean of the data from this five-number summary.
At least half of the observations are between 22 and 50 inclusive.
The interquartile range is 28.
None of the observations in the data set is an outside value on the box plot of the data.
2-4 department of the Technitron company. The stem-and-leaf plot below shows the annual salaries for the 21 employees in the engineering Questions 9 to 11 refer to the following information.
Stem-and-leaf plot of SALARY
n (^) = 21
Units: 4 | 7 = $47,
(1) The median for the SALARY data set is:
(1) The upper quartile for the SALARY data set is:
Which
(^) one
(^) of the following statements is
(^) true
omitted, as there are no data values on it.The stem-and-leaf plot is drawn incorrectly because the second to last line should have been
The stem-and-leaf plot is drawn incorrectly as there is a 0 missing on the second to last line.
The stem-and-leaf plot is drawn correctly despite the fact that there is only one row for stem 2.
is one stem-digit with more leaf-digits than any other stem-digit.The stem-and-leaf plot has been drawn correctly because the length of the plot is such that there
value.The stem-and-leaf plot is drawn incorrectly because 4 | 7 in the units statement is not a data
ID
1. Section A: Types of Variables
(a)
Quantitative
(^) variables are _____________________________________________ and counts.
(b)
Qualitative
(^) variables describe _________________
Quantitative variables
(^) can be either
(^) discrete
(^) or
(^) continuous
(a)
Variables with
(^) few
(^) repeated values
(^) are treated as ____________________________________.
(b)
Variables with
(^) many
(^) repeated values
are treated as __________________________________.
Qualitative variables
(^) can be either
(^) categorical
(^) or
(^) ordinal
(a)
Variables
(^) with order
(^) are called __________________________________.
(b)
Variables
(^) without order
(^) are called
(a)
To explore the relationship between two
(^) quantitative
(^) variables we use a __________________
(b)
To explore relationships between a
qualitative
variable and a
quantitative
variable we use
___________ plots, ________________________________ plots and _______________ plots.
(c)
To explore the relationship between two
qualitative
variables we use a ___________________
__________________ of _________________________.
3-2 in the USA. TVNZ News, 5 August 1997, reported that smoking is on the increase in the high socio-economic group Questions 1 and 2 refer to the following information. Section B: Two Variables
It was claimed that the advertising and fashion industries are responsible for this increase.
person has measurements made on variables that describe some aspect of their image.The data shown in the table below is a subset of the data from a study on a large number of people. Each
Gender
Weight (kg)
Socio-Ec Status
Smoking Status
Age
Female
High
Smoker
Male
Low
Smoker
Male
Middle
Non-smoker
Female
Middle
Non-smoker
Table 1:
(^) Data on People’s Images
(1) Smoking Status is to construct a:The most appropriate way to begin to explore the relationship between Socio-Economic Status and
for the column values.two-way table of counts with Socio-Economic Status for the row values and Smoking Status
each plot.dot plot of Socio-Economic Status for each level of Smoking Status, using the same scale for
each plot.box plot of Socio-Economic Status for each level of Smoking Status, using the same scale for
frequency table for each of these two variables.
scatter plot of Socio-Economic Status against Smoking Status.
(1) to construct a:The most appropriate way to begin to explore the relationship between Weight and Smoking Status is
values.two-way table of counts with Weight for the row values and Smoking Status for the column
dot plot of Weight for each level of Smoking Status, using the same scale for each plot.
box plot of Weight for each level of Smoking Status, using the same scale for each plot.
frequency table for each of these two variables.
scatter plot of Weight against Smoking Status.
In 1995 there were 2011 students enrolled in either
(^) Stat
(^) or
Stat
Statistics at
(^) one University
campus. The numbers of female and male students are given in the following table.
Females
Males
Total
Stat
Stat
Total
(a)
Convert the above table of counts into a probability table (to 4 dp).
Females
Males
Total
Stat
Stat Total
(b)
(i) is:One of the 2011 students is chosen at random. What is the probability that the student chosen
a male taking
(^) Stat
(ii)
a female?
(iii)
a female taking
(^) Stat
4-
(c)
Given that a student is taking
(^) Stat
, what is the probability that they are male?
(d)
What is the probability that a randomly chosen male student is taking
(^) Stat
Consider drivers stopped at random for breath testing.
Below is a partially completed probability
whether they were (or were not) wearing seat belts.table providing information about such drivers, with regards to their age (40 or under, over 40) and
40 or under
Over 40
Total
Wearing a seat belt
Not wearing seat belt
Total
(a)
Complete the table.
(b)
What is the probability that a driver stopped at random is not wearing a seat belt?
(c)
over 40?If a driver stopped at random is not wearing a seat belt, then what is the probability the driver is
(d)
What is the probability that a driver stopped at random is 40 or under?
1. Section A: Discrete Random Variables
Let random variableThe sample space is {HH, HT, TH, TT}.Consider the experiment of tossing two fair coins.
(^) be the number of tails.
(a)
The probability function for this experiment is:
x
pr(
X (^) = (^) x )
(b)
(i) Find the probability that:
(^) is more than 1
(ii)
(^) is at least 1
(iii)
(^) is at most 2
Random variable
(^) has the following probability function:
y
pr(
Y (^) = (^) y )
(a) Find the probability that:
(^) is more than 12
(b)
(^) is no more than 10
(c)
(^) is at least 6
(d)
(^) is at least 6 and at most 12
(e)
(^) is at least 10 and at most 12
(f)
(^) is more than 13 but less than 25.
5-2 1. Section B: Binomial Distribution
least one book.The owner of a small bookshop estimates that 30% of the customers who enter the shop purchase at
At 10.30am on a particular day there are 7 potential customers in the shop.
book.potential customers, calculate the probability that at least two of these people purchase at least oneAssuming that these customers can be regarded as a random sample from the population of all
malfunction during the computer’s warranty period. LetThe manufacturer of disk drives for a well-known brand of computers expects 5% of the drives to
(^) be the number of disk drives, in a batch of
10 randomly selected disk drives, which malfunction during this period.
has a Binomial
(a) distribution.
Identify
(^) n (^) and
(^) p , the parameters of the Binomial random variable.
(b)
In the context of this exercise, state the assumptions required for
to have a Binomial
distribution.
(c)
Are the assumptions satisfied here?
(d)
(i) Calculate the probability that:
no disk drive will malfunction during the warranty period.
(ii)
exactly one disk drive will malfunction during the warranty period.
(iii)
at least two disk drives will malfunction during the warranty period.
(iv)
between 2 and 5 (inclusive) disk drives will malfunction during the warranty period.
serious cases, while 36 presented as mild cases.The medical records of a group of diabetic patients presenting at a clinic showed that 50 presented as
Of the 31 patients aged under 40, 16 presented as
(a) mild cases.
Present this information in the table below.
(b)
(i) A patient is chosen at random. Find the probabilities that:
the patient is under 40 and has a mild case.
(ii)
the patient is at least 40 years old or has a serious case.
(iii)
the patient has a serious case and is at least 40 years old.
(c)
Of those presenting with serious cases, what proportion are aged under 40?
(d)
Of those aged at least 40, what proportion present with mild cases?
A bank classifies borrowers as high-risk or low-risk.
Of all its loans, 5% are in default.
Forty
(a) are to low-risk borrowers.percent (40%) of those loans in default are to high-risk borrowers, while 77% of loans not in default
Complete the table.
(b)
What percentage of loans is made to borrowers in the high-risk category?
(c)
What is the probability that a high-risk borrower will default on his or her loan?
bachelor’s degrees are in business.According to recent figures from the National Centre of Educational Statistics (US), 17.5% of all
27% of bachelor’s degrees in business are obtained by women
(a) and 48.75% of other degrees are obtained by men.
Complete the table.
(b)
man?What is the probability that a randomly selected recent bachelor’s degree graduate will be a
(c)
man with a degree in business?What is the probability that a randomly selected recent bachelor’s degree graduate will be a
(d)
have a degree in business?What is the probability that a randomly selected female recent bachelor’s degree graduate will
alcohol daily. A drinking pattern found by a survey is that 19% of male drinkers and 10% of female drinkers drink
Also, 51% of all drinkers are male (a ‘drinker’ was defined as someone who had
(1) is:The probability that a randomly selected drinker from this survey who drinks alcohol daily is femaleconsumed alcohol in the previous 12 months).
had lung cancer.A medical trial was conducted to investigate whether a new drug extended the life of a patient who
Assume that the survival time (in months) for patients on this drug is Normally
(a) distributed with a mean of 31.1 months and a standard deviation of 16.0 months.
Use the following
output to answer the questions below.
(i)
Calculate the probability that a patient survives for no more than one year.
(ii)
Calculate the proportion of patients who survive for between one year and two years.
6-
(iii)
Calculate the number of months beyond which 80% of the patients survive.
(iv)
Calculate the range of the central 80% of survival times.
(b)
A sample of survival times is taken for 38 patients on this drug.
Plots of these 38 survival
times are shown below.
Use these plots to comment on the validity of the assumption that the
Comment: survival time is Normally distributed.
without having to change positions.The designer of a new aircraft’s cockpit wants to position a switch so that most pilots can reach it
Suppose that among airline pilots the distribution of the
is approximately Normally distributed with meanmaximum distance (measured from the back of the seat) that can be reached without moving the seat
μ (^) = 125cm and standard deviation
σ (^) = 10cm.
Normal Cumulative Distribution Function
with
mean
=
and
standard
deviation
=
x
P(X<=
x)
Normal Inverse Cumulative Distribution Function
with
mean
=
and
standard
deviation
=
P(X
<=
x)
x
Use the
output above to answer the following questions.
(a)
reach it without moving the seat?If the switch is placed 120cm from the back of the seat, what proportion of pilots will be able to
(b)
required that 95% of pilots be able to reach it without moving the seat?What is the maximum distance from the back of the seat that the switch could be placed if it is
(c)
(i)
If the pilot has a
(^) z -score of 1.5, what does this mean in this context?
(ii)
To what maximum reach does a
(^) z -score of 1.5 correspond?
6-6 Formulae for Combining Random variables Section C: Combining Random Variables
(^) (An extract from the formulae appendix)
For any constants
(^) a
and
b :
aX
(^) + (^) b ) =
(^) a E( X ) +
(^) b
sd(
aX
(^) + (^) b ) = |
a |sd(
If (^) X 1 (^) and
2 (^) are independent random variables:
a 1 X 1 (^) + (^) a 2 X 2 ) =
(^) a 1 E(
X 1 ) +
(^) a 2 E(
X 2 )
sd(
a 1 X 1 (^) + (^) a 2 X 2 ) =
2
2
2
2
2
1
2
1
sd
sd
a
a
If (^) X 1 , X 2 , .....,
n (^) is a random sample from a distribution with mean
μ (^) and standard deviation
σ :
1 (^) + (^) X 2 (^) + ...... +
n ) =
(^) n μ
sd(
X 1 (^) + (^) X 2 (^) + ...... +
n ) =
σ
n
If (^) Y (^) = (^) a 1 X 1 +
(^) a 2 X 2 (^) is written as
(a) complete the following by filling in the gaps:
(b)
(c)
and
are independent random variables.
(^) has a mean of 1 and a standard deviation of 2, and
has a mean of 3 and a standard deviation of 3.
Suppose
The standard deviation of
σ W , is:
(^) is a random variable with a mean of 2 and a standard deviation of 2 and
(^) is a random variable with
a mean of 3 and a standard deviation of 4.
If
X
and
are independent random variables and
(^) then the standard deviation of
σ W , is:
x
x
a 1
1
a 2
2
x
x
x
x
x
x
university.A university professor keeps records of his travel time while he is driving between his home and the
Over a long period of time he has found that his morning travel times are approximately
Normally distributed with a mean of 31 minutes and a standard deviation of 3 minutes.
His return
Use thedeviation of 3.5 minutes.journey in the evening is also Normally distributed but with a mean of 35.5 minutes and a standard
output on the next page to answer the following questions.
(a)
work.Find the probability that on a typical day he spends more than one hour travelling to and from
(b)
Find the probability that on a given day his morning journey is longer than his evening journey.
(c)
morning journey?On what proportion of days is the evening journey more than five minutes longer than the
(d)
(i) Over a five-day working week, what is the distribution of the total time for:
morning journeys?
(ii)
evening journeys?
(iii)
all journeys?
6-10 Normal Cumulative Distribution Function
with
mean
=
and
standard
deviation
=
x
P(
X
<=
x)
Normal Cumulative Distribution Function
with
mean
=
and
standard
deviation
=
x
P(
X
<=
x)
Normal Cumulative Distribution Function
with
mean
=
-4.
and
standard
deviation
=
x
P(
X
<=
x)
-10.
-5.
Normal Cumulative Distribution Function
with
mean
=
-4.
and
standard
deviation
=
x
P(
X
<=
x)
-10.
-5.
Normal Cumulative Distribution Function
with
mean
=
and
standard
deviation
=
x
P(
X
<=
x)
-10.
-5.
Normal Cumulative Distribution Function
with
mean
=
and
standard
deviation
=
x
P(
X
<=
x)
-10.
-5.