















































































Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Material Type: Notes; Class: Statistics and Applications; Subject: Statistics ; University: University of Nebraska - Lincoln; Term: Spring 2005;
Typology: Study notes
1 / 87
This page cannot be seen from the preview
Don't miss anything!
















































































Chapter 3: Random Variables and Probability
Distributions
Diagram from Chapter 1:
Inference
Sample
Population
Take Sample
In Chapter 2, we learned about some of the basics of
probability. In this chapter, we are going to learn more
about probability and how we can use “probability
distributions”. One can think of probability distributions
as population quantities since they summarize possible
values that values that a random variable can take on.
3.1-3.2: Concept of a Random Variable and Discrete
Probability Distributions
Example: Fifty numbers from 0 to 9 (fifty_numbers_ch3.xls)
Suppose I draw 50 numbers at random from 0, 1, …, 9
with replacement and each number has an equal chance
of being drawn. Notice that for each draw, the sample
space is S = {0, 1, …, 9}. Below are the results.
Below is a table summarizing frequencies and
percentages for each category:
Frequency Percentage
Let X = the number selected on one draw from 0, 1, …,
X is called a random variable because it can change
from draw to draw in a random manner which is
controlled by a probability structure.
Notice a capital X is used!
We know before the experiment of drawing numbers that
X can be any number from 0, 1, …, 9, and we know the
percentage of draws (probability) we expect X to be any
of these numbers. Thus, we can talk about a probability
distribution for X before the experiment. Note that this is
for the population!
x P(X=x)
Notes:
A lowercase “x” is used to denote the values that X
can take on. Be careful!!!
9
x 0
P(X x) 1
Values of X not listed above have a probability of 0; for
example, P(X=4.2) = 0.
Remember that P(X=7) is what we expect to happen if
the experiment is repeated an infinite number of times.
In our sample, the percentage of time a 7 occurred
was 0.06.
The random variable used in this example is called a
discrete random variable since there are a finite
number of values that it can take on – 0, 1, …, 9.
Section 3.3 will discuss continuous random variables
where there are an infinite number of values a random
variable can take on within a particular region
The more general definition of a discrete random
variable is if the set of possible values for X is
“countable”. Countable corresponds to a set which
is “finite” or “countably infinite”.
When there are a finite number of values for X
(i.e., you can count all possible values), the
random variable is discrete (as in this example).
A random variable is also called “discrete” if the set
of possible values of X is “countably infinite”.
There will be more on this at the end of the
section.
In order to make this more general (we need to for
continuous random variables), the notation of f(x) is
Select Tools > Data Analysis > Random Number
Generation from the main Excel menu bar.
The Random Number Generation window will appear.
To generate a sample of 50, the Number of Variables
is set to 1, and the Number of Random Numbers is set
to 50.
Select Discrete as the distribution since there are a
discrete number of values of X.
Select the area in the spreadsheet where the
probability distribution is located. For example, cell
F14 contains the “0” and cell G23 contains the last
“0.1”. The “:” in the list of cells means “to”. The “$” are
not needed to make this work (see window below).
Select a cell for the output to begin the Output Range.
Select OK.
The final Random Number Generation window is shown
below with part of the output (see 50 random numbers
column)
The Histogram window will now appear. Fill in the
Input Range with the cells addresses of the sample,
the Bin Range with the cell addresses of the possible
values that X can take on, and the Output Range with
an empty spot in the spreadsheet where the Excel
generated output will go. Below is what the window
should look like:
Select OK to produce the table below. The titles of the
table can be changed to whatever you desire.
Bin Frequency
0 8
1 4
2 5
3 10
4 5
5 3
6 0
7 3
8 5
9 7
More 0
Let’s review what has happened here.
The random variable X denotes the number which is
drawn from 0, 1, …, 9.
The probability distribution assigns probabilities to
particular values that X can take on. For this example,
all 10 possible values have a probability of 0.1. By
specifying the probability distribution, we are specifying
the possible values that can exist in the population with
how often they can exist.
The value “observed” from the first draw was 3. The
value “observed” from the second draw was 1. All of
these numbers constitute a sample of size 50 from a
population which has the specified probability
distribution.
x P(X=x) = f(x)
is used instead. What do you expect would happen
with the sample? Try it out for homework.
In order to plot the probability distributions for discrete
random variables, the following steps can be performed.
Select Insert > Chart from the main Excel menu bar.
Below is the window that will appear.
We want to make a column chart since this will plot
each of the probabilities for a specific X value as a bar.
Select Next > to go on to the next step.
Select the Series tab at the top of the window. Select
Add to add a set of values to appear on the plot.
Put in the range of the probabilities in the Values box
and put in the range of X values in the Category (X)
axis labels box. Below is what the finished window will
look like. Select Next > to go on to the next step.
The next window asks you where you want to put the
plot. Select Finish to put it in the current sheet.
The plot should now appear in the sheet. You can
move it anywhere by clicking on it and moving it to a
new location while holding down the left mouse button.
Various items can be changed in the plot by selecting
the items and making the appropriate changes.
o Change the title: Select the title so that it is
highlighted (black squares around it) and then
click in the text area. Type in a new name.
o Change the y-axis scale and gridlines: Select the
y-axis (black squares will appear on it), right click
on it to bring up a short cut menu, and select
Format axis. Select whatever option you want to
change from the Format axis window.
o Change the gridlines?
o Remove the gray background?
Below is the final plot:
Probability distribution for X
0
0 1 2 3 4 5 6 7 8 9
X
f(x)
Note that it may be better to represent each probability
with just a line instead of bar (see for example p. 68).
This can not be done easily in Excel. Below is a plot
done in the statistical software package called R which
demonstrates this:
0 2 4 6 8
X
f(x)
The names probability distribution function and a
cumulative distribution function are often abbreviated as
PDF and CDF.
Example: Fifty numbers from 0 to 9 (fifty_numbers_ch3.xls)
x P(X=x) = f(x) F(x)
Questions:
What is F(0)?
What is F(1.5)?
What is F(1.99999)?
What is F(2)?
What is F(-1)?
Plotting the cumulative distribution function for discrete
random variables in Excel can be a little more difficult.
Because of this, you will not be responsible for knowing
how to do it. Please see fifty_numbers_ch3.xls for an
example of how to do it. Below is the plot.
CDF
0
1
-1 0 1 2 3 4 5 6 7 8 9 10
X
F(x)
Example: Let’s Play Plinko! (plinko.xls in Chapter 4)
From Haws (1995):
The contestant drops a circular disk down a board
with nails arranged in the pattern indicated in figure
the left or to the right at each nail it encounters. The
contestant wins the amount of cash indicated on the
reservoir in which the disk lands.