





Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Material Type: Notes; Class: Statistical Methods; Subject: Statistics; University: SUNY Institute of Technology at Utica-Rome; Term: Unknown 1989;
Typology: Study notes
1 / 9
This page cannot be seen from the preview
Don't miss anything!






numerical outcomes. For example, as I mentioned in a previous lecture, I planted 30 trees in my backyard a few weeks ago. If I wait till early July and count the number of plants which are still alive I obtain a number. This is a discrete situation (random variable) because possible outcomes are whole numbers (0, 1… 30). When we count things we use discrete models. Other examples of experiments which are discrete would be: count how many phone calls are received in Utica between noon and 2PM tomorrow; draw a liter of water from a swamp and count the number of insect larvae in your sample; take a minute right now to see how many heart beats you count in the next 60 seconds.
These situations are different from experiments which involve length or time. We call these random variables continuous because they involve physical quantities which can assume numbers on a continuum. For example, I could consider the height of each plant. Admittedly, my ruler doesn’t allow me to measure infinitely finely, however the concept if height is a continuous one. This lecture attempts to remind you of some discrete ideas and begin to introduce continuous ones as well.
which are traditionally called success or failure, and if these outcomes are independent from trial to trial, then the probability of obtaining r successes on n trials is 𝑃 𝑟 = 𝐶𝑛,𝑟 𝑝𝑟^1 − 𝑝 𝑛−𝑟
Here 𝑝 is the constant probability of success and 1 – 𝑝 = 𝑞 is the constant probability of failure.
Example: Suppose that a population is composed of 20% smokers and 80% nonsmokers. You form a
random sample of 15 individuals. What is the probability that, in your sample,
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0.035 0.132 0.231 0.250 0.188 0.103 0.043 0.014 0.003 0.001 0.000 0.000 0.000 0.000 0.000 0.
So, the probability that exactly 4 will be smokers is 0.188. To get the probability of at least 4 consider the shaded region and add up the individual probabilities: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0.035 0.132 0.231 0.250 0.188 0.103 0.043 0.014 0.003 0.001 0.000 0.000 0.000 0.000 0.000 0.
So, 𝑝𝑟𝑜𝑏 𝑎𝑡 𝑙𝑒𝑎𝑠𝑡 4 𝑠𝑚𝑜𝑘𝑒𝑟𝑠 = 0..
The next one is a little tricky. Notice that exactly 6 nonsmokers is the same as 9 smokers (there are
15 people, 6+9=15) and exactly 7 nonsmokers is the same as 8 smokers (there are 15 people, 7+8=15), so “ At least 6 will be nonsmokers” is the same as 9 or 8 or … or 0 smokers.
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0.035 0.132 0.231 0.250 0.188 0.103 0.043 0.014 0.003 0.001 0.000 0.000 0.000 0.000 0.000 0.
This gives us 𝑝𝑟𝑜𝑏 𝑎𝑡 𝑙𝑒𝑎𝑠𝑡 6 𝑛𝑜𝑛𝑠𝑚𝑜𝑘𝑒𝑟𝑠 = 𝑝𝑟𝑜𝑏 𝑎𝑡 𝑚𝑜𝑠𝑡 9 𝑠𝑚𝑜𝑘𝑒𝑟𝑠 ≈ 1. The expected value we
talked about in the previous lecture is “easy” to calculate,
I’ll assume you are using Excel for this:
𝑟 𝑝 𝑟 𝑟 𝑝 𝑟 0 0.035 0 1 0.132 0. 2 0.231 0. 3 0.250 0. 4 0.188 0. 5 0.103 0. 6 0.043 0. 7 0.014 0. 8 0.003 0. 9 0.001 0. 10 0.000 0. 11 0.000 0. 12 0.000 1.15E- 05 13 0.000 7.16E- 07 14 0.000 2.75E- 08 15 0.000 4.92E- 10 sums 1 3
If you’ve noticed that 20% of 15 is equal to 3 you’ve got a nice result. This gives us a fast way to calculate expected values, but only for the binomial case.
As far as spread, we can define the standard deviation for a discrete probability distribution in a similar way to expected value.
A little scary. Luckily we have a quick formula, but this is only good for the binomial random variable
One last example and the third “class presentation” for this week: Suppose you will roll a fair die 10
times and you consider a “4” to be a success. Calculate the probabilities of obtaining 0, 1… 10
successes. Set these up in a table and show how to calculate the standard deviation using the
formula
.Also show that this gives the same result as when multiplying
One of the most important continuous distributions in probability and statistics is the normal
distribution. As we will see later, the distribution is found to describe a wide variety of natural
phenomena and also models many sampling situations. For example, as we will see in the next
lecture, a binomial random variable (discrete) may be fairly well approximated by the normal
distribution (continuous) when 𝑛𝑝 > 5 and 𝑛(1 − 𝑝) > 5.
Important Properties Of The Normal Distribution:
It is symmetrical about its mean
It is "bell shaped"
It is defined on the whole real line.
It has a wild formula 𝑓 𝑥 = 1 2 𝜋𝜎^2 𝑒
−^12 𝑥 −𝜇𝜎 2
Some of us have to work with this formula all the time. In STA100 we see it once and run away
quickly. Luckily for us the probabilities we need from this distribution are tabulated. Before looking at
the tables, remember the empirical rule. If a set of data is approximately normally distributed we will
see
Approximately 68% of the data within one standard deviation of the mean.
Approximately 95% of the data within two standard deviations of the mean.
Approximately 99.7% of the data within three standard deviations of the mean.
The graph below shows us what a typical normal distribution looks like and how we might use it.
One second to last (penultimate) “student presentation” for this week. This may take a few minutes,
but will give us some data to play with. Find 30 people willing to give you 1 minute of their time (at
work, in a crowded theater, wherever you’d like). For each of these 30 people time their number of
0 5 10 15 20 25 30 35 40 45 50
0
Binomial Probabilities n=50, p=0.5 with Normal Distribution Superimposed
normal binomial
As a quick example, suppose we have a population whose histogram looks as shown:
From the figure we can see that the smallest data points in the population are near zero and the largest are near 2 (I’m of course looking along the horizontal axis for this). Also, since the graph gets higher to the right we have more data points near 2 than near 0. From this figure, if we want the probability that a data point falls between 0 and 1.5 (or, put another way, the proportion of data between 0 and 1.5) we must find the area under the curve between 0 and 1.5. Recalling that the area
under a triangle is 𝐴 = 1 2 𝑏𝑎𝑠𝑒^ ∙ 𝑒𝑖𝑔𝑡^ we see that: The total area under the curve is 𝑡𝑜𝑡𝑎𝑙 𝑎𝑟𝑒𝑎 = 1 2 2 1 = 1^ which is what we want (100% of the data lie between 0 and 2. The area between 0 and 1.5 is 𝑃 0 < 𝑋 < 1.5 = 1 2 1.5^ 0.75^ =^ 0.5625^ meaning that a little more than half the population lies between 0 and 1.5.
When a curve is more complicated than a triangle it becomes challenging to obtain areas/probabilities. Luckily for us someone has done the work for us and placed all the areas under the normal curve we could reasonably want in a table for us. If you look in the front cover of your book you should see the famous “z-table”. The pictures which accompany the table show you how to obtain areas.
For example, if you have a data set which follows a standard normal curve (bell shaped, centered at x=0 and with standard deviation=1) you can find out what proportion of data sit below 2.25. The table lists “z values” as X.XX meaning it considers z values in the form of units, tenths and hundredths. Our number 2.25 is equal to 2.2 + 0.05 so look down the first column till you see a 2.2 (about 2/3 down the
0 0.5 1 1.5 2
-0.
-0.
0
1
Simple Probability Density Function
second page of the table). Now look across columns till you find the area under the column labeled 0.05. You should find an area of 0.9878.
Try this again for a z value of -1.38 and obtain an area of 0.0838. We are almost done. If you want to know what proportion of the area under a standard normal distribution lies between 1.45 and 2.20 you can set up a little table:
z2= 2.20 A2= 0. z1=1.45 A1= 0. A = 0.
We would write 𝑃 1.45 < 𝑍 < 2.20 = 𝐴 2 − 𝐴1 = 0.
At this point we can say that if our data is mound shaped and centered at the origin with a spread or standard deviation of 1 then we can find areas/proportions/probabilities. Suppose you instead have IQ data. A common IQ test has a mean or center of 100 (meaning 100 is the average IQ) and a spread of 15. If these scores are normally distributed then the histogram would look like this:
Suppose you would like to know what proportion of the population has an IQ between 90 and 120. The Z-table can help us, but first we have to rephrase our question. It turns out that we can use our Z-
20 40 60 80 100 120 140 160 180
0
Histogram of IQ Scores, =100 and =