















Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Material Type: Notes; Class: INTRO BUSINESS STAT; Subject: Statistics; University: University of Pennsylvania; Term: Unknown 1998;
Typology: Study notes
1 / 23
This page cannot be seen from the preview
Don't miss anything!
















STAT 102
Stat 102 has concentrated on analysis of explanatory and predictive models with one dependent variable ( Y ) and one or more predictive variables or factors (often denoted by X ). Data Structures examined include situations described as:
Ordinary linear regression: Y continuous ; One continuous x -variable. Multiple linear regression: Y continuous ; Several continuous x -variables Polynomial regression: Like multiple regression, but with some polynomial terms ANOVA: Y continuous ; One or more qualitative X-factors Regression and ANOVA combined: Logistic Regression Y qualitative ; One or more continuous X-factors (We also included a qualitative X-factor (sex) in one example)
Example Wine and Music: Do shoppers buy more wine when appropriate music is played? Data: A supermarket in N. Ireland played no background music, French music, or Italian music at random times of the day. Here is the number of purchases for different combination of type of music and type of wine: 2-way table of results
Type of Wine By Type of Music Count (^) French Italian None Total French 39 30 30 99 Italian 1 19 11^31 Other 35 35 43 113 Total 75 84 84 243
Taken from Moore and McCabe, Introduction to the Practice of Statistics, 5th^ edition. Original source, Ryan, Northrup-Clewes, Knox and Thurnham (1998). The effect of in-store music on consumer choice of wine. Proc. Nutrition Soc. 57. p1069.
Test of Independence Null Hypothesis: In words -- H 0 : Row effects are independent of Column effects Symbolically – The (population) probabilities for the row effects (Wine) are
denoted by^1 2 3
def for ,
P BuyingFrenchWine Similar
The probabilities for the column effects (Music) are 1 2 3
, are similarly defined
H 0 is that the probability of a count in the i , j cell is the product, for example Pr (^) Frenc h& None (^) Pr count in cell 1 , (^3) 1 3 P 13. Ha is that H 0 is not true.
Computations (cont) We can put this together to get estimates for what the cell counts should be if H 0 is true. The expected (under H 0 ) cell counts would be Expected ij Eij N (^) P ˆ ij. Pearson’s Chi-Square statistic for testing H 0 is
2 ij ij ij (^) ij
Under H 0 it has a Chi-Square distribution with
In our example, df 2 2 4. Your book has tables of the Chi-Square distribution, or use JMP. We REJECT when χ^2 C , with C as found in the table.
Computations (summary)
Get the table. It has I rows and J columns and entries Nij.
Calculate Eij , the Expected (under H 0 ) cell counts. Looking back
we see they have the formula j ij E Ni^ j N Ni N N
Calculate
2 2 Observed Expected Expected
ij ij ij ij ij (^) ij ij
Square table with this df. Reject whenever χ^2 C.
Analysis in JMP (cont) JMP also gives the value of (^) χ^2 , and the resulting P-value: Test ChiSquare Prob>ChiSq Likelihood Ratio 21.875 0. Pearson 18.279 0. If you calculated by hand you would get this from the table on p 9 as 18.279 2.3337 7.6724 .0004 .5209 ....
( You can ignore the Likelihood Ratio statistic. It belongs to a different statistical method. It comes from a different set of formulas than Pearson’s Chi-Square. The P-values from the two tests are nearly always very similar. )
The result of this calculation is that we Reject H 0. (We do so at any reasonable level of significance since P-value = .0011). This should be followed by a summary of the nature of the effects we have detected.
Nature of the Effects We have Rejected H 0. Therefore we believe choice of wine and nature of music are related to each other. What is the relation? There are formal ways to investigate this [see the Appendix to this lecture] , but for Stat 102 we recommend looking at the numbers and use common-sense. You can look at the cell counts (as on p6), and/or at the cell Chi- Square values (as on p9) and/or at row or column % (as below). Used carefully, any of these looks should yield similar conclusions. Here is the table of counts and column %:
“When French music is playing customers seem very much less likely to purchase Italian wine than they are to purchase Italian wine when Italian music is playing.”
Also (from the last two columns ) “There doesn’t seem to be much difference in distribution of wine choice between times when Italian music is playing and when no music is playing. (There’s a slightly greater % of people choosing Italian wine when Italian music is playing, but it’s not a large difference.)”
Second Example Income and Job Satisfaction About 50 years ago, a classic sociological study looked at the relationship between a worker’s salary and their job satisfaction. One two-way table from this study looked at the results from a survey of worker’s income and a questionnaire judging their job satisfaction. Annual job INCOME was divided into 4 categories <$6,000, $6,000 - $15,000, $15,000 - $25,000, >$25,000. Their SATISFACTION was graded into four categories 1 = very dissatisfied, 2 = somewhat dissatisfied, 3 = somewhat satisfied, 4 = very satisfied. [By accident, both examples in this lecture have square contingency tables. But that’s not necessary; rectangular tables are also common].
Data table and results of the analysis follow:
An Aside The Mosaic Plot in JMP provides a very nice visualization for this data: Mosaic Plot
Satisfaction
(1) 0-6 (2) 6- (3) 15-25(4) 25+ Income
2
3
4
You can clearly see how the % of very dissatisfied workers goes down as earnings go up, and the % of very satisfied workers increases with income.
Pearson Chi-Square Test of Independence
Test of H 0 : Satisfaction is independent of Income
df = 33 = 9.
Results from JMP:
Test ChiSquare Prob>ChiSq Likelihood Ratio 12.037 0. Pearson 11.989 0. The P-value is 0.2140. We cannot reject the null hypothesis.
[We have to regretfully conclude that numbers like those in the table might well have occurred by chance. BUT note that not only do we have numbers like these, but they occur in a very obvious pattern. Our test isn’t looking for any special pattern. Might there be a better test to use on this data? See the Appendix .]
GLM (log-link) Wine and Music Data
Whole Model Test Model - LogLikelihood L-R ChiSquare DF Prob>ChiSq Difference 38.99 77.98 8 <. Full 21. Reduced 60. Effect Tests Source DF L-R ChiSquare Prob>ChiSq Wine 2 68.33 <. Music 2 11.31 0. Music*Wine 4 21.88 0.
These tables show that various things we can see in the table are statistically significant. In particular, the interaction is significant.
It is this interaction term that describes the dependence of Wine and Music. The significance of this term is another confirmation of the result of our earlier Pearson Chi-Square test.
Parameter Estimates This analysis yields a parameter estimates table that one can use to estimate the values of relevant quantities.
For example one can estimate a quantity like E Log (^) N (^) 12 N 2 ,
and hence also E N (^) 12 N 2 . This corresponds to the proportion of
Italian Music listeners who purchase French wine.
Parameter Estimates Term Estimate Std Error L-R ChiSquare Prob>ChiSq Intercept 2.96 0.13 51.04 <. Wine[French] 0.52 0.14 20.25 <. Wine[Italian] - 1.18 0.24 67.05 <. Music[French] - 0.56 0.24 10.64 0. Music[Italian] 0.34 0.14 7.05 0. Music[French]Wine[French] 0.73 0.25 15.21 <. Music[French]Wine[Italian] - 1.22 0.46 16.53 <. Music[Italian]Wine[French] - 0.42 0.17 7.82 0. Music[Italian]Wine[Italian] 0.83 0.26 15.21 <.