12. Moderation | Passion-Driven Statistics, Lecture notes of Statistics

The effect of a moderating variable is often characterized statistically as an interaction; that is, a third variable that affects the direction and/or strength ...

Typology: Lecture notes

2022/2023

Uploaded on 03/01/2023

agrata
agrata 🇺🇸

4

(7)

258 documents

1 / 35

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
1
12. Moderation
Section 12.1: Moderation with ANOVA
Section 12.2: Graphing Moderation with ANOVA
Section 12.3: Moderation with Chi-Square
Section 12.4: Graphing Moderation with Chi-Square
Section 12.5: Moderation with Correlation
Section 12.6: Graphing Moderation with Correlation
Section 12.1: Moderation with ANOVA
Statistical interaction describes a relationship between two variables that is dependent upon, or
moderated by, a third variable. For instance do you prefer ketchup or soy sauce? Obviously, your
answer depends on what food you’re eating. If you’re eating sushi then your preference may be soy
sauce. If you’re having a burger and fries, you may prefer ketchup.
In this case, the third variable is referred to as the moderating variable or simply the moderator.
The effect of a moderating variable is often characterized statistically as an interaction; that is, a
third variable that affects the direction and/or strength of the relation between your explanatory,
or X variable, and your response, or Y variable.
What if the population we are studying has different subgroups? Could it be that, like the soy sauce
ketchup example, different subgroups could have a moderating effect on our association of interest?
To explore this idea we’re going to use a hypothetical study and some made up data. In our
imaginary study we’re looking at two diets and their effect on weight loss. Diet A is a low-
carbohydrate plan. Diet B is a low fat plan. Our hypothetical study also recorded data on which
exercise program participants chose; cardiovascular exercise or weight training. Our variables of
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23

Partial preview of the text

Download 12. Moderation | Passion-Driven Statistics and more Lecture notes Statistics in PDF only on Docsity!

12. Moderation

Section 12 .1: Moderation with ANOVA Section 12 .2: Graphing Moderation with ANOVA Section 12.3: Moderation with Chi-Square Section 12.4: Graphing Moderation with Chi-Square Section 12.5: Moderation with Correlation Section 12.6: Graphing Moderation with Correlation

Section 12 .1: Moderation with ANOVA

Statistical interaction describes a relationship between two variables that is dependent upon, or moderated by, a third variable. For instance do you prefer ketchup or soy sauce? Obviously, your answer depends on what food you’re eating. If you’re eating sushi then your preference may be soy sauce. If you’re having a burger and fries, you may prefer ketchup. In this case, the third variable is referred to as the moderating variable or simply the moderator. The effect of a moderating variable is often characterized statistically as an interaction; that is, a third variable that affects the direction and/or strength of the relation between your explanatory, or X variable, and your response, or Y variable. What if the population we are studying has different subgroups? Could it be that, like the soy sauce ketchup example, different subgroups could have a moderating effect on our association of interest? To explore this idea we’re going to use a hypothetical study and some made up data. In our imaginary study we’re looking at two diets and their effect on weight loss. Diet A is a low- carbohydrate plan. Diet B is a low fat plan. Our hypothetical study also recorded data on which exercise program participants chose; cardiovascular exercise or weight training. Our variables of

interest are Diet and Weight Loss. We’ve added this third variable, Exercise Plan to help us understand moderation, or statistical interaction. So what is the association between diet plan A and B, our explanatory variable, and weight loss, our quantitative response variable? This table shows our hypothetical data showing diet, weight loss, and exercise plan. Since we have a categorical explanatory variable, diet plan A or B, and a quantitative response variable, that is weight loss, we will of course need to use analysis of variance to evaluate the association.

Here we see that the average one month weight loss for diet A is about 14.7 pounds and that the average one month weight loss for diet B is about 9.3 pounds. So, in conjunction with the significant p-value, we can say that diet plan A is associated with significantly greater weight loss than diet plan B. Here we show the finding graphically as a bar chart with diet, the explanatory variable, on the x- axis and the mean weight loss, our response variable, on the y-axis. But what about our third variable, exercise program? Would we get the same results in terms of the association between diet and weight loss for those participants using cardio and those participants using weight training? In statistics, moderation occurs when the relationship between two variables depends on a third variable. In this case, the third variable is referred to as the moderating variable, or simply the

moderator. The effect of the moderating variable is often characterized statistically as an interaction. That is, a third variable that effects the direction and or strength of the relation between your explanatory and response variables. So, does type of exercise program affect the direction or strength of the relationship between diet and weight loss? The standard way of asking this question in the context of analysis of variance is to move to the use of a two way or two factor analysis of variance, rather than the one way or one factor ANOVA that we've been using. Instead, we're going to take a less-standard approach that can be consistently used across each of the inferential tools, that is ANOVA, Chi-Square, and Pearson Correlation. In each of these contexts, we're actually going to be asking the question, is our explanatory variable associated with our response variable, for each population sub-group or each level of our third variable? That is, are diet type and weight loss associated for those doing the cardio exercise program? And are diet and weight loss associated for those using the weight-training program? Here are the results of the example analysis. The ANOVA table examining the relationship between diet and weight loss, for those in the cardio exercise group, shows a large f-value and a significant associated p-value.

However, the means show that the association is in the opposite direction. For those involved in weight training, diet B is associated with greater weight loss, at 11.5 pounds, compared to diet A, which is only 8.8 pounds. Here, these results are shown graphically. As you can see, the relationship between diet and weight loss depends on which exercise program is being used. When using cardio, diet A is significantly better for weight loss than diet B. When using weights, diet B is significantly better for weight loss than diet A. Thus, we can say there is a

significant statistical interaction between the variables diet and weight loss, and the type of exercise (our third variable) moderates the association between diet and weight loss. Suppose we did not evaluate exercise as a possible moderator and instead focused only on the association between diet and weight loss for the entire population. Based on this graph, obviously, we would've incorrectly concluded that diet A is better than diet B. As we now know, that is true only if we're looking at the cardio group. Let’s return to our ANOVA example in which we asked ‘Is major depression associated with a smoking quantity among current young adult smokers’? Since we are testing for moderation we need to include a third variable of Sex to ask the question for each population sub-group or each level of our third variable. In hypothesis testing terms, “Are the mean number of cigarettes smoked per month equal or not equal for those individuals with and without major depression?” and “Does this association differ by sex? That is, does it differ for males vs. females?” The explanatory variable here is categorical with two levels, MAJORDEPLIFE, that is, the presence or absence of major depression. The response variable of NUMCIGMO_EST, smoking quantity, measured by the number of cigarettes smoked per month ranges from 1 to 2940. The third variable SEX of the participant where 1=Male and 2=Female.

  1. Go to Data > Sort Cases.
  1. Using the arrow move the third variable from the window on the left to the Groups Based on: window on the right. Click Compare groups and File is already sorted , then OK.
  2. Go to Analyze > Compare Means > One-Way ANOVA…
  1. From the variables list on the left, use the arrows to put your categorical explanatory variable, MAJORDEPLIFE, in the Factor: window and your quantitative response variable, NUMCIGMO_EST, in the Dependent List: window.
  2. Click the Options… button and check the box by Descriptive. Click Continue then OK.

You should recall in our original ANOVA the calculated F statistic found in the F column in this output, is 3.5 5. The significance, probability, or p value, associated with this F statistic, is labeled Sig. , and as you can see, the p value is .060, just over our p .05 cut point. Our calculated F statistic for males found in the top row of the F column in this output, is 2.028. The significance, probability, or p-value, associated with this F statistic, is labeled Sig. , and as you can see, the p value is .155, over our .05 cut point. Our calculated F statistic for females found in the bottom row of the F column in this output, is 4.824. The significance, probability, or p-value, associated with this F statistic, is labeled Sig. , and as you can see, the p value is .028, below the p .05 cut point. As you can see, the relationship between major life depression and smoking quantity depends on your sex. For males we did not find a statistically significant relationship, that is, no significant difference in amount of cigarettes smoked when comparing those with and without major depression. For females we did find a statistically significant relationship, that is, a significant difference in amount of cigarettes smoked when comparing those with and without major depression. More specifically, females with major depression smoked significantly more cigarettes than those without major depression. Thus, we can say there is a significant statistical interaction between the sex and major depression when predicting smoking quantity. In other words, sex (our third variable) moderates the association between major depression and smoking quantity.

Section 12.2: Graphing Moderation with ANOVA

We have already sorted the data and split the data, therefore, if we complete the appropriate steps for a bivariate graph for a categorical explanatory variable and a quantitative response variable we will get two graphs, one for males and one for females.

  1. Click Graphs > Legacy Dialogs.
  1. Click on the graph left of Simple > Define.
  2. In the top middle Bars Represent box Click Other statistic (e.g., mean). Using the arrow directly below, move the Quantitative Response Variable from the left window to below Variable:. Use the next arrow down to move the Categorical Explanatory Variable to the Category Axis: window. Click Titles… in the upper right corner.

Using the Chart Editor previously explained make the appropriate changes to your graphs. Suppose we did not evaluate sex as a possible moderator and instead focused only on the association between major depression and smoking quantity for the entire population. Based on this p-value’s and graphs, obviously, we would've incorrectly concluded that participants with major depression smoke significantly more than those that do not have major depression. As we now know, that is true only if we're looking at females.

Section 12.3: Moderation with Chi-Square

Now let's evaluate third variables as potential moderators in the context of chi-square tests of independence. Using the NESARC data and asking the question, is smoking associated with nicotine dependence? Let’s return to our Chi-Square example using the NESARC data in which we asked ‘Is smoking frequency associated with nicotine dependence among current young adult smokers’? Since we are testing for moderation we need to include a third variable of Sex to ask the question for each population sub-group or each level of our third variable. In hypothesis testing terms, Or in hypothesis testing terms, “Is smoking frequency and nicotine dependence independent or dependent?; That is, are the rates of nicotine dependence equal or not equal among individuals from my different smoking frequency categories. And is the answer to this question similar or different for males and females? For this analysis, we're going to use the same categorical explanatory variable and categorical response variable we used when learning about Chi-Square in SPSS tutorial 10. If you recall the categorical explanatory variable has 6 levels, the number of days smoked per month, which we called USFREQMO, with the following categorical values: smoking approximately 1 day/month, 2. days/ month, 5 days/month, 14 days/month, 22 days/month and 30 days/month. The response variable, called TAB12MDX, is categorical with 2 levels--the presence or absence of nicotine dependence in the past 12 months. The third variable SEX of the participant where 1=Male and 2=Female.

  1. Go to Data > Sort Cases.
  1. Using the arrow move the third variable from the window on the left to the Groups Based on: window on the right. Click Compare groups and File is already sorted , then OK.
  2. Click Analyze > Descriptive Statistics > Crosstabs.
  1. Using the arrows move your categorical explanatory variable to the window labeled Column(s): and your categorical response variable to the window labeled Row(s): then click Statistics.
  2. Check Chi-square then click Continue.