Download Examples with Solutions - Applied Regression Analysis | STAT 462 and more Study notes Statistics in PDF only on Docsity! Stat 462 March 3 Example: A study is done to compare three metal alloys used to make welds to join pipes together. Y = a measure of the strength of the weld X = diameter of weld Alloy = type of alloy used to make the weld (Alloy 1, 2, or 3) Graph of the Strength versus Diameter in which alloys are indicated by different symbols. Note that Strength and Diameter are related and that there are differences among the alloys. There appears to be interaction as the slopes differ for the three alloys. The slope is steeper for alloy 2. Alloy is categorical - the numerical codes 1, 2, 3 are arbitrary. To put Alloy into a regression model, create two indicator variables A1 = 1 if observation is alloy 1 and 0 otherwise. A2 = 1 if observation is alloy 2 and 0 otherwise General Rule: If a categorical variable has k categories, then k1 indictor variables will fully describe the variable. In our example, we could create A3 = 1 if observation is alloy 3 and 0 otherwise. But, notice that A1+A2+A3 = 1 for any observation. Thus, A3=1A1A2 meaning that A3 is perfectly predictable from values of A1 and A2 and it would be redundant as a predictor in a regression equation. In the alloy problem, a “no interaction” model is E(Y) = 0 + 1X + 2A1 + 3A2 Given the plot above, this model almost surely is wrong. An interaction model is E(Y) = 0 + 1X + 2A1 + 3A2 + 4X*A1 + 5 X*A2. Notice that the interaction terms involve multiplications of the Alloy indicators and X=diameter. Page 2 Understanding the meaning of the coefficients When indicator variables are present in the model, the data analyst must give consideration to the correct interpretation of the coefficients multiplying the predictors. To do this Consider each category of a categorical variable separately. For a specific category, determine the values of all indicator variables Substitute these values into the equation (model for E(Y)) and reduce as far as possible. When this is done for each category, compare the resulting equations to determine what the individual coefficients measure. Alloy Example – No Interaction model Model for average Y is E(Y) = 0 + 1X + 2A1 + 3A2 Alloy 1. For this alloy, A1=1 and A2 = 0. So, E(Y) = 0 + 1X + 2(1) + 3(0) = 0 +2 + 1X Alloy 2. For this alloy, A1=0 and A2 = 1. So, E(Y) = 0 + 1X + 2(0) + 3(1) = 0 +3 + 1X Alloy 3. For this alloy, A1=0 and A2 = 0. So, E(Y) = 0 + 1X + 2(0) + 3(0) = 0 + 1X 1 = the slope between Y and X, regardless of alloy. This is what “no interaction is about – the slope between Y and X is the same for each alloy. The model actually consists of three parallel lines. 2 = difference between intercepts for alloys 1 and 3. More generally, it would be the difference between E(Y) for alloys 1 and 3 at any specified value of X. 3 = difference between intercepts for alloys 2 and 3. More generally, it would be the difference between E(Y) for alloys 2 and 3 at any specified value of X. MINITAB RESULTS INCLUDING GRAPH OF ESTIMATED MODEL The regression equation is Y = - 57.3 + 6.04 X + 12.0 A1 + 29.8 A2 Predictor Coef SE Coef T P Constant -57.27 16.43 -3.49 0.004 X 6.0425 0.8956 6.75 0.000 A1 12.009 4.866 2.47 0.027 A2 29.798 4.597 6.48 0.000 Page 3