






Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Enter or copy the data from the table above into a blank Excel spreadsheet as shown here. Page 6. 2. Click on Data→ Data Analysis and, in the Data Analysis pop ...
Typology: Study notes
1 / 10
This page cannot be seen from the preview
Don't miss anything!







Simple Linear and Multiple Regression
In this tutorial, we will be covering the basics of linear regression, doing both simple and multiple regression models. The following data gives us the selling price, square footage, number of bedrooms, and age of house (in years) that have sold in a neighborhood in the past six months.
Selling Price Square Footage Bedrooms Age 64000 1670 2 30 59000 1339 2 25 61500 1712 3 30 79000 1840 3 40 87500 2300 3 18 92500 2234 3 30 95000 2311 3 19 113000 2377 3 7 115000 2736 4 10 138000 2500 3 1 142500 2500 4 3 144000 2479 3 3 145000 2400 3 1 147500 3124 4 0 144000 2500 3 2 155500 4062 4 10 165000 2854 3 3
We need to develop three simple regression models to predict the selling price based on each of the individual factors and determine which one is the best model. Next, we will develop a model to predict the selling price of a house based on the square footage, number of bedrooms, and age and will discuss if all three variables should be included and if it is a better model than just the three simple regression models.
To use Excel for regression, we do not want to use the Excel QM module, but rather will be using the data analysis add-in. To check and be sure that it is activated, go to File Options Add-ins. An Excel Options window will appear as shown here.
If not, click the “Go” button at the bottom of the window next to “Manage Excel Add-Ins” and simply tick the box next to Analysis ToolPak and Analysis ToolPak VBA then click OK.
Once you have the Add-ins in place, you are ready to get started.
Repeat steps 2-5, but select C3-C20 for the number of bedrooms and put the Output Range as F23, then, repeat steps 2-5 again but select D3-D20 for Age, and put the Output Range as F43.
You should now have all three simple regression models. Click here to download the completed sample spreadsheet so you can compare it to yours.
The key parts of this output are as follows (using the square footage example):
Under the “Regression Statistics” Multiple R – the correlation coefficient – notes the strength of the relationship – in this case, 0.80358 – a pretty strong positive relationship. R squared – the amount of variability in the dependent variable explained by the independent variable(s). In this case, 0.6457 – again, a pretty strong number – almost 65% of the variability in purchase price is explained by square footage. Adjusted R squared – this is when you have more than one independent variable and have adjusted the R squared value for the number of independent variables. Use this when looking at a multiple regression model.
Under the ANOVA Tables Significance F – this tests the significance of the overall model. We look for this to be less than 0.05. If it is less than 0.05, we can reject the null hypothesis and determine that the model is statistically valid. In this case, it’s 0.000102, so we have a valid model. Intercept Coefficient – this is the intercept for our line if we were to plot it out. With X as zero, this is where the line crosses the Y axis. Here its 2367. So a house with zero square feet will sell for $2,367. X Coefficient – this is the coefficient for our independent variable for the linear equation. It is the slope of our line or the amount that our dependent variable changes for every $1 change in our independent variable. For every increase in square footage by one, our price will change by this amount, or $46.6. X P-Value – this tests the significance of the variable. We look for this to be less than 0.05. If it less than 0.05, we can reject the null hypothesis and determine that the variable is statistically significant. It’s 0.000102, so we have a significant variable.
Looking at the multiple model, this is even better. Both Multiple R and R-Square are higher, even when adjusting for the number of dependent variables. What is interesting here is that the number of bedrooms is not significant in this model, so that should not be included in the final model.
This concludes the tutorial on both simple and multiple regression models.