Download Excel Basics - Econometrics - Lecture Notes and more Study notes Econometrics and Mathematical Economics in PDF only on Docsity!
Excel Basics
Part I: Basics
Starting Excel:
- Click the “start” button on the lower left of the screen.
- Click “programs”, then slide your cursor up to “Microsoft Office 97”, and then over to the Microsoft Excel icon.
Cells: The heart of the Excel workbook is the cells in the central part of the spreadsheet. Each cell has an “address” which is denoted by a row number and column letter. For example, the cell in the top left of the screen is cell A1. Notice that if you click your mouse pointer on cell A1, then A1 appears in the Name Box , which always contains the address of the active cell (or the upper left cell of a highlighted range of cells).
Text Entry: Text is entered in cells simply by typing in the active cell. The active cell is the cell where your cursor is currently located, and will be highlighted by a black line around it. The words or characters you are typing are shown in the formula bar near the top of the screen.
Example:
- In cell A1, type the words “ECN 422-001 Rocks”
- Press enter, or click on another cell.
- The words should now appear in cell A1.
- You can change the width of the entire A column by clicking one the border between the heading for column A and column B (located immediately above the cells) and dragging to the desired width. This way, when you enter information in cell B1, you do not cover some of the information contained in A1. If you will not be entering anything in cell B1 then this action is unnecessary.
- You can edit the contents of the cell simply by clicking on it again, and performing the edit in the formula bar by moving the cursor to the desired location and deleting, retyping, or adding characters.
Data Entry: Numerical data entry works just like text entry. You can use the number keys at the top of your keyboard, or use the number pad to the right (be sure the Num Lock is engaged).
Saving Periodically, you will want to save what you have done onto your disk as follows:
- Click “file”, and then “save as”
- Designate A: drive in the “save in” area
- Name your file in the “file name” area.
- Click “save”.
Simply clicking the save icon on the main toolbar (it looks like a computer disk) can now perform future saves (which you should do frequently).
Adding the data analysis tool pack Under “tools” in the main tool bar at the top of the screen see if the option “data analysis” is available (it should be at the bottom of the list). If data analysis is not there do the following: a. Under “tools” select “add-ins” b. Click on the box for “analysis toolpak” c. Click “OK” d. Go back and check to see that “data analysis” is now an option under tools.
Other cool stuff: Customize your toolbar (best to do this on your own personal computer rather than a lab computer) by doing the following:
- Click “View” at the top of the main window.
- Click “Toolbars”
- Click “Customize” on the right.
- On the left under categories is a list of all types of toolbars. When you highlight a type, all the icons in that category appear in the right part of the window. Add any icon to your existing toolbar by simply dragging it up to the toolbar space – careful not to pack too may in, or they will disappear off the ends of the screen.
Most of the icons in “file” and “edit” should already be on your toolbar. There is some useful stuff in “formatting”. I like to separate my raw data from everything else (means, sums, whatever) with lines or shading to make it neat, so dragging some of these tools up to your toolbar is a good idea. When you save on disk, the toolbar should be saved with the file.
Using data from the ECN 422 web page Each Excel file on the web page contains two worksheets. The first is the raw data. There is also a “variable names” sheet that gives an interpretation of the abbreviated data names and tells you a little about where the data came from.
When you click on the link on the web page, a browser version of MS Excel will open up. You can work in that version, but it will not operate exactly the same as the actual version of Excel. What I suggest you do is immediately save the file to a disk or to the hard drive as an excel file, and then close the browser version and open the actual version of excel to access the file.
In any case, I would recommend making a copy of the entire data set, just in case yours gets damaged or lost. If you do not know how to do this, see me for help.
We’re going to use a data set called “House Prices and 6 Characteristics” for this lab. Go to the ECN 422 webpage ( http://www.csb.uncwil.edu/people/schuhmannp/ecn422.htm ) and scroll down to data sets (below the link for this handout).
- Click on the link for housedata for excel basics lab.
- you should see a pop-up window that gives you options to save or open. If you click open, Excel should launch. If you click save, you can save the file to a disk. If you open, follow step 3, if you save, just close the window and launch excel, then open the file you saved. Skip to step 5.
- Save the file by clicking on “file” then “save as”, then name the file and drive where you want to save it. Make sure you save the file as an excel workbook.
- Now close the browser version of excel and launch the real version of excel (start, programs, Microsoft excel). Open the file that you just saved.
- Click the copy icon (it looks like two pieces of paper and is below the main tool bar).
- Move into the graphing sheet and put your cursor in cell A1.
- Click the paste icon (it looks like a clipboard with a piece of paper in front of it).
- The data set should now appear in the graphing sheet.
We’re going to construct a frequency histogram for price – this will give us a good idea of what the distribution of price looks like. To do this, we have to create classes of price. For example, one class might be price of $49 or less. Another might be price between $50 and $99 etc. Let’s create classes with a range of $50 as above by doing the following:
- In the graphing sheet that you created, type the word “class” in cell I1 (or any cell to the right of the data). Underneath the word class, type the upper bounds on each class of price. For example, the first upper bound would be 49 (this should go in cell I2), the next would be 99 (this should go in cell H3), and so on till 249. Your table of class upper bounds will look like this: Class 49 99 149 199 249
- At the main tool bar, select “Tools” then “data analysis” (if its not there, go back and do the steps under part I letter G), then “Histogram”, then click “OK”.
- A dialogue box should appear. Put your cursor in the little window for “input range” then highlight the column of data for price (should be column A) including the label “Price”. Note: another option here is to type the letters of the cells corresponding to the range of data as: A1:A482 (read “A through A82).
- Now put your cursor in the little window for “Bin Range” and highlight the column with the classes we just typed (including the word “class”). Again, another option is to type the range, so here you’d type I1:I6.
- Click the check box for “Labels” (because we have the labels “price” and “class” in our columns).
- Put your cursor in the little window for output range and click (or type the name of) a cell somewhere to the right of your data (example K2).
- Click the check boxes for “Cumulative Percentage” and “Chart Output”.
- Click “OK”
You should get two things. First, a numerical chart of the frequency and cumulative percentage of each class (example, two houses have price in the 0-49 class, and this makes up 2.47% of the data), and second, a bar chart of the same thing.
To change the size of the chart (you’ll want to make it bigger), simply click on its outside lines, and drag to a new size.
You can also go back and re-format any component of the chart. To format the axes (scale, font or numbers), double click on the axis you want to format and a dialogue box will appear giving you options. Try changing the scale of one of your axes and see how the chart changes. To format the main plot area, double click in the main area. A different box with options will appear. To change the data points (marker pattern or size), double click on any data point and change the options. For example, the default
marker is a pink square. Try changing this to a circle by double clicking a data point, the selecting the pattern tab. You can change style and size, then click “OK”.
You can also change the labels for the classes (the number 49 isn’t very informative because it doesn’t explicitly say $0-$49). Do this by changing the numbers in the cells of the chart that excel created to whatever you want. Note that the corresponding labels in the histogram will change as well.
Now we’ll create a pie chart for one of our categorical variables “nbhood”. First, we need to make a list of the percentage and count of each type of nbhood we have in our data. Somewhere to the right of the data and histogram label three blank columns “Nbhood”, “percent”, and “count”. Under nbhood, list the three types of nbhoods, each in its own cell. These are A, B, and C. Now, to count the number of observations we have for each type of nbhood, we first want to sort the data by nbhood.
- After highlighting all the data, on the main toolbar, click “Data”, then “Sort”.
- At the bottom of the dialogue box, make sure the check dot for “header row” is selected.
- In the “sort by” window, choose “nbhood”.
- Click “OK”
Now that the data are sorted, you can simply count up the number of each type. As far as I can tell, Excel does not have a good way of doing this for categorical data (text). However, numeric, version of the nbhood column, and get excel to do the counting for you. To do this (handy idea if you have a really big data set and don’t want to count) do the following:
- Highlight column H by clicking on the gray box with the letter H on it.
- On the main toolbar, click “Insert”, then column. You should get a new, blank column to the right of the “nbhoods” column.
- Copy the entire nbhoods column into this column by highlighting it (click the gray G box), then click the copy icon.
- Paste this into the new column. Now columns G and H should be identical.
- Highlight the new one by clicking the gray H box.
- On the main toolbar click “Edit” then “Replace”.
- A dialogue box should appear with spaces for “Find what” and “Replace with”. Type “A” in the “find what” space, and the number 1 in the “Replace with space”.
- Click “Replace All”
- All entries for NBHOOD type “A” should have been replaced with the number 1. Do the same for B and C, replacing them with 2 and 3 respectively.
- Now, put your cursor in the cell where you want to place the count for NBHOOD type A. If you’ve already counted the number of type A homes, simply type the number in the cell (it should be 26). To make excel do the count for you, click on the “function wizard” icon on the main toolbar (it looks like: fx ), under “function category”, choose “statistical”. Under “function name” choose “COUNTIF”, then click “OK”.
- With your cursor in the “Range” window, highlight the column with the numeric version of “Nbhood”.
- In the “Criteria” window, type the number 1.
- Click OK. You should get the number 26 in the cell where you want the count to go.
- Now do the same for the other two categories (counts should be 26, 36 and 19 for A, B, and C).
- Now calculate the percentage for each one: a. With your cursor in the cell where you want the percent of type A homes, type the following: =26/81. This gives you 0.320988 or 32.1% of the homes are in NBHOOD type A. b. Do the same for the other categories.
Part III. Calculating Descriptive Statistics for Your Data Set Copy the entire data set (but not any of the charts or graphs) into a new sheet labeled “Dstats”.
A. Calculating the Mean of a Sample Using Auto sum (Ok to skip to part C) The auto sum button ∑ adds up all the numerical values in a given column. To use this button, simply highlight the set of cells you wish to sum (click the top cell in the group and then drag over the range of cells, releasing the mouse button when the set of cells is highlighted), and then click the auto sum button.
- Use the auto sum button to add the price column. Notice that the sum appears in the cell below the last value in the summation group. Note: if you don’t want it to be located here you can move it by cutting and pasting: highlight the cell and cut by clicking the scissors icon, then paste to the cell you want to move the value to by highlighting that cell and pressing the paste icon, which looks like a clipboard.
- Now, to find the mean, we need to divide this sum by the number of values in the range (81):
- Highlight a cell where you want the mean to appear (E.g.: the cell beneath the sum)
- Type the following without the quotes: “ =A83/81 ”
- Press enter.
- The mean price should appear in the cell where you wrote the formula.
Note: instead of typing the cell address A83, you can simply click on the cell A83 when you come to that part of your formula. Delete the mean you just calculated and try it this way.
B. Calculating Mean, Variance and Standard Deviation Using the Function Wizard (Ok to skip to part C) Using the auto sum button for summing values is handy, but a better way to calculate a mean (and other descriptive statistics for that matter), is to use the function wizard button fx .. This button does a lot of different things – most of which we won’t use.
Calculate the mean value for price by doing the following:
- Highlight the blank cell where you want the mean to appear (E.g.: the cell beneath the last value in the column or two cells below).
- Click the function wizard button.
- In “function category”, on the left, click on “statistical” – a list of statistical functions will appear on the right under “function name”.
- Click on “average”, which is the second one down (note that when you highlight “average”, a brief description appears in the gray portion of the bottom of the window – use this feature to check out what other functions do).
- At the bottom of the window, click “next”.
- The cursor should be blinking in a box labeled “number 1” – Excel is asking you for the cells you want to average. You can enter them individually by clicking each one in a new number box, or simply highlight the entire range in the “number 1” box.
- Highlight the cells which contain the values you want to average (note: if the function wizard window is in your way, you can move it around by clicking on the top blue stripe without releasing and dragging the window to a more convenient location).
- Click “finish” at the bottom of the window.
- The mean value should appear in the blank cell you chose to begin with. Now calculate the variance and standard deviation (^) for price by doing the following;
- Highlight the blank cell where you want the statistic to appear (e.g. the cell below the mean value).
- Click the function wizard button.
- In “function category”, on the left, click on “statistical”.
- Under “function name”, scroll down and click on “VAR” for variance or “STDEV” for standard deviation.
- At the bottom of the window, click “next”.
- Highlight the cells which contain the values you want to perform the calculation over.
- Click “finish” at the bottom of the window.
- The standard deviation should appear in the blank cell you chose to begin with.
C. Calculating Descriptive Statistics Using the Data Analysis Tool
- Under “tools” at the top of the screen select “data analysis”.
- Select “Descriptive Statistics” then click “OK”
- With the cursor blinking in “input range”, highlight all data cells (including the label) for the price column (click on the gray A above the column).
- Click on the box for “labels in first row”.
- Click on the dot for “output range” then put your cursor in the corresponding box and click a cell somewhere to the right of the data.
- check the box for “summary statistics”.
- Click “OK”
- A list of statistics should appear.
Part IV. Confidence Intervals and Hypothesis Testing for a Population Mean
We can test a hypothesis about a population mean using a confidence interval or a t-test (assuming the true population standard deviation is not known). Excel will not do either of these automatically (it will do a CI assuming the population standard deviation is known, but we’re not interested in that for today), so, we have to write equations in excel to perform the necessary calculations.
Open the data set called “vacancy data” from the ECN 422 webpage.
We’re going to test the hypotheses that the mean vacancy rate is equal to 10 and that mean rent per square foot is equal to 19, both at the 95% confidence level.
- Calculate the critical value of t (needed for both the CI and the t-test) as follows: with your cursor in a blank cell, click on the “functions wizard” icon on the main toolbar (it looks like fx ). Under function category, choose “statistical” and under function name, choose “TINV”. For probability, enter 0.05 and enter the degrees of freedom for this problem.
- Now you are ready to calculate the CI or the t-statistic. Do both by writing the formulas in excel. You may wish to first create labels in adjacent cells (eg: t-stat = , CI upper = , CI lower =). For example, to calculate the lower bound for the 95% CI for the mean vacancy rate, you would type the following (without quotes): “=11.333–(2.045231*1.087371)”
Here is what my new sheet looks like:
Vacancy Rent Rent Tcrit 2. Mean 11.33333 Mean 17. Standard Error 1.087371 Standard Error 0.482946 Vacancy LB 16. Median 12 Median 17.41 Vacancy UB 18. Mode 3 Mode #N/A Standard Deviation
5.955776 Standard Deviation
2.645206 we are 95% confident that the true rent is between 16.07 and 18.05 -> since the hypothesized value (19) is not within this CI, we can reject the null hypothesis
Sample Variance 35.47126 Sample Variance 6. Kurtosis -1.14126 Kurtosis -0. Skewness -0.21612 Skewness -0. Range 18 Range 10.72 Tstat -4. Minimum 2 Minimum 10. Maximum 20 Maximum 21.62 since abs value Tstat > abs value Tcrit, Sum 340 Sum 511.73 we accept the null hypothesis Count 30 Count 30
Vacancy Tcrit 2.
Vacancy LB 9. Vacancy UB 13.
we are 95% confident that the true mean vacancy rate is between 9.11 and 13.56 -> since the hypothesized value (10) is within this CI, we can accept the null hypothesis
Tstat 1.
since abs value of Tstat < abs value Tcrit, we accept the null hypothesis
Part V: Scatter Plots and Correlation
Set-up: Open the data for this exercise Open the data set called “baseball data” from the 422 web page.
- Insert 2 new worksheets and re-save the file as an Excel workbook with a new name on a different disk.
- Label the new sheets “scatter” and “correlation”.
A. Constructing a scatter plot for two variables:
- Copy all data into the scatter sheet, then delete columns F→ L. We will not be using these variables for this lab.
- Copy and paste the “ATTEND” column to the right of each of the other variables. You will have to insert blank columns first (highlight the column where you want to put a new blank column, and click “insert” then “column”). The new data set up should look like this:
POP ATTEND CAPACITY ATTEND PRIORWIN ATTEND CURNTWIN ATTEND TEAMS ATTEND 4353 2704.794 52.806 2704.794 92 2704.794 104 2704.794 3 2704. 2803 2110.009 43.737 2110.009 89 2110.009 89 2110.009 2 2110. 9120 1821.815 57.545 1821.815 91 1821.815 87 1821.815 7 1821. 2763 1661.618 33.583 1661.618 84 1661.618 86 1661.618 2 1661. 2174 2045.784 53.197 2045.784 98 2045.784 85 2045.784 1 2045. … … … … … ... … … … …
- On the main top toolbar, click the “chart wizard” icon (it looks like a small colorful bar graph).
- Choose XY (scatter), and select the option on the right with no connecting lines, then click “next”.
- With your cursor in the input range box, highlight the POP and ATTEND columns including labels, and select the option for “series in columns”. Click “next”.
- There are five tabs to choose from: “Title” allows you to write in a title for your graph and put labels on the axes (you should title it something like “POP vs. ATTEND”, and label the X axis POP and the Y axis ATTEND) “Axes” allows you to specify the ranges for the values on the axes, “Gridlines” allows you to have varying amounts of detail in gridlines, and “Legend” simply places the chart legend at various positions in the graph. → Play with each of these and familiarize yourself with the options available, then when you’re happy with what you have, click “next”.
- The next box simply asks you where you want to put the chart – either as a new sheet or embedded in an existing sheet. Select an option, and then click “finish”. The completed chart should appear.
- To change the size of the chart, simply click on its outside lines, and drag to a new size.
- You can still go back and re-format any component of the chart. To format the axes (scale, font or numbers), double click on the axis you want to format and a dialogue box will appear giving you options. Try changing the scale of one of your axes and see how the chart changes. To format the main plot area, double click in the main area. A different box with options will appear. To change the data points (marker pattern or size), double click on any data point and change the options. For example, the default marker is a triangle. Try changing this to a circle by double clicking a data point, the selecting the pattern tab. You can change style and size, then click “OK”.
Repeat the above procedure for ATTEND and each other variable, so that you have a total of 5 scatter plots. An important thing to note here is that we are assuming that attendance is dependent upon the other