Using Pivot Tables for Data Organization-Data Warehouse-Lab Mannual, Lecture notes of Data Warehousing

Topics include in this course are Data Warehousing Concepts, Design and Development, Extraction, Transformation and Loading, OLAP Technology, Data Mining Techniques: Classification, Clustering and Decision Tree, Advanced Topics. This lecture handout includes: Using, Pivot, Tables, Data, Organization, Analysis, Excel, Summary, Aggregates, Spreadsheet, Status

Typology: Lecture notes

2011/2012

Uploaded on 08/08/2012

sharib_sweet
sharib_sweet 🇮🇳

4.2

(50)

102 documents

1 / 11

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
1-1
DB 430- Data Warehousing and Data Mining
DB 430 Data Warehousing and Data Mining
Lab # 1
Using Pivot Tables for Data Organization
Objectives
Introduction to Pivot Tables
Data Organization and Analysis Using Pivot Tables
docsity.com
pf3
pf4
pf5
pf8
pf9
pfa

Partial preview of the text

Download Using Pivot Tables for Data Organization-Data Warehouse-Lab Mannual and more Lecture notes Data Warehousing in PDF only on Docsity!

DB 430- Data Warehousing and Data Mining

DB 430 – Data Warehousing and Data Mining

Lab # 1

Using Pivot Tables for Data Organization

Objectives

 Introduction to Pivot Tables

 Data Organization and Analysis Using Pivot Tables

1-2 Pivot Tables in MS Excel

DB 430- Data Warehousing and Data Mining

Use a Pivot Table in Excel 2007 for Data Organization

Pivot tables are an Excel feature that you should learn how to use. Instead of analyzing rows upon rows of records, a pivot table can aggregate your data and show a new perspective and few clicks. You can also move columns to rows or vice versa.

What is a Pivot Table?

You might think of a pivot table as a user created summary table of your original spreadsheet. You create the table by defining which fields to view and how the data should be displayed. Based on your field selections, Excel aggregates and organizes the data so you see a different view of your data.

As example, a sample spreadsheet of 4000 fictitious voters is provided to you in this lab, which includes the following data fields:

 Voter ID  Party Affiliation  Their precinct  Age group  When they last voted  Years they’ve been registered  Ballot status

1-4 Pivot Tables in MS Excel

DB 430- Data Warehousing and Data Mining

A pivot tables allow you to group the spreadsheet or external data source by any of your data fields. The thumbnail below shows a count of voters by party by precinct.

Using a pivot table, you can continue to slice the data by selecting additional fields from the PivotTable Field List. For example, we can take the same data and segment by voter age group.

DB 430- Data Warehousing and Data Mining

Understanding the Pivot Table Structure

In the thumbnail above, the main areas of the pivot table are labeled.

  1. PivotTable Field List – this section in the top right displays the fields in your spreadsheet. You may check a field or drag it to a quadrant in the lower portion.
  2. The lower right quadrants - this area defines where and how the data shows on your pivot table. You can have a field show in either a column or row. You may also indicate if the data should be counted, summed, averaged, filtered and so on.
  3. The red outlined area to the left is the result of your selections from (1) and (2). You’ll see that the only difference we made in the last pivot table was to drag the AGE GROUP field underneath the PRECINCT field in the Row Labels quadrant.

Example -1: How to Create a Pivot Table

Note: Excel sheet for this example will be provided to you by instructor.

There are several ways to build a pivot table. Excel has logic that knows the field type and will try to place it in the correct row or column if you check the box. For example, data that is numeric such as Precinct counts tends to appear to the right in columns. Data, which is textual, such as Party would appear in rows.

While you can simply check fields to display and let Excel build your pivot table, one way is to use the “drag and drop” method. This is partly because we want to visualize our data in columns and rows. It may also be easier if you have fields, which can appear to be numbers like a precinct value.

  1. Open your original spreadsheet and remove any blank rows or columns.
  2. Make sure each column has a heading, as it will be carried over to the Field List.
  3. Make sure your cells are properly formatted for their data type.
  4. Highlight your data range
  5. Click the Insert tab.
  6. Select the PivotTable button from the Tables group.
  7. Select PivotTable from the list.

DB 430- Data Warehousing and Data Mining

  1. Double-check your Table/Range: value.
  2. Select the radio button for New Worksheet.
  3. Click OK.

A new worksheet opens with a blank pivot table. You’ll see that the fields from our source spreadsheet were carried over to the PivotTable Field List.

  1. Drag an item such as PRECINCT from the PivotTable Field List down to the Row Labels quadrant. The left side of your Excel spreadsheet should show a row for each precinct value. You should also see a checkmark appear next to PRECINCT.

1-8 Pivot Tables in MS Excel

DB 430- Data Warehousing and Data Mining

  1. The next step is to ask what you would like to know about each precinct. Drag the PARTY field from the PivotTable Field List to the Column Labels quadrant. This will provide an additional column for each party. Note that you won’t see any numerical data.
  2. To see the count for each party, we need to drag the same field to the Values quadrant. In this case, Excel determines we want a Count of PARTY. We could double-click the entry and choose another Field Setting. Excel has also added Grand Totals.

1-10 Pivot Tables in MS Excel

DB 430- Data Warehousing and Data Mining

sort by the month first. You might prefer to move the data so the election dates are in a chronological order.

Prefer not adding fields to a pivot table. It’s easier to add the fields first to your source spreadsheet. The reason is you might get items out of sync if you move data unless you make them a calculated field.

Pivot tables may not make the election data exciting, but it can make the analysis process easier. Without these tables, you’d probably spend more time filtering, sorting and subtotaling. The other benefit is that it’s easy to start over by deselecting fields or moving them to another location. Feel free to download the tutorial spreadsheet below and play with the data. This will probably be the only time you’re allowed to manipulate election data.

DB 430- Data Warehousing and Data Mining

Exercise Problems

Problem 1: Construct a pivot table with CardiologyCategorical.xls database file. Make angina and thal row attributes and class a column attribute. Place class, angina , and thal in the data area. Specify slope , sex , and #colored vessels as page variables. Use the pivot table to answer the following questions:

  1. How many healthy males are in the database?
  2. How many healthy females have three or more colored vessels?
  3. Determine values for #colored vessels and angina that are sufficient for defining sick individual.
  4. Verify or refute the hypothesis: The majority of individuals with #colored vessels = 0 are healthy.
  5. Verify or refute the hypothesis: A typical healthy individual will show no symptoms of angina and will have a value of normal for attribute thal.

Problem 2: Construct a pivot table with CreditCardPromotion.xls database file to answer the following questions:

  1. How many cardholders did not purchase a single promotional offering?
  2. How many cardholders took advantage of the magazine and watch promotions but did not purchase the life insurance promotion?
  3. How many male cardholders make between $50000 and $60000?
  4. Verify or refute the hypothesis: Individuals who purchased all three promotional offerings also purchased credit card insurance.