Customer Demographics & Credit Balances: Exploratory Data Analysis for AJ Davis, Exams of Mathematics

A detailed guide for a statistics project, focusing on exploratory data analysis for a department store chain named aj davis. The project involves analyzing five variables related to 50 credit customers: location, income, size, years lived in current location, and credit balance. The goal is to process, organize, present, and summarize the data using minitab, and to analyze connections or relationships between the variables. The project is worth 100 points and is due by the end of week 2 for statistics math 533 final exam 3 2024.

Typology: Exams

2023/2024

Available from 04/26/2024

willis-william-1
willis-william-1 🇬🇧

4.6

(5)

2.2K documents

1 / 8

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Introduction
PROJECT PART A: Exploratory Data Analysis
Statistics Math 533 Final Exam 3 2024 final exam revision best graded A+
Course Project: AJ DAVIS DEPARTMENT STORES
AJ DAVIS is a department store chain, which has many credit customers and wants
to find out more information about these customers. A sample of 50 credit
customers is selected with data collected on the following five variables.
1. Location (rural, urban, suburban)
2. Income (in $1,000's—be careful with this)
3. Size (household size, meaning number of people living in the household)
4. Years (the number of years that the customer has lived in the current location)
5. Credit balance (the customers current credit card balance on the store's credit
card, in $).
The data is available in Doc Sharing Course Project Data Set as an Excel file. You
are to copy and paste the data set into a MINITAB worksheet.
Open the file MATH533 Project Consumer.xls from the Course Project Data
Set folder in Doc Sharing.
For each of the five variables, process, organize, present, and summarize the
data. Analyze each variable by itself using graphical and numerical
techniques of summarization. Use MINITAB as much as possible, explaining
what the printout tells you. You may wish to use some of the following
graphs: stem-leaf diagram, frequency or relative frequency table, histogram,
boxplot, dotplot, pie chart, bar graph. Caution: Not all of these are
appropriate for each of these variables, nor are they all necessary. More is
not necessarily better. In addition, be sure to find the appropriate measures
of central tendency and measures of dispersion for the above data. Where
appropriate use the five number summary (the Min, Q1, Median, Q3, Max).
Once again, use MINITAB as appropriate, and explain what the results mean.
Analyze the connections or relationships between the variables. There are
10 pairings here (location and income, location and size, location and years,
location and credit balance, income and size, income and years, income and
balance, size and years, size and credit balance, years and Credit Balance).
Use graphical as well as numerical summary measures. Explain what you
see. Be sure to consider all 10 pairings. Some variables show clear
relationships, while others do not.
Prepare your report in Microsoft Word (or some other word processing
package), integrating your graphs and tables with text explanations and interpretations.
Be sure that you have graphical and numerical back up for your explanations
and interpretations. Be selective in what you include in the report. I'm not
looking for a 20-page report on every variable and every possible relationship
(that's 15 things to do). Rather, what I want you do is to highlight what you
pf3
pf4
pf5
pf8

Partial preview of the text

Download Customer Demographics & Credit Balances: Exploratory Data Analysis for AJ Davis and more Exams Mathematics in PDF only on Docsity!

Introduction PROJECT PART A: Exploratory Data Analysis Course Project: AJ DAVIS DEPARTMENT STORES AJ DAVIS is a department store chain, which has many credit customers and wants to find out more information about these customers. A sample of 50 credit customers is selected with data collected on the following five variables.

  1. Location (rural, urban, suburban)
  2. Income (in $1,000's—be careful with this)
  3. Size (household size, meaning number of people living in the household)
  4. Years (the number of years that the customer has lived in the current location)
  5. Credit balance (the customers current credit card balance on the store's credit card, in $). The data is available in Doc Sharing Course Project Data Set as an Excel file. You are to copy and paste the data set into a MINITAB worksheet.  Open the file MATH533 Project Consumer.xls from the Course Project Data Set folder in Doc Sharing.  For each of the five variables, process, organize, present, and summarize the data. Analyze each variable by itself using graphical and numerical techniques of summarization. Use MINITAB as much as possible, explaining what the printout tells you. You may wish to use some of the following graphs: stem-leaf diagram, frequency or relative frequency table, histogram, boxplot, dotplot, pie chart, bar graph. Caution: Not all of these are appropriate for each of these variables, nor are they all necessary. More is not necessarily better. In addition, be sure to find the appropriate measures of central tendency and measures of dispersion for the above data. Where appropriate use the five number summary (the Min, Q1, Median, Q3, Max). Once again, use MINITAB as appropriate, and explain what the results mean.  Analyze the connections or relationships between the variables. There are 10 pairings here (location and income, location and size, location and years, location and credit balance, income and size, income and years, income and balance, size and years, size and credit balance, years and Credit Balance). Use graphical as well as numerical summary measures. Explain what you see. Be sure to consider all 10 pairings. Some variables show clear relationships, while others do not.  Prepare your report in Microsoft Word (or some other word processing package), integrating your graphs and tables with text explanations and interpretations. Be sure that you have graphical and numerical back up for your explanations and interpretations. Be selective in what you include in the report. I'm not looking for a 20-page report on every variable and every possible relationship (that's 15 things to do). Rather, what I want you do is to highlight what you

see for three individual variables (no more than one graph for each, one or two measures of central tendency and variability (as appropriate), and two or three sentences of interpretation). For the 10 pairings, identify and report only on three of the pairings , again using graphical and numerical summary (as appropriate), with

Project Part B: Hypothesis Testing and Confidence Intervals Total (^) 100 10 0 A quality paper will meet or exceed all of the above requirements. Your manager has speculated the following. a. The average (mean) annual income was greater than $45,000. b. The true population proportion of customers who live in a suburban area is less than 45%. c. The average (mean) number of years lived in the current home is greater than 8 years.

Project Part B: Grading Rubric Project Part C: Regression and Correlation Analysis d. The average (mean) credit balance for rural customers is less than $3,200.

  1. Using the sample data, perform the hypothesis test for each of the above situations in order to see if there is evidence to support your manager’s belief in each case A–D. In each case, use the Seven Elements of a Test of Hypothesis in Section 6.2 of your text book with α = .05, and explain your conclusion in simple terms. Also, be sure to compute the p-value and interpret.
  2. Follow this up with computing 95% confidence intervals for each of the variables described in A–D, and again interpreting these intervals.
  3. Write a report to your manager about the results, distilling down the results in a way that would be understandable to someone who does not know statistics. Clear explanations and interpretations are critical.
  4. All DeVry University policies are in effect, including the plagiarism policy.
  5. Project Part B report is due by the end of Week 6.
  6. Project Part B is worth 100 total points. See the grading rubric below. Submission: The report from Part 3 and all of the relevant work done in the hypothesis testing (including MINITAB) in 1 and the confidence intervals (MINITAB) in Part 2 as an appendix Format for report: A. Summary report (about one paragraph on each of the speculations, A–D) B. Appendix with all of the steps in hypothesis testing (the format of the Seven Elements of a Test of Hypothesis, in Section 6.2 of your text book) for each speculation A–D, as well as the confidence intervals, including all MINITAB output Category Points % Description Addressing each speculation—20 points each

hypothesis test, interpretation, confidence interval, and interpretation Summary report 20 20 one paragraph on each of the speculations Total 100 100 A quality paper will meet or exceed all of the above requirements. Using MINITAB, perform the regression and correlation analysis for the data on Income (Y), the dependent variable, and Credit Balance (X), the independent variable, by answering the following.

  1. Generate a scatterplot for income ($1,000) versus credit balance ($), including the graph of the best fit line. Interpret.
  2. Determine the equation of the best fit line, which describes the
  1. Determine the coefficient of determination. Interpret.
  2. Test the utility of this regression model (use a two tail test with α =.05). Interpret your results, including the p-value.
  3. Based on your findings in 1–5, what is your opinion about using credit balance to predict income? Explain.
  4. Compute the 95% confidence interval for beta-1 (the population slope). Interpret this interval.
  5. Using an interval, estimate the average income for customers that have credit balance of $4,000. Interpret this interval.
  6. Using an interval, predict the income for a customer that has a credit balance of $4,000. Interpret this interval.
  7. What can we say about the income for a customer that has a credit balance of $10,000? Explain your answer. In an attempt to improve the model, we attempt to do a multiple regression model predicting income based on credit balance, years, and size.
  8. Using MINITAB, run the multiple regression analysis using the variables Credit Balance, Years, and Size to predict Income. State the equation for this multiple regression model.
  9. Perform the Global Test for Utility (F-Test). Explain your conclusion.
  10. Perform the t-test on each independent variable. Explain your conclusions and clearly state how you should proceed. In particular, state which independent variables should we keep and which should be discarded.
  11. Is this multiple regression model better than the linear model that we generated in parts 1–10? Explain. All DeVry University policies are in effect, including the plagiarism policy.
  12. Project Part C report is due by the end of Week 7.
  13. Project Part C is worth 100 total points. See the grading rubric below. Summarize your results from 1–14 in a report that is 3 pages or less in length and explains and interprets the results in ways that are understandable to someone who does not know statistics. Submission: The summary report + all of the work done in 1–14 (MINITAB Output + interpretations) as an appendix Format: A. Summary Report

Project Part C: Grading Rubric B. Points 1–14 addressed with appropriate output, graphs, and interpretations. Be sure to number each point 1–14. Category Points % Description Questions 1– and 14 5 points each

addressed with appropriate output, graphs, and interpretations Question 13 15 15 addressed with appropriate output, graphs, and interpretations Summary 20 20 writing, grammar, clarity, logic, and cohesiveness Total 100 10 0 A quality paper will meet or exceed all of the above requirements.