Statistics Review: Sampling, Data Types, Graphs, and Descriptive Statistics, Study notes of Statistics

An overview of statistics, focusing on sampling methods, data types, describing data using graphs, and quantitative data descriptive statistics. It covers the concepts of populations and samples, parameters and statistics, different types of samples, and data types. The document also explains how to describe data using graphs for both categorical and quantitative data, and introduces measures of center and spread for quantitative data.

Typology: Study notes

2010/2011

Uploaded on 10/23/2011

meg6492
meg6492 🇺🇸

5

(1)

7 documents

1 / 10

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
STAT 2000
Franklin
STUDY GUIDE FOR TEST 1!
* Check eLC for Review Questions w/ answers! Really helpful practice!
* Review session Sept. 12th from 6-8 in Fine Arts Building 300. (Check eLC for more details on
review sessions.) There will be a handout posted on for this review session. Work the
problems!
* Read handout on eLC about what to do on Test Day.
What is statistics?
-The science of designing studies and analyzing the data that those studies produce. It is
the science of learning from data.
Section 1.2: We learn about populations using samples
-population : the total set of subjects in which we are interested. EX: the entire voting
public
-sample: a subset of the population for whom we have data. EX: 200 randomly selected
voters
-subject: entities that we measure in a study. EX: each voter in the sample
-parameter: a numerical value summarizing the population data. EX: percentage of
voters for candidate A in the entire population
-statistic: a numerical value summarizing the sample data. EX: percentage of voters
voting for candidate A in our sample (the 200 randomly selected voters)
* know the difference between a parameter and a statistic!
Notation: different symbols are used to differentiate between the mean of a sample and the
mean of a population
-population mean (parameter) = (mu)mu)μ (mu)
-sample mean (statistic) = (mu)x-bar)x
-population proportion (parameter) = p
-sample proportion (statistic) = (mu)p hat)p
* A statistic is descriptive if it summarizes the actual data in the sample.
* A statistic that makes a conclusion about the population is inferential.
Different Types of Samples: (be able to identify the type of sample used!)
Simple Random Sample (mu)SRS) : each possible sample of size n is equally likely of being
chosen.
oExample: Writing names of students of on pieces of paper and putting them into
a hat and drawing names
oAdvantage: Tends to be a good reflection of the population.
Systematic Sample: using a sampling frame or list, generate a starting point at
random for the list and select every kth (every other, every 3rd, etc.) subject of the list.
pf3
pf4
pf5
pf8
pf9
pfa

Partial preview of the text

Download Statistics Review: Sampling, Data Types, Graphs, and Descriptive Statistics and more Study notes Statistics in PDF only on Docsity!

STAT 2000

Franklin STUDY GUIDE FOR TEST 1!

  • Check eLC for Review Questions w/ answers! Really helpful practice!
  • Review session Sept. 12th^ from 6-8 in Fine Arts Building 300. (Check eLC for more details on review sessions.) There will be a handout posted on for this review session. Work the problems!
  • Read handout on eLC about what to do on Test Day. What is statistics?

- The science of designing studies and analyzing the data that those studies produce. It is the science of learning from data. Section 1.2: We learn about populations using samples - population : the total set of subjects in which we are interested. EX: the entire voting public - sample: a subset of the population for whom we have data. EX: 200 randomly selected voters - subject: entities that we measure in a study. EX: each voter in the sample - parameter: a numerical value summarizing the population data. EX: percentage of voters for candidate A in the entire population - statistic: a numerical value summarizing the sample data. EX: percentage of voters voting for candidate A in our sample (the 200 randomly selected voters) *** know the difference between a parameter and a statistic! Notation** : different symbols are used to differentiate between the mean of a sample and the mean of a population - population mean (parameter) = μ (mu)(mu)mu) - sample mean (statistic) = x̄ (x-bar)(mu)x-bar) - population proportion (parameter) = p - sample proportion (statistic) = p̂ (p hat)(mu)p hat)

  • A statistic is descriptive if it summarizes the actual data in the sample.
  • A statistic that makes a conclusion about the population is inferential. Different Types of Samples: (be able to identify the type of sample used!)  Simple Random Sample (mu)SRS): each possible sample of size n is equally likely of being chosen. o Example: Writing names of students of on pieces of paper and putting them into a hat and drawing names o Advantage: Tends to be a good reflection of the population.  Systematic Sample: using a sampling frame or list, generate a starting point at random for the list and select every kth^ (every other, every 3rd, etc.) subject of the list.

o Example: Selecting every other person on a list. o Advantage: Easy to conduct and a more even sample.  Stratified Random Sample: Divide population into groups (strata) and take a simple random sample from each group. o Example: Freshmen, Sophomores, Juniors, Seniors o Advantages: There will be enough subjects in each group that are being compared.  Cluster Random Sample: Identify clusters of subjects and take a simple random sample from each cluster. o Example: clusters of different majors o Advantage: There does not need to be a sampling frame of subjects  Convenience Sampling: subjects are selected at convenience of researcher with no pattern or attempt for accurate representation. o Example: Choosing the first 50 people in line to go into a store first. o Advantage: Convenient and simple!

  • Know how to use a random number table. First, pick a random starting point anywhere on the table. Go left to right assigning random digits to your sample (can be single digits, double, etc.) Skip repeating digits! Section 2.1: What are the types of data? 2 types of variables: (mu)know the difference!)

- If the variable of interest can be summarized as a word or category it is categorical. o EX: a person’s eye color - If the variable of interest can be summarized as a number it is quantitative. o EX: the oven temperature needed for a recipe - A quantitative variable can be discrete or continuous. o A quantitative variable is discrete if it can only take on a countable number of values, usually a whole number. There can be no numbers in between.  EX: the number of living grandparents a person has (you can’t have 2 and a half grandparents!) o A quantitative variable is continuous if it can have any number of decimals.  EX: a person’s height Proportions and Percentages - Frequency: Number of occurrences - Frequency table: lists the number of observations for each category of data - Relative Frequency: the proportion or percent of observations within a category o = frequency/ total # of frequencies (to make this a percentage, multiply by 100) - Proportion = .30 Percentage = 30% Section 2.2: How to describe data using graphs (know different types, what they look like, when to use them, etc.)

Dot Plot: Along the horizontal axis are the numbers used, and above each number is a dot to represent how many times that number appeared in the data set. Histogram: The quantitative equivalent of a bar graph. The “categories” on the horizontal axis are numerical values. Steam and Leaf Plot: The ones-place digits (farthest right digit) are placed on the right side of a “vertical bar chart”. These are the “leaves” along the stem. Numbers must go in ascending order. Shape of a Graph Symmetrical/Normal: the distribution on either side of the middle is equal. Skewed left: left “tail” is stretched out longer than the right tail. (I try to remember this by knowing if most of the data is on the right side of the graph, it’s skewed left, so it’s opposite of what you would expect it to be.) Skewed right: Right “tail” is stretched out longer than the left tail. (Again, opposite of what you would expect. If most of the data is on the left side of the graph, it’s So, the first number on the stem and leaf plot represents 199.

skewed right.) Section 2.3 : How do we describe the center of quantitative data? Mean: the average of the data set Median: the value of the data that occupies the middle position when the data are ranked in ascending order. Separates top and bottom 50% of the data. Mode: the value that occurs most often in the data set; the highest frequency (these can all be found using StatCrunch! After you enter your data…STAT > Summary Stats > Columns) cool stuff  Also really helpful to know… Outlier: a data point that is ridiculously far away from the other data points. Outliers can mess with data.

- The mean, range, and standard deviation are affected by outliers, so they are not resistant. - The mode and median are resistant to outliers, so they are resistant. Section 2.4: How can we describe the spread of quantitative data? Range: the difference between the largest and smallest observations. (max. – min.) Deviation from the mean : difference between the value of x and the mean. Sample Variance (mu)s^2 ): averaging all the squared deviations and dividing by n-1 (sum of all deviations squared/n-1) Standard Deviation: measures roughly the average distance of an observation in a distribution from the mean. (square root of sample variance) Z-score: measures the number of standard deviations that an observation falls from the mean. = (mu)observation – mean)/ standard deviation

Identifying Potential Outliers:

- 1.5 x interquartile range (IQR) o Find IQR = (Q3 – Q1) o Find 1.5 x IQR o Find lower boundary: Q1 – 1.5 x IQR o Find upper boundary: Q3 – 1.5 x IQR o If an observation is less than the lower boundary or greater than the upper boundary, the observation is classified as a potential outlier. Chapter 3 – Association: Contingency, Correlation, and Regression Response Variable: a variable that can be explained by, or is determined by, another variable. This is the y-variable, the variable that goes on the vertical axis of a graph. Explanatory Variable: explains, or affects, the response variable. This is the x-variable, the variable that goes on the horizontal axis of a graph. Association: an association exists between 2 variables if a particular value for one value is more likely to occur with certain values of the other variable. Lurking Variable: related to the response or explanatory variable, but is not the variable being studied. Section 3.2: How can we explore the association between two quantitative variables? Scatterplot: a graphical display for 2 quantitative variables. Explanatory variable is on the horizontal axis and the response variable is on the vertical axis. Data: Positive association: as X increases, Y increases Negative association: as X increase Y decreases No association: as X increases, there is no definite shift in the values of Y We can calculate the correlation to determine if there is a linear relationship between the variables.

Linear correlation: when the data tends to follow a straight line path, can be positive or negative/ strong or weak No correlation: as X increases there is no definite shift in the values of Y Correlation (mu)r): the numerical measure of the strength of the linear relation between X and Y. This can be done with StatCrunch too  After entering your data….STAT > Summary Stats > Correlation

Section 3.1: How can we explore the association between 2 categorical variables? Contingency (mu)2-way) Table: table that relates 2 categorical variables. Conditional Proportion: the proportion for a value of a variable, given a specific value of the other variable. Relative Risk: the comparative odds for each group; tells us how many times more likely the outcome is for one group than the other.