






Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
An overview of statistics, focusing on sampling methods, data types, describing data using graphs, and quantitative data descriptive statistics. It covers the concepts of populations and samples, parameters and statistics, different types of samples, and data types. The document also explains how to describe data using graphs for both categorical and quantitative data, and introduces measures of center and spread for quantitative data.
Typology: Study notes
1 / 10
This page cannot be seen from the preview
Don't miss anything!







Franklin STUDY GUIDE FOR TEST 1!
- The science of designing studies and analyzing the data that those studies produce. It is the science of learning from data. Section 1.2: We learn about populations using samples - population : the total set of subjects in which we are interested. EX: the entire voting public - sample: a subset of the population for whom we have data. EX: 200 randomly selected voters - subject: entities that we measure in a study. EX: each voter in the sample - parameter: a numerical value summarizing the population data. EX: percentage of voters for candidate A in the entire population - statistic: a numerical value summarizing the sample data. EX: percentage of voters voting for candidate A in our sample (the 200 randomly selected voters) *** know the difference between a parameter and a statistic! Notation** : different symbols are used to differentiate between the mean of a sample and the mean of a population - population mean (parameter) = μ (mu)(mu)mu) - sample mean (statistic) = x̄ (x-bar)(mu)x-bar) - population proportion (parameter) = p - sample proportion (statistic) = p̂ (p hat)(mu)p hat)
o Example: Selecting every other person on a list. o Advantage: Easy to conduct and a more even sample. Stratified Random Sample: Divide population into groups (strata) and take a simple random sample from each group. o Example: Freshmen, Sophomores, Juniors, Seniors o Advantages: There will be enough subjects in each group that are being compared. Cluster Random Sample: Identify clusters of subjects and take a simple random sample from each cluster. o Example: clusters of different majors o Advantage: There does not need to be a sampling frame of subjects Convenience Sampling: subjects are selected at convenience of researcher with no pattern or attempt for accurate representation. o Example: Choosing the first 50 people in line to go into a store first. o Advantage: Convenient and simple!
- If the variable of interest can be summarized as a word or category it is categorical. o EX: a person’s eye color - If the variable of interest can be summarized as a number it is quantitative. o EX: the oven temperature needed for a recipe - A quantitative variable can be discrete or continuous. o A quantitative variable is discrete if it can only take on a countable number of values, usually a whole number. There can be no numbers in between. EX: the number of living grandparents a person has (you can’t have 2 and a half grandparents!) o A quantitative variable is continuous if it can have any number of decimals. EX: a person’s height Proportions and Percentages - Frequency: Number of occurrences - Frequency table: lists the number of observations for each category of data - Relative Frequency: the proportion or percent of observations within a category o = frequency/ total # of frequencies (to make this a percentage, multiply by 100) - Proportion = .30 Percentage = 30% Section 2.2: How to describe data using graphs (know different types, what they look like, when to use them, etc.)
Dot Plot: Along the horizontal axis are the numbers used, and above each number is a dot to represent how many times that number appeared in the data set. Histogram: The quantitative equivalent of a bar graph. The “categories” on the horizontal axis are numerical values. Steam and Leaf Plot: The ones-place digits (farthest right digit) are placed on the right side of a “vertical bar chart”. These are the “leaves” along the stem. Numbers must go in ascending order. Shape of a Graph Symmetrical/Normal: the distribution on either side of the middle is equal. Skewed left: left “tail” is stretched out longer than the right tail. (I try to remember this by knowing if most of the data is on the right side of the graph, it’s skewed left, so it’s opposite of what you would expect it to be.) Skewed right: Right “tail” is stretched out longer than the left tail. (Again, opposite of what you would expect. If most of the data is on the left side of the graph, it’s So, the first number on the stem and leaf plot represents 199.
skewed right.) Section 2.3 : How do we describe the center of quantitative data? Mean: the average of the data set Median: the value of the data that occupies the middle position when the data are ranked in ascending order. Separates top and bottom 50% of the data. Mode: the value that occurs most often in the data set; the highest frequency (these can all be found using StatCrunch! After you enter your data…STAT > Summary Stats > Columns) cool stuff Also really helpful to know… Outlier: a data point that is ridiculously far away from the other data points. Outliers can mess with data.
- The mean, range, and standard deviation are affected by outliers, so they are not resistant. - The mode and median are resistant to outliers, so they are resistant. Section 2.4: How can we describe the spread of quantitative data? Range: the difference between the largest and smallest observations. (max. – min.) Deviation from the mean : difference between the value of x and the mean. Sample Variance (mu)s^2 ): averaging all the squared deviations and dividing by n-1 (sum of all deviations squared/n-1) Standard Deviation: measures roughly the average distance of an observation in a distribution from the mean. (square root of sample variance) Z-score: measures the number of standard deviations that an observation falls from the mean. = (mu)observation – mean)/ standard deviation
Identifying Potential Outliers:
- 1.5 x interquartile range (IQR) o Find IQR = (Q3 – Q1) o Find 1.5 x IQR o Find lower boundary: Q1 – 1.5 x IQR o Find upper boundary: Q3 – 1.5 x IQR o If an observation is less than the lower boundary or greater than the upper boundary, the observation is classified as a potential outlier. Chapter 3 – Association: Contingency, Correlation, and Regression Response Variable: a variable that can be explained by, or is determined by, another variable. This is the y-variable, the variable that goes on the vertical axis of a graph. Explanatory Variable: explains, or affects, the response variable. This is the x-variable, the variable that goes on the horizontal axis of a graph. Association: an association exists between 2 variables if a particular value for one value is more likely to occur with certain values of the other variable. Lurking Variable: related to the response or explanatory variable, but is not the variable being studied. Section 3.2: How can we explore the association between two quantitative variables? Scatterplot: a graphical display for 2 quantitative variables. Explanatory variable is on the horizontal axis and the response variable is on the vertical axis. Data: Positive association: as X increases, Y increases Negative association: as X increase Y decreases No association: as X increases, there is no definite shift in the values of Y We can calculate the correlation to determine if there is a linear relationship between the variables.
Linear correlation: when the data tends to follow a straight line path, can be positive or negative/ strong or weak No correlation: as X increases there is no definite shift in the values of Y Correlation (mu)r): the numerical measure of the strength of the linear relation between X and Y. This can be done with StatCrunch too After entering your data….STAT > Summary Stats > Correlation
Section 3.1: How can we explore the association between 2 categorical variables? Contingency (mu)2-way) Table: table that relates 2 categorical variables. Conditional Proportion: the proportion for a value of a variable, given a specific value of the other variable. Relative Risk: the comparative odds for each group; tells us how many times more likely the outcome is for one group than the other.