AP Statistics unit 1, Study notes of Mathematics

AP Statistics unit 1 notes on variables

Typology: Study notes

2025/2026

Uploaded on 02/28/2026

shreya-chatterjee-1
shreya-chatterjee-1 🇺🇸

5 documents

1 / 5

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Unit 1- Exploring Data
1.01 - Classifying Variables
Data set includes info about individuals (doesn’t always have to be a person)
Variables - characteristics evaluated or collected
E.g. data set about pounds of meat consumed by each tiger in a zoo on a daily basis - individual
is tiger and variable is meat consumption in pounds
Categorical vs. quantitative variables - quality vs quantity
Numbers don’t always mean quantities (e.g. area codes aren’t data)
Categorical variable
Gender
Zip code
Free-time activities
Quantitative variable
Age
Household income
Weight of a box of food
Displaying data:
Categorical: usually pie charts and bar graphs
Too many categories make these displays difficult to read and therefore ineffective
two-way/contingency tables - display categorical data when more than 1 categorical
variable is collected
Marginal distributions - taking a specified variable and putting it over a total
The totals on the bottom row and rightmost column are the totals
Conditional distributions - members of a specified variable within a table - calculated by
putting the identifies variable over a total that relates (e.g. not concerned at all females = 11/255
= 4.3%)
Percentage of a column/row
By using conditional distributions and making percents, you can draw conclusions across data
(e.g. number of female isn’t equal to number of male participants, most of population is
somewhat concerned for male and female)
Association - statistical relationship between values of two variables
Association isn’t the same as correlation
When the distribution of one variable is the same for the categories of another variable, the
variables are independent and there is no association.
Graphs:
pf3
pf4
pf5

Partial preview of the text

Download AP Statistics unit 1 and more Study notes Mathematics in PDF only on Docsity!

Unit 1- Exploring Data

1.01 - Classifying Variables

Data set includes info about individuals (doesn’t always have to be a person) Variables - characteristics evaluated or collected E.g. data set about pounds of meat consumed by each tiger in a zoo on a daily basis - individual is tiger and variable is meat consumption in pounds

Categorical vs. quantitative variables - quality vs quantity Numbers don’t always mean quantities (e.g. area codes aren’t data) Categorical variable Gender Zip code Free-time activities

Quantitative variable Age Household income Weight of a box of food

Displaying data: ● Categorical: usually pie charts and bar graphs Too many categories make these displays difficult to read and therefore ineffective ● two-way/contingency tables - display categorical data when more than 1 categorical variable is collected

Marginal distributions - taking a specified variable and putting it over a total

The totals on the bottom row and rightmost column are the totals Conditional distributions - members of a specified variable within a table - calculated by putting the identifies variable over a total that relates (e.g. not concerned at all females = 11/ = 4.3%) ● Percentage of a column/row By using conditional distributions and making percents, you can draw conclusions across data (e.g. number of female isn’t equal to number of male participants, most of population is somewhat concerned for male and female)

Association - statistical relationship between values of two variables Association isn’t the same as correlation When the distribution of one variable is the same for the categories of another variable, the variables are independent and there is no association.

Graphs:

  • Bar graph (one categorical var and one group)
  • Segmented bar graph (one categorical var and two or more groups) - Mosaic plot - to reflect group size

1.02 - Describing Data

SOCS

When working with quantitative data, you will be asked to make observations and analyze information from a data set. A simple mnemonic that you can use to look strategically at data is SOCS. Shape, Outliers, Center, and Spread are the key features you must be able to discuss on both your AP Exam and in the assessments for this course. If a set of data has a symmetric distribution, then use the mean for the measure of center and the standard deviation for spread. For a skewed distribution, use the median for center and the intequartile range, or IQR, for spread.

Data can be described by the number of peaks in a display. These peaks are possible modes. Data organized into a dotplot, with one highest point is called unimodal. Data that have exactly two clear modes, shown by two peaks of similar size on the graph, are called bimodal. Multimodal data have multiple modes, shown by more than two peaks of similar size on a graph. Finally, uniform data don't appear to have any distinct modes; there are no clear peaks on the graph.

Measures of center and spread are ways that data can be analyzed. Note which of these are resistant and which are impacted by outliers. Practice using technology to calculate these key characteristics of data to avoid errors and save time; but, make sure you can also locate and utilize the appropriate formula for your calculations.

1.03 - Displaying Data

SOCS

S - shape

  • symmetric/roughly symmetric
    • Unimodal, bimodal, multimodal
  • Skewed left
  • Skewed right O - Outliers
  • Outlier < Q1 - 1.5(IQR)
  • Any value lower than the calculated value is an outlier
  • Outlier > Q3 + 1.5(IQR)
  • Any value bigger than the calculated value is an outlier C - Center
  • Mean - nonresistant
  • Median - resistant S - Spread

time is about five days for the treatment group versus about nine days for the control group. It appears the drug had a positive effect on patient recovery.

One key feature of data displays is skew. Data that are skewed to the right have the "tail" of the data on the right side. Data that are skewed to the left have the "tail" of the data on the left side. Data that are not skewed right or left and are relatively evenly spaced on either side of the middle are referred to as being symmetric.

Dotplots, stemplots, histograms, and boxplots are all tools that can be used to display data in a way that allows statisticians to observe key characteristics quickly. When organizing a data set for display, keep in mind that you'll need to create not only a graph, you'll also need to discuss the important features of the graph and how they may relate to other data sets for comparison purposes. Use the SOCS strategy in your work to stay organized and avoid missing key components.

Percentiles and z-scores are methods of standardizing raw data. Normally distributed information can be converted to a standard scale that can be compared against other standardized data, even if the original information did not use the same units of measure. Both percentiles and z-scores are considered a type of transformation.

When performing transformations with addition and subtraction, measures of center (mean, median, quartiles, and percentiles) are also increased or decreased by the same value whereas measures of spread (standard deviation, range, and IQR) are not affected. When performing transformations with multiplication and division, measures of center (mean, median, quartiles, and percentiles) and measures of spread are multiplied or divided by the same value. With multiplication and division, the distribution is not changed.

Z-score formula

Z = z-score O = standard deviation u = mean X = var

68-95-99.7 empirical rule

A Normal distribution is a type of density curve in which the observations are approximately symmetric around the center. Normal distributions are not a typical occurrence in data collection, but many observations can be made about a Normal distribution. Mean, median, and mode are all approximately the same.

The z-score can be used for the purpose of standardizing distributions. The z-score is calculated to create a mean of zero and standard deviations of one. These are used to compare different data sets.

Using the the Standard Normal Probabilities table or a calculator, the area under the Normal curve can be found as a decimal proportion. This proportion can be converted to a percentage to show how much of a distribution falls within a specified range.