Statistics summary 2025, Schemes and Mind Maps of Mathematical Methods

Statistics note. Summary of full course.

Typology: Schemes and Mind Maps

2025/2026

Uploaded on 04/06/2026

saumya-23
saumya-23 🇨🇦

4 documents

1 / 16

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
HSCI 190 rewritten notes
The whole course in one idea
Statistics is a way of making sense of variation. People, measurements, outcomes, and samples
differ, and statistics gives you a method for collecting data, summarizing it, comparing it, and
deciding what conclusions are reasonable. The course also emphasizes that statistics is not just
calculation. It is about choosing the right method, interpreting results properly, and
communicating them honestly.
A major framework in the course is PPDAC:
Problem: What question are you trying to answer?
Plan: What information do you need and how will you get it?
Data: Collect good-quality information.
Analysis: Organize, graph, and test the data.
Conclusion: Interpret the findings and decide what they mean.
That cycle matters because the course keeps coming back to the idea that “doing stats” is not
just plugging numbers into a formula. Bad planning or bad sampling can ruin a study before the
math even starts.
Module 1: describing data
1) Descriptive vs inferential statistics
Descriptive statistics tell you what your data looks like.
They summarize the data you actually collected.
Inferential statistics use a sample to say something about a larger population.
That is the branch used later in the course for probability, hypothesis testing, t tests, ANOVA,
correlation, and regression.
2) Variables and levels of measurement
A variable is something you measure. The way a variable is measured determines what kind of
statistics and graphs make sense. Module 01 uses four levels of measurement:
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff

Partial preview of the text

Download Statistics summary 2025 and more Schemes and Mind Maps Mathematical Methods in PDF only on Docsity!

HSCI 190 rewritten notes

The whole course in one idea

Statistics is a way of making sense of variation. People, measurements, outcomes, and samples differ, and statistics gives you a method for collecting data, summarizing it, comparing it, and deciding what conclusions are reasonable. The course also emphasizes that statistics is not just calculation. It is about choosing the right method, interpreting results properly, and communicating them honestly.

A major framework in the course is PPDAC :

Problem : What question are you trying to answer? Plan : What information do you need and how will you get it? Data : Collect good-quality information. Analysis : Organize, graph, and test the data. Conclusion : Interpret the findings and decide what they mean.

That cycle matters because the course keeps coming back to the idea that “doing stats” is not just plugging numbers into a formula. Bad planning or bad sampling can ruin a study before the math even starts.

Module 1: describing data

1) Descriptive vs inferential statistics

Descriptive statistics tell you what your data looks like. They summarize the data you actually collected.

Inferential statistics use a sample to say something about a larger population. That is the branch used later in the course for probability, hypothesis testing, t tests, ANOVA, correlation, and regression.

2) Variables and levels of measurement

A variable is something you measure. The way a variable is measured determines what kind of statistics and graphs make sense. Module 01 uses four levels of measurement:

Nominal : categories with no order Example: gender

Ordinal : categories with order Example: BMI classification

Interval : numeric scale where differences are meaningful, but zero does not mean “none” Example: time on a 24-hour clock

Ratio : numeric scale with a real zero Example: newborn weight

A simpler way to think about it:

Categorical data : labels or groups ● Scale data : numbers measured on a scale

This matters because not every graph or statistic works for every data type.

3) Frequency and tables

When you summarize data in a table, you may see:

Absolute frequency : the count Relative frequency : the proportion or percentage

So if 8 out of 20 people have a symptom, absolute frequency is 8 and relative frequency is 40%. Module 01 also emphasizes that how numbers are presented can change how people interpret them.

4) Measures of central tendency

These describe the “middle” of the data.

Mean : the arithmetic average Add all values and divide by how many there are.

Median : the middle value when data are ordered

Mode : the most common value

The important idea is not just what they are, but when they work best.

● The mean is common and useful, but it is sensitive to outliers. ● The median is more resistant to extreme values. ● The mode is the most frequent value, but is often less useful for further math.

Module 02 covers a long list of graphs and expects you to match them to the data type. These include bar graphs, histograms, frequency polygons, cumulative frequency polygons, one-way scatter plots, boxplots, two-way scatter plots, and line graphs.

A plain-language way to think about them:

Bar graph : best for categories Bars are separated because categories are distinct.

Histogram : best for scale data grouped into intervals Bars touch because the scale is continuous.

Frequency polygon : like a histogram drawn as connected points Useful when comparing groups.

Cumulative frequency polygon : shows how totals build up across the scale

Boxplot : good for showing median, spread, and possible outliers

Scatter plot : used for pairs of numeric variables

Line graph : often used when something changes across an ordered sequence, especially over time

3) Shape of a distribution and skew

A distribution is the overall shape of the data.

If a histogram has a long tail on one side, it is skewed.

Negative skew : tail goes left ● Positive skew : tail goes right

The module explicitly notes that for skewed data, the median is often the best measure of central tendency because it is less affected by extreme values.

4) Outliers

An outlier is a value that sits far away from the rest of the data. Module 02 covers what outliers are, how they affect conclusions, how to identify them visually and statistically, and when to keep or remove them.

Why they matter:

● they can pull the mean ● they can inflate SD ● they can make a data set look more or less variable

● they may be a true extreme value or an error

The right move is not “always delete them.” Sometimes they are meaningful and should stay.

5) Population, sample, and sampling

A population is the full group you care about. A sample is the smaller set you actually study.

Sampling methods matter because your sample needs to represent the population as well as possible. Module 02 distinguishes:

Random sampling : selection is based on chance Non-random sampling : not everyone has the same chance of being selected

6) Bias

Module 02 emphasizes that the most dangerous mistakes often come from bad samples, not bad calculations. It specifically discusses representative samples, random sampling, transparency, and forms of bias such as sampling bias, response bias, survivorship bias, and recall bias.

The Wakefield MMR example is used to show how a biased sample and recall bias can lead to dangerous conclusions in health care.

Module 3: sampling distributions, probability, z scores,

and hypothesis testing

This is where the course shifts fully into inferential statistics. Module 03 covers sample statistics versus population parameters, sampling distributions, the standard error of the mean, probability rules, probability distributions, the normal distribution, z scores, the central limit theorem, confidence intervals, and hypothesis testing.

1) Parameter vs statistic

A parameter describes a population. A statistic describes a sample.

Examples:

● population mean = μ ● sample mean =

● about 68% within 1 SD of the mean ● about 95% within 2 SD ● about 99.7% within 3 SD

7) Z scores

A z score tells you how far a value is from the mean in units of standard deviation. Module 03 explicitly teaches using z scores, z charts, and area under the curve to find probability.

So when you asked earlier “what value?”, the answer is: one observed score, one sample mean, or one measurement you want to compare to a distribution. A z score standardizes that value so you can ask how unusual it is.

Plain version:

● z = 0 means right at the mean ● positive z means above the mean ● negative z means below the mean ● larger absolute z means more unusual

8) Central limit theorem

The central limit theorem (CLT) says that if sample size is large enough, the sampling distribution of the mean becomes approximately normal, even if the original population is not normal. The module notes that when n is large, sample means approximate the population mean, the spread of the sampling distribution reflects the standard error, and the shape becomes normal.

That theorem is why so much later inference works.

9) Estimation and confidence intervals

Module 03 divides inference into:

Estimation : estimate an unknown population parameter Hypothesis testing : test a claim about a population parameter using sample data and probability

Within estimation, it distinguishes:

Point estimate : one best guess Interval estimate : a range likely to contain the parameter

That interval is your confidence interval (CI) , commonly at 95%.

10) Hypothesis testing

This is the backbone of later modules.

Null hypothesis (H) : usually “no difference” or “no effect” Alternative hypothesis (H) : the claim that there is a difference or effect

You also need:

alpha (α) : significance level, usually. p-value : probability of seeing results this extreme if the null hypothesis were true one-tailed vs two-tailed tests rejection region : where results are extreme enough to reject H₀

11) Type I error, Type II error, and power

Type I error : reject H₀ when it is actually true False positive

Type II error : fail to reject H₀ when it is actually false False negative

The module ties Type I error to alpha and Type II error to beta, and it also introduces power in the hypothesis-testing section.

12) Why Module 03 matters

The module conclusion says this is the foundation of inferential statistics. Later tests are basically new ways of applying the same logic: sampling variation, probability, distributions, and hypothesis testing.

Module 4: Student’s t distribution, comparing two groups,

and categorical analyses

Module 04 says directly that it builds on z distributions and introduces two major variations: the Student’s t distribution and the chi square distribution. It covers one-sample t tests, independent and dependent t tests, nonparametric alternatives, contingency tables, chi square tests, expected counts, and categorical analyses.

1) Why the t distribution exists

The z distribution assumes you know the population standard deviation, but in real life that is rare. The Student’s t distribution was created for situations where the population SD is unknown, especially with smaller samples.

Wilcoxon Signed Rank for paired/dependent data Mann-Whitney U for unpaired/independent data

7) Categorical analyses and chi square

Module 04 then shifts from comparing means to comparing counts or frequencies.

A two-way contingency table organizes counts across two categorical variables. The chi square distribution and chi square test are used to analyze frequency data between groups. The module also teaches degrees of freedom, expected counts, interpreting chi square tests, and alternatives like Fisher’s Exact Test and McNemar’s Test.

A plain-language way to think about chi square:

● t test asks whether means differ ● chi square asks whether frequencies or proportions differ

Module 5: comparing more than two groups

Module 05 introduces ANOVA and related tests for situations where you have more than two groups. It covers the F distribution , one-way ANOVA, post-hoc tests such as Tukey’s HSD, Kruskal-Wallis, repeated-measures ANOVA, Friedman, and two-way ANOVA with main and interaction effects.

1) Why not just run many t tests?

The module explicitly asks this. The problem is that multiple t tests inflate Type I error. ANOVA controls that better.

2) F distribution and F ratio

ANOVA uses an F ratio and the F distribution. The F distribution is right-skewed, only has positive values, has total area 1, and its shape depends on two degrees of freedom.

The first degree of freedom is linked to number of groups minus 1, and the second is linked to total observations minus number of groups.

3) One-way ANOVA

Used when you compare more than two independent groups on one outcome. Module 05 covers variables, hypotheses, assumptions, conducting the test, interpreting the output, visualizing ANOVAs, and reporting results.

4) Assumptions for one-way ANOVA

The module highlights normality, no problematic outliers, independence, and equal variances, using tools like Shapiro-Wilk and Levene’s test.

5) Post-hoc comparisons

ANOVA can tell you that at least one group differs, but not exactly where the differences are. For that, you use post-hoc comparisons , often Tukey’s HSD. These are pairwise comparisons that control the increased Type I error that comes from multiple comparisons.

6) Kruskal-Wallis

This is the nonparametric alternative to a one-way ANOVA when assumptions are not met. Module 05 covers its assumptions, calculation, interpretation, and reporting.

7) Repeated-measures ANOVA

Used when the same participants are measured across multiple conditions or time points. Module 05 covers repeated-measures terminology, assumptions, sphericity , interpreting output, and what to do when sphericity is broken.

8) Friedman test

This is the nonparametric alternative connected with repeated-measures ANOVA content in Module 05.

9) Two-way ANOVA

Used when there are two grouping factors. Module 05 teaches main effects and interaction effects and includes graph interpretation.

A plain-language version:

main effect = one factor matters on its own ● interaction effect = the effect of one factor depends on the level of the other factor

The module also notes advanced ANOVA types exist, but for this course the focus is one-way ANOVA, one-way repeated-measures ANOVA, and two-way ANOVA, with awareness of Kruskal-Wallis and Friedman.

Module 06 then introduces simple linear regression. It covers response and explanatory variables, plotting data, linear relationships, residuals, the regression equation, least squares, y-intercept and slope, assumptions, interpretation, coefficient of determination, prediction, and extrapolation.

A plain-language way to separate them:

correlation asks: are these variables related? ● regression asks: can one variable help predict the other?

5) Coefficient of determination

The module specifically teaches interpreting the coefficient of determination , which tells you how much of the variation in the outcome is explained by the predictor in a simple linear regression model.

6) Statistical considerations

The last section is broader and less formula-heavy. It includes:

● study design and methods ● sample size ● data quality and machine learning ● meaning of significance ● statistical vs clinical vs biological significance ● data reproducibility ● replication crisis ● causes and solutions

The module explicitly says the goal here is more awareness than memorizing every detail, but you do need to understand the difference between statistical significance and real-world importance, and you need awareness of reproducibility problems in modern science.

The “which test do I use?” backbone

A good stripped-down decision path from these modules is:

If you are describing one variable , think:

● level of measurement ● table or graph ● mean/median/mode ● range/IQR/SD

If you are judging how unusual one value is , think:

● z score ● probability ● normal curve

If you are estimating a population value , think:

● point estimate ● confidence interval

If you are testing one sample against a known value , think:

● one-sample t test

If you are comparing two means , think:

● dependent t test if paired ● independent t test if unpaired ● Wilcoxon or Mann-Whitney if assumptions fail

If you are comparing two categorical variables or proportions , think:

● chi square ● possibly Fisher’s Exact or McNemar depending on setup

If you are comparing more than two means , think:

● one-way ANOVA ● post-hoc test if significant ● Kruskal-Wallis if assumptions fail

If you are comparing repeated measures across more than two conditions , think:

● repeated-measures ANOVA ● Friedman if assumptions fail

If you have two factors , think:

● two-way ANOVA ● main effects and interaction effects

If you are studying a relationship between two continuous variables , think:

● Pearson or Spearman correlation ● regression if prediction is the goal

Final compact summary

Modules 1 and 2 are about describing data honestly: what kind it is, what the center and spread are, what the graph should look like, whether outliers are present, and whether the sample is trustworthy. Modules 3 to 6 are about making inferences: how sample results vary, how probability works, how z and t distributions support inference, how to compare means and counts, how to compare more than two groups, how to study relationships and prediction, and how to think critically about significance and reproducibility in health research.

Next, I can turn this into a tighter exam-ready master cheat sheet with one line per concept and a “when to use which test” table.