

Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Condensed cheat sheet for basic statistics
Typology: Cheat Sheet
1 / 2
This page cannot be seen from the preview
Don't miss anything!


Hypothesis Testing: One-Sample T-test ● A hypothesis test evaluates whether the null hypothesis (H₀) is a reasonable explanation for the observed data, using evidence from a sample. The outcome is always to reject H₀ or fail to reject H₀—you never accept or prove the null hypothesis. ● The H 0 is a statement about population parameters or distributions. The alternative hypothesis (Hₐ) includes all values not specified by H₀ and often reflects the research hypothesis. ● There are one-sided tests (left- and right-tailed tests) and two-sided tests. The one-sided test has the same p-value. A two-tailed p-value is twice the one-tailed p-value. ○ Two-tailed : μ ≠ μ₀ | Right-tailed : μ > μ₀ | Left-tailed : μ < μ₀ ● The assumptions are normality (the sample observations come from a normal distribution) and independence (the sample observations are independent of one another). ● The alpha level α is the threshold below which a p-value is small enough to reject the null hypothesis. We reject if p-value < 𝛼 and fail to reject if p-value ≥ 𝛼. ● Type I error (α) : H₀ is true, but we mistakenly reject it | Type II error (β) : H₀ is false, but we fail to reject it. ○ Lowering α reduces Type I error but increases Type II error, and vice versa. The only way to reduce both is to increase the sample size. ● Power is the probability of rejecting a false null hypothesis and is defined as 1 − β. A high-power test is more likely to detect a true effect when one exists. Dependent (Paired) Samples T-test: Compares means of 2 groups measured on the same continuous variable, where observations can be meaningfully paired (before vs. after treatment). ● The null hypothesis typically states that the mean paired difference (d̄) equals zero (H₀: d = 0). ● The assumptions are normality and independence. To decide whether you can use a paired samples t-test. ○ If n 1 =n 2 > 30, you can rely on the Central Limit Theorem, and the test is likely to be robust to normality violations. ○ If n < 30, you must check normality through the skewness statistic or the Shapiro-Wilk test. ■ If normality holds, use the paired t-test; if not, the paired t-test may be invalid. ● When interpreting results, first determine the test type. ○ If it is 1-tailed (Ha: 𝜇 d > 0 or Ha: 𝜇 d < 0), check if the mean difference goes in the hypothesized direction. ■ If yes, check whether (2-tailed p-value)/2 <0. ● If yes: Statistically significant; report result and calculate effect size ● If no: Not significant; report result. ○ If it is 2-tailed (i.e., Ha: 𝜇 d ≠ 0), check if the 2-tailed p-value < .05 or if 0 is not in the confidence interval. ■ If yes → Statistically significant; report result, determine direction (μd > 0 or < 0), and calculate effect size. ■ If no → Not significant; report result. I ndependent Samples T-test (Two-sample T-test): Compares the means of two populations whose individuals cannot be meaningfully paired. For example, a drug trial comparing a treatment and control group uses an independent samples test. The goal is to estimate the difference between population means (𝜇₁ − 𝜇₂) using the difference between sample means (ȳ₁ − ȳ₂). ● Assumptions include normality, independence, and independent groups (scores in one group do not depend on the other). ● The null hypothesis is that the difference between the population means equals a hypothesized value Δ₀: H₀: 𝜇₁ − 𝜇₂ = Δ₀, commonly Δ₀ = 0. ● Before running the test, we decide which formula for degrees of freedom to use by performing Levene’s test for equality of variances. ○ If we fail to reject the null hypothesis (i.e., if p ≥ 0.05) , we should use the formula for populations with equal variance s. ○ If we reject the null hypothesis (i.e., if p < 0.05 ), we should use the formula for populations with unequal variances. ● Assumptions and result interpretation follow the same procedures as described for the dependent t-test. One-way ANOVA: Used when comparing the means of three or more groups defined by a single factor (e.g., treatment type). ● The H 0 states that all population means are equal (μ₁ = μ₂ = ... = μ ). The HA states that at least one mean differs. ● The assumptions include normality, independence, independent groups, and homogeneity of variances (populations have the same variance). ● The ANOVA test compares the between-group variance (MSB) and the within-group variance (MSW). ○ The MSB is the spread of the sample means across treatment groups, while the MSW is the spread of scores within each treatment group. ○ ANOVA uses the F-ratio (MSb/MSw), which follows an F-distribution when assumptions are met, and the null hypothesis is true. ■ If MSB ≤ MSW (i.e., F ≤ 1), we fail to reject the null hypothesis because the variation we see is close to what we expect to see if the null is true. ■ If MSB > MSW (i.e., F > 1), we may reject the null hypothesis because the variation we see is unusual if the null is true. ● The effect size η^2 (eta-squared) is calculated SSB/SST (variance explained by groups). The Cohen effect size is used to categorize the size of the effect. ○ η 2 < 0.01 negligible, 0.01 ≤ η 2 < 0.06 small, 0.06 ≤ η 2 < 0.14 medium, and η 2 ≥ 0.14 large ● Post-hoc multiple comparisons tests identify which pairs of means differ while controlling Type I error. Tukey's Honestly Significance Difference compares all pairs after a significant ANOVA, while the Bonferroni adjustment is used for planned comparisons. ● To decide whether you can use a one-way ANOVA. ○ If n ₁ , n ₂ , ....nk > 30, you can rely on the CLT and proceed with checking assumptions. ○ If either sample is < 30 → check normality ■ If the skewness ratio <2 or the Shapiro-Wilk test is p>0.05, then normality is tenable → proceed with checking assumptions ■ If the skewness ratio is not <2 or the Shapiro-Wilk test is not p>0.05 → normality is not tenable → do not use one-way ANOVA, result may not be valid ○ If n1 = n2 = ... = nk → homogeneity of variance is not an issue → proceed with the one-way ANOVA ○ If n1 ≠ n2 ≠ ... ≠ nk → homogeneity of variance is an issue → perform Levene's test ■ If Levene's test p > .05 → homogeneity of variance is not an issue → proceed with the one-way ANOVA ■ If Levene's test is not p > .05 → homogeneity of variance is not tenable → do not use one-way ANOVA, test result may not be valid ● When interpreting results, first determine whether the ANOVA test is significant. ○ If the ANOVA p < .05, the results are statistically significant. ■ If k > 2 → Conduct post-hoc, interpret the result, and calculate effect size. ■ If k=2 → Use sample means to interpret the result and calculate the effect size Two-way ANOVA : Examines how two factors affect a dependent variable and whether they interact, by partitioning variability into main and interaction effects. ● The null hypotheses state that there is no interaction and no main effects of Factors 1 (equal row means) and 2 (equal column means), while the alternative hypotheses s tate that an interaction exists and that at least one row or column mean differs. ● The assumptions include normality, independence, independent groups, and homogeneity of variances. ● An interaction occurs when the effect of one factor depends on the level of the other. Interaction plots help visualize this: ○ Parallel lines indicate no interaction, even if they are curved or have kinks. Nonparallel or crossing lines s uggest a possible interaction, though lines may cross without a significant interaction or not cross despite one. If the interaction is significant, interpretation should focus on the interaction rather than the main effects. ● Effect sizes show the proportion of total variance explained by each effect: ○ Interaction Effect : η²INT = SSINT / SST | Factor 1 Effect : η²F1 = SSF₁ / SST | Factor 2 Effect: η²F2 = SSF₂ / SST ● Assumptions follow the same procedures as described for the one-way ANOVA. When interpreting results, first determine whether the interaction effect is significant. ○ If p < 0.05 for the interaction effect, the effects are statistically significant → Conduct a simple effects analysis, interpret the interaction result, and calculate the effect size (do not interpret main effects result). ○ If p is not <0.05 → check if p <0.05 for any of the main effects ■ If yes, the results are statistically significant → check if k >2. ● If k>2, conduct post-hoc, interpret the result, and calculate the effect size. ● If k=2, use sample means to interpret the result and calculate effect size. ■ If not, the results are not statistically significant and report result Correlation and Simple Regression ● A scatterplot is a graphical representation of paired data values (x,y) as individual points on a grid, with the x-axis (horizontal) as the IV and the y-axis (vertical) as the DV. ● The direction of the association between the two variables can be positive (bottom left → upper right) or negative (upper left → bottom right). The form of the association can be linear (points appear stretched in a consistent straight form) and non-linear (points do not follow a straight form–curved, bending, etc). The strength of association can be strong (points are tightly clustered in a clear pattern–either linear or nonlinear) or weak (points form a vague cloud with barely discernible pattern). ● Pearson’s Correlation Coefficient (r) measures the strength and direction of a linear relationship between two variables. The range is 1 ≤ r ≤ 1. The direction of r indicates a negative or positive direction, while the absolute value |r| indicates the strength, with |r| close to 1 → strong and |r| close to 0 → weak. There is no universal cutoff for “strong” r. ● Regression analysis is used when a moderate or strong linear association exists and can be used for prediction. ○ A linear model equation is: 𝑦 = 𝑎 + 𝑏𝑥 where 𝑦 is the predicted value of y, a is the y-intercept, and b is the slope. ○ The slope b represents the change in the predicted value of y for a one-unit increase in x. The intercept represents the predicted value of y when x equals zero. If zero is not a meaningful value for x, the intercept is simply a starting point and not a meaningful prediction. ● Residual (e) : difference between observed and predicted value, where 𝑒 = 𝑦 − 𝑦. ● The Coefficient of Determination (R^2 ) is the proportion of variation in y explained by the model, and ranges from 0 to 1. It is used as a measure of effect size. ● The regression assumption includes linearity, independence (error terms must be independent), equal variance (variance of the error terms should be the same), and normal population (the error terms along the regression line should follow a normal distribution). ● The regression inference focuses on testing whether the population slope equals zero. The null hypothesis states that there is no relationship between x and y. A t-statistic is used to test this hypothesis and to construct confidence intervals for the slope. A regression model is statistically significant if the F-test rejects the null hypothesis.
Multiple Choice Question
Hypothesis Testing One-sample T-test Test whether the mean gestation time of 70 pregnancies (x̄ = 260.31 days) differs from the known population mean (μ = 266 days)
Dependent Samples T-test A researcher wants to know how large the average difference in dexterity is between the dominant and non-dominant hands in children.
Independent Samples T-test Use NHANES data to conduct an independent sample t-test comparing mean systolic blood pressure between males and females.