STS Pearson & Regression Analysis, Study notes of Statistics

Study notes for Pearson correlation and Regression Analysis

Typology: Study notes

2024/2025

Uploaded on 11/03/2025

hoseok-ko
hoseok-ko 🇵🇭

1 document

1 / 13

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
📘
PEARSON CORRELATION COEFFICIENT (r) – COMPLETE REVIEW NOTES
🧠
1. Overview of Inferential Statistics
🔹
Definition
Inferential statistics is the branch of statistics used to make conclusions about a
population based on a sample.
It allows researchers to analyze, test hypotheses, and predict outcomes using data.
🔹
Purpose
To draw inferences about a population from a sample.
To measure reliability of conclusions
To analyze relationships and make predictions through tests and models.
2. Two Major Types of Inferential Statistics
Type
Description
Examples
Regression Analysis
Determines how one
variable changes with
respect to another.
Simple, Multiple, Logistic
Regression
Hypothesis Testing
Tests assumptions about a
population using sample
data.
z-Test, t-Test, ANOVA
🧩
Hypothesis Testing Components
Null Hypothesis (H): No effect or difference exists.
Alternative Hypothesis (H): There is an effect or difference.
Decision Basis: Based on test statistic, critical value, and confidence level
(usually 95%).
Types: Left-tailed, right-tailed, or two-tailed.
pf3
pf4
pf5
pf8
pf9
pfa
pfd

Partial preview of the text

Download STS Pearson & Regression Analysis and more Study notes Statistics in PDF only on Docsity!

📘 PEARSON CORRELATION COEFFICIENT (r) – COMPLETE REVIEW NOTES

🧠 1. Overview of Inferential Statistics

🔹 Definition

Inferential statistics is the branch of statistics used to make conclusions about a population based on a sample. It allows researchers to analyze, test hypotheses, and predict outcomes using data.

🔹 Purpose

● To draw inferences about a population from a sample. ● To measure reliability of conclusions ● To analyze relationships and make predictions through tests and models.

2. Two Major Types of Inferential Statistics

Type Description Examples

Regression Analysis Determines how one variable changes with respect to another.

Simple, Multiple, Logistic Regression

Hypothesis Testing Tests assumptions about a population using sample data.

z-Test, t-Test, ANOVA

🧩 Hypothesis Testing Components

Null Hypothesis (H): No effect or difference exists. ● Alternative Hypothesis (H): There is an effect or difference. ● Decision Basis: Based on test statistic , critical value , and confidence level (usually 95%). ● Types: Left-tailed, right-tailed, or two-tailed.

📉 3. Regression Analysis (Brief Review)

🔹 Definition

A statistical method that examines the relationship between dependent (Y) and independent (X) variables.

🔹 Purpose

● To measure how changes in X affect Y ● To predict outcomes ● To test relationships and influences

🔹 Regression Coefficient

Represents how much Y changes for every one-unit change in X.

📊 4. PEARSON CORRELATION COEFFICIENT (r)

🔹 Definition

A statistical measure that indicates the strength and direction of a linear relationship between two numerical variables.

Developed by:

Sir Francis Galton (late 1800s): Introduced the idea of “co-relation” when studying heredity. ● Karl Pearson (1896): Formalized the product-moment correlation coefficient , now known as Pearson’s r.

5. Formula

Where:

n = number of data pairs ● x, y = individual data values ● Σ = summation symbol

💡 7. Key Concepts and Characteristics

Concept Description

Range -1 ≤ r ≤ +

Direction Positive (+) means variables move together; Negative (–) means they move oppositely

Strength Closer

Unit Independence r is unit-free , unaffected by scale of measurement

Symmetry r(X,Y) = r(Y,X)

📏 8. Importance of Pearson’s r

  1. Checks Linearity: Confirms if a linear relationship exists (required for regression).
  2. Avoids Misleading Models: If correlation ≈ 0, regression is not suitable.
  3. Validates Regression Assumptions: Ensures data meet linearity conditions.
  4. Quantifies Relationship Strength: Measures how well one variable predicts another.

9. Assumptions of Pearson’s Correlation

Assumption Explanation

Independence Each observation must be independent.

Linearity Relationship between variables must be linear (check via scatterplot).

Homoscedasticity Spread of points around the line should be uniform (rectangular scatter).

📚 10. Example Problem

Situation:

A teacher wants to find if hours studied affect exam scores of 5 students.

Student X = Study Hours Y = Exam Score

● If regression analysis is appropriate

📍 When to Use Pearson’s r

● Both variables are quantitative (numerical) ● Relationship is linear ● No extreme outliers are present

📍 When NOT to Use It

● One or both variables are categorical (use Chi-square instead) ● Relationship is non-linear

🧾 13. Formula Summary

Concept Formula

Pearson’s r ( r = \frac{nΣxy - (Σx)(Σy)}{\sqrt{[nΣx^2 - (Σx)^2][nΣy^2 - (Σy)^2]}} )

Regression Equation ( Y = a + bX )

Slope (b) ( b = \frac{nΣxy - (Σx)(Σy)}{nΣx^2 - (Σx)^2} )

Intercept (a) ( a = \frac{Σy - bΣx}{n} )

🧭 14. Real-World Applications

● Predicting academic performance from study habits ● Measuring link between income and spending ● Studying relation between advertising and sales ● Health studies: correlation between age and blood pressure

✅ FINAL SUMMARY

Key Point Essence

Pearson’s r Measures strength & direction of linear relationship

Range -1 ≤ r ≤ +

Significance High

Interpretation Positive = direct relation; Negative = inverse

Purpose Foundation for regression, data analysis, and prediction

📘 REGRESSION ANALYSIS

Regression Analysis – a statistical method used to study the relationship between dependent (Y) and independent (X) variables. It shows how Y changes when X changes.

Purpose:

  1. Identify relationships between variables
  2. Predict outcomes
  3. Test hypotheses
  4. Control variables in experiments or data
  1. Prepare data (x, y) Compute: ○ ( Σx ), ( Σy ), ( Σx^2 ), ( Σy^2 ), ( Σxy )
  2. Compute b (slope) : b = \frac{nΣxy - (Σx)(Σy)}{nΣx^2 - (Σx)^2}
  3. Compute a (intercept) : a = \frac{Σy - b(Σx)}{n}
  4. Form equation: Ŷ = a + bX
  5. Substitute X to predict Y.

🔹 Example

Predict exam score if a student studies 7 hours.

X (Hours) Y (Score)

2 65

3 70

5 75

6 85

8 90

a = 56.5789,\ b = 4. Ŷ = 56.5789 + 4.2544X

For X = 7 Ŷ = 86.

Interpretation: A student studying 7 hours is predicted to score 86.36.

📉 PEARSON CORRELATION (r)

Formula:

Interpretation:

Value of r Interpretation

+0.90 to +1.00 Very strong positive

+0.70 to +0.89 Strong positive

+0.50 to +0.69 Moderate positive

+0.30 to +0.49 Weak positive

0 No correlation

-0.30 to -0.49 Weak negative

-0.70 to -0.89 Strong negative

-0.90 to -1.00 Very strong negative

Purpose:

● Tests if there is a linear relationship between X and Y ● Checks if the regression model is valid ●

🧮 CALCULATOR TECHNIQUE

  1. MODE → STAT (3), select (2) for linear regression
  2. Enter x and y
  3. To find: ○ a: Shift + 1 + 5 + 1 ○ b: Shift + 1 + 5 + 2 ○ r: Shift + 1 + 5 + 3
  4. For prediction: enter value of x → Shift + 1 + 5 +

🧠 KEY TAKEAWAYS

1Understanding Regression

● Explains relationship between dependent and independent variables ● Used for prediction and inference

2Types

Type When to Use

Simple Linear One predictor

Multiple Linear Two or more predictors

Logistic Binary/categorical dependent variable

3Computation Methods

Manual – least squares formulas ● Calculator – direct computation ● Excel – fast analysis with graphs

4Applications

● Predicting sales or grades ● Estimating costs or productivity ● Forecasting trends or risks