Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

STATS 305 Notes1, Study notes of Calculus

Stanford University Calculus

These are class notes for STAT 305, beautifully scribed by Eric Min and made available online for students. The notes cover various topics including ANOVA, hypothesis testing, lab reliability, contrasts, multiple comparisons, simple regression, errors in variables, and random effects. The notes were taken during the Autumn 2013 semester at the Department of Statistics, Stanford University. detailed explanations, examples, and formulas for each topic.

Typology: Study notes

2012/2013

Uploaded on 05/11/2023

alpana 🇺🇸

4.9

(13)

249 documents

1 / 148

This page cannot be seen from the preview

Don't miss anything!

STATS 305 Notes1

Art Owen2

Autumn 2013

1The class notes were beautifully scribed by Eric Min. He has kindly allowed his notes to be placed online

for stat 305 students. Reading these at leasure, you will spot a few errors and omissions due to the hurried

nature of scribing and probably my handwriting too. Reading them ahead of class will help you understand

the material as the class proceeds.

2Department of Statistics, Stanford University.

Discover Study notes of Calculus Stanford University

Partial preview of the text

Download STATS 305 Notes1 and more Study notes Calculus in PDF only on Docsity!

STATS 305 Notes^1

Art Owen^2

Autumn 2013

(^1) The class notes were beautifully scribed by Eric Min. He has kindly allowed his notes to be placed online for stat 305 students. Reading these at leasure, you will spot a few errors and omissions due to the hurried nature of scribing and probably my handwriting too. Reading them ahead of class will help you understand the material as the class proceeds. (^2) Department of Statistics, Stanford University.

0.0: Chapter 0:

1 Overview
- 1.1 The Math of Applied Statistics
- 1.2 The Linear Model
  - 1.2.1 Other Extensions
- 1.3 Linearity
- 1.4 Beyond Simple Linearity
  - 1.4.1 Polynomial Regression
  - 1.4.2 Two Groups
  - 1.4.3 k Groups
  - 1.4.4 Different Slopes
  - 1.4.5 Two-Phase Regression
  - 1.4.6 Periodic Functions
  - 1.4.7 Haar Wavelets
  - 1.4.8 Multiphase Regression
- 1.5 Concluding Remarks
2 Setting Up the Linear Model
- 2.1 Linear Model Notation
- 2.2 Two Potential Models
  - 2.2.1 Regression Model
  - 2.2.2 Correlation Model
- 2.3 TheLinear Model
- 2.4 Math Review
  - 2.4.1 Quadratic Forms
3 The Normal Distribution
- 3.1 Friends of N (0, 1)
  - 3.1.1 χ
  - 3.1.2 t-distribution
  - 3.1.3 F -distribution
- 3.2 The Multivariate Normal
  - 3.2.1 Linear Transformations
  - 3.2.2 Normal Quadratic Forms
  - 3.2.3 Rotation
  - 3.2.4 More on Independence
  - 3.2.5 Conditional Distributions
- 3.3 Non-Central Distributions
  - 3.3.1 Non-Central χ 0.0: CONTENTS Chapter 0: CONTENTS
  - 3.3.2 Non-Central F Distribution
  - 3.3.3 Doubly Non-Central F Distribution
4 Linear Least Squares
- 4.1 The Best Estimator
  - 4.1.1 Calculus Approach
  - 4.1.2 Geometric Approach
  - 4.1.3 Distributional Theory of Least Squares
- 4.2 The Hat Matrix and t-tests
- 4.3 Distributional Results
  - 4.3.1 Distribution of ˆε
- 4.4 Applications
  - 4.4.1 Another Approach to the t-statistic
- 4.5 Examples of Non-Uniqueness
  - 4.5.1 The Dummy Variable Trap
  - 4.5.2 Correlated Data
- 4.6 Extra Sum of Squares
- 4.7 Gauss Markov Theorem
- 4.8 Computing LS Solutions
  - 4.8.1 The SVD
- 4.9 R^2 and ANOVA Decomposition
5 Intro to Statistics
- 5.1 Mean and Variance
- 5.2 A Staircase
- 5.3 Testing
  - 5.3.1 One Sample t-test
- 5.4 Practical vs. Statistical Significance
- 5.5 Confidence Intervals
6 Power and Correlations
- 6.1 Significance
  - 6.1.1 Finding Non-Significance
  - 6.1.2 Brief History of the t-test
- 6.2 Power
  - 6.2.1 Power Example
- 6.3 Variance Estimation
- 6.4 Correlations in Yi
  - 6.4.1 Autoregressive Model
  - 6.4.2 Moving Average Model
  - 6.4.3 Non-Normality
  - 6.4.4 Outliers
7 Two-Sample Tests
- 7.1 Setup
- 7.2 Welch’s t
- 7.3 Permutation Tests
8 k Groups Chapter 0: CONTENTS 0.0: CONTENTS
- 8.1 ANOVA Revisited
  - 8.1.1 Cell Means Model
  - 8.1.2 Effects Model
- 8.2 Hypothesis Testing
- 8.3 Lab Reliability
- 8.4 Contrasts
  - 8.4.1 Another Example
  - 8.4.2 The Most Sensitive Contrast
- 8.5 Some Recap
  - 8.5.1 The “Cheat” Contrast
- 8.6 Multiple Comparisons
  - 8.6.1 t-tests
  - 8.6.2 Fisher’s Least Significant Difference (LSD)
  - 8.6.3 Bonferroni’s Union Bound
  - 8.6.4 Tukey’s Standardized Range Test
  - 8.6.5 Scheff´e
  - 8.6.6 Benjamini and Hochberg
- 8.7 Creating Groups
9 Simple Regression
- 9.1 Regression Fallacy
- 9.2 The Linear Model
  - 9.2.1 The Easier Way
- 9.3 Variance of Estimates
  - 9.3.1 βˆ
  - 9.3.2 βˆ
  - 9.3.3 βˆ 0 + βˆ 1 x
- 9.4 Simultaneous Bands
- 9.5 Calibration
- 9.6 R^2 for x ∈ R
- 9.7 Regression through the Origin
  - 9.7.1 The Futility of R
10 Errors in Variables
- 10.1 The Normal Case
11 Random Effects
- 11.1 The Model
- 11.2 Estimation
- 11.3 Two-Factor ANOVA
  - 11.3.1 Fixed × Fixed
  - 11.3.2 Random × Random
  - 11.3.3 Random × Fixed
  - 11.3.4 Other Remarks
12 Multiple Regression
- 12.1 Example: Body Fat
- 12.2 Some Considerations 0.0: CONTENTS Chapter 0: CONTENTS
  - 12.2.1 A “True” βj
  - 12.2.2 Naive Face-Value Interpretation
  - 12.2.3 Wrong Signs
  - 12.2.4 Correlation, Association, Causality
13 Interplay between Variables
- 13.1 Simpson’s Paradox
  - 13.1.1 Hospitals
  - 13.1.2 Smoking and Cancer
- 13.2 Competition and Collaboration
  - 13.2.1 Competition
  - 13.2.2 Collaboration
14 Automatic Variable Selection
- 14.1 Stepwise Methods
- 14.2 Mallow’s Cp
- 14.3 Least Squares Cross-Validation
  - 14.3.1 The Short Way
  - 14.3.2 Algebraic Aside
- 14.4 Generalized CV
- 14.5 AIC and BIC
  - 14.5.1 AIC vs. BIC
- 14.6 Ridge Regression
  - 14.6.1 Optimization
  - 14.6.2 A Bayesian Connection
  - 14.6.3 An Alternative
- 14.7 Principal Components
- 14.8 L 1 Penalties
15 Causality
- 15.1 Regression Discontinuity Designs
16 Violations of Assumptions
- 16.1 Bias
  - 16.1.1 Detection
  - 16.1.2 Transformations
- 16.2 Heteroskedasticity
  - 16.2.1 A Special Case
  - 16.2.2 Consequences
  - 16.2.3 Detection
- 16.3 Non-Normality
  - 16.3.1 Detection
- 16.4 Outliers
  - 16.4.1 Detection
  - 16.4.2 Potential Solutions
17 Bootstrapped Regressions
- 17.1 Bootstrapped Pairs Chapter 0: CONTENTS 0.0: CONTENTS
- 17.2 Bootstrapped Residuals
  - 17.2.1 Unequal Variances
- 17.3 Wild Bootstrap
- 17.4 Weighted Likelihood Bootstrap
18 Course Summary
- 18.1 The Steps

0.0: CONTENTS Chapter 0: CONTENTS

1.3: Linearity Chapter 1: Overview

Proof.

E[(Y − m(x))^2 |X = x] =E[(Y − μ(x) + μ(x) − m(x))^2 |X = x] =E[(Y − μ(x))^2 |X = x]

2E[(Y − μ(x))(μ(x) − m(x))|X = x]
(μ(x) − m(x))^2 =V(Y |X = x) + 2(0) + (μ(x) − m(x))^2 =V(Y |X = x) + Bias^2

This is the standard bias-variance trade-off. We cannot change variance. (We also assume that y has a finite variance.) However, our choice of m(x) can minimize the bias.

1.2.1 Other Extensions

The proof above is slightly unsatisfactory since it already “knows” the conclusion. Instead, we could take a first-order condition:

d dm E[(Y − m)^2 |X = x] = 0 ⇒ m(x) = E[Y |X = x]

This yields an extremum which must obviously be a minimum.

We were dealing with mean squared error above. What about absolute error? For that, E[(Y − m)|X = x] is minimized by using m = med(Y |X = x) (the median). In some cases, we may want V(Y |X = x), or a quantile like Q^0.^999 (Y |X = x).

1.3 Linearity

Suppose we have data on boiling points of water at different levels of air pressure.

We can fit a line through this data and toss out all other potential information.

yi = β 0 + β 1 xi + εi

By drawing a single line, we also assume that the residual errors are independent and have the same variance. That is,

ε ∼ (0, σ^2 )

Chapter 1: Overview 1.4: Beyond Simple Linearity

Boiling point

Air pressure

Boiling point

Air pressure

Figure 1.1: Sample data of boiling points at different air pressures, with and without linear fit.

Doing so overlooks the possibility that each point’s error may actually not be random, but the effect of other factors we do not analyze directly (such as whether the experiment was done in the sun, done indoors, used a faulty thermometer, etc.). Nonetheless, the linear model is powerful in that it summarizes the data using three values: β 0 , β 1 , εi.

But what about the data in Figure 1.2?

Figure 1.2: Other data with a fitted line.

This is also a simple linear model; the key difference is that the variance in the residuals here is much, much higher.

1.4 Beyond Simple Linearity

The definition of a linear model goes further than a straight line. More generally,

yi = β 0 + βixi 1 + ... + βpxip + εi has p predictors and p − 1 parameters

Chapter 1: Overview 1.4: Beyond Simple Linearity

1.4.3 k Groups

The logic above extends when having more than two groups. Let:

x 1 =

1 if group 2 0 otherwise

x 2 =

1 if group 3 0 otherwise

... xk− 1 =

1 if group k 0 otherwise

where one group is chosen as a point of reference. Then, we get that

E[Y ] = β 0 + β 1 x 1 + β 2 xx + ... + βk− 1 xk− 1

where group 1 has mean β 0 and group j > 1 has mean β 0 + βj− 1.

Another way to consider k groups is through the cell mean model, which we express as the following:

E[Y ] = β 11 {x = 1} + β 21 {x = 2} + ... + βk 1 {x = k}

Note that the cell mean model has no intercept term.

1.4.4 Different Slopes

Suppose we are comparing a treatment and control group. It could be the case that both groups experience the same effect based on time, in which case the slopes of their two lines are the same (Figure 1.3a). But what if the two groups also have different slopes (Figure 1.3b)?

𝐸[𝑦]

𝑥

T’s gain T’s overall gain

C’s gain

(a) Same slopes can be dealt with using a dummy variable.

𝐸[𝑦]

𝑥 (b) Different slopes can be dealt with using interac- tions.

Figure 1.3: Two cases of different slopes.

E[Y ] = β 0 + β 1 x + β 2 z + β 2 xz

1.4: Beyond Simple Linearity Chapter 1: Overview

where x is time and z is an indicator for treatment. This allows for the two groups to have different slopes.

1.4.5 Two-Phase Regression

There may be cases where the slope of the line changes at a certain point. For instance, the performance of an average human kidney begins to decline at age 40. See Figure 1.5b. How can we express these kinds of situations?

E[Y ] = β 0 + β 1 x + β 2 (x − t)+ + εi where z+ = max(0, z) =

z z ≥ 0 0 z < 0

𝑧+

𝑧

Figure 1.4: Visualization of z+.

𝑡 0

𝛽 0 + 𝛽 1 𝑡 + 𝛽 2 𝑡 − 𝑡 0 +

(a) Accounting for sudden increase at a known t 0.

𝛽 0 + 𝛽 1 𝑡 − (^40) +

(b) Accounting for decline in kidney performance at

Figure 1.5: Examples of two-phase regression models.

1.4.6 Periodic Functions

What about cyclical data, such as calendar time? December 31 is not that distinct from January

How can we deal with an x that has a torus-like shape?

1.5: Concluding Remarks Chapter 1: Overview

This can approximate piecewise functions and may be preferable to methods such as (excessive) polynomial regression.

Figure 1.7: Example of multiphase regression.

1.5 Concluding Remarks

Despite these models’ differences, the mathematics underlying them is all linear and practically the same.

Then what are examples of non-linear models?

yi = β 0

1 − e−β^1 ti

εi is not linear because β 1 is in an exponent.

yi = β 0 + β 1 xi 1 + β 2 (xi − β 3 )+ + εi is not linear because of β 3.
yi +

∑k j=1 βj^ e

− 12 ||xi−μj ||^2 is almost linear but has a small Gaussian bump in the middle.

Chapter 2 Setting Up the Linear Model

Last time, we discussed the big picture of applied statistics, with focus on the linear model.

Next, we will delve into more of the basic math, probability, computations, and geometry of the linear model. These components will be similar or the same across all forms of the model. We will then explore the actual ideas behind statistics; there, things will differ model by model.

There are six overall tasks we would like to perform:

Estimate β
Test, e.g. β 7 = 0
Predict yn+1 given xn+
Estimate σ^2
Check the model’s assumptions
Make a choice among linear models

Before anything, let’s set up some notation.

2.1 Linear Model Notation

Xi ∈ Rd^ Yi ∈ R

Yi =

∑^ p

Zij βj + εi where Zij = jth^ function of Xi

We also call these jth^ functions features. Note that the dimensions of X (d) do not have to be equal to the number of features (p). For example,

Zi =

[

1 xi 1 ... xip

]

Zi =

[

1 xi 1 xi 2 x^2 i 1 x^2 i 2

]

(quadratic regression)

In the first case, p = d + 1. In the second, p = 5 and d = 2. Features are not the same as variables.

Chapter 2: Setting Up the Linear Model 2.4: Math Review

Yi =

∑^ p

Zij βj + εi where εi i.i.d. ∼ N (0, σ^2 )

Note that the i.i.d. notation for εi is interchangeable with “ind.” However, both of these are

distinct from εi i.i.d. ∼ N (0, σ^2 i ) (where each error term has its own variance) and εi ind. ∼ (0, σ^2 ) (where the distribution is unknown).

Second,

Y =

y 1 y 2 .. . yn

Z =

z 11 ... zip .. .

zn 1 ... znp

 β^ =

β 1 .. . βp

 ε^ =

ε 1 .. . εp

Third,

A = Zβ + ε where ε ∼ N (0, σ^2 I)

We can also express this third version as A ∼ N (Zβ, σ^2 I).

Let’s dig deeper into the vector/matrix form of the regression model. But first, we require some probability and matrix algebra review.

2.4 Math Review

Suppose that A ∈ Rm×n^ is a random m×n matrix. Expected values on this matrix work component- wise:

E

x 11 ... x 1 n .. .

xm 1 ... xmn

E[x 11 ] ... E[x 1 n] .. .

E[xm 1 ] ... E[xmn]

Now, suppose there exist two non-random matrices A and B. Then,

E[AX] = A E[X] E[XB] = E[X] B

Say that A ∈ Rn^ and A ∈ Rn^ are random column vectors. Then,

2.4: Math Review Chapter 2: Setting Up the Linear Model

Cov(X, Y ) ≡ E[(X − E[X])(Y − E[Y ])T^ ] ∈ Rn×m
Var(Y ) = Cov(Y, Y ) ∈ Rm×m
- Variances are on the diagonal, and covariances are on the off-diagonal.
Cov(AX, BY ) = A Cov(X, Y ) BT
V(AX + b) = AV(X)AT^ (where A is random and b is non-random)

The variance matrix V(X) is positive semi-definite. That is, for a vector C,

0 ≤ V(CT^ X) = CT^ V(X)C

Indeed, V(X) is positive definite unless V(CT^ X) = 0 for some C 6 = 0.

2.4.1 Quadratic Forms

Suppose that A ∈ Rn×n. Written explicitly, the quadratic form is a scalar value defined as

xT^ Ax =

∑^ m

∑^ n

Aij xixj

We can assume symmetry for A; Aij = Aji. (If it is not symmetric, we can instead use A+A T

This doesn’t look directly relevant to us. So why do we care about quadratic forms? Consider how we estimate variance. We know that

σˆ^2 ∝

∑^ n

(yi − Y )^2 where ¯y =

∑^ n

In matrix notation, this variance estimate can be written as:

  

y 1 .. . yn

1 − (^1) n − (^) n^1 ... − (^) n^1 − (^1) n 1 − (^) n^1

− (^1) n ... ... 1 − (^) n^1

y 1 .. . yn

∑^ n

yi(yi − y¯)

(^1) This is true because xT^ Ax = (xT^ Ax)T^ = xT^ AT^ x = xT

( 1 2 A^ +

1 2 A

) x

The first equality is true because the transpose of a scalar is itself. The second equality is from the fact that we are averaging two quantities which are themselves equal. Based on this, we know that only the symmetric part of A contributes to the quadratic form.

STATS 305 Notes1, Study notes of Calculus

Related documents

Partial preview of the text

Download STATS 305 Notes1 and more Study notes Calculus in PDF only on Docsity!

STATS 305 Notes^1

Art Owen^2

Autumn 2013

1.2.1 Other Extensions

1.4.3 k Groups

1.4.4 Different Slopes

1.4.5 Two-Phase Regression

1.4.6 Periodic Functions

Chapter 2

Setting Up the Linear Model

[

]

[

]

Y =

Z =

E

E[AX] = A E[X] E[XB] = E[X] B

2.4.1 Quadratic Forms