Multivariate Regression and Principal Component Regression/Analysis: An Introduction - Pro | Papers Chemistry

1 pca2.mcd

Multivariate Regression, leading up to Principal Component Regression/Analysis

-- an introductory tutorial to some of the most important ideas in multivariate regression.

Instructor: Nam Sun Wang

Multivariate Regression.

Let us expand the number of independent variables and dependent variables. Here, we are given a

set of data consisted of a series of m+1 independent variables x<0>, x<1>, ..., x<m>, and l+1

dependent variables y<0>, y<1>, ..., y<l>. An example is how the quality, thickness, and strength of a

paper product (Y) depend on water content, source of fiber, digestion temperature, pH, etc. (X).

Another example is how the yield and composition in a chemical reactor (Y) depend on stirrer speed,

feed flow rate, reactant concentrations, ... (X). The chemical composition (Y) measured with a

chemical sensor may be related to the response of an array sensor (X). The mechanical or chemical

property of a material (Y) may depend on its color spectrum (X). An economic example may be how

the stock price and trading volume (Y) depend on the prevailing interest rate, the company's earning,

the quarter in the calendar, ... (X). The gross national product (Y) may depend on a country's

population, literacy rate, average age, level of rainfall, ... (X). The probability of death, thus, the

premium of a life insurance policy, may depend on the many attributes of the insured. The salary

and popularity of a football player (Y) may depend on his height, weight, running speed, strength,

running yards gained, passing yards gained, number of touchdowns, number of fumbles, hours of

practice per day, ... (X). The standardized test scores or the grade point average of a student (Y)

may depend on the number of hours spent in school, amount of daily TV time, the household

income, gender, the time of the day the test is taken, and maybe even the number of whip lashes

received since one's birth or the average number of glasses of milk one consumes daily (X).

Furthermore, a student's standardized test scores and grade point average may be closely

correlated. The examples are endless.

What we include as an independent variable need not actually affect the dependent variables in any

way. It is not necessarily a reflection of what we believe to affect the process. If we so desire, we

can throw in everything that may remotely affect the dependent variables. One thing regression tells

us is whether there is indeed any correlation between the two. A word of caution: existence of a

correlation does not imply the existence of an actual connection or the existence of a direct

cause-effect relationship. It is often true that "look and thou shall find." To judge whether a particular

degree of correlation is significant, we need to resort to tools from probability, hypothesis testing,

metrics, reliability, controlled experimentation, etc. In addition, we need to worry about a lot of other

things: how to include representative samples, adequate sample size, define the domain of validity to

avoid extrapolation, and detection and rectification of outliers and gross errors -- none of which will be

addressed in this worksheet.

Multivariate Regression and Principal Component Regression/Analysis: An Introduction - Pro, Papers of Chemistry

Related documents

Partial preview of the text

Download Multivariate Regression and Principal Component Regression/Analysis: An Introduction - Pro and more Papers Chemistry in PDF only on Docsity!

Y

Y

X

X

Y

X

3 X

X

X

10000 X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

Y 0 Y

X

3 X