Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Log in Sign up

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Canonical Correlation Analysis - Applied Multivariate Analysis | STAT 636, Study notes of Descriptive statistics

Texas A&M University (A&M)Descriptive statistics

Material Type: Notes; Class: APPLIED MULTIVARIATE ANLY; Subject: STATISTICS; University: Texas A&M University; Term: Unknown 1989;

Typology: Study notes

Pre 2010

Uploaded on 02/13/2009

koofers-user-vq3-1 🇺🇸

4

(1)

8 documents

1 / 6

This page cannot be seen from the preview

Don't miss anything!

LECTURE 9

CANONICAL CORRELATION ANALYSIS

Introduction

The concept of canonical correlation arises when we want to quantify the associations

between two sets of variables.

For example, suppose that the first set of variables, labeled 'arithmetic' records x the

1

speed of an individual in working problems and x the accuracy. The second set of

2

variables, labeled 'reading' consists of x reading speed and x comprehension. We can

3

4

examine the six pair wise correlations but in addition, we ask if it makes sense to ask if

arithmetic is correlated with reading.

The answer is given by considering a linear combination of the arithmetic variables, say,

u and a linear combination of the reading variables, say v and using their correlation to

represent the association between the groups. Thus we construct

u a x a x and v b x b xœ  œ 

1 1 2 2 1 3 2 4

and we seek coefficients so that this correlation is maximized.

(NOTE: Every text I know of uses u and v for these variables. SAS PROC CANCORR

uses v and w. That is OK but don't get confused.)

Development

Suppose we have a vector of variables, x that consists of two sets of variables, x and x

1 2

where, x has length p and x has length p . Assume that p p . To develop the

1 1 2 2 1 2

Ÿ

notation, let

E[ ] and Var( )x x x

x

œ œ œ œ

” • ” • ” •

1 11 12

2

1

221 22

.

..

.

..

.

D D

D DD D

D D

D DD D

D D

D

The matrix gives the covariances between the variables in set one and set two and inD

DD

D

12

correlation form it gives the correlations. When p and p are moderately large,

1 2

examining the p p correlations and drawing conclusions is not an easy task. As an

1 2

alternative, we consider linear combinations

u and vœ œa x b x

T T

1 2

Note that

Var[u] Var[v] Cov[u,v]œ œ œa a b b a b

T T T

11 22 12

D D D

D D DD D D

D D D

We want to determine the vectors and so thata b

Corr[u, v] œ

a b

a a b b

T12

T T

11 22

D

DD

D

D D

D DD D

D D

ÈÈ

È

is as large as possible. To this end, we determine a and b as the solution to the problem

Discover Study notes of Descriptive statistics Texas A&M University (A&M)

Partial preview of the text

Download Canonical Correlation Analysis - Applied Multivariate Analysis | STAT 636 and more Study notes Descriptive statistics in PDF only on Docsity!

LECTURE 9

CANONICAL CORRELATION ANALYSIS

Introduction

The concept of canonical correlation arises when we want to quantify the associations between two sets of variables.

For example, suppose that the first set of variables, labeled 'arithmetic' records x 1 the speed of an individual in working problems and x 2 the accuracy. The second set of variables, labeled 'reading' consists of x 3 reading speed and x comprehension. We can 4 examine the six pair wise correlations but in addition, we ask if it makes sense to ask if arithmetic is correlated with reading.

The answer is given by considering a linear combination of the arithmetic variables, say, u and a linear combination of the reading variables, say v and using their correlation to represent the association between the groups. Thus we construct

u œ a x 1 1 a x 2 2 and v œ b x 1 3 b x 2 4

and we seek coefficients so that this correlation is maximized. (NOTE: Every text I know of uses u and v for these variables. SAS PROC CANCORR uses v and w. That is OK but don't get confused.)

Development

Suppose we have a vector of variables, x that consists of two sets of variables, x 1 and x 2 where, x 1 has length p 1 and x 2 has length p. Assume that p 2 1 Ÿp. To develop the 2 notation, let

x E[ ] x and Var( ) x

x x œ (^) ”^1 • œ (^) ” • œ œ”^11 12 • 2

1 2 21 22

DDDD DDDD

D

The matrix DDDD 12 gives the covariances between the variables in set one and set two and in correlation form it gives the correlations. When p 1 and p 2 are moderately large, examining the p p 1 2 correlations and drawing conclusions is not an easy task. As an alternative, we consider linear combinations u œ a xT^ 1 and vœ b xT 2 Note that Var[u] œ a T^ DDDD 11 (^) a Var[v] œ b T^ DDDD 22 (^) b Cov[u,v]œ a T DDDD 12 b

We want to determine the vectors a and b so that

Corr[u, v] œ (^) a a^ a bb b

T (^12) T 11 T 22 DDDD È DDDD^ ÈÈÈÈ DDDD

is as large as possible. To this end, we determine a and b as the solution to the problem

maximize a T^ DDDD 12 b subject to : a T^ DDDD 11 a œœœœ 1

b T^ DDDD 22 b œœœœ 1

The variables so determined are called the first pair of canonical variables, u 1 and v. 2 The second pair of canonical variables, u 2 and v 2 are similarly determined by linear combinations of x 1 and x 2 with unit variance and maximum correlation among all variables that are uncorrelated with the first pair. This reminds us of the discussion of principal components and leads to the determination of eigenvalues and eigenvectors.

The solution leads us to the stationary equations,

DDDD (^) 12 b -D-D-D-D 11 a œœœœ 0

DDDD (^) 21 a )D)D)D)D 22 b œœœœ 0

Multiplying the first equation by a T^ and the second by b Tshows that

œ )œ a T^ DDDD 12 b.

We thus seek - so that

º º^ 0.

-D-D-D-D DDDD

DDDD -D-D-D-D

11 12 21 22

œ

The following result is useful: I the matrix A is written in partitioned form as

A

A A

œœœœ (^) ””””^11 12 • ••• 21 22 then llll A l œ ll œ ll œ ll œ l A 11 llllllll A 22 A 21 A 11 ^1 A 12 llll

œ lœ lœ lœ l A 22 llll llll A 11 A 12 A 22 ^1 A 21 llll

Applying the second form of this to our matrix we have

º º^ (^ )

-D-D-D-D DDDD

DDDD -D-D-D-D

11 12 21 22

œ l -D 22 ll -D 11 (^) - "^ D 12 D 22 ^1 D 21 l

œ l D 22 ll D 12 ( D 22 )^1 D^21 - D^211 l

œ l D 22 ll D 11 ll D 11 ^1 D 12 ( D 22 ) ^1 D 21 -^2 Il

We can continue this for all non-zero eigenvalues.

Summary

The canonical variable pairs, u (^) i œa xiT 1 and v xiT 2 as determined have the following properties:

Corr(u , v )i i œ - i Corr(u , u )i j œ 0

Corr(v , v )i j œ 0 Corr(u , v )i j œ 0 for i Áj

These properties can be summarized by the correlation matrix

R

I Diag(( ) uv Diag( ) I

p i i p

œ (^) ” 1 • 2

Example

Returning to the reading-arithmetic example, suppose the sample correlation matrix is given by

1 .4 .5. .4 1 .3 .4 1 .4 .5. .5 .3 1 .2 .4 1 .3. .6 .4 .2 1

R œ R œ R œ

Ô ×

Ö Ù

Õ Ø

R (^) 22 œ (^) ” • R 21 œ” •

Note that it is best to apply the results to standardized data and hence we use the correlation matrix. We may then compute

.452. .146. A œ R (^) 11 ^1 R (^) 12 R (^) 22 ^1 R 21 œ” •

and .206. .278. B œ R (^) 22 ^1 R (^) 21 R 11 ^1 R 11 œ” •

The eigenvalues of these two matrices are the same, that is, - 12 œ .5457 and - 22 œ.0009. The eigenvectors of A and B are the columns of the matrices

VecA and VecB

œ (^) ” • œ” •

Recall that we have specified that the variances of the u (^) i and v imust be one. That is,

a RTi 11 a (^) i œœœœ 1 and b Ti DDDD 22 b (^) i œœœœ 1

The eigenvectors as determined are normalized to have length one but do not satisfy this condition. The eigenvectors must be scaled. The scaled eigenvectors are given by

and

A œ VecA (^) Œ B œ VecB Œ

"#^ "#

Thus,

and

A œ (^) ” • B œ” •

It follows that the first canonical pair is defined by

u 1 œ .856z 1 ..278z 2 v 1 œ .545z 3 .737z 4

with correlation 31 œ È .5457 œ.

The second canonical pair is defined by

u 2 œ ..677z 1 1.056z 2 v 2 œ .863x 3 .706x 4

with correlation 32 œ È .0009 œ.

We see that the first pair captures most of the relation between arithmetic and reading. The canonical variate for arithmetic, u , places over three times as much weight on speed 1 as it does on accuracy and the canonical variate for reading, v , puts more weight on 1 comprehension that on speed in proportion 4:3. Note that this does not say, for example, that speed is three times as important as accuracy in arithmetic. It simply says that if we are asking for a measure of the relation between arithmetic and reading, these functions provide the essential component of that relation.

Interpretation of Canonical Variables

In general, the canonical variables are artificial and may have no physical meaning. The interpretation is often aided by computing the correlation between the original variables and the canonical variables. To do this, note that the canonical variables are related to the original variables by the equations,

u œ A z T^ 1 and v œœœœ B z T 2

Canonical Correlation Analysis - Applied Multivariate Analysis | STAT 636, Study notes of Descriptive statistics

Related documents

Partial preview of the text

Download Canonical Correlation Analysis - Applied Multivariate Analysis | STAT 636 and more Study notes Descriptive statistics in PDF only on Docsity!

LECTURE 9

CANONICAL CORRELATION ANALYSIS

DDDD DDDD

DDDD DDDD

D

º º^ 0.

-D-D-D-D DDDD

DDDD -D-D-D-D

A

A A

A A

º º^ (^ )

-D-D-D-D DDDD

DDDD -D-D-D-D

R

Ô ×

Ö Ù

Ö Ù

Õ Ø