Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Memory System Performance, Lecture Slide - Computer Science, Slides of Introduction to Computers

Carnegie Mellon University (CMU)Introduction to Computers

Impact of cache pointers, Impact of Memory reference patterns, Matrix Multiply, Transpose, Memory Mountain Range

Typology: Slides

2010/2011

Uploaded on 10/07/2011

rolla45 🇺🇸

(6)

133 documents

1 / 35

This page cannot be seen from the preview

Don't miss anything!

Memory System Performance

October 29, 1998

Topics

•Impact of cache parameters

•Impact of memory reference patterns

–matrix multiply

–transpose

–memory mountain range

15-213

class20.ppt

Discover Slides of Introduction to Computers Carnegie Mellon University (CMU)

Partial preview of the text

Download Memory System Performance, Lecture Slide - Computer Science and more Slides Introduction to Computers in PDF only on Docsity!

Memory System Performance

October 29, 1998

Topics

Impact of cache parameters

memory mountain range– transpose – matrix multiply Impact of memory reference patterns

class20.ppt

CS 213 F’

2 –

class20.ppt

Basic Cache Organization

Cache (C = S x E x B bytes)

S = 2

s sets

E blocks/set

(cache line) Cache block

Address space (

N = 2

n bytes)

Valid bit

data

1 bit

B = 2

b bytes (block size)

t bitstag

(n = t + s + b bits) Address

CS 213 F’

4 –

class20.ppt

Cache Performance Metrics

Miss Rate

(misses/references)fraction of memory references not found in cache

1-2% for L2 5-10% for L1 Typical numbers:

Hit Time

to determine whether the block is in the cache)time to deliver a block in the cache to the processor (includes time

3-8 clock cycles for L2 1 clock cycle for L1 Typical numbers

Miss Penalty

Typically 10-30 cycles for main memory additional time required because of a miss

CS 213 F’

5 –

class20.ppt

Impact of Cache and Block Size

Cache Size

Larger is better Effect on miss rate

Smaller is faster Effect on hit time

Block Size

For given cache size, can hold fewer big blocks than little ones, though – Big blocks help exploit spatial locality Effect on miss rate

Longer transfer time Effect on miss penalty

CS 213 F’

7 –

class20.ppt

Impact of Write Strategy

Write-through or write-back?

Advantages of Write Through

Read misses are cheaper. Why?

Simpler to implement.

Requires a write buffer to pipeline writes

Advantages of Write Back

Especially if bus used to connect multiple processors or I/O devices Reduced traffic to memory

Individual writes performed at the processor rate

CS 213 F’

8 –

class20.ppt

Compulsory Misses^ Qualitative Cache Performance Model

First access to line not in cache

Also called “Cold start” misses

Capacity Misses

Active portion of memory exceeds cache size

Conflict Misses

map to same cache entryActive portion of address space fits in cache, but too many lines

Direct mapped and set associative placement only

CS 213 F’

10 –

class20.ppt

Interactions Between Program & Cache

Major Cache Effects to Consider

Try to keep heavily used data in highest level cache Total cache size

Exploit spatial locality Block size (sometimes referred to “line size”)

Example Application

Multiply n X n matrices

O(n (^) ) total operations (^3)

n values summed per destination – n reads per source element Accesses » But may be able to hold in register

/ ijk*

(^) */

for (i=0; i<n; i++)

{

for (j=0; j<n; j++) { for (k=0; k<n; k++)sum = 0.0; sum += a[i][k] * b[k][j];

c[i][j] = sum;

}

} / ijk*

(^) */

for (i=0; i<n; i++)

{

for (j=0; j<n; j++) { for (k=0; k<n; k++)sum = 0.0; sum += a[i][k] * b[k][j];

c[i][j] = sum;

}

Variable

(^) sum

held in register

CS 213 F’

11 –

class20.ppt

Matmult Performance (Sparc20)

n n n n n n n l l l l l l l s s s s s s s u u u u u u u q q q q q q q m m m m m m m 50

100

125

150

175

200

0 2 4 6 8 10 12 14 16 18 20

mflops (d.p.)

matrix size (n)

n ikj

l kij

s ijk

u jik

q jki

m kji

As matrices grow in size, exceed cache capacity

Whether or not can accumulate in register – Cache effects Different loop orderings give different performance

CS 213 F’

13 –

class20.ppt

Matrix multiplication (ijk)

for (i=0; i<n; i++) / ijk /

for (j=0; j<n; j++)

for (k=0; k<n;sum = 0.0;

(^) k++)

sum += a[i][k]

b[k][j];

c[i][j] = sum;

} for (i=0; i<n; i++)/ ijk /

for (j=0; j<n; j++)

for (k=0; k<n; k++)sum = 0.0; sum += a[i][k]

b[k][j];

c[i][j] = sum;

A

B

C

(i,*)

(*,j)

(i,j)

Inner loop:

wiseColumn-

Row-wise

Fixed

Approx. Miss Rates

CS 213 F’

14 –

class20.ppt

Matrix multiplication (jik)

for (j=0; j<n; j++) / jik /

for (i=0; i<n; i++)

for (k=0; k<n; k++)sum = 0.0; sum += a[i][k]

b[k][j];

c[i][j] = sum

} for (j=0; j<n; j++)/ jik /

for (i=0; i<n; i++)

for (k=0; k<n; k++)sum = 0.0; sum += a[i][k]

b[k][j];

c[i][j] = sum

A

B

C

(i,*)

(*,j)

(i,j)

Inner loop:

Row-wise

wiseColumn-

Fixed

Approx. Miss Rates

CS 213 F’

16 –

class20.ppt

Matrix multiplication (ikj)

for (i=0; i<n; i++) / ikj /

for (k=0; k<n; k++)

for (j=0; j<n;r = a[i][k];

j++)

c[i][j] += r

b[k][j];

A

B

C

(i,*)

(i,k)

(k,*)

Inner loop:

Row-wise

Fixed

Approx. Miss Rates

CS 213 F’

17 –

class20.ppt

Matrix multiplication (jki)

for (j=0; j<n; j++) / jki /

for (k=0; k<n; k++)

for (i=0; i<n;r = b[k][j];

i++)

c[i][j] += a[i][k]

A

B

C

(*,j)

(k,j)

Inner loop: (*,k)

wiseColumn -

wiseColumn-

Fixed

Approx. Miss Rates

CS 213 F’

19 –

class20.ppt Summary of Matrix Multiplication

for (j=0; j<n; j++) {for (i=0; i<n; i++) { for (k=0; k<n; k++)sum = 0.0; sum += a[i][k] * b[k][j];

c[i][j] = sum;

}

} ijk (L=2, S=0, MR=1.25)

for (j=0; j<n; j++) { for (i=0; i<n; i++) { for (k=0; k<n; k++)sum = 0.0; sum += a[i][k] * b[k][j];

c[i][j] = sum

}

for (k=0; k<n; k++) { for (i=0; i<n; i++) { for (j=0; j<n; j++)r = a[i][k]; c[i][j] += r * b[k][j];

}

jik (L=2, S=0, MR=1.25) kij (L=2, S=1, MR=0.5)

for (i=0; i<n; i++) { for (j=0; j<n; j++)r = a[i][k];for (k=0; k<n; k++) { c[i][j] += rb[k][j];*

} } ikj (L=2, S=1, MR=0.5)

for (j=0; j<n; j++) { for (k=0; k<n; k++) { for (i=0; i<n; i++)r = b[k][j]; c[i][j] += a[i][k] * r;

}

} jki (L=2, S=1, MR=2.0)

for (k=0; k<n; k++) { for (j=0; j<n; j++) { for (i=0; i<n; i++)r = b[k][j]; c[i][j] += a[i][k] * r;

} } kji (L=2, S=1, MR=2.0)

CS 213 F’

20 –

class20.ppt

n^ Matmult performance (DEC5000)

n n n n n n

l l l l l l l s s s s s s s u u u u u u u q q q q q q q m m m m m m m 50

100

125

150

175

200

mflops (d.p.)

matrix size (n)

n ikj

l kij

s ijk

u jik

q jki

m kji

Memory System Performance, Lecture Slide - Computer Science, Slides of Introduction to Computers

Related documents

Partial preview of the text

Download Memory System Performance, Lecture Slide - Computer Science and more Slides Introduction to Computers in PDF only on Docsity!

Memory System Performance

October 29, 1998

Topics

Basic Cache Organization

S = 2

N = 2

B = 2

Cache Performance Metrics

Miss Rate

Hit Time

Miss Penalty

Impact of Cache and Block Size

Cache Size

Block Size

Impact of Write Strategy

Advantages of Write Through

Advantages of Write Back

Compulsory Misses^ Qualitative Cache Performance Model

Capacity Misses

Conflict Misses

Interactions Between Program & Cache

Major Cache Effects to Consider

Example Application

Matmult Performance (Sparc20)

Matrix multiplication (ijk)

A

B

C

Approx. Miss Rates

Matrix multiplication (jik)

A

B

C

Approx. Miss Rates

Matrix multiplication (ikj)

A

B

C

Approx. Miss Rates

Matrix multiplication (jki)

A

B

C

Approx. Miss Rates

class20.ppt Summary of Matrix Multiplication

n^ Matmult performance (DEC5000)

(L=2, S=0, MR=1.25) (L=2, S=1, MR=0.5)

(L=2, S=1, MR=2.0)