Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Understanding Memory Bandwidth and Data Size in Computer Architecture - Prof. William D. G, Exams of Computer Science

University of Illinois - Urbana-Champaign Computer Science

Prof. William D. Gropp

The importance of memory in computer architecture performance, focusing on memory impact and instruction execution. The role of memory in performance bounds, refining performance bounds, memory bandwidth vs data size, and the impact of memory hierarchy. It also discusses the importance of spatial locality and temporal locality, as well as the effects of virtual memory and traps for the unwary.

Typology: Exams

Pre 2010

Uploaded on 03/16/2009

koofers-user-zlp 🇺🇸

9 documents

1 / 32

This page cannot be seen from the preview

Don't miss anything!

Computer Architecture and

Performance:

Memory Impact;

Instruction Execution

William Gropp

Discover Exams of Computer Science University of Illinois - Urbana-Champaign

Partial preview of the text

Download Understanding Memory Bandwidth and Data Size in Computer Architecture - Prof. William D. G and more Exams Computer Science in PDF only on Docsity!

Computer Architecture and

Performance:

Memory Impact;

Instruction Execution

William Gropp

Importance of Memory in

Performance Bounds

• We have seen:

♦ Loads and stores can be as important as

floating point operations

♦ Simple models that look at just sustained

memory bandwidth (and ignore details of

cache effects) can provide useful bounds on

performance

Recall the sparse matrix-multiply example
True for problems where the majority of data accesses are consecutive

♦ Note that this is a bound, a guaranteed-not-

to-exceed value for the performance

Memory Bandwidth vs Data Size

L

Main Memory

Impact of Memory Hierarchy

Data Size (Bytes) 10 3 10 4 10 5 10 6 10 7 10 (^10008) 2000 3000 4000 5000 6000 7000 8000 9000 10000 11000 12000 13000 14000 15000 STREAM performance in MB/s versus data size L L

Breaking the Model: TLB

Adding virtual memory requires a extremely

fast way to convert virtual addresses to

physical addresses

♦ The Translation Lookaside Buffer is a cache that performs this translation ♦ However, with typical page sizes, the TLB does not provide fast translation for all memory in cache

Cost of occasional TLB miss in consecutive accesses (for data in memory and not on disk) is relatively small
Cost for non-consecutive addresses can be very large ♦ Partial fix: Specify larger pages
No standard way to do this in language (or among flavors of Unix) ♦ Algorithmic fix: Change order of accesses
No standard way to control in language
Depends on page and cache line size

What’s Next?

• There are many details that we’ve

ignored

♦ Can more than one operation take

place at a time?

♦ Does each assignment require a store

into memory?

♦ What about the other operations

(loop counts and tests, array

indexing, etc.)?

• Before answering these, lets revisit

the CPU

More Details

• Can more than one operation take place

at a time?

♦ Yes, if they involve different functional units

♦ Or if there are multiple units of the same

type, as long as enough units are available

Architecture Feature: Quickest way to add to peak floating point performance is to add floating point units
Algorithm and Programming language must make use of these − Discussion Question: Are there natural ways to use and express this?

More Details (2)

• Does each assignment require a store

into memory?

• Consider this code in C:

double sum = 0;

for (I=0;I <n; I++) {

sum = sum + a[I];

• The value “sum” may be stored in

register, requiring no load or store.

♦ Making use of registers can be crucial in

achieving high performance

♦ Recall the CPU diagram: most operations

take place between operands in register

Perils of Aliasing

They do not compute the same value!
Consider this usage of the routines ♦ Sum( &a[2], a, 3 ); ♦ In the first case, the routine computes - A[2] + A[0] + a[1] + a[2] + a[0] + a[1] - Why? ♦ In the second case, the routine computes - A[0] + a[1] + a[2]
When two variables may describe overlapping memory regions, they are said to alias one another ♦ Programming languages with pointers often permit aliasing (how can they prevent it) ♦ The potential for aliasing can force the compiler to store a value (or in a different example, load it) even though the programmer does not intend to use aliased data ♦ Discussion Question: Is this a flaw in the programming model? If so, how would you fix it?

More Details (4)

• What about the other operations (loop

counts and tests, array indexing, etc.)?

♦ Operations on integers are relatively fast in

modern CPUs

Exceptions include integer divide and modulus

♦ Branches (conditional jumps to other parts

of the code, such as at a loop test) are also

relatively expensive

♦ However, most are still faster than an L

cache miss

Some Rules for Bounding Performance

Most importantly remember: the goal is to create an effective (but possibly approximate) bound on performance - not an estimate! ♦ Discussion Question: What’s the difference?
Count the number of operations in each functional unit category: ♦ Loads/Stores ♦ Floating Point (add, subtract, multiply - divides are a special subcase) ♦ Other operations (integer arithmetic, branches, comparisons, etc.)
For each of these, compute the time they will take
The bound on the time is the max of these three ♦ Note: not really a bound because we’ve ignored any dependencies between the different operations ♦ You can refine each of these by including more detail - Refine load/store by considering cache

Another Example: Matrix-Matrix Multiply (ddot form)

do i=1,n do j=1,n do k=1,n c(i,j) = c(i,j) + a(i,k) * b(k,j) =
Like transpose, but two new features:
Perform a calculation (we’ll see why this is important later)
Reuse of data: n 2 data used for n 3 operations

Reusing Data

Load data into register
Use several times (each load, even from

cache, is at least a cycle)

Use loop unrolling to expose register use ♦ … c(i,j) += a(i,k) * b(k,j) c(i+1,j) += a(i+1,k) * b(k,j) c(i,j+1) += a(i,k) * b(k,j+1) c(i+1,j+1) += a(i+1,k) * b(k,j+1)
Each a(i,j) etc. used twice ♦ Cuts the numbers of loads in half ♦ But requires enough registers to hold all items - 4 registers for a(I,k), a(I+1,k), b(k,j), b(k,j+1) plus 2 registers for I, j, and 4 registers for address of a(I,k), address of b(k,j), address of c(I,j), and address of c(I,j+1).

Understanding Memory Bandwidth and Data Size in Computer Architecture - Prof. William D. G, Exams of Computer Science

Related documents

Partial preview of the text

Download Understanding Memory Bandwidth and Data Size in Computer Architecture - Prof. William D. G and more Exams Computer Science in PDF only on Docsity!

Computer Architecture and

Performance:

Memory Impact;

Instruction Execution

William Gropp

Importance of Memory in

Performance Bounds

• We have seen:

♦ Loads and stores can be as important as

floating point operations

♦ Simple models that look at just sustained

memory bandwidth (and ignore details of

cache effects) can provide useful bounds on

performance

♦ Note that this is a bound, a guaranteed-not-

to-exceed value for the performance

L

L

Main Memory

Impact of Memory Hierarchy

Breaking the Model: TLB

fast way to convert virtual addresses to

physical addresses

What’s Next?

• There are many details that we’ve

ignored

♦ Can more than one operation take

place at a time?

♦ Does each assignment require a store

into memory?

♦ What about the other operations

(loop counts and tests, array

indexing, etc.)?

• Before answering these, lets revisit

the CPU

More Details

• Can more than one operation take place

at a time?

♦ Yes, if they involve different functional units

♦ Or if there are multiple units of the same

type, as long as enough units are available

More Details (2)

• Does each assignment require a store

into memory?

• Consider this code in C:

double sum = 0;

for (I=0;I <n; I++) {

sum = sum + a[I];

• The value “sum” may be stored in

register, requiring no load or store.

♦ Making use of registers can be crucial in

achieving high performance

♦ Recall the CPU diagram: most operations

take place between operands in register

More Details (4)

• What about the other operations (loop

counts and tests, array indexing, etc.)?

♦ Operations on integers are relatively fast in

modern CPUs

♦ Branches (conditional jumps to other parts

of the code, such as at a loop test) are also

relatively expensive

♦ However, most are still faster than an L

cache miss

Reusing Data

cache, is at least a cycle)

Blocking for Cache

• Reuse data in cache by blocking

Block for each level of memory hierarchy