Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Log in Sign up

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Architecture Of Parallel Computers - Problem Set 2 | ECE 506, Assignments of Electrical and Electronics Engineering

North Carolina State University (NCSU)Electrical and Electronics Engineering

Prof. Gehringer

2002 Summer Material Type: Assignment; Professor: Gehringer; Class: Architecture Of Parallel Computers; Subject: Electrical and Computer Engineering; University: North Carolina State University; Term: Unknown 1989;

Typology: Assignments

Pre 2010

Uploaded on 03/10/2009

koofers-user-4v1 🇺🇸

9 documents

1 / 4

This page cannot be seen from the preview

Don't miss anything!

–1–

CSC/ECE 506: Architecture of Parallel Computers

Problem Set 2

Due Friday, June 28, 2002

Problems 3, 4, and 5 will be graded. There are 45 points on these problems. Note: You must do

all the problems, even the non-graded ones. If you do not do some of them, half as many points

as they are worth will be subtracted from your score on the graded problems.

Problem 1. (25 points) As described at the end of Lecture 12, there are three ways of

organizing the addresses in interleaved memory:

•Fine interleaving or low-order interleaving which distributes the addresses so that

consecutive addresses are located within consecutive modules.

•Coarse interleaving or high-order interleaving which distributes the addresses so that

each module contains consecutive addresses.

• A combination in which both low- and high-order interleaving are used.

(a) Suppose that a 16-megaword memory is built from 1M-bit chips, so that there are at least 220

addresses per module. How many different interleaved organizations can be constructed? For

each organization, give the format of an address. This will require you to specify how many bits

there are in (up to) 3 fields: the group number, the module number, and the address within the

module.

(b) Choosing one of the above organization involves a tradeoff between bandwidth and reliability

Which of your organizations optimizes accesses to consecutive words of memory?

(c) Some of the organizations are less reliable than others, because the failure of a single

memory module scatters “holes” of unusable words throughout the address space. Which

scheme does not suffer from this problem?

(d) Assume that a program is referencing every third word. For the S-access and C-access

strategies, what are the throughputs of each organization of part (a)? Express the throughput in

terms of words/access. For example, if a memory is 16-way low-order interleaved, and 16

consecutive words are referenced, the throughput is 16 words/access. If only the even-

numbered words are referenced, however, the throughput is only 8 words/access, because only

8 of the 16 words delivered by the memory modules are used

(e) Repeat part (d), but assume that the program is referencing every second word.

Discover Assignments of Electrical and Electronics Engineering North Carolina State University (NCSU)

Partial preview of the text

Download Architecture Of Parallel Computers - Problem Set 2 | ECE 506 and more Assignments Electrical and Electronics Engineering in PDF only on Docsity!

CSC/ECE 506: Architecture of Parallel Computers

Problem Set 2

Due Friday, June 28, 2002

Problems 3, 4, and 5 will be graded. There are 45 points on these problems_. Note: You must do all the problems, even the non-graded ones_. If you do not do some of them, half as many points as they are worth will be subtracted from your score on the graded problems.

Problem 1. (25 points) As described at the end of Lecture 12, there are three ways of organizing the addresses in interleaved memory:

Fine interleaving or low-order interleaving which distributes the addresses so that consecutive addresses are located within consecutive modules.
Coarse interleaving or high-order interleaving which distributes the addresses so that each module contains consecutive addresses.
A combination in which both low- and high-order interleaving are used.

(a) Suppose that a 16-megaword memory is built from 1M-bit chips, so that there are at least 2^20 addresses per module. How many different interleaved organizations can be constructed? For each organization, give the format of an address. This will require you to specify how many bits there are in (up to) 3 fields: the group number, the module number, and the address within the module. (b) Choosing one of the above organization involves a tradeoff between bandwidth and reliability Which of your organizations optimizes accesses to consecutive words of memory?

(c) Some of the organizations are less reliable than others, because the failure of a single memory module scatters “holes” of unusable words throughout the address space. Which scheme does not suffer from this problem?

(d) Assume that a program is referencing every third word. For the S-access and C-access strategies, what are the throughputs of each organization of part (a)? Express the throughput in terms of words/access. For example, if a memory is 16-way low-order interleaved, and 16 consecutive words are referenced, the throughput is 16 words/access. If only the even- numbered words are referenced, however, the throughput is only 8 words/access, because only 8 of the 16 words delivered by the memory modules are used

(e) Repeat part (d), but assume that the program is referencing every second word.

Problem 2. (30 points) This problem examines the LRU cache-management implementation using status flip-flops, as covered in Lecture 11. The diagram is reproduced below.

D ck

D

ck D

ck

NX NY NZ

C

I 0

I 1

X 0

X 1

Y 0

Y 1

Z 0

Z 1

W 0

W 1

The example design suffers from several faults:

Redundant logic. Note that C appears—

° twice as a factor in^ and^ 1, ° three times as a factor in^ and^ 2, and ° four times as a factor in^ and^ 3.

Susceptibility to hazards (races). It is generally considered bad practice to create a shift register in which all elements are not clocked from the same logic signal. Several different faults can occur:

° Value flushed through two stages on one clock.^ Consider the case where the block that has just been accessed ( I ) is not X. Then the I value should move to the X -register, and the value in the X -register to the Y -register. But if the flip- flops are fast, and and 1 is slow, then the I value will be loaded into the X - register, and when and 1 switches, also clocked into the Y -register.

° Clock chopped off by change of register value used to gate it.^ Also consider that and 1 includes the term ( I ≠ X ). In the case above, at the start of the cycle, I does not equal X , and and 1 will produce a 1. As soon as the X -register loads, I will equal X , and and 1 will turn off. This cuts the clock pulse from and 1 short, so it may not reliably load the Y -register when it should.

° Clock enabled by change of register value used to gate it.^ Consider the case where the block that has just been accessed ( I ) is Y. Then the I value should move to the X -register, and the value in the X -register to the Y -register. The Z - and W -registers should not change. At the start of the cycle, ( I ≠ Y ) = 0, preventing the Z -register from changing. But if the Y -register loads from the X - register before the end of the cycle, then ( I ≠ Y ) will turn to 1, and the Z -register will take on a new, undesired value.

The second hazard, clock-chopping, is avoided if the circuit is implemented with master- slave flip-flops, but the other two problems are only alleviated if the clock period is long.

(a) Calculate the number of misses, miss rate and the total time it will take to handle all of these misses with a FIFO page replacement policy.

(b) Calculate the number of misses, miss rate and the total time it will take to handle all of these misses with LRU page replacement policy.

(c) If we increase the cache size by one line with LRU page replacement policy, what will the new number of misses be? What happens if we double the cache size (to 6 lines)? Will this change obey the 30% rule?

Question 5. (15 points) Consider a system with a byte-addressed two-level cache having the following characteristics:

Size Associa- tivity

Sector size Block size Avg. miss rate

L Cache

4KB Direct mapped

2 blocks/sector 1 word/block Write through, no write allocate

N/A

L

Cache

160KB 4-way set associative 2 blocks/sector 1 word/block Write through, write allocate

The system has a 40-bit physical address space and a 52-bit virtual address space. One word is 4 bytes.

(a) What is the total number of bits within each L1 cache block, including status bits? (b) What is the total number of bits within each L1 cache sector, including status bits? (c) What is the total number of bits within each L2 set, including status bits? (d) Now just consider the L2 cache. Suppose that on any miss, a whole block is read and the processor sends references to its cache at the rate of 109 words per second, and 30% of the references are writes. The bus cannot read or write more than one word at a time. Calculate the bandwidth that a single processor uses to the memory system. What percentage of the available bandwidth is this?

Architecture Of Parallel Computers - Problem Set 2 | ECE 506, Assignments of Electrical and Electronics Engineering

Related documents

Partial preview of the text

Download Architecture Of Parallel Computers - Problem Set 2 | ECE 506 and more Assignments Electrical and Electronics Engineering in PDF only on Docsity!

CSC/ECE 506: Architecture of Parallel Computers

Problem Set 2

Due Friday, June 28, 2002

NX NY NZ

I 0

I 1

X 0

X 1

Y 0

Y 1

Z 0

Z 1

W 0

W 1

N/A

L