

Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
2001 Summer Material Type: Assignment; Professor: Gehringer; Class: Architecture Of Parallel Computers; Subject: Electrical and Computer Engineering; University: North Carolina State University; Term: Unknown 1989;
Typology: Assignments
1 / 2
This page cannot be seen from the preview
Don't miss anything!


Problems 1 and 4 will be graded. There are 45 points on these problems.Note: You must doall the problems, even the non-graded ones. If you do not do some of them, half as many points as they are worth will be subtracted from your score on the graded problems.
Problem 1 .(25 points) Consider a multiprocessor with a sequentially consistent memory system. Each processor has a cache implementing a basic 3-state write-invalidate protocol (similar to the one in Figure 5.13, p.295 of Culler, Singh & Gupta), and a one-entry write buffer between the CPU and the cache, so that stores need not block the processor.
Upon a store, if the referenced cache block isn’t in the exclusive state, then the register value is transferred into the write buffer, and the necessary protocol action is launched to obtain ownership. Once the processor obtains ownership for the block, the value is transferred from the write buffer to the cache, and the entry is removed from the write buffer. Load instructions stall the processor (upon a cache miss); store instuctions don’t stall the processor so long as the SC memory model can be obeyed.
Assume the following conditions:
The trace at the rightgives the interleaved order in which the instructions from the two processors (P1 and P2) were executed. Assume that I0 starts at time 0 and successive instructions start execution on consecutive cycles as long as there is no memory model-induced stalling. U and V are in different cache blocks. Initially both processors have U and V in the Shared state in their respective caches.
You should give a timing diagram (x-axis is time,y-axis is instructions I0--I9), showing when an instruction starts execution and when it finishes execution. Also give a table that has the following fields:
instruction consistency actions miss type/hit reason for stall (if any)
Problem 2. (20 points) This problem should be solved using the MESI protocol for a bus- based shared-memory multiprocessor. Assume the following:
Direct-mapped cache organization P1 andP2 each have exactly 2 cache lines Cache-block size: 4 words Cache-to-cache block transfer takes 4 cycles Read/write hit (when no bus action is needed) takes 1 cycle Invalidation takes 2 cycles Memory-to-cache block transfer takes 8 cycles
B1 andB2 are two memory blocks that map to the same cache line. They contain the data items W, X, Y, Z, and P, Q, R, S, respectively as shown below. Each data item is one word.
You are given the following trace of memory accesses from two processorsP andP2. Assume that the accesses occur strictly sequentially in the textual order shown at the right. Also assume that whenever a bus action is required then the time for the bus action is in addition to the time needed to satisfy the processor request (read/write).
Determine the total time needed to execute the given memory access sequence. Assume that initially the caches are empty. Clearly show the state transition for the affected cache blocks in the caches ofP1 andP2 after each access. Whenever there is a cache miss, indicate it as one of cold-miss, coherence-miss, capacity-miss, or conflict miss. For coherence misses, indicate the ones that are due to false sharing and those due to true sharing. Assume that Q and Z are private variables forP1.
Problem 3. (15 points) [CS&G 5.9] Consider the following conditions proposed as sufficient conditions for SC
Are these conditions indeed sufficient to guarantee SC executions? If so, say why. If not, construct a counterexample, and say why the conditions that were listed in the chapter are indeed sufficient in that case. Hint: Think about how these conditions are different from the ones in the chapter.
Problem 4. (20 points) In a 4-node CC-NUMA DSM machine using a memory-based directory protocol with average network transaction time between nodes of 20 μsec.—
(a) Compute the remote memory access time and draw a network transaction diagram for a write miss on node 1 to a remote remote memory block on node 2 that is dirty on node 3 for the following directory protocol optimizations:
(b) Compare the performance of the above three protocol optimizations.
(c) Discuss methods for achieving coherence by serialization to a memory location in this system.
Problem 5. (20 points) When blocks in the cache are tagged by virtual addresses, an inverse TLB may be used to determine whether a given physical address is found in the cache.
Draw a diagram including the TLB, cache, inverse TLB, and main memory. Annotate that diagram as follows: Assume that processP 1 has been referencing page frame 6 using (virtual) page number 27. Then a process switch occurs to processP2, which uses the samepage (not just the same page frame!) but calls it page 15. Call your first diagram “before,” and then draw an “after” diagram, showing any changes in the cache, TLB, inverse TLB, and physical memory. Describe the order in which they changed: which changed first, which changed second, etc., and why.