Computer Architecture: Window Size, Multithreading, and Cache Penalty Calculation - Prof. , Assignments of Electrical and Electronics Engineering

Solutions to homework problems related to computer architecture concepts such as window size, multithreading, and cache penalty calculation. It includes detailed explanations and formulas for each problem.

Typology: Assignments

Pre 2010

Uploaded on 03/11/2009

koofers-user-td0-2
koofers-user-td0-2 🇺🇸

10 documents

1 / 3

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Homework 3
Brief Solution
Problem 1
1. [1] The set of instructions that is examined for simultaneous execution is called the window.
[2] Window size decides the total number of instructions that can be examined for possible
independent instructions that can be issued together. Thus window size directly limits issue rate of the
processor, though it’s not the only factor. On the other hand, the window size is also limited by issue
rate from a practical point of view. A limited issue rate will make a large window size a waste and
much less helpful.
[2] Window size is limited by the required storage needed to put those instructions, and number of
comparisons needed to determine instruction dependences. Issue rate is limited by many other factors
like true data dependence among instructions in the window and limited number of functional units,
register file ports, instruction commit, etc.
2. [2] Fine-grained multithreading switches between threads on each instruction, causing the execution
of multiple threads to be interleaved at the granularity of one instruction. It takes the advantage that it
can fide the throughput losses that arise from both short and long stalls. The disadvantage of it is that
the execution of individual threads is slowed down.
[2] Coarse-grained multithreading switches threads only on costly stalls. It takes the advantage over
find-grained multithreading that it relieves the need to have thread switching be essentially free and is
much less likely to slow the processor down. The main drawback is that its ability to overcome
throughput losses, especially from shorter stalls, is limited.
3. [1] Since there is only one TLB, switching between processes lead to TLB flushing. Executing both
processes will make too many TLB flushes.
Problem 2
1. [5] 2 + (20% * 15% * 10 + 30% * 15% * 6) + (15% + 5%) * 5% * 50 = 3.07 (cycles)
15% load and of them 20% are close to store = 15%*20%*10
15% load and of them 30% are close to other = 15%*30%*6
stall penalty due to store->load = 0.15*0.20*10 = 0.3
stall penalty due to load->other = 0.15*0.30*6 = 0.27
stall due to cache miss = (0.15+0.05)*0.05*50 = 0.5
So, total penalty = 0.3+0.27+0.5 =1.07
CPI of A = 2+ 1.07 =3.07
2. [5] 2 + [80% * (20% * 15% * 1 + 30% * 15% * 1) + 20% * (20% * 15% * 10 + 30% * 15% * 6)] *
50% + (20% * 15% * 10 + 30% * 15% * 6) * 50% + 2% * 50% * 20% *50 + 3% * 20% * 50% * 50 =
2.622 (cycles)
20% data memory access = 15% load + 5% store ;20% of load have store->load delay 10
30% of load have load->other delay 6; 50% data memory are stack access(stack cache)
80% prediction in stack cache right then store-> load/load->other delay = 1
20% store->load=10 / load->other = 6; 2% = Miss rate for stack cache ; 3% = no-stack data access
miss rate.
pf3

Partial preview of the text

Download Computer Architecture: Window Size, Multithreading, and Cache Penalty Calculation - Prof. and more Assignments Electrical and Electronics Engineering in PDF only on Docsity!

Homework 3

Brief Solution

Problem 1

  1. [1] The set of instructions that is examined for simultaneous execution is called the window. [2] Window size decides the total number of instructions that can be examined for possible independent instructions that can be issued together. Thus window size directly limits issue rate of the processor, though it’s not the only factor. On the other hand, the window size is also limited by issue rate from a practical point of view. A limited issue rate will make a large window size a waste and much less helpful. [2] Window size is limited by the required storage needed to put those instructions, and number of comparisons needed to determine instruction dependences. Issue rate is limited by many other factors like true data dependence among instructions in the window and limited number of functional units, register file ports, instruction commit, etc.
  2. [2] Fine-grained multithreading switches between threads on each instruction, causing the execution of multiple threads to be interleaved at the granularity of one instruction. It takes the advantage that it can fide the throughput losses that arise from both short and long stalls. The disadvantage of it is that the execution of individual threads is slowed down. [2] Coarse-grained multithreading switches threads only on costly stalls. It takes the advantage over find-grained multithreading that it relieves the need to have thread switching be essentially free and is much less likely to slow the processor down. The main drawback is that its ability to overcome throughput losses, especially from shorter stalls, is limited.
  3. [1] Since there is only one TLB, switching between processes lead to TLB flushing. Executing both processes will make too many TLB flushes.

Problem 2

  1. [5] 2 + (20% * 15% * 10 + 30% * 15% * 6) + (15% + 5%) * 5% * 50 = 3.07 (cycles) 15% load and of them 20% are close to store = 15%20% 15% load and of them 30% are close to other = 15%30% stall penalty due to store->load = 0.150.2010 = 0. stall penalty due to load->other = 0.150.306 = 0. stall due to cache miss = (0.15+0.05)0.0550 = 0. So, total penalty = 0.3+0.27+0.5 =1. CPI of A = 2+ 1.07 =3.
  2. [5] 2 + [80% * (20% * 15% * 1 + 30% * 15% * 1) + 20% * (20% * 15% * 10 + 30% * 15% * 6)] * 50% + (20% * 15% * 10 + 30% * 15% * 6) * 50% + 2% * 50% * 20% *50 + 3% * 20% * 50% * 50 = 2.622 (cycles) 20% data memory access = 15% load + 5% store ;20% of load have store->load delay 10 30% of load have load->other delay 6; 50% data memory are stack access(stack cache) 80% prediction in stack cache right then store-> load/load->other delay = 1 20% store->load=10 / load->other = 6; 2% = Miss rate for stack cache ; 3% = no-stack data access miss rate.

stack Penalty due to store->load and load->other = 0.150.50{0.80(0.201+0.301) +0.20(0.2010+0.306)}=0.03+0.057 = 0. Penalty due to memory stalls = {(0.15+0.05)0.500.0250} = 0. total penalty for access stack = 0.087+0.1 = 0. non-stack Penalty due to store->load and load->other = 0.150.50 {0.2010+0.306} = 0. Penalty due to memory stalls = {(0.15+0.05)0.500.03*50} = 0. total penalty for access non-stack = 0.285+0.15=0.

CPI of B = 2+ total penalty for stack+ total penalty for non-stack =2+0.187+0.435 = 2.

  1. [5] Instruction execution rateA = 4000MHz / 3.07 = 1302.93 MIPS Instruction execution rateB = 3500MHz / 2.622 = 1334.86 MIPS Thus machine B is slightly better.

Problem 3 Note: The answer is NOT unique.

  1. [5] Fully-associative > 2 way set-associative > direct-mapped Request sequence

A B C D F B F

Dir-map (set 0)

A A A A A A A

Dir-map (set 1)

- B B B F B F

Dir-map (set 2)

- - C C C C C

Dir-map (set 3)

- - - D D D D

Dir-map (hit/miss)

miss miss miss miss miss miss miss

Set-asso (set 0)

A A C

A

C

A

C

A

C

A

C

A

Set-asso (set 1)

- B B D

B

F

D

B

F

F

B

Set-asso (hit/miss)

miss miss miss miss miss miss hit

Full-asso (set 0)

A BA CBA DCBA FDCB BFDC FBDC

Full-asso (hit/miss)

miss miss miss miss miss hit hit