Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Exercises Computation II 5EIB0: Answers and Solutions, Exercises of Computer Science

Technische Universiteit Eindhoven Computer Science

Computer Architecture Questions

Typology: Exercises

2018/2019

Uploaded on 06/24/2019

kristikapllani 🇳🇱

2 documents

1 / 5

This page cannot be seen from the preview

Don't miss anything!

Exercises Computation II 5EIB0

Answers

Answer 1

N_instructions CPI T_cycle T_execution

------------------------------------------------------------------

Single cycle 10000 1 2.0 ns 20000 ns

Multi cycle 10000 4 0.4 ns 16000 ns

Pipelined 10000 1 0.4 ns 4000 ns

Answer 2

a. CPI_ideal = 1/3

b. CPI_branch = CPI_ideal + f_branch * f_wrong * BranchPenalty

= 0.33 + 0.15 * 0.05 * 19

= 0.476

Answer 3

6 stall cycles.

lw $t0, 0($t2)

2 stall cycles

lw $t1, 4($t0)

2 stall cycles

sub $s5, $t1, $t2

2 stall cycles

sw $s5, 4($t0)

Answer 4

An extra (third) read port is needed.

Discover Exercises of Computer Science Technische Universiteit Eindhoven

Partial preview of the text

Download Exercises Computation II 5EIB0: Answers and Solutions and more Exercises Computer Science in PDF only on Docsity!

Exercises Computation II 5EIB

Answers

Answer 1

N_instructions CPI T_cycle T_execution

Single cycle 10000 1 2.0 ns 20000 ns

Multi cycle 10000 4 0.4 ns 16000 ns

Pipelined 10000 1 0.4 ns 4000 ns

Answer 2

a. CPI_ideal = 1/

b. CPI_branch = CPI_ideal + f_branch * f_wrong * BranchPenalty

= 0.33 + 0.15 * 0.05 * 19

= 0.

Answer 3

6 stall cycles.

lw $t0, 0($t2)

2 stall cycles

lw $t1, 4($t0)

2 stall cycles

sub $s5, $t1, $t

2 stall cycles

sw $s5, 4($t0)

Answer 4

An extra (third) read port is needed.

Check the MIPS pipelined data path figures !!

Answer 5

CPI = CPI_ideal + f_inst * I_missrate * I_misspenalty

F_data * D_missrate * D_misspenalty

= 2 + 1 * 0.05 * 20 + 0.3 * 0.1 * 20 = 3.6 cycles

Slowdown is T_new / T_old

= (N_instr_new * CPI_new * T_cycle_new)/ N_instr_old * CPI_old * T_cycle_old

= CPI_new / CPI_ideal = 3.6 / 2.0 = 1.

(so if the ideal cache program would take 1000 cycles, the real one takes 1800 cycles, or an 80 % slowdown)

Note that N_instr and T_cycle do not change.

Answer 6

Tag bits = 32 = index - word_offset - byte_offset = 32 - 8 - 1 -2 = 21 bits

Cache size = 4 * 2^8 * (value bit + tag bits + block bits)

= 2^10 * (1 + 21 + 64)

= 86 kbit

Answer 7

The data memory access pattern is: 100, 108, 104, 112, 108, 116, 112, 120

This mappes to word: 0, 2, 1, 3, 2, 4, 3, 5

1-word block: M M M M H M H M -> 25%

2-word block: M M H H H M H H -> 62.5%

4-word block: M H H H H M H H -> 75%

Note, there are no capacity or conflict misses.

(making the cache smaller and/or the offset of the second load bigger can introduce these

misses).

The peak DDR3 bandwidth =

#Partitions * #bytes/transfer * #transfers/clock * #clocks/sec =

8 * 8 * 2 * 1G = 128 GB/sec

a. Total DDR3RAM memory size = 8 * 256 MB = 2048 MB

Modern computers have 32-bit single precision

So, if we want 3 n*n SP matrices, maximum n is

3n^2 * 4 <= 2048 * 1024 * 1024

n_max = 13377 = n

b. For each element of the result, we need n multiply-adds

For each row of the result, we need n * n multiply-adds

For the entire result matrix, we need n * n * n multiply-adds

Thus, 2393 GFlops.

Per multiply-add we need to load 2 source operands, 4 bytes each.

Now a discussion is needed about the bottleneck, either processing of memory bandwidth. If no caching,

memory is clearly determining execution speed, for optimal caching (using tiling) its the processing.

b.a. Assuming cache : loading of 2 matrices and storing of 1 to the graphics memory. That is 3 * n^

= 512 GB of data =>

t_memory = 512 / 128 = 4 seconds.

t_processing = 2393 / 192 = 12.46 seconds

t_total = 16.46 seconds.

b.b. No cache: 2393 GFlops require 239324 Gbytes (note, storing the result can in this case

be neglected) =>

t_memory = 239324 / 128 = 149.6 seconds

t_total = 149.6+12.5 = 162.1 seconds

Answer 13

2D grid/mesh: n^2 nodes, n=4 in picture Diameter: n = Nodal degree: 4 (assuming unidirectional links) Network Bandwidth: 2PB = n^2B = 216B = 32B Bisection Bandwidth: 2nB = 8B

n-cube tree: 2^n nodes, n=3 in picture

Diameter: n = 3

Nodal degree: 2n = 23 = 6 (bidirectional links)

Network Bandwidth: N_linksB = 2^n2nB = 24B

Bisection Bandwidth: 2^n 2 / 2 * B=2^nB = 8*B

Scalability

pro mesh: constant nodal degree (so cheap), easier to layout in 2 dimensions
pro cube: short diameter

Answer 14

Exercises Computation II 5EIB0: Answers and Solutions, Exercises of Computer Science

Related documents

Partial preview of the text

Download Exercises Computation II 5EIB0: Answers and Solutions and more Exercises Computer Science in PDF only on Docsity!

Exercises Computation II 5EIB

Answers

#Partitions * #bytes/transfer * #transfers/clock * #clocks/sec =

8 * 8 * 2 * 1G = 128 GB/sec

a. Total DDR3RAM memory size = 8 * 256 MB = 2048 MB

Modern computers have 32-bit single precision

So, if we want 3 n*n SP matrices, maximum n is

3n^2 * 4 <= 2048 * 1024 * 1024

n_max = 13377 = n

b. For each element of the result, we need n multiply-adds

For each row of the result, we need n * n multiply-adds

For the entire result matrix, we need n * n * n multiply-adds

Thus, 2393 GFlops.

Per multiply-add we need to load 2 source operands, 4 bytes each.

Now a discussion is needed about the bottleneck, either processing of memory bandwidth. If no caching,

memory is clearly determining execution speed, for optimal caching (using tiling) its the processing.

b.a. Assuming cache : loading of 2 matrices and storing of 1 to the graphics memory. That is 3 * n^

= 512 GB of data =>

t_memory = 512 / 128 = 4 seconds.

t_processing = 2393 / 192 = 12.46 seconds

t_total = 16.46 seconds.

b.b. No cache: 2393 GFlops require 239324 Gbytes (note, storing the result can in this case

be neglected) =>

t_memory = 239324 / 128 = 149.6 seconds

t_total = 149.6+12.5 = 162.1 seconds

a. In shared memory system: using (regular) loads and stores

In message passing system by sending and receiving messages

b. Yes, the address space can be fully shared, while physically memories can be distributed. It means that

loads and stores can address all locations, also the ones in other cores.

c. Pros of shared memory:

Cons of shared memory: