


Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Information about a university homework assignment for the course cda 5155 in the fall of 2008. The assignment covers topics such as instruction-level parallelism, cache performance, and memory hierarchy. Students are required to answer various problems related to these topics, including calculating average cpi for different machines, determining cache hit ratios for different organizations, and understanding the relation between page size and l1 cache size. The assignment is due on november 6, 2008, and includes three problems.
Typology: Assignments
1 / 4
This page cannot be seen from the preview
Don't miss anything!



You are not allowed to take or give help in completing this assignment. Submit the PDF version of the submission in e-Learning website before the deadline. Please include the sentence in bold on top of your submission: “ I have neither given nor received any unauthorized aid on this assignment ”.
Problem 1
Problem 2 A traditional way to improve cache performance is to separately optimize memory accesses with distinct purposes, such as instruction access versus data access. This technique can be taken even further, by separating the data memory access into different sub-categories such as stack versus non-stack access. Such approaches are sometimes called region caching , in reference to different address-space regions that may conventionally be devoted to instructions, stack, static data and heap. In this problem, you are going to calculate and compare the performance for two different machines A and B. Note that in a given benchmark suite, 15% of benchmark instructions are loads and 5% are stores, and that the average CPI is 2 for the given benchmark suite on both machines, excluding both the “store → load” and “load → other” latencies (which will be explained below) as well as the data miss. Assume that 20% of loads in the benchmark are close enough to a store that they depend on to incur the maximum “store → load” latency on either machine, and that the rest of the loads are far enough away to incur no latency on either machine. Similarly, assume that 30% of loads are followed by a data-dependent instruction that is close enough to incur the maximum “load → other” latency on either machine and the rest incur no latency. Moreover, we assume 50% of the data memory accesses are stack accesses.
Problem 3
Request sequence
Dir-map (set 0)
Dir-map (set 1)
Dir-map (set 2)
Problem 4