Selected Question answers, Exercises of Computer Architecture and Organization

sample question answer regarding computer architecture

Typology: Exercises

2017/2018

Uploaded on 04/05/2018

roshan-koju-1
roshan-koju-1 🇳🇵

4.6

(8)

9 documents

1 / 11

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Computer architecture questions
These questions were collected from previous exams and tests, so you will find
a new set of processor specifications inserted at various locations: the questions
following use those processor specifications. You will also find some essentially
identical questions! “Re-use” is a well-established software engineering principle:
we use it for exam questions too!
Except where otherwise indicated, use the following operating system and processor
characteristics in all questions.
Your operating system uses 8 kbyte pages. The machine you are using has a 4-way
set associative 32 kbyte unified L1 cache and a 64 entry fully associative TLB.
Cache lines contain 32 bytes. Integer registers are 32 bits wide. Physical addresses
are also 32 bits. It supports virtual addresses of 46 bits. 1 Gbyte of main memory is
installed.
a) Give one advantage of a direct mapped cache.
b) What is the main disadvantage of a direct mapped cache?
c) How many sets does the cache contain?
d) How many comparators does the cache require?
e) How many bits do these comparators work on?
f) Your program is a text processor for large documents: in an initial check, it scans
the document looking for illegal characters. For an 8 Mbyte document, what
would you expect the L1 cache hit rate to be during the initial check? (You are
expected to do a calculation and give an approximate numeric answer!)
g) Your program manipulates large arrays of data. In order to consistent good
performance, you should avoid one thing. What is it? (Be precise – a numeric
answer relevant to the processor described above and an explanation is required
here.)
h) What is the alternative to a unified cache? What advantages does it provide?
i) In addition to data and tags, a cache will have additional bits associated with each
entry. List these bits and add a short phrase describing the purpose of each bit (or
set of bits). (In all cases, make your answers concise: simply list any differences
from a preceding answer.)
(i) A set-associative write-back cache
(ii) A set-associative write-through cache
(iii) A direct mapped cache
(iv) A fully associative cache
j) 32 processes are currently running. If the OS permitted each process to use the
maximum possible address space, how many page table entries are required.
(i) Conventional page tables
(ii) Inverted page tables
k) Draw a diagram showing how the bits of a virtual address are used to generate a
32-bit physical address.
l) “A program which simply copies a large block of data from one memory location
to another exhibits little locality of reference, therefore its performance is not
improved by the presence of a cache.” Comment on this statement. Is it strictly
true, mostly true or not true at all? Explain your answer. Assume you are running
programs on the processor described at the beginning of this section.
pf3
pf4
pf5
pf8
pf9
pfa

Partial preview of the text

Download Selected Question answers and more Exercises Computer Architecture and Organization in PDF only on Docsity!

Computer architecture questions

These questions were collected from previous exams and tests, so you will find

a new set of processor specifications inserted at various locations: the questions following use those processor specifications. You will also find some essentially identical questions! “Re-use” is a well-established software engineering principle: we use it for exam questions too!

Except where otherwise indicated, use the following operating system and processor characteristics in all questions****. Your operating system uses 8 kbyte pages. The machine you are using has a 4-way set associative 32 kbyte unified L1 cache and a 64 entry fully associative TLB.

Cache lines contain 32 bytes. Integer registers are 32 bits wide. Physical addresses are also 32 bits. It supports virtual addresses of 46 bits. 1 Gbyte of main memory is installed.

a) Give one advantage of a direct mapped cache. b) What is the main disadvantage of a direct mapped cache? c) How many sets does the cache contain? d) How many comparators does the cache require? e) How many bits do these comparators work on? f) Your program is a text processor for large documents: in an initial check, it scans the document looking for illegal characters. For an 8 Mbyte document, what would you expect the L1 cache hit rate to be during the initial check? (You are expected to do a calculation and give an approximate numeric answer!) g) (^) Your program manipulates large arrays of data. In order to consistent good performance, you should avoid one thing. What is it? (Be precise – a numeric answer relevant to the processor described above and an explanation is required here.) h) What is the alternative to a unified cache? What advantages does it provide? i) In addition to data and tags, a cache will have additional bits associated with each entry. List these bits and add a short phrase describing the purpose of each bit (or set of bits). (In all cases, make your answers concise: simply list any differences from a preceding answer.) (i) A set-associative write-back cache (ii) A set-associative write-through cache (iii) A direct mapped cache (iv) A fully associative cache j) 32 processes are currently running. If the OS permitted each process to use the maximum possible address space, how many page table entries are required. (i) Conventional page tables (ii) Inverted page tables k) Draw a diagram showing how the bits of a virtual address are used to generate a 32-bit physical address. l) “A program which simply copies a large block of data from one memory location to another exhibits little locality of reference, therefore its performance is not improved by the presence of a cache.” Comment on this statement. Is it strictly true, mostly true or not true at all? Explain your answer. Assume you are running programs on the processor described at the beginning of this section.

m) You are advising a team of programmers writing a large scientific simulation

program. The team mainly consists of CS graduates who skipped any study of computer architecture in their degrees. Performance is critical. List some simple things that you would advise them to do when writing code for this system. Provide a one sentence explanation for each point of advice. (1 mark for each valid piece of advice, 1 for explaining it and 1 for adding a number that makes the advice specific to the processor described earlier.) ------------------------------------------------------------------------------------------------------------ Except where otherwise indicated, use the following operating system and processor characteristics in all questions. Your operating system uses 8 kbyte pages. The machine you are using has a 4-way set associative 32 kbyte unified L1 cache and a 128 entry fully associative TLB. Integer registers are 32 bits wide. Physical addresses are also 32 bits. It supports virtual addresses of 44 bits. The bus is 64 bits wide: the most usual bus transaction has four data cycles. 1 Gbyte of main memory is installed. n) Why would a processor execute both statements s1 and s2 from a compound statement: if ( condition ) s1; else s2;

o) What would you expect the cache line length to be? p) How many comparators does the cache require? q) How many bits do these comparators work on? r) How many comparators does the TLB require? s) How many bits do these comparators work on? t) Your program is a text processor for large documents: in an initial check, it scans the document looking for illegal characters. For an 8 Mbyte document, what would you expect the hit rates to be during the initial check? (You are expected to do a calculation and give an approximate numeric answer!) (i) Cache (ii) TLB u) Under what conditions would you expect to achieve 100% TLB hits? (Two answers required. For one, you are expected to do a calculation and give an approximate numeric answer!) v) Why are caches built with long (ie more than 8 byte) lines? (Two reasons needed.) w) Your program manipulates large arrays of data. In order to consistent good performance, you should avoid one thing. What is it? (Be precise – a numeric answer relevant to the processor described above and an explanation is required here.) x) What is the maximum number of page faults can be generated by a single memory access? Explain your answer. (Assume the page fault handler is locked in memory and no other page faults are generated for pages of instructions.) y) A system interface unit will often change the order in which memory accesses generated by the program are placed on the system bus. Give two examples of such re-orderings and explain why the order is changed. z) Why does a read transaction check the write queue in a system interface unit? aa) If you had only a limited number of transistors available for improving branch performance, what prediction logic would you add? Why will it work? bb) Why does successful branch prediction improve the performance of a processor?

If you mark I/O buffers like this, then writes to them always go directly to memory (not wasting cache space) and they don’t need to be invalidated when new DMA operations are performed. Most OS’s will copy data from read buffers to a user’s address space immediately after it’s been read from the device, so there’s no advantage in caching it: it’s only ever read once. jj) What benefit would you expect from a fully associative cache (compared to other cache organizations)? kk) Despite this, fully associative data caches are rarely found. Why? ll) TLBs are often fully associative caches. Referring to your answer to the previous question, explain why. mm) What distinguishes write-through and write-back caches? nn) Your processor has a 64kbyte 8-way set associative L1 cache with 32 byte lines. When writing programs that need to perform well on this machine, list two simple things could you do to get the maximum performance. (Your answer may mention things you would not do if you prefer!) oo) How many sets does this cache have? pp) What is the relationship between addresses of lines in the same set? qq) Why is the previous answer relevant to ensuring that programs run efficiently? rr) What is a potential pitfall of writing a program that uses the answer to (pp) above to ensure good performance? ss) Your program spends most its time scanning through documents which are usually about 2Mbytes long looking for key words. What would you expect the L1 cache hit rate to be? Your answer should consider only the hit rate for the document data, ie it can ignore small perturbations caused by hits or misses in program code, OS interrupts, etc. tt) An OS supports pages of 4 kbytes. Virtual addresses are 44 bits long. The TLB has 84 entries and is fully associative. What is the TLB coverage? uu) “TLBs are just caches.” What is the ‘data’ stored in a TLB? vv) Does this TLB present the same problem that questions (g), (h) and (i) refer to? Why? ww) How many bits does the tag in this TLB have? xx) How does an OS share pages between different processes or users? yy) What benefits result from sharing pages? At least two answers required. zz) Give an example of an instruction sequence which contains a data dependency. Indicate the dependency present. aaa) Give an example of an instruction sequence which benefits from value forwarding hardware. Indicate why value forwarding helps.


Except where otherwise stated, assume that all caches in the following questions have lines of 32 bytes and a total capacity of 64kbytes.

bbb) Why would a cache be built with such a long line? (Two reasons needed.) ccc) If the cache is direct mapped, how many comparators are needed? ddd) If the cache is fully associative, how many comparators does it need? eee) (^) If the cache is 8-way set associative, how many sets does it have? fff) If the cache is 4-way set associative, what is the relationship between lines in the same set?

ggg) Virtual addresses have 48 bits and physical addresses have 32 bits. The OS uses

a page size of 16kbytes. The cache is 4-way set associative. How many tags are present in the whole cache? hhh) How many bits are needed for each tag? iii) (^) How many additional bits are needed per line? Indicate the purpose of each bit. jjj) Draw a diagram showing how the bits of a 40-bit virtual address are used to generate a 32-bit physical address. Assume the cache is 4-way set associative and the OS has set the page size to 8kbytes.

kkk) What fields would you expect to find in a page table entry? Add a short phrase indicating the purpose of each field.

lll) Why does a system interface unit provide separate queues for read and write transactions? mmm) Why does a read transaction check the write queue in a system interface unit? nnn) If a processor with a simple branch predictor sees a conditional branch, under what circumstances can it make a prediction which has a high probability of being successful? Why? ooo) Why is the branch processing unit placed as early in the pipeline as possible? ppp) Irrespective of the number and power of the individual processing elements in a parallel processor, what factor primarily determines whether the processor will be efficient? qqq) Under what conditions would you expect to achieve 100% TLB hits? (Two answers required. For one, you are expected to do a calculation and give an approximate numeric answer!) rrr) Why are caches built with long (ie more than 8 byte) lines? (Two reasons needed.) sss) A superscalar processor has 6 functional units. What determines the maximum number of instructions that this processor can start every cycle? ttt) List the functions of the instruction issue unit of a superscalar processor. No marks for “issue instructions” (somewhat obvious!)- list the other functions that the IIU performs. uuu) You are trying to estimate the performance on your application of a superscalar processor with 8 functional units and a clock speed of 2GHz. It’s a conventional RISC machine. You decide to start by working out how many instructions the processor can complete every second. The application is a commercial one with no floating point operations. What question do you need to ask before you can make this estimate? (Alternatively: what piece of information do you need to find in the processor’s data sheets?)

vvv) Have dataflow architectures disappeared in the way of the dinosaurs? Explain your answer. www) Describe a situation in which it is beneficial for an OS to share pages between different processes or users.

gggg) Describe a scenario in which you would prefer a write-through cache to a write-

back one. Explain why a write-through cache should perform better.

hhhh) Give one advantage and one disadvantage of a direct mapped cache.

Except where otherwise stated, assume that all caches in the following questions are 4-way set-associative, have lines of 64 bytes and a total capacity of 64kbytes. iiii) The system bus is 64 bits wide. How many bus clock cycles are used to transfer data for the most common bus transaction? jjjj) The system has a split address and data bus. List the overhead bus cycles needed for the common bus transaction. An overhead cycle is one which transfers no data. A simple name implying a function for each cycle will suffice. The list is started for you. Address Bus Request kkkk) The cache is write-through. The machine emits 64-bit addresses. One bit is used for an LRU algorithm. What is the total number of bits in the cache? Count all overhead bits. Since you do not have a calculator, show your working leading to a numeric expression which would, if fed into a calculator, give the final answer.

llll) What is the relationship between lines in the same set? mmmm) Virtual addresses have 48 bits and physical addresses have 32 bits. The OS uses a page size of 16kbytes. How many tags are present in the whole cache? nnnn) What fields would you expect to find in a page table entry? Add a short phrase indicating the purpose of each field. oooo) (^) Why does a read transaction check the write queue in a system interface unit?

An OS uses 8kbyte pages. It’s running on a system with a 44-bit virtual address space. A page table entry requires 4 bytes. Physical addresses are 32 bits. How much space is needed for the page table for each user?

pppp) If the page tables are inverted and the system can handle 256 simultaneous processes, how much space is needed for page tables?

qqqq) Show how the address emitted by a program running on the system in Q4 is translated into a physical address.

rrrr) Discuss the benefits (if any) of having separate TLBs for instructions and data.

ssss) A system with a 64kb cache exhibits a hit rate of 95% on a benchmark program. A cache access time is 1.8 cycles, so that pipeline is stalled for 1 cycle. Increasing the cache size to 128kb increases the hit rate to 98% and the cache access time to 2.0 cycles. The access time for a main memory access is 15 cycles. Is increasing the cache size a good idea? tttt) Your processor’s L1 cache contains 16kB of data; it is organized as an 8-way set associative cache with lines of 64 bytes each. The processor has a 64 bit data bus. tttt)..a How many data cycles would you expect in the most common bus transaction?

tttt)..b How many sets does this cache contain? tttt)..c You are designing a program to process matrices: what situation would you look out for? Be precise – supply a number in your answer! tttt)..d How many comparators are required? tttt)..e (^) How many bits will be in each tag? tttt)..f How many tags will this cache hold? tttt)..g For an image processing program that works its way sequentially through 2Mbyte monochrome images (each pixel is one byte), what would you expect the hit rate to be? tttt)..h If the image is stored in row-major order and a program processes the image column-by-column, what would you expect the hit rate to be? tttt)..i For an engineering program that processes streams of double precision floats that have been captured on disc, what would you expect the hit rate to be? uuuu) (^) List the advantages of separate instruction and data caches. vvvv) The OS manages pages of 8kB. The TLB has 128 entries. vvvv)..aWhat is the coverage of this TLB? vvvv)..bYour program needs to multiply matrices which contain 100x100 doubles. How would you expect the TLB to perform? Would there be any advantage to padding the matrices out to 128x elements? (Don’t forget the cache!) wwww) Why does a typical branch predictor count the number of times that it predicted the branch direction successfully? xxxx) Most elements of a typical processor are replicated 2 k^ times where k is an integer. Which of the following need to be 2k in size? Interpret ‘need’ here to mean that either that a considerable amount of extra circuitry would be required or that software would become considerably more complicated. xxxx)..aMaximum physical memory supported by a processor. xxxx)..bActual amount of physical memory installed in a processor xxxx)..cNumber of lines in a xxxx)..c..i direct mapped cache xxxx)..c..ii (^) fully associative cache xxxx)..c..iii set-associative cache xxxx)..dNumber of entries in a TLB xxxx)..eSize of a page xxxx)..fNumber of entries in a page table xxxx)..gNumber of data phases in the most common bus transaction yyyy) On a system with a 128Kbyte L1 cache for a program with a working data set of 2Mbytes: yyyy)..aCalculate the expected cache hit rate when no assumptions can be made about data access patterns.

yyyy)..bWould you expect the actual hit rate to be better or worse than this? Why? Better – this assumes perfectly random access to everything. Loop variables, constants, etc are likely to have much better hit rates. zzzz) Your system has a 4-way set associative cache with 4 32-bit words per cache line. zzzz)..aIf the total cache size is 64kbytes, how many sets are there?

1. Superscalar procesors

  1. Draw a diagram showing how the instruction fetch and execution units of a superscalar processor are connected. Show the widths of the datapath (in words - not bits; your diagram should be relevant to a 32-bit or 64-bit processor). Which factor primarily determines performance: the instruction issue width (number of instructions issued per cycle) or the number of functional units?
  2. List the capabilities of the instruction fetch/despatch unit needed to make an effective superscalar processor.

2. Branch Prediction

  1. Why does branch prediction speed up a processor? Two reasons – one to do with the effect of branches on performance, the other to do with the likelihood that prediction is possible.
  2. If you only had a few transistors to implement a branch prediction system, what would you do? Why would it be effective?
  3. In addition to the answer that you almost certainly gave for the previous question, describe a scenario where you would expect branch prediction to be successful.
  4. Describe the status bits in a branch target buffer.
  5. Does it make sense to have both branch prediction and speculative execution in the same processor? Explain your answer.

Atomic Instructions

8. Is an atomic instruction, such as a test-and-set instruction necessary for (a) a

single processor running a multi-threaded OS and (b) a shared memory parallel processor. In each case, explain your answer.

  1. A computer bus must support READ and WRITE commands. List some other commands that it must support. Consider also the situation when the processors have snooping caches.

3. Programming Model

  1. What does the ‘Shared Memory’ programming model imply?
  2. Distinguish between a ‘Uniform Memory Access’ system and a ‘Non-Uniform Memory Access’ system.
  3. When using a shared memory, cache coherence transactions are expensive and could potentially clog up a bus so much that the bandwidth available to useful transactions becomes very low. A hacker (who’s spent his time reading the manual for his processor on the net instead of going to SE363 lectures) discovers that there’s an instruction that will disable the cache and decides that this will solve the problem. Why is this likely to be a bad idea?
  4. (State transition diagram for the MESI protocol inserted here.)
  5. Explain the significance of each state in a MESI protocol.
  6. Using the diagram, describe a scenario that would lead to the …………… transition in the diagram.

4. Other Parallel Processors

  1. Why does a VLIW machine need a good optimizing compiler?
  2. Where can you find a small dataflow machine in every high performance processor?