











Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Material Type: Notes; Class: COMPUTER ORGNIZATN; Subject: Computer Science; University: University of Maryland; Term: Fall 2003;
Typology: Study notes
1 / 19
This page cannot be seen from the preview
Don't miss anything!












What do we use for accessing small amounts of data quickly? Registers (32 in MIPS) Why not store all data and instructions in registers? Too much overhead for addressing; lose speed advantage Register file can use 5-32 decoder or 32-1 MUX to select Memory (RAM) is organized in larger quantities
Vcc , V (^) dd : voltages (5 volts and ground), necessary to power the chip, but do not affect logic
A9-0 : Address input (10 bits) Why 10 bits? R/W': read/not write, selects read or write 1: read, 0: write CE: chip enable, allows read or write; when 0, neither read nor write also called chip select
D7-0 : 8 bits of data read or to be written may be bidirectional, or 16 separate lines (pins on the chip) Where do the inputs come from? CPU
How do we get larger amounts of memory? Think of memory as abstraction 4K memory: like byte array M[4095] Give memory index of byte, get data value back Use 4 1K chips
Each chip has elements C[0] up to C[1023] Chips are numbered 00, 01, 10, 11 Each chip contains 1024 of the elements (0 to 1023, 1024 to 2047, etc.) Where is element M[1025]? Chip 01 at index 1 Where is element M[3071]? Chip 10 at index 1023
Need to generate signals to determine which addresses to access in memory Logic for chip enable CPU can generate 3 control signals: B which indicates that the CPU wants to access a byte H which indicates that the CPU wants to access a halfword W which indicates that the CPU wants to access a word Which chips are enabled? B: 1 chip H: 2 chips W: all 4 chips Address patterns: addresses on chip 00 end in 00 (divisible by 4) addresses on chip 01 end in 01 (congruent to 1 mod 4) addresses on chip 10 end in 10 (congruent to 2 mod 4) addresses on chip 11 end in 11 (congruent to 3 mod 4) Logic for chip 00: If W = 1, then all four chips are enabled If B = 1, select chip 00 when A 1 A 0 = 00 If H = 1, select Chip 00 if A 1 = 0 (already know that A 0 = 0) Boolean expression: CE = W + (H * A 1 ') + (B * A 1 'A 0 ')
Notice address bits A (^) 9-0 : To get M[i], access index i / 4 Same as shifting to the right by 2 bits, or accessing bits 11-2 from the address bus Chip 10: CE = W + (H * A 1 ) + (B * A 1 A 0 ')
Invalid addresses: we assume that the other 20 bits of the address are all 0
Memory hierarchy As distance from CPU increases, so does size
Fig 7.
To take advantage of: temporal locality: keep recently used data closer to CPU spatial locality: when moving data to a higher level, move a contiguous block Miss: requested data not found in currrent level Hit: requested data is found in current level Hit rate: how often a requested data item is found in a given level of the hierarchy
If hit rate at the top levels of the hierarchy is large, then the average access time will be close to the fastest access time Cache: level in hierarchy between CPU and main memory
Data is transferred only between adjacent levels: When miss occurs at one level of hierarchy, data is transferred from next lower level Minimum unit of data transferred: block
Performance depends on speed of hits and misses Hit time: time to access upper level, including determining hit or miss Miss penalty: time to access lower level to get data
Issues How much data to transfer between levels Policy to replace data in upper levels Policy to update data in each level Analogy Need 10 books for a term paper Fig. 7. Bring all 10 books back to your desk, instead of going back to library 10 times
Example:
Fig 7.
Cache of 8 words Each data word is mapped to location whose address ends in same 3 bits For example, all the gray words have addresses ending in 001, mapped to cache block 001 Each cache location can contain several possible data words If a word is in the cache, how do we know which one it is? Add tags to the cache entries Tag needs to contain only the upper bits of the address In example, only need upper 2 bits for tag Also need to recognize whether block is empty: valid bit
How to manage cache efficiently? Temporal locality copy data into cache when accessed Spatial locality
copy 2 k^ block of data including accessed data item How to choose range of addresses to copy? Might choose data from addr - delta to addr + delta where delta is 2 k- However, this is not so convenient to manage Instead, use all addresses with same upper n-k bits Example: Want to access data with address A (^) 31-. Copy 32 bytes with addresses: A (^) 31-5 00000 A (^) 31-5 00001 A (^) 31-5 00010
... A (^) 31-5 11111 These 32 bytes are called a cache line. The upper 27 bits are the tag.
Cache misses: types compulsory: first reference to a data item capacity: not enough space in cache too few slots cache line too small conflict: space is available, but data block already stored at that location 2 cache lines map to same cache slot Instruction and data cache instructions and data have different access patterns, use different memory areas use separate instruction and data caches one reason to have separate instruction and data memories in datapath Modifying data in the cache write-back: update main memory only when block is removed from cache saves time required to write main memory for each store write-through update main memory at the same time as cache save time by continuing execution while main memory write completes Replacement policy how to choose cache line to replace LRU: least recently used: slot which has not been used in the longest time LFU: least frequently used FIFO: first in, first out: slot which has been in the cache the longest Random: may be only 10% worse than LRU may require additional hardware to keep track
More flexible cache management fully-associative Assume cache consists of 2 7 = 128 slots, with 2 5 = 32 bytes per cache line
Address A (^) 31-0 consists of tag bits A (^) 31-5 and offset A (^) 4-
If data not in the cache, pick a slot Fully-associative cache: may go in any slot Pick one with V = 0 If none, evict a slot using replacement policy (LRU, FIFO, etc.) How do we know if a cache line is in the cache? Must search every slot, but hardware can search in parallel, unlike software Compare the address tag bits with the tags of each slot in the cache Can be done using a comparator (combinational circuit using XNOR gates) Must also have valid bit V = 1 Complexity of hardware to manage fully-associative cache slows down the speed of the cache, so it is not generally used
Direct-mapped: location determined directly by block number Fig 7. 12 % 8 = 4 Set-associative (2-way): search 2 tags 12 % 4 = 0 Fully-associative: search all tags
Range of associativity for 8-block cache direct-mapped 1-way set-associative 2-way 4-way fully-associative 8-way
Fig 7.
This document was created with Win2PDF available at http://www.daneprairie.com. The unregistered version of Win2PDF is for evaluation or non-commercial use only.