








Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
An in-depth exploration of cache memory, its purpose, and various cache mapping schemes such as direct mapped, set associative, and multi-level caches. It covers topics like temporal and spatial locality, cache layout, cache size calculation, and cache write policies. The document also discusses the advantages and disadvantages of each cache mapping scheme.
Typology: Study notes
1 / 14
This page cannot be seen from the preview
Don't miss anything!









2
CIT 595
k
Address: unique (
k
-bit) identifier of location
Contents:
m
-bit value stored in location
There is bound on how fast we can access data frommemory
This latency inherently slow down the overallprocessing speed of the processor
3
CIT 595
Once the power is off, the information is lost
RAM - Random Access Memory^ ¾
Access time is the same for all locations hence Random Access ¾
Memory can be read and written ¾
The instructions and/or data are stored whenexecuting your programs
E.g. Magnetic disk, ROMs, Flash RAM
4
CIT 595
SR Flip-Flop, D Flip-Flop
Consists of 8 transistors per cell (1 NAND/NOR gaterequires 4 transistors)
Can be optimized to use 6 transistors per cell
5
Kinds of RAM: Type II
DRAM - Dynamic RAM
Capacitor is used to store charge
Transistor acts as a switch which allows data to be read orwritten
Charge on capacitor needs to be sensed for 0 or 1
Capacitors slowly leak their charge over time and hencemust be refreshed every few milliseconds to prevent dataloss
6
CIT 595
DRAM vs. SRAM Technology
DRAM is more
denser
Stores more bits per surface area
It cost same to get 4MB SRAM vs. 1GB DRAM
DRAM ~250x cheaper than SRAM
SRAM has faster access time
SRAM access time is 3ns to 10ns
DRAM access time is 30ns to 90ns
~ 10x slow to SRAM
7
Performance/Cost/Capacity
In general
Slow memory is cheap and has more storage capacity Fast memory is expensive and has less storage capacity
Ideal Goals
Memory that operates at processor speeds
¾
Time it takes to compute basic operation ¾
Don’t want memory access time to dominate the clockcycle time or add to CPI
Memory as large as needed for all running programs Memory that is cost effective
So how do we get best of everything?
Use Memory
Hierarchy
8
CIT 595
Memory Hierarchy
To provide the best performance at the lowest cost, memory is organized in a hierarchical fashion Small, fast storage elements are
near
the CPU
Larger, (almost) permanent storage in the form of disk and media storage is still further from the CPU Larger, slower memory is
accessed through
the data bus
Each level of memory keeps a subset of the data contained in the lower memory-level
13
Basic Cache Organization
Memory is divided into
blocks
Each block contains
fixed numbers
of words
Word = size of data stored in onelocation e.g. 8 bits, 16 bits etc..
One block is used as the
minimum
unit of transfer
between main memory
and cache
Hence, each
location
in the cache
stores 1 block
Also some extra info – more on it ahead
Word 0Word 1Word 2Word 3
Main Memory
Block 0Block 1
Word 2
Word 3
Word 0
Word 1
Cache
14
CIT 595
Cache Mapping Scheme
Main memory address generated by the processor cannot be used to access the cache
Hence a
mapping scheme
is required that converts the
generated main memory address into a cache location
Also determines where the block will placed when it is originally copied into the cache
15
Address Conversion to Cache Location
Address Conversion is done by giving special significance to the
bits of the main memory address
The address is split into distinct groups called
fields
Just like instruction decoding is done based on certain bit fields
The group fields are a way to find:
Which cache location? Which word in the block? Whether it is the right data are looking for? Some kind ofunique identifier
16
CIT 595
Mapping Scheme 1: Direct Mapped Cache
17
Direct Mapped Scheme: Address Conversion
Word
Block
Tag
n-bit main memory address
Word = which word in block?Block = Which location in Cache?Tag = unique identifier w.r.t one blockNote: Tag is used to distinguish whethermain memory block 7 or 17 is stored incache block 7
18
CIT 595
Cache Layout
0 1 2 3 Block
No.
Data
Tag
E.g. Cache with 4 blocks and 8 words per block
19
Example of Direct Mapped Scheme
Suppose our memory consists of 2
14
words, and cache
has 16 = 2
4
blocks, and each block holds 8 words
Thus main memory is divided into 2
14
(^3)
11
blocks
Of the 14 bit address, we need 4 bits for the block field, 3 bits for the word, and the tag is what’s left over
20
CIT 595
Direct Mapped Cache with 16 blocks (^012345131415) Block
No.
Data
Tag
25
Direct Mapped Cache with address 3AB
(^012345131415) Block
No.
Data
Tag
26
CIT 595
Disadvantage of Direct Mapped Cache
The cache will continually evict and replace blocks
Known as thrashing
27
Calculating Cache Size
Whenever Cache Size is mentioned, its stated with capacity of data that it holds
Tag storage is considered overhead
Suppose our memory consists of 2
14
locations (or
words), and cache has 16 = 2
(^4) blocks, and each block
holds 8 words
Cache Size = # of Blocks * Block Size
There are 16 locations in the cache -> # of Blocks
Each block stores 8 words
Assume 1 word is 8 bits, then Block size = 8 bytes
Cache size = 16 x 8 bytes = 128 bytes
28
CIT 595
Address Breakup
Why is the address broken up in a particular manner?
Less variation in higher order bits compared to middle order bits
If the higher order bits (i.e. bits used for tag) are used for determining cache location (block) then values from consecutiveaddresses would map to same location in cache
The middle bits are preferred as they would cause less thrashing
Word
Block
Tag
29
Valid Cache block
How do we know whether the block in cache is valid or not?
For example:
When processor just starts up, the cache will be emptyand tag fields in each location will be meaningless
Thus tag fields must be ignored initially when the cacheis starting to fill up
For validity, another bit called
valid bit
is added to the cache
indicate whether the block contains valid information
0 – not valid, 1 – valid
All blocks at start up would be not valid
If data from main memory is got into cache for aparticular block, then valid bit for that field is set
Valid bit will contribute as overheard bits
30
CIT 595
Direct Mapped Cache with Valid (V) Field 5 (^01234131415) Block
No.
Data
Tag
0 0 0 0
Address 3ABreferenced forthe first time.Entire block isbrought intocache block 5.
31
Hit or Miss in the Cache
Hit
means that we actually found data in the cache
A hit occurs when valid bit = 1
tag in the cache
matches the tag field of the address
If both conditions don’t hold then we did not find the data in cache
This is known as
miss
in cache
On a miss, the data is brought from main memory into the cache, and the valid bit is set
32
CIT 595
Mapping Scheme 2: Fully Associative Cache
37
Scheme 3: Address Conversion
Like direct-mapped cache except, middle bits of the mainmemory address indicate the
set
in cache
38
CIT 595
K-Set Associative Cache Example
Suppose we have a main memory of 2
14
locations
Map this memory to a
2-way
set associative cache
having
16 blocks
where each block contains 8 words
Number of Sets = Number of Blocks in cache/ Blocks per set (K)
Since this is a 2-way cache, each set consists of 2blocks, and there are 8 sets i.e. 16/2 = 8 sets
39
Advantage & Disadvantage Set Associate
Advantage
Unlike direct mapped cache, there is less trashing
¾
If an address maps to a set, there is
choice
for placing the
new block and evicting an old block
Disadvantage
Tags
of each block in a set need to be
matched
(in parallel) to
figure out whether the data is present in cache
¾
Cost for matching is less than fully associative but it ismore than direct mapped i.e. k comparators ¾
Contributes to access time
If both slots are filled, then we need an
algorithm
that will
decide which old block to evict (like fully associate)
¾
Adds to design complexity
40
CIT 595
Replacement Algorithm/Policy
Optimal Goal
Keep blocks required in the near future Replace block which is not used for the longest period of time
L
east recently used
Evicts the block that has been unused for the longest period of time
Disadvantage: complexity
LRU has to maintain an access history for each block, whichwill slow down the cache Usually some approximation is used
¾
E.g. Not Most Recently Used (NMRU)
41
Replacement Algorithm/Policy (contd..)
First-in, first-out
In FIFO, the block that has been in the cache the
longest,
regardless of when it was last used
Easy to implement compared to LRU
Does not always match temporal locality Random Replacement
It picks a block at
random
and replaces it with a new block
Can evict a block that will be needed often or needed soon, but it never thrashes
Difficult to implement a truly random replacement
42
CIT 595
What about blocks that have been written too?
While your program is running, it will modify some locations
We need to keep main memory and cache
consistent
if we are modifying data
Update cache and memory
Both at the same time
Update cache and then memory at later time
The two choices are known
Cache Write policies
43
Cache Write Policies
Write-Through
Update cache and main memory simultaneously on every write
Advantage
Keeps cache main memory consistent at the same time
Disadvantage
All writes require main memory access (bus transaction)
Slows down the system
This is what we were avoiding in the first place whendecided to introduce the cache
44
CIT 595
Cache Write Policies (contd..)
Write Back or Copy Back
Modified data is written back to main memory when the block is going to be evicted (removed) from cache
Advantage
Faster than write-through, time is not spent accessingmain memory
Disadvantage
Need extra bit in cache to indicate which block has beenmodified
Like valid bit, a another bit is introduced called
Dirty Bit
to indicate a modified cache block.
0 – Not Dirty, 1 – Dirty (modified)
49
Multi-level Caches (contd..)
If the cache system used an
inclusive
cache, the same data
may be present at multiple levels of cache
Strictly inclusive
caches guarantee that all data in a smaller
cache also exists at the next higher level
Exclusive
caches permit only one copy of the data
The tradeoffs in choosing one over the other involveweighing the variables of access time, memory size, andcircuit complexity
50
CIT 595
Instruction and Data Caches
This is called a
Harvard
cache
51
EAT
The performance of hierarchical memory is measured by its effective access time
EAT is a weighted average that takes into account the hit ratio and relative access times of successive levels ofmemory
The EAT for a two-level memory is given by:
EAT =
H
×
Access Time for Level
i
H
)^ ×
Access for Level
i+
H is the hit rate i.e. % time data is found in level
i
This equation can be extended to any number of memory levels
52
CIT 595
Example of EAT
53
CIT 595
Review of Cache Organization
Q1: Where can a block be placed in the cache level? Mapping scheme Q2: How is a block found if it is in the cache?^ Mapping Scheme Q3: If cache is full, then where do we put the new blocki.e. which old block should we replace? Block replacement policy Q4: If we write to a block in cache, should we update themain memory at the same time? Write Policy