Memory Hierarchy - Intro to Computer Architecture - Lecture Notes, Study notes of Computer Architecture and Organization

In the lecture notes of the intro to computer architecture the main points are listed below:Memory Hierarchy, Reasonable Cost Per Bit, Neumann Bottleneck, Typical System View, Cache and Main Memory, Cache Sizes of Some Processors, Main Idea of Cache, Access Time, System Bus, Addressed Memory Value`Memory Hierarchy, Reasonable Cost Per Bit, Neumann Bottleneck, Typical System View, Cache and Main Memory, Cache Sizes of Some Processors, Main Idea of Cache, Access Time, System Bus, Addressed Memory

Typology: Study notes

2012/2013

Uploaded on 05/06/2013

anurati
anurati 🇮🇳

4.2

(24)

121 documents

1 / 15

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Memory Hierarchy
Goal: “Fast”, “unlimited” storage at a reasonable cost per bit.
Recall the von Neumann bottleneck - single, relatively slow path between the CPU and main
memory.
Cache - 1
Docsity.com
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff

Partial preview of the text

Download Memory Hierarchy - Intro to Computer Architecture - Lecture Notes and more Study notes Computer Architecture and Organization in PDF only on Docsity!

Memory Hierarchy

Goal: “Fast”, “unlimited” storage at a reasonable cost per bit.

Recall the von Neumann bottleneck - single, relatively slow path between the CPU and main

memory.

Cache - 1

Typical system view of the memory hierarchy

Figure 4.3 Cache and Main Memory

Cache - 2

Main Idea of a Cache - keep a copy of frequently used information as “close” (w.r.t access time) to

the processor as possible.

CPU Memory

cache

System Bus

Steps when the CPU generates a memory request:

1) check the (faster) cache first

2) If the addressed memory value is in the cache (called a hit ), then no need to access memory

3) If the addressed memory value is NOT in the cache (called a miss ), then transfer the block of

memory containing the reference to cache. (The CPU is stalled waiting while this occurs)

4) The cache supplies the memory value from the cache.

Effective Memory Access Time

Suppose that the hit time is 5 ns, the cache miss penalty is 160 ns, and the hit rate is 99%.

Effective Access Time l(hit time * hit probability) + (miss penalty * miss probability)

Effective Access Time = 5 * 0.99 + 160 * (1 - 0.99) = 4.95 + 1.6 = 6.55 ns

Cache - 4

One way to reduce the miss penalty is to not have the cache wait for the whole block to be read from

memory before supplying the accessed memory word.

Figure 4.5 Cache Read Operation

Cache - 5

Cache - Small fast memory between the CPU and RAM/Main memory.

Example:

 32-bit address

 512 KB (2^19 )

 8 byte per block/line

 byte-addressable memory

Number of Cache Line =

size of cache

size of line

16

Three Types of Cache:

1) Direct-mapped - a memory block maps to a single cache line

Line

Block

tag line # offset

32-bit address:

tag block

Cache - 7

Cache - Small fast memory between the CPU and RAM/Main memory.

Example:

 32-bit address, byte-addressable memory

 512 KB (2^19 )

 8 byte per block/line

Number of Cache Line =

size of cache

size of line

2) Fully-Associative Cache - a memory block can map to any cache line

Line

Block

tag offset

32-bit address:

tag block

Advantage: Flexibility on what’s in the cache

Disadvantage: Complex circuit to compare all tags of the cache with the tag in the target address

Therefore, they are expensive and slower so use only for small caches (say 8-64 lines)

Replacement algorithms - on a miss of a full cache, we must select a block in the cache to replace

 LRU - replace the cache block that has not been used for the longest time (need additional bits)

 Random - select a block randomly (only slightly worse that LRU)

 FIFO - select the block that has been in the cache for the longest time (slightly worse that LRU)

Cache - 8

Figure 4.16 Varying Associativity over Cache Size

1k

H

it

ra

ti

o

2k 4k 8k 16k

Cache size (bytes)

direct
2-way
4-way
8-way
16-way

32k 64k 128k 256k 512k 1M

Cache - 10

Block/Line Size

The larger the line size:

 fewer cache line for the same size cache

 improves hit rate since larger blocks are read when a miss occurs

 larger miss penalty since more words are read from memory when a miss occurs

Number of Caches:

Issues:

 Number of cache levels

CPU

cache

Memory

L

cache

L2 cache

64KB

512KB

 unified vs. split caches

split caches - separate smaller caches for data and instructions

unified cache - data and instructions in the same cache

Advantages of each:

split caches - reduces contention for “memory” between instruction and data accesses

unified caches - balances the load between instructions and data automatically

(e.g., a tight loop might need more data blocks than instruction blocks)

Cache - 11

Write Policy - do we keep the cache and memory copy of a block identical???

CPU CPU

cache cache

:X :X

I/O

Memory

5 :X

(can read

and write

memory)

Just reading a shared variable causes no problems - all caches have the same value

Writing can cause a “ cache-coherency problem ”

CPU 0 CPU 1

cache cache

:X :X

X:=X +2; X:=X +1;

I/O

Memory

5 :X

(can read

and write

memory)

Write Policies

write back - CPU only changes local cache copy until that block is replaced, then it is written back

to memory (a UPDATE/DIRTY bit is associated with each cache line to indicate if it has been

modified). If we assume that CPU 0 writes the block back to memory before CPU 1, then X’s

resulting value will be 6. Thereby, discarding the effect of “X:=X+2”.

Disadvantage(s) of writeback?

Advantage(s) of writeback?

write through - on a write to a cache block, write to the main memory copy to keep it up to date

To avoid stalling the CPU on a write, a write buffer can be used to allow the CPU to continue

execution without stalling to wait for the write.

write miss options:

write allocate - the block written to is read into the cache before updating

no-write allocate - no block is allocated in the cache and only the lower-level memory is modified

Cache - 13

Cache Coherency Solutions

a) bus watching with write through / Snoopy caches - caches eavesdrop on the bus for other caches

write requests. If the cache contains a block written by another cache, it take some action such as

invalidating it’s cache copy.

CPU CPU

cache cache

:X :X

I/O

Memory

5 :X

(can read

and write

memory)

The MESA protocol is a common cache-coherency protocol.

b) noncachable memory - part of memory is designated as noncachable, so the memory copy is the

only copy. Access to this part of memory always generate a “miss”.

CPU CPU

cache cache

I/O

Memory

5 :X

(can read

and write

memory)

noncachable

part of

memory

c) Hardware transparency - additional hardware is added to update all caches and memory at once

on a write.

Cache - 14