Memory Hierarchy: Understanding Different Types of Memory and Their Characteristics, Slides of Computer Fundamentals

An in-depth look into various types of memory systems, including sram, dram, and hard disks. It covers their capacities, latencies, costs, and access mechanisms. The document also discusses techniques for increasing throughput, such as pipelining and cache memory.

Typology: Slides

2012/2013

Uploaded on 04/18/2013

palmoni
palmoni 🇮🇳

4.5

(2)

75 documents

1 / 6

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
L15 – Memory Hierarchy 1
6.004 – Spring 2009 4/2/09
The Memory Hierarchy
Lab #5 due tonight
L15 – Memory Hierarchy 2
6.004 – Spring 2009 4/2/09
What we want in a memory
PC
INST
MADDR
MDATA
BETA MEMORY
Capacity Latency Cost
Register 100’s of bits 20 ps $$$$
SRAM 100’s Kbytes 1 ns $$$
DRAM 100’s Mbytes 40 ns $
Hard disk* 100’s Gbytes 10 ms ¢
Want 1’s Gbytes 1 ns cheap
* non-volatile
ADDR
DOUT
ADDR
DIN/DOUT
L15 – Memory Hierarchy 3
6.004 – Spring 2009 4/2/09
SRAM Memory Cell
6-T SRAM Cell
word line N
bit bit
access FETs
static
bistable
storage
element
word line N+1
There are two bit-lines per
column: one supplies the bit, the
other it’s complement.
On a Read Cycle:
A single word line is activated
(driven to “1”), and the access
transistors enable the selected
cells, and their complements,
onto the bit lines.
0 1
1
Good, but
slow 0
Slow and
almost 1
Strong
1
Strong
0
Doesn’t this
violate our
static
discipline?
Writes are similar to reads,
except the bit-lines are driven
with the desired value of the cell.
The writing has to “overpower”
the original contents of the
memory cell.
L15 – Memory Hierarchy 4
6.004 – Spring 2009 4/2/09
Multiport SRAMs
(a.k.a. Register Files)
One can increase the number of SRAM ports by adding access
transistors. By carefully sizing the inverter pair, so that one
is strong and the other weak, we can assure that our WRITE
bus will only ght with the weaker one, and the READs are
driven by the stronger one - thus minimizing both access and
write times.
write
read0
read1
PU = 2 / 1
PD = 4 / 1
PU = 2 / 2
PD = 2 / 3
4/1
5 / 1
2 / 1
2 / 1
wd rd1 rd0
This transistor
isolates the storage
node so that it won’t
flip unintentionally.
pf3
pf4
pf5

Partial preview of the text

Download Memory Hierarchy: Understanding Different Types of Memory and Their Characteristics and more Slides Computer Fundamentals in PDF only on Docsity!

L15 – Memory Hierarchy 1

4/2/

The Memory Hierarchy

Lab #5 due tonight

6.004 – Spring 2009

4/2/

What we want in a memory

PC INST MADDRMDATA

BETA

MEMORY

Capacity

Latency

Cost

Register

100’s of bits

20 ps

SRAM

100’s Kbytes

1 ns

DRAM

100’s Mbytes

40 ns

Hard disk*

100’s Gbytes

10 ms

Want

1’s Gbytes

1 ns

cheap

  • non-volatile

ADDRDOUT ADDRDIN/DOUT

L15 – Memory Hierarchy 3

4/2/

SRAM Memory Cell

6-T SRAM Cell

word line N

bit

bit

access FETs

staticbistablestorageelement word line N+

There are two bit-lines percolumn: one supplies the bit, theother it’s complement.On a Read Cycle:

A single word line is activated (driven to “1”), and the accesstransistors enable the selectedcells, and their complements,onto the bit lines.

0

1

1

Good, but

slow 0

Slow andalmost 1

Strong

1

Strong

0

Doesn’t this violate our

static discipline?

Writes are similar to reads,except the bit-lines are drivenwith the desired value of the cell.The writing has to “overpower”the original contents of thememory cell.

6.004 – Spring 2009

4/2/

Multiport SRAMs

(a.k.a. Register Files)

One can increase the number of SRAM ports by adding access

transistors. By carefully sizing the inverter pair, so that oneis strong and the other weak, we can assure that our WRITEbus will only fight with the weaker one, and the READs aredriven by the stronger one - thus minimizing both access andwrite times.

writeread0read

PU = 2 / 1PD = 4 / 1 PU = 2 / 2PD = 2 / 3

4/

5 / 1

2 / 1

2 / 1

wd

rd

rd

This transistor isolates the storagenode so that it won’t flip unintentionally.

L15 – Memory Hierarchy 5

4/2/

1-T Dynamic Ram

word line

bit

access FET

C in storage capacitor determined by:

C =



Ad

more area

better dielectric

thinner film 1-T DRAM Cell

V

REF

Explicit storage

capacitor

Six transistors/cell may not sound like much, but they canadd up quickly. What is the fewest number of transistorsthat can be used to store a bit?

TiN top electrode (V

REF

)

Ta

2 O

5

dielectric

W bottomelectrode

polywordline

access fet

Can’t we get

rid of the explicit

cap?

6.004 – Spring 2009

4/2/

Tricks for increasing throughput

Row Address Decoder

Col.

1

Col. 2

Col.

3

Col.^2

M

Row 1Row 2 Row 2

N

Column Multiplexer/Shifter

N M

Multiplexed Address (row first, then column)

bit lines

word lines

memory

cell (one bit)

D

t

1

t

2

t

3

t

4

The first thing that shouldpop into you mind whenasked to speed upthroughput…

PIPELINING Synchronous DRAM

(SDRAM)

ClockDataout

Double-clocked Synchronous DRAM

(DDRAM)

but, alas, not latency

L15 – Memory Hierarchy 7

4/2/

Hard Disk Drives

Typical high-end drive: •  Average latency = 4 ms

  •  Average seek time = 9 ms
    •  Transfer rate = 20M bytes/sec
      •  Capacity = 100-500G byte
        •  Cost = ~$1/Gbyte

6.004 – Spring 2009

4/2/

Quantity vs Quality…

Your memory system can be

  • BIG and SLOW... or• SMALL and FAST.

10

10

100

.

1 100 10 .

10

TAPE

DISK

DRAM

SRAM

AccessTime

.

$/MB

We’ve explored a range ofcircuit-design trade-offs.

Is there anARCHITECTURAL solutionto this DILEMMA?

Track

Sector

Zoned-bit recording

Sector

Shaft

Track Cylinder

Figure by MIT OpenCourseWare.

L15 – Memory Hierarchy 13

4/2/

The Cache Idea:

Program-Transparent Memory Hierarchy

Cache contains TEMPORARY COPIES of selected

main memory locations... eg. Mem[100] = 37

GOALS:1)

Improve the average access time

Transparency (compatibility, programming ease)

(1.0-



)

CPU

"CACHE"

DYNAMIC

RAM"MAIN

MEMORY"

100

37

 (1-



)

HIT RATIO

:^

Fraction of refs found in CACHE.

MISS RATIO: Remaining references

.

Challenge:make thehit ratio ashigh aspossible.

m

c

m

c

c

ave

t

) ( t ) t t

t

t

6.004 – Spring 2009

4/2/

How High of a Hit Ratio?

Suppose we can easily build an on-chip static memory witha 4 nS access time, but the fastest dynamic memoriesthat we can buy for main memory have an average accesstime of 40 nS. How high of a hit rate do we need to sustainan average access time of 5 nS?



=

1



t

ave



t c

t m

=

1



5



4 40

=

97.5%

L15 – Memory Hierarchy 15

4/2/

The Cache Principle

Find “Bitdiddle, Ben”5-Minute Access Time:

5-Second Access Time:

ALGORITHM: Look nearby for therequested information first, if it’s notthere, check secondary storage

6.004 – Spring 2009

4/2/

Basic Cache Algorithm

ON REFERENCE TO Mem[X]: Look for X among cache tags...HIT:

X = TAG(i) , for some cache line i

READ:

return DATA(i)

WRITE:

change DATA(i); Start Write to Mem(X)

MISS:

X not found in TAG of any cache line

  • 

REPLACEMENT SELECTION:



^

Select some line k to hold Mem[X] (Allocation)

  • 

READ:

Read Mem[X]Set TAG(k)=X, DATA(K)=Mem[X]

  • 

WRITE:

Start Write to Mem(X)Set TAG(k)=X, DATA(K)= new Mem[X]

MAIN MEMORY

CPU

(

!

)

Tag

Data

A B

Mem[A]Mem[B]

QUESTION: How do we “search” the cache?

Figure by MIT OpenCourseWare

L15 – Memory Hierarchy 17

4/2/

Associativity: Parallel Lookup

Find “Bitdiddle, Ben”

Nope, “Smith”

Nope, “Jones”

Nope, “Bitwit”

HERE IT IS!

L15 – Memory Hierarchy 18

6.004 – Spring 2009

4/2/

Fully-Associative Cache

TAG

Data

=? TAG

Data

=? TAG

Data

=?

IncomingAddress

HIT

DataOut

The extreme in associativity:

All comparisons made inparallel

Any data item could belocated in any cache location

L15 – Memory Hierarchy 19

4/2/

Direct-Mapped Cache

(non-associative)

Find “Bitdiddle, Ben”

NO Parallelism:

Look in JUST ONE place,determined by parameters ofincoming request (address bits)

... can use ordinary RAM as table

Y

Z

A

B

Need: Address Mapping Function!

Maps incoming BIG address to

small CACHE address… tellsus which

single

cache location

to use Direct Mapped

: just use a subset

of incoming address bits! Collision

when several addresses

map to same cache line.

L15 – Memory Hierarchy 20

6.004 – Spring 2009

4/2/

The Problem with Collisions

Find “Bitwit”

Find “Bituminous”

Find “Bitdiddle”

Nope, I’ve got

“BITWIT”under “B”

PROBLEM:Contention among B’s.... each

competes for same cacheline!

  • CAN’T cache both

“Bitdiddle” & “Bitwit”

... Suppose B’s tend

to come at once?

Y Z

A B

BETTER IDEA:

File by LAST letter!

Figure by MIT OpenCourseWare

Figure by MIT OpenCourseWare

Figure by MIT OpenCourseWare