Memory Organization - Computer Organization II | ECE 366, Study notes of Computer Architecture and Organization

Material Type: Notes; Class: Computer Organization II; Subject: Electrical and Computer Engr; University: University of Illinois - Chicago; Term: Unknown 1989;

Typology: Study notes

Pre 2010

Uploaded on 07/23/2009

koofers-user-v6x
koofers-user-v6x 🇺🇸

10 documents

1 / 29

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
EECS 366: Computer Architecure
Instructor: Shantanu Dutt
Department of EECS
University of Illinois at Chicago
Lecture Notes # 16
Memory Organization
c
Shantanu Dutt
c
Shantanu Dutt, UIC 1
1
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d

Partial preview of the text

Download Memory Organization - Computer Organization II | ECE 366 and more Study notes Computer Architecture and Organization in PDF only on Docsity!

EECS 366: Computer Architecure

Instructor: Shantanu Dutt

Department of EECS

University of Illinois at Chicago

Lecture Notes # 16

Memory Organization

c Shantanu Dutt

c Shantanu Dutt, UIC^

Memory Hierarchy Design

lems they solve increase. Many programs need large amounts of memory, as the size of the prob-

To solve the problem quickly, fast access is

needed to all this data

of this kind is impractical to realizeexample) consumes too much VLSI area and power, so that large memoryof storing 1000s of MBytes. As we saw, fast memory (static memory, for One solution is, of course, to build very large fast memory units capable

largermemory, it is well known that access to this memory gets slower as it gets Furthermore, even if it becomes feasible to build large amounts of fast

Fortunately, there is a way out!

Because of

locality

property of most

(1)quick access to large amounts of data:programs, it is not necessary to have large amounts of fast memory for

Temporal Locality

: An item just referenced will be referenced again

(2)soon.

(^) Spatial Locality

: When an item is referenced, nearby items in memory

will also be referenced soon.

c Shantanu Dutt, UIC^

Memory Hierarchy Design (contd.)

In principle, there can be

levels in the memory hierarchy as shown be-

low.

expensivemore Faster,

expensivelessSlower,

The Memory Hierarchy

c Shantanu Dutt, UIC^

Memory Hierarchy Design (contd.)

level, and also belong to the entire memory address space An upper level is generally a subset of the data contained in the next lower

STOREslower levels are handled explicitly by the program in using LOADs andto the register file only, and data transfer between the register file and theaddress space—registers are addressed by a different address that pertainsin the cache at all times. Also, the register file is not part of the memory An exception is the register level, all of whose data may not be contained

operating system (main mem.–secondary storage hierarchy)they are handled either by hardware (cache–main mem. hierarchy) or thetransfers between them are “automatic” and transparent to the program— The rest of the levels share a common memory address space, and data

c Shantanu Dutt, UIC^

General Definitions and Principles of Memory Hierarchy (contd.)

Consider any 2 adjacent levels in the memory hierarchy: 

Miss penalty

: Time to replace a block in the upper level by a needed block

obtaining the required block. The miss penaltythat is not in that level. Since there can be hits or misses at lower levels for

for the upper-most

level (level 1) is be given by:

where

is the miss rate in level

, and

is the block replace-

ment time from level

to

.

The average memort access time

for the CPU is given by

The block replacement time

= access time

^

(time to access the the

1st word of the block in the lower level

)

  • transfer time

(time to access the remaining word),

where

is the block size in the upper level

and

is the transfer rate

(per word) from level

.

For e.g., there is an initial time

required to search for the block/page

average timelocation in main memory (MM), and further due to refreshing we saw that

to access MM is given by:

.

Then the initital access time to MM is:

However, the entire row is stored in the row register after spending

the rest of the words in the block can be sent in approx. time to access the word, and the required block is part of this row. Thus

time per

word. Thus

.

Example:

There are 3-levels in the memory hierarchy:

cache, MM,

secondary storage.

The following are values of above parameters:

cc’s,

, cache block size = 4 words,

cc’s,

cc’s,

(^9

cc’s,

cc’s, MM page size = 2K

words.

Then, the average time taken by the CPU to access a word is:

A    9! < 

B@

B;

@ 9 C   8 B

:D

D

E

c Shantanu Dutt, UIC^

8

General Definitions and Principles of Memory Hierarchy (contd.)

Effect of Block Size: 

enced soon (spatial locality) Larger the block size, better the anticipation of nearby items to be refer-

stretched. However, beyond a certain block size, the concept of spatial locality is

Note that while a program may access almost all items in a

by random accesses (for ex., due to branches)necessarily one following the current one—spatial locality is punctuatedsmall or medium-size block, it later accesses a random next block, not

the programHence the miss rate increases when the next random block is accessed byupper level is limited, larger the block size, smaller is the # of blocks.the program might not access in the near-future. Since the space on the Thus for large block sizes, there will be many useless data items in it that

c Shantanu Dutt, UIC^

Effect of Block Size (contd.)

C

A

C

A

Initial A access, miss, Work on C Work on A

Next access is C, miss,A loaded A

Empty

Next access is A, hitC loaded Work on A

Next access is C, hit

block size = 16 words (c) Miss pattern with0 misses per iteration

block size = 32 words(b) Miss pattern with

A & B

C & D loadedNext access is C, miss,A&B loaded Initial A access, miss,

C & D

A & B loadedNext access is A, miss, 2 misses per iterationWork on C Work on A

D C B^ A

1

16 words 16 words 16 words 16 words

(a) Program Structure

c Shantanu Dutt, UIC^

General Definitions and Principles of Memory Hierarchy (contd.)

(when the current process is done or it also has a miss)previous process’s status, so that it can start re-executing at a later stageexecuting. When the requested block is brought in, this is noted in theor page fault), CPU is interrupted on a miss, and another process starts(2) If the miss penalty, is 100s to 1000s of cc’s (as in main-memory misswaits (ex., cache miss)(1) If the miss penalty is a few 10s of clock cycles (cc’s), then the CPU What the CPU does on a miss in the upper level:

takes place simultaneouslyCPU executes another process, while transfer from disk to main-mem.sets up the appropriate disk interface for a DMA and leaves the CPU; the(2) Done in software (O.S. could do this) for main-mem. miss—the O.S.(1) Done in hardware for few 10s of cc’s penalty (cache) Block transfer mechanism:

c Shantanu Dutt, UIC^

Some Basic Issues in Memory Hierarchies

Again we consider 2 adjacent levels of the hierarchy:

  1. Write Strategy: What happens on a write to the upper level—how is this3. Block Replacement: Which block to replace during a miss?2. Block Identification: How is a block found in the upper level?1. Block Placement: Where can a block be placed in the upper level?

percolated to the lower level

c Shantanu Dutt, UIC^

Some Basic Issues in Memory Hierarchies

(1) Block Placement (contd.): 

set containing all FA and DM are special cases of set-associative. In FA, there is only one

^

blocks. In DM, there are

^

sets, each containing exactly

1 block

FA has the most flexibility in placing a block, while DM has the least

c Shantanu Dutt, UIC^

Some Basic Issues in Memory Hierarchies (contd.)

(2) Block Identification: 

tags Associative or content-addressible memory (CAM): stores the block # or

of resident blocks for each set.

The

index

, which is the

H K g J  h

main-mem. hierarchy.for the rest of the block # (the tag). This is generally used in the cache – rightmost bits of the block #, determines which set of the CAM to search

(^0) (^1) (^2) (^3) (^4) (^5) (^6) (^7) (^0) (^1) (^2) (^3) (^4) (^5) (^6) (^7) (^0) (^1) (^2) (^3) (^4) (^5) (^6)

7

0 Set

1 Set

2 Set

3 Set

4 1

41

41

position 14 mod 8 = 6Search only in tag

within set 14 mod 4 = 2Search everywhere

Word #Block offset/

Tag

Index

Block #

the "indexed" set, and the word # is used to select the word in the blockselect the set (in DM and SA), and the tag is used to check all blocks in (b) Different portions of an address: The index (address mod s) is used to

i i

i i

i i

i i

i i

i i

i i

i i

i i

i i

j j

j j

j j

j j

j j

j j

j j

j j

j j

j j

k k

k k

k k

k k

k k

k k

k k

k k

k k

k k

l l

l l

l l

l l

l l

l l

l l

l l

l l

l l

m m

m m

m m

m m

m m

m m

m m

m m

m m

m m

n n

n n

n n

n n

n n

n n

n n

n n

n n

n n

o p o p o p o p o p o p q r q r q r q r q r q r

s t s t s t s t s t s t

u v u v u v u v u v u v

Bl. #

Bl. #

Bl. #

Block 14

Direct mapped (DM):

2-way Set Associative (SA):

Fully associative (FA):

Tag Data

Data

Data

Tag

Tag

Search everywhere

performed in parallel in FA and SA caches for speed. (a) Block identification in different cache types. Search

Search

Search

Search

c Shantanu Dutt, UIC^

17

Some Basic Issues in Memory Hierarchies (contd.)

CAMs: 

Hardware Complexity: Of parallel search logic =

w

 8 x

(^83)

 for a FA cache,

where

is the size of the cache in blocks, and

x

is the # of bits in the

block #. This can be prohibitive for large

x

and

y

For SA cache, we have one such CAM of size

 @  x  h 

for each of

the

J

sets. So total CAM size is

3 @  x  h 

. However, there is only

one parallel search logic of size

 @  x  h 

which is used to search

only the indexed set

Tag

Data Block Logic Search

xednI

xedn I

Data Store

StoreTag

Decoder5−to−32=l−to−2**l

= 32−to−12**(r−l)−to− Mux

#Set = 312**l−1 1 0

=32 2**(r−l)

=15 m−l

=32 2**(r−l)

= 512 bits^ 16 blocks

bits 512

bits 15

15

(^5)

5

Decoder1−to−32= l−to−2**l

(^3110) Set #

4 3

0

Word #

23

9 8

r=10l=5 m=

Block # (20)

Tag (15)

Index (5)

of sets = 32, set size = 32 blocks2**r = 1024 blocksCache size=

l

m

w^ There is only one equality comparator in a DM cache; thus complexity is

 8  x  y  

Time complexity of search:

w



H

K

g

x

 for FA,

w

 H K g  x  h  

for SA, and

w

H K g  x  y  

c Shantanu Dutt, UIC^