





















Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
These are the Lecture Slides of Computer Systems which includes Writing to Cache, Memory Access, Simple Direct-Mapped Cache, Inconsistent Memory, Write-Through Caches, Write-Back Caches, Finishing Write Back, Write Misses etc.Key important points are: Memory Hierarchies and Caches, Memory Systems, Cache Introduction, Introducing Caches, Principle of Locality, Spatial Locality, Temporal Locality, Locality in Program, Kinds of Caches
Typology: Slides
1 / 29
This page cannot be seen from the preview
Don't miss anything!






















We’ve already seen how to make a fast processor. How can wesupply the CPU with enough data to keep it busy?supply the CPU with enough data to keep it busy?
-^
Part of CS410 focuses on memory and input/output issues,which are frequently bottlenecks that limit the performance of asystem.
-^
We’ll start off by looking at memory systems and turn to I/O.– How caches can dramatically improve the speed of memory
accesses.How virtual memory provides security and ease of
programming
connected
Memory
Processor
3
Input/Output
-^
-^
5
-^
Unfortunately there is a tradeoff between speed, cost and capacity.
Storage
Speed
Cost
Capacit
Storage
Speed
Cost
Capacit
y
Static RAM
Fastest
Expensive
Smallest
Dynamic RAM
Slow
Cheap
Large
Hard disks
Slowest
Cheapest
Largest
-^
Fast memory is too expensive for most people to buy a lot of.
-^
But dynamic memory has a much longer delay than other functional units in adatapath If every lw or sw accessed dynamic memory we’d have to either
Hard disks
Slowest
Cheapest
Largest
datapath. If every lw or sw accessed dynamic memory, we d have to eitherincrease the cycle time or stall frequently.
-^
Here are rough estimates of some current storage parameters.*
Storage
Delay
Cost/MB
Capacity
Storage
Delay
Cost/MB
Capacity
Static RAM
1-10 cycles
~$
128KB-2MB
Dynamic RAM
100-200 cycles
~$0.
128MB-4GB
Hard disks
10 000 000 cycles
~$0 0005
20GB-400GB
6
Hard disks
10
,000,000 cycles
~$0.
20GB-400GB
*These numbers are a couple of years old now, but the ratios are still about right.More recent numbers in Sec. 5.1 of the book.
-^
-^
8
The principle of temporal locality says that if a program accessesone memory address there is a good chance that it will access the
one memory address, there is a good chance that it will access thesame address again.
-^
Loops are excellent examples of temporal locality in programs.– The loop body will be executed many times.– The computer will need to access those same few locations of
the instruction memory repeatedly.
-^
For example:
Loop:
lw
$t0,
0($s1)
add
$t0, $t0, $s
sw
$t0,
0($s1)
addi
$s1, $s1, -
every loop iteration
$^
, $
,
bne
$s1,
$0,
Loop
9
every
loop iteration.
The principle of spatial locality says that if a program accessesone memory address there is a good chance that it will also
one memory address, there is a good chance that it will alsoaccess other nearby addresses.
sub
$sp,
$sp,
16
sw
$ra,
0($sp)
sw
$s0,
4($sp)
sw
$a0,
8($sp)
sw
$a1,
12($sp)
Nearly every program exhibits spatial locality, becauseinstructions are usually executed in sequence—if we execute aninstruction at memory location
i^
then we will probably also
instruction at memory location
i , then we will probably also
execute the next instruction, at memory location
i+
Code fragments such as loops exhibit
both
temporal and spatial
locality.
11
-^
sum = 0;
for (i = 0; i <
MAX; i++)
sum = sum + a[i];
employee.name = “Homer Simpson”;employee.boss =
“Mr.
Burns”;
employee.age = 45;
employee
.age
45;
-^
12
When the CPU reads location
i^
from main
memory, a copy of that data is placed in the
CPU
y^
py
p
cache.
-^
But instead of just copying the contents oflocation
i , we can copy
several
values into the
cache at once such as the four bytes from
CPU
cache at once, such as the four bytes fromlocations
i^
through
i^
locations
i^
i
i^
A little staticRAM (cache)
locations
i^
i^
i^
data from the cache and not the slower mainmemory.
Lots of
element at a time, the cache might actually beloading four array elements at once.
Again, the initial load incurs a performancepenalty but we’re gambling on spatial locality and
dynamic RAM
14
penalty
, but we re gambling on spatial locality and
the chance that the CPU will need the extra data.
The general idea behind caches is used in many otheri
i
situations.
-^
Networks are probably the best example.– Networks have relatively high “latency” and low “bandwidth,”
t d d t
t^
f^
d^
i^
bl
so repeated data transfers are undesirable.
most recently accessed web pages on your hard disk.Administrators can set up a network wide cache and
companies like Akamai also provide caching services.
A few other examples:
Many processors have a “translation lookaside buffer ” which
translation lookaside buffer, which
is a cache dedicated to virtual memory support.
blocks, like directories, in main memory... and that data may
y^
y
then in turn be stored in the CPU cache!
15
-^
Caches are divided into blocks, which may be of various sizes.– The number of blocks in a cache is usually a power of 2.– For now we’ll say that each block contains one byte. This won’t take
advantage of spatial locality, but we’ll do that next time.
-^
Here is an example cache with eight blocks, each holding one byte.
Blockindex
8-bit data
000001010011100101
17
110111
When we copy a block of data from main memory
to the cache, where exactly should we put it?
if it has to be fetched from main memory first?if it has to be fetched from main memory first?
To load a new block from main RAM, we’d have toreplace one of the existing blocks in the cachereplace one of the existing blocks in the cache...which one?
write
operations be handled by the
memory system?memory system?
18
-^
k
0 MemoryAddress
k^
Index
0 1 2 3 4
-^
0 1 2 Index
4 5 6 7 8
3
(^910111213)
20
131415
Docsity.com
-^
00
00 MemoryAddress
-^
I d
00
00 0001001000110100
-^
Index^000110
01000101011001111000
-^
11
1001101010111100
21
110111101111
Docsity.com