memory hierarchy and design, Thesis of Computer Science

this will tell you the memory hierarchy of computers

Typology: Thesis

2016/2017

Uploaded on 05/03/2017

arsalan-hussain
arsalan-hussain 🇵🇰

5

(1)

1 document

1 / 16

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Faraz Idris Khan
Memory Hierarchy
Design
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff

Partial preview of the text

Download memory hierarchy and design and more Thesis Computer Science in PDF only on Docsity!

Faraz Idris Khan

Memory Hierarchy

Design

Advanced Cache

Optimizations

Reducing the hit time —Small and simple first-level caches and wayprediction. Increasing cache bandwidth —Pipelined caches, multibanked caches, and nonblocking caches. These techniques have varying impacts on power consumption. Reducing the miss penalty —Critical word first and merging write buffers. These optimizations have little impact on power. Reducing the miss rate —Compiler optimizations. Obviously any improvement at compile time improves power consumption. Reducing the miss penalty or miss rate via parallelism —Hardware prefetching and compiler prefetching. These optimizations generally increase power consumption, primarily due to prefetched data that are unused

Second Optimization: Way Prediction to Reduce Hit Time

Way prediction – extra bits

This prediction means the multiplexor is set

early to select the desired block, and only a

single tag comparison is performed that

clock cycle in parallel with reading the

cache data.

Third Optimization: Pipelined Cache Access to Increase Cache Bandwidth

Pipeline cache access

Effective latency of a first-level cache hit can

be multiple clock cycles, giving fast clock

cycle time and high bandwidth but slow hits

Fifth Optimization: Multibanked

Caches to Increase Cache Bandwidth

Divide cache in independent bank to support

simultaneous access

Multiple banks also are a way to reduce power

consumption both in caches and DRAM

 Sequential interleaving

A simple mapping that works well is to spread

the addresses of the block sequentially across

the banks

bank 0 has all blocks whose address modulo 4 is

bank 1 has all blocks whose address modulo 4

is 1

Sixth Optimization: Critical Word First and Early Restart to Reduce Miss PenaltyTwo strategiesCritical word first—Request the missed word first from memory and send it to the processor as soon as it arrives; let the processor continue execution while filling the rest of the words in the block.  (^) Early restart—Fetch the words in normal order, but as soon as the requested word of the block arrives send it to the processor and let the processor continue execution.

Eighth Optimization: Compiler Optimizations to Reduce Miss Rate The increasing performance gap between processors and main memory has inspired compiler writers to scrutinize the memory hierarchy to see if compile time optimizations can improve performance The goal is to maximize accesses to the data loaded into the cache before the data are replaced Loop Interchange Blocking

Ninth Optimization: Hardware Prefetching of
Instructions and Data to Reduce Miss Penalty or
Miss Rate

Nonblocking caches effectively reduce the miss penalty

by overlapping execution with memory access

Another approach is to prefetch items before the

processor requests them

DRAM Technology

DRAMs grew in capacity

The cost of a package with all
the necessary address lines
was an issue

Multiplex the address lines, thereby cutting the number of address pins in half

One-half of the address is
sent first during the row
access strobe (RAS)
The other half of the address,
sent during the column
access strobe (CAS), follows it

Improving Memory Performance Inside a DRAM Chip First, DRAMs added timing signals that allow repeated accesses to the row buffer without another row access time  Synchronous DRAM (SDRAM) Second major change was to add a clock signal to the DRAM interface, so that the repeated transfers would not bear that overhead Third, to overcome the problem of getting a wide stream of bits from the memory without having to make the memory system too large as memory system density increased, DRAMS were made wider  Double data rate (DDR) The fourth major DRAM innovation to increase bandwidth is to transfer data on both the rising edge and falling edge of the DRAM clock signal, thereby doubling the peak data rate

Virtual Memory

Paged virtual memory means that every memory
access logically takes at least twice as long

One memory access to obtain the physical address Second access to get the data if the accesses have locality, then the address translations for the accesses must also have localityTranslation lookaside buffer (TLB) keeping these address translations in a special cache A memory access rarely requires a second access to translate the address TLB entry Cache entry where the tag holds portions of the virtual address and the data portion holds a physical page address, protection field, valid bit, and usually a use bit and a dirty bit