Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

memory hierarchy and design, Thesis of Computer Science

FAST - National University of Computer and Emerging Sciences (NUCES)Computer Science

this will tell you the memory hierarchy of computers

Typology: Thesis

2016/2017

Uploaded on 05/03/2017

arsalan-hussain 🇵🇰

(1)

1 document

1 / 16

This page cannot be seen from the preview

Don't miss anything!

Faraz Idris Khan

Memory Hierarchy

Design

Discover Thesis of Computer Science FAST - National University of Computer and Emerging Sciences (NUCES)

Partial preview of the text

Download memory hierarchy and design and more Thesis Computer Science in PDF only on Docsity!

Faraz Idris Khan

Memory Hierarchy

Design

Advanced Cache

Optimizations

Reducing the hit time —Small and simple first-level caches and wayprediction. Increasing cache bandwidth —Pipelined caches, multibanked caches, and nonblocking caches. These techniques have varying impacts on power consumption. Reducing the miss penalty —Critical word first and merging write buffers. These optimizations have little impact on power. Reducing the miss rate —Compiler optimizations. Obviously any improvement at compile time improves power consumption. Reducing the miss penalty or miss rate via parallelism —Hardware prefetching and compiler prefetching. These optimizations generally increase power consumption, primarily due to prefetched data that are unused

Second Optimization: Way Prediction to Reduce Hit Time

Way prediction – extra bits

This prediction means the multiplexor is set

early to select the desired block, and only a

single tag comparison is performed that

clock cycle in parallel with reading the

cache data.

Third Optimization: Pipelined Cache Access to Increase Cache Bandwidth

Pipeline cache access

Effective latency of a first-level cache hit can

be multiple clock cycles, giving fast clock

cycle time and high bandwidth but slow hits

Fifth Optimization: Multibanked

Caches to Increase Cache Bandwidth

Divide cache in independent bank to support

simultaneous access

Multiple banks also are a way to reduce power

consumption both in caches and DRAM

 Sequential interleaving

A simple mapping that works well is to spread

the addresses of the block sequentially across

the banks

bank 0 has all blocks whose address modulo 4 is

bank 1 has all blocks whose address modulo 4

is 1

Sixth Optimization: Critical Word First and Early Restart to Reduce Miss Penalty  Two strategies  Critical word first—Request the missed word first from memory and send it to the processor as soon as it arrives; let the processor continue execution while filling the rest of the words in the block.  (^) Early restart—Fetch the words in normal order, but as soon as the requested word of the block arrives send it to the processor and let the processor continue execution.

Eighth Optimization: Compiler Optimizations to Reduce Miss Rate The increasing performance gap between processors and main memory has inspired compiler writers to scrutinize the memory hierarchy to see if compile time optimizations can improve performance The goal is to maximize accesses to the data loaded into the cache before the data are replaced Loop Interchange Blocking

Ninth Optimization: Hardware Prefetching of

Instructions and Data to Reduce Miss Penalty or

Miss Rate

Nonblocking caches effectively reduce the miss penalty

by overlapping execution with memory access

Another approach is to prefetch items before the

processor requests them

DRAM Technology

DRAMs grew in capacity

The cost of a package with all

the necessary address lines

was an issue

Multiplex the address lines, thereby cutting the number of address pins in half

One-half of the address is

sent first during the row

access strobe (RAS)

The other half of the address,

sent during the column

access strobe (CAS), follows it

Improving Memory Performance Inside a DRAM Chip First, DRAMs added timing signals that allow repeated accesses to the row buffer without another row access time  Synchronous DRAM (SDRAM) Second major change was to add a clock signal to the DRAM interface, so that the repeated transfers would not bear that overhead Third, to overcome the problem of getting a wide stream of bits from the memory without having to make the memory system too large as memory system density increased, DRAMS were made wider  Double data rate (DDR) The fourth major DRAM innovation to increase bandwidth is to transfer data on both the rising edge and falling edge of the DRAM clock signal, thereby doubling the peak data rate

Virtual Memory

Paged virtual memory means that every memory

access logically takes at least twice as long

One memory access to obtain the physical address Second access to get the data if the accesses have locality, then the address translations for the accesses must also have locality  Translation lookaside buffer (TLB) keeping these address translations in a special cache A memory access rarely requires a second access to translate the address TLB entry Cache entry where the tag holds portions of the virtual address and the data portion holds a physical page address, protection field, valid bit, and usually a use bit and a dirty bit

memory hierarchy and design, Thesis of Computer Science

Related documents

Partial preview of the text

Download memory hierarchy and design and more Thesis Computer Science in PDF only on Docsity!

Faraz Idris Khan

Memory Hierarchy

Design

Advanced Cache

Optimizations

Way prediction – extra bits

This prediction means the multiplexor is set

early to select the desired block, and only a

single tag comparison is performed that

clock cycle in parallel with reading the

cache data.

Pipeline cache access

Effective latency of a first-level cache hit can

be multiple clock cycles, giving fast clock

cycle time and high bandwidth but slow hits

Fifth Optimization: Multibanked

Caches to Increase Cache Bandwidth

Divide cache in independent bank to support

simultaneous access

Multiple banks also are a way to reduce power

consumption both in caches and DRAM

 Sequential interleaving

A simple mapping that works well is to spread

the addresses of the block sequentially across

the banks

bank 0 has all blocks whose address modulo 4 is

bank 1 has all blocks whose address modulo 4

is 1

Ninth Optimization: Hardware Prefetching of

Instructions and Data to Reduce Miss Penalty or

Miss Rate

Nonblocking caches effectively reduce the miss penalty

by overlapping execution with memory access

Another approach is to prefetch items before the

processor requests them

DRAM Technology

The cost of a package with all

the necessary address lines

was an issue

One-half of the address is

sent first during the row

access strobe (RAS)

The other half of the address,

sent during the column

access strobe (CAS), follows it

Virtual Memory

Paged virtual memory means that every memory

access logically takes at least twice as long