Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Log in Sign up

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Computer Architecture Notes, Lecture notes of Computer Architecture and Organization

University of North Texas (UNT)Computer Architecture and Organization

Slides for Henessy Book for computer architecture 5th edition book.

Typology: Lecture notes

2016/2017

Uploaded on 02/28/2017

Sadman.Ahmed 🇺🇸

4.7

(3)

1 document

1 / 9

This page cannot be seen from the preview

Don't miss anything!

2/2/17

1

CSCE 4610 Feb 02, 2017 1

CSCE 4610: Computer Architecture

Review

Instruction sets

Function call, register windows

Dataflow

HW #2 (Due Feb 7) A4, A7 (a), A18

A7(b) is bonus

Dinero and trace driven simulations

So, each instruction and data item (variable) will have an address

instruction addresses are in the program counter

data addresses are generated using Load and Store instructions

These addresses are “Virtual” addresses which will be mapped physical addresses during

execution

Each address is either 32 or 64 bits and refer to a byte

for 4 byte data type (integer), your byte address will look like

xxxxx.........xxxx 00

Or addresses will be on 4 byte boundary

CSCE 4610 Feb 02, 2017 2

CSCE 4610: Computer Architecture

Dynamic data relates to malloc, new, etc

Text segment = your code

All these addresses are “virtual”

At runtime, physical pages are allocated for the virtual pages

So we need to understand the difference between a virtual and physical address

Addresses generated by your program (either instruction address, or address generated by a

Load or Store) are virtual addresses

Discover Lecture notes of Computer Architecture and Organization University of North Texas (UNT)

Partial preview of the text

Download Computer Architecture Notes and more Lecture notes Computer Architecture and Organization in PDF only on Docsity!

CSCE 4610 Feb 02, 2017 1

CSCE 4610: Computer Architecture

Review Instruction sets Function call, register windows Dataflow HW #2 (Due Feb 7) A4, A7 (a), A A7(b) is bonus Dinero and trace driven simulations So, each instruction and data item (variable) will have an address instruction addresses are in the program counter data addresses are generated using Load and Store instructions These addresses are “Virtual” addresses which will be mapped physical addresses during execution Each address is either 32 or 64 bits and refer to a byte for 4 byte data type (integer), your byte address will look like xxxxx.........xxxx 00 Or addresses will be on 4 byte boundary CSCE 4610 Feb 02, 2017 2

CSCE 4610: Computer Architecture

Dynamic data relates to malloc, new, etc Text segment = your code All these addresses are “virtual” At runtime, physical pages are allocated for the virtual pages So we need to understand the difference between a virtual and physical address Addresses generated by your program (either instruction address, or address generated by a Load or Store) are virtual addresses

CSCE 4610 Feb 02, 2017 3

CSCE 4610: Computer Architecture

Typical Memory Hierarchy (page 72) CSCE 4610 Feb 02, 2017 4

CSCE 4610: Computer Architecture

21 2 5 2 9 2 13 2 17 221 L1 CACHE SRAM LAST LEVEL CACHE EDRAM DRAM^ PCM^ FLASH HARD DRIVE MAIN MEMORY SYSTEM Typical Access Latency (in terms of processor cycles for a 4 GHz processor) HIGH PERFORMANCE DISK SYSTEM 2 3 2 7 2 11 215 219 223 Figure 1: Latency of different technologies in memory hierarchy. Numbers accurate within a factor of two. There is active research on PCM, and several PCM prototypes have been proposed, each optimizing for some important device characteristics (such as density, latency, bandwidth, or lifetime). While the PCM technology matures, and becomes ready to be used as a complement to DRAM, we believe that system architecture solutions can be explored to make these memories part of the main memory to improve system performance. The objective of this pa- per is to study the design trade-offs in integrating the most promis- ing emerging memory technology, PCM, into the main memory system. To be independent of the choice of a specific PCM prototype, we use an abstract memory model that is D times denser than DRAM and S times slower than DRAM. We show that for currently pro- jected values of PCM (S ≈ 4 , D ≈ 4 ), a main memory system using PCM can reduce page faults by 5X, and hence execute appli- cations with much larger working sets. However, because PCM is slower than DRAM, main memory access time is likely to increase linearly with S, which increases the overall execution time. There- fore, we believe that PCM is unlikely to be a drop-in replacement for DRAM. We show that by having a small DRAM buffer in front of the PCM memory, we can make the effective access time and performance closer to a DRAM memory. We study the design issues in such a hybrid memory architecture and show how a two-level memory system can be managed. Our evaluations for a baseline system of 16-cores with 8GB DRAM show that PCM-based hybrid memory can provide a speedup of 3X while incurring only 13% area overhead. The speedup is within 10% of an expensive DRAM only system which would incur 4X

2. BACKGROUND AND MOTIVATION With increasing number of processors in the computer system, the pressure on the memory system to satisfy the demand of all concurrently executing applications (threads) has increased as well. Furthermore, critical computing applications are becoming more data-centric than compute-centric [9]. One of the major challenges in the design of large-scale, high-performance computer systems is maintaining the performance growth rate of the system mem- ory. Typically, the disk is five orders of magnitude slower than the rest of the system [6] making frequent misses in system main memory a major bottleneck to system performance. Furthermore, main memory consisting entirely of DRAM is already hitting the power and cost limits [15]. Exploiting emerging memory technolo- gies, such as Phase-Change Memory (PCM) and Flash, become crucial to be able to build larger capacity memory systems in the future while remaining within the overall system cost and power budgets. In this section, we first present a brief description of the Phase-Change Memory technology, and highlight the strengths of PCM that makes it a promising candidate for main memory of high- performance servers. We present a simple model that is useful in describing such emerging memory technologies for use in com- puter architecture studies. 2.1 What is Phase-Change Memory? PCM is a type of non-volatile memory that exploits the prop- erty of chalcogenide glass to switch between two states, amorphous and crystalline, with the application of heat using electrical pulses. New technologies may impact the hierarchies Our goal is to keep needed data and instructions in higher level memories (like L1 cache) so that the average memory access time is reduced 3D DRAMs How to find the average number of cycles (CPI) to access memory for Load and Store? Average memory access cycles

CSCE 4610 Feb 02, 2017 7

CSCE 4610: Computer Architecture

Consider an example with 64 byte blocks or lines In 64K Byte cache we will have 1024 cache lines (each with 64 bytes) A straightforward address translation looks like Why do we need Tag bits? This type of caches are called DIRECT mapped caches 16 - bit block ID (or tag) 10 - bit block address 6 - bit byte address Tags of data currently in Cache Data in Cache (64 bytes per line) CSCE 4610 Feb 02, 2017 8

CSCE 4610: Computer Architecture

A given address can be placed in only a specific cache block. All address with the same 10-bit block address conflict for the same cache block that is, 2^16 different address map to the same cache line There are 512 sets (each with two lines) One line from each group is a set We need 9 bits to select a set Can we say that a given address can be placed in one of 2 locations? We pair cache blocks and call them sets (two blocks per set) Then we map an address to a set so that it can be in either of the blocks of the set 17 - bit block ID (or tag) 9 - bit set address 6 - bit byte address

CSCE 4610 Feb 02, 2017 9

CSCE 4610: Computer Architecture

An example: Consider the following sequence of addresses (all are hex numbers). B01AA050; B01AA073; B2FE3057; B4FFD85F; B01AA04E. Let us consider where they fall if we have 1-way set associative (direct mapped) cache with 64Byte cache blocks and 64K cache (1024 lines) B01AA050 = (1011 0000 0001 1010) 1010000001 010000 Line 641, byte 16 (cold miss) B01AA073 = (1011 0000 0001 1010) 1010000001 110011 Line 641, byte 51 (hit) B2FE3057 = (1011 0010 1111 1110) 0011000001 010111 Line 191, byte 39 (cold miss) B4FFD85F = (1011 0100 1111 1111) 1101100001 011111 Line 865 byte 31 (cold miss) B01BA04E = (1011 0000 0001 1011) 1010000001 001110 Line 641 Byte 14 (conflict miss) 2 - way set associative cache (512 sets) B01AA050 = (1011 0000 0001 1010 1 ) 010000001 010000 Set 129, byte 16 (cold miss) B01AA073 = (1011 0000 0001 1010 1 ) 010000001 110011 Set 129, byte 51 (hit) B2FE3057 = (1011 0010 1111 1110 0 ) 011000001 010111 Set 191, byte 39 (cold miss) B4FFD85F = (1011 0100 1111 1111 1 ) 101100001 011111 Set 353 byte 31 (cold miss) B01BA04E = (1011 0000 0001 1011 1 ) 010000001 001110 Set 129, Byte 14 (no conflict) CSCE 4610 Feb 02, 2017 10

CSCE 4610: Computer Architecture

We can reduce cache conflicts by increasing the set size (set associativity) For example consider 4 way associative (each set has 4 cache lines) One block from each group forms a set. There are 256 sets and we need 8-bits to select a set 18 - bit block ID (or tag) 8 - bit set address 6 - bit byte address

CSCE 4610 Feb 02, 2017 13

CSCE 4610: Computer Architecture

Table from B- 24 Observations Larger caches – overall miss rate decreases fewer capacity misses Higher associativity – fewer conflict misses capacity miss contribution increases Cold misses do not change their contribution increases with higher associativity and capacity CSCE 4610 Feb 02, 2017 14

CSCE 4610: Computer Architecture

Reading vs Writing To read: locate the cache line and read the line (and use the needed byte) If not in cache, “cache miss” You can fetch data and Tag together and ignore data if tag does not match To write: We need to fetch tag, check if a hit, only then write data Should we update higher level caches and main-memory also? If a cache miss, do we simply update higher level memories (L2 cache, main memory) or bring the line into cache and then update the cache only cache or both cache and higher level memories? Write-through and write-back

CSCE 4610 Feb 02, 2017 15

CSCE 4610: Computer Architecture

Reading vs Writing On Miss do we bring the data to cache and then modify (allocate) or only modify in memory (no-allocate) Look at the choices Write Write Write Write Hit Hit Hit Hit Miss (^) Miss Miss Miss Wtie to cache Write through to memory^ Wite to Memory (no allocate) Wtie to cache Write through to memory^ Load cache (allocate) Write to cache Write-through to memory Write only to Cache Set dirty bit Write only to Write to Memory (No allocate) Cache Set dirty bit^ Load cache (Allocate) Write only cache Set dirty bit Intel i486, Pentium Motorola MC 68040 Motorola MC Motorola MC Power PC 603 CSCE 4610 Feb 02, 2017 16

CSCE 4610: Computer Architecture

Write-through using write buffers Cache is updated immediately and the new value is placed in a buffer, awaiting write to Memory (or L2 cache) Address Data to be written 1000 value Cache performance evaluation: Cache miss rate Miss Penalty Memory Access time = (time on hit)+(miss rate)*(miss penalty)

Computer Architecture Notes, Lecture notes of Computer Architecture and Organization

Related documents

Partial preview of the text

Download Computer Architecture Notes and more Lecture notes Computer Architecture and Organization in PDF only on Docsity!

CSCE 4610: Computer Architecture

CSCE 4610: Computer Architecture

CSCE 4610: Computer Architecture

CSCE 4610: Computer Architecture

CSCE 4610: Computer Architecture

CSCE 4610: Computer Architecture

CSCE 4610: Computer Architecture

CSCE 4610: Computer Architecture

CSCE 4610: Computer Architecture

CSCE 4610: Computer Architecture

CSCE 4610: Computer Architecture

CSCE 4610: Computer Architecture