Computer Architecture Notes, Lecture notes of Computer Architecture and Organization

Slides for Henessy Book for computer architecture 5th edition book.

Typology: Lecture notes

2016/2017

Uploaded on 02/28/2017

Sadman.Ahmed
Sadman.Ahmed 🇺🇸

4.7

(3)

1 document

1 / 9

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
2/2/17
1
CSCE 4610 Feb 02, 2017 1
CSCE 4610: Computer Architecture
Review
Instruction sets
Function call, register windows
Dataflow
HW #2 (Due Feb 7) A4, A7 (a), A18
A7(b) is bonus
Dinero and trace driven simulations
So, each instruction and data item (variable) will have an address
instruction addresses are in the program counter
data addresses are generated using Load and Store instructions
These addresses are “Virtual” addresses which will be mapped physical addresses during
execution
Each address is either 32 or 64 bits and refer to a byte
for 4 byte data type (integer), your byte address will look like
xxxxx.........xxxx 00
Or addresses will be on 4 byte boundary
CSCE 4610 Feb 02, 2017 2
CSCE 4610: Computer Architecture
Dynamic data relates to malloc, new, etc
Text segment = your code
All these addresses are “virtual”
At runtime, physical pages are allocated for the virtual pages
So we need to understand the difference between a virtual and physical address
Addresses generated by your program (either instruction address, or address generated by a
Load or Store) are virtual addresses
pf3
pf4
pf5
pf8
pf9

Partial preview of the text

Download Computer Architecture Notes and more Lecture notes Computer Architecture and Organization in PDF only on Docsity!

CSCE 4610 Feb 02, 2017 1

CSCE 4610: Computer Architecture

Review Instruction sets Function call, register windows Dataflow HW #2 (Due Feb 7) A4, A7 (a), A A7(b) is bonus Dinero and trace driven simulations So, each instruction and data item (variable) will have an address instruction addresses are in the program counter data addresses are generated using Load and Store instructions These addresses are “Virtual” addresses which will be mapped physical addresses during execution Each address is either 32 or 64 bits and refer to a byte for 4 byte data type (integer), your byte address will look like xxxxx.........xxxx 00 Or addresses will be on 4 byte boundary CSCE 4610 Feb 02, 2017 2

CSCE 4610: Computer Architecture

Dynamic data relates to malloc, new, etc Text segment = your code All these addresses are “virtual” At runtime, physical pages are allocated for the virtual pages So we need to understand the difference between a virtual and physical address Addresses generated by your program (either instruction address, or address generated by a Load or Store) are virtual addresses

CSCE 4610 Feb 02, 2017 3

CSCE 4610: Computer Architecture

Typical Memory Hierarchy (page 72) CSCE 4610 Feb 02, 2017 4

CSCE 4610: Computer Architecture

21 2 5 2 9 2 13 2 17 221 L1 CACHE SRAM LAST LEVEL CACHE EDRAM DRAM^ PCM^ FLASH HARD DRIVE MAIN MEMORY SYSTEM Typical Access Latency (in terms of processor cycles for a 4 GHz processor) HIGH PERFORMANCE DISK SYSTEM 2 3 2 7 2 11 215 219 223 Figure 1: Latency of different technologies in memory hierarchy. Numbers accurate within a factor of two. There is active research on PCM, and several PCM prototypes have been proposed, each optimizing for some important device characteristics (such as density, latency, bandwidth, or lifetime). While the PCM technology matures, and becomes ready to be used as a complement to DRAM, we believe that system architecture solutions can be explored to make these memories part of the main memory to improve system performance. The objective of this pa- per is to study the design trade-offs in integrating the most promis- ing emerging memory technology, PCM, into the main memory system. To be independent of the choice of a specific PCM prototype, we use an abstract memory model that is D times denser than DRAM and S times slower than DRAM. We show that for currently pro- jected values of PCM (S ≈ 4 , D ≈ 4 ), a main memory system using PCM can reduce page faults by 5X, and hence execute appli- cations with much larger working sets. However, because PCM is slower than DRAM, main memory access time is likely to increase linearly with S, which increases the overall execution time. There- fore, we believe that PCM is unlikely to be a drop-in replacement for DRAM. We show that by having a small DRAM buffer in front of the PCM memory, we can make the effective access time and performance closer to a DRAM memory. We study the design issues in such a hybrid memory architecture and show how a two-level memory system can be managed. Our evaluations for a baseline system of 16-cores with 8GB DRAM show that PCM-based hybrid memory can provide a speedup of 3X while incurring only 13% area overhead. The speedup is within 10% of an expensive DRAM only system which would incur 4X

2. BACKGROUND AND MOTIVATION With increasing number of processors in the computer system, the pressure on the memory system to satisfy the demand of all concurrently executing applications (threads) has increased as well. Furthermore, critical computing applications are becoming more data-centric than compute-centric [9]. One of the major challenges in the design of large-scale, high-performance computer systems is maintaining the performance growth rate of the system mem- ory. Typically, the disk is five orders of magnitude slower than the rest of the system [6] making frequent misses in system main memory a major bottleneck to system performance. Furthermore, main memory consisting entirely of DRAM is already hitting the power and cost limits [15]. Exploiting emerging memory technolo- gies, such as Phase-Change Memory (PCM) and Flash, become crucial to be able to build larger capacity memory systems in the future while remaining within the overall system cost and power budgets. In this section, we first present a brief description of the Phase-Change Memory technology, and highlight the strengths of PCM that makes it a promising candidate for main memory of high- performance servers. We present a simple model that is useful in describing such emerging memory technologies for use in com- puter architecture studies. 2.1 What is Phase-Change Memory? PCM is a type of non-volatile memory that exploits the prop- erty of chalcogenide glass to switch between two states, amorphous and crystalline, with the application of heat using electrical pulses. New technologies may impact the hierarchies Our goal is to keep needed data and instructions in higher level memories (like L1 cache) so that the average memory access time is reduced 3D DRAMs How to find the average number of cycles (CPI) to access memory for Load and Store? Average memory access cycles

CSCE 4610 Feb 02, 2017 7

CSCE 4610: Computer Architecture

Consider an example with 64 byte blocks or lines In 64K Byte cache we will have 1024 cache lines (each with 64 bytes) A straightforward address translation looks like Why do we need Tag bits? This type of caches are called DIRECT mapped caches 16 - bit block ID (or tag) 10 - bit block address 6 - bit byte address Tags of data currently in Cache Data in Cache (64 bytes per line) CSCE 4610 Feb 02, 2017 8

CSCE 4610: Computer Architecture

A given address can be placed in only a specific cache block. All address with the same 10-bit block address conflict for the same cache block that is, 2^16 different address map to the same cache line There are 512 sets (each with two lines) One line from each group is a set We need 9 bits to select a set Can we say that a given address can be placed in one of 2 locations? We pair cache blocks and call them sets (two blocks per set) Then we map an address to a set so that it can be in either of the blocks of the set 17 - bit block ID (or tag) 9 - bit set address 6 - bit byte address

CSCE 4610 Feb 02, 2017 9

CSCE 4610: Computer Architecture

An example: Consider the following sequence of addresses (all are hex numbers). B01AA050; B01AA073; B2FE3057; B4FFD85F; B01AA04E. Let us consider where they fall if we have 1-way set associative (direct mapped) cache with 64Byte cache blocks and 64K cache (1024 lines) B01AA050 = (1011 0000 0001 1010) 1010000001 010000 Line 641, byte 16 (cold miss) B01AA073 = (1011 0000 0001 1010) 1010000001 110011 Line 641, byte 51 (hit) B2FE3057 = (1011 0010 1111 1110) 0011000001 010111 Line 191, byte 39 (cold miss) B4FFD85F = (1011 0100 1111 1111) 1101100001 011111 Line 865 byte 31 (cold miss) B01BA04E = (1011 0000 0001 1011) 1010000001 001110 Line 641 Byte 14 (conflict miss) 2 - way set associative cache (512 sets) B01AA050 = (1011 0000 0001 1010 1 ) 010000001 010000 Set 129, byte 16 (cold miss) B01AA073 = (1011 0000 0001 1010 1 ) 010000001 110011 Set 129, byte 51 (hit) B2FE3057 = (1011 0010 1111 1110 0 ) 011000001 010111 Set 191, byte 39 (cold miss) B4FFD85F = (1011 0100 1111 1111 1 ) 101100001 011111 Set 353 byte 31 (cold miss) B01BA04E = (1011 0000 0001 1011 1 ) 010000001 001110 Set 129, Byte 14 (no conflict) CSCE 4610 Feb 02, 2017 10

CSCE 4610: Computer Architecture

We can reduce cache conflicts by increasing the set size (set associativity) For example consider 4 way associative (each set has 4 cache lines) One block from each group forms a set. There are 256 sets and we need 8-bits to select a set 18 - bit block ID (or tag) 8 - bit set address 6 - bit byte address

CSCE 4610 Feb 02, 2017 13

CSCE 4610: Computer Architecture

Table from B- 24 Observations Larger caches – overall miss rate decreases fewer capacity misses Higher associativity – fewer conflict misses capacity miss contribution increases Cold misses do not change their contribution increases with higher associativity and capacity CSCE 4610 Feb 02, 2017 14

CSCE 4610: Computer Architecture

Reading vs Writing To read: locate the cache line and read the line (and use the needed byte) If not in cache, “cache miss” You can fetch data and Tag together and ignore data if tag does not match To write: We need to fetch tag, check if a hit, only then write data Should we update higher level caches and main-memory also? If a cache miss, do we simply update higher level memories (L2 cache, main memory) or bring the line into cache and then update the cache only cache or both cache and higher level memories? Write-through and write-back

CSCE 4610 Feb 02, 2017 15

CSCE 4610: Computer Architecture

Reading vs Writing On Miss do we bring the data to cache and then modify (allocate) or only modify in memory (no-allocate) Look at the choices Write Write Write Write Hit Hit Hit Hit Miss (^) Miss Miss Miss Wtie to cache Write through to memory^ Wite to Memory (no allocate) Wtie to cache Write through to memory^ Load cache (allocate) Write to cache Write-through to memory Write only to Cache Set dirty bit Write only to Write to Memory (No allocate) Cache Set dirty bit^ Load cache (Allocate) Write only cache Set dirty bit Intel i486, Pentium Motorola MC 68040 Motorola MC Motorola MC Power PC 603 CSCE 4610 Feb 02, 2017 16

CSCE 4610: Computer Architecture

Write-through using write buffers Cache is updated immediately and the new value is placed in a buffer, awaiting write to Memory (or L2 cache) Address Data to be written 1000 value Cache performance evaluation: Cache miss rate Miss Penalty Memory Access time = (time on hit)+(miss rate)*(miss penalty)