Central Processing Unit - Computer Design and Organisation - Lecture Slides, Slides of Computer Science

These are the Lecture Slides of Computer Design and Organisation which includes Brick and Mortar, Silicon Manufacturing, Cost of Production, Mortar Chips, Fixed Set of Functions, Standard Interface, Benefits of Brick and Mortar, Chip Design etc. Key important points are: Central Processing Unit, Memory Hierarchy, Multiprocessors, Multithreading, Technological Improvements, Moore’s Law, Intel Microprocessor Speeds, Power Dissipation, Processor-Memory Performance Gap

Typology: Slides

2012/2013

Uploaded on 03/22/2013

dhirendra
dhirendra šŸ‡®šŸ‡³

4.3

(78)

268 documents

1 / 32

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Computer Design and Organization
•Architecture = Design + Organization + Performance
•Topics in this class:
–Central processing unit: deeply pipelined, multiple instr.
per cycle, exploitation of instruction level parallelism (in-
order and out-of-order), support for speculation (branch
prediction, spec. loads).
–Memory hierarchy: multi-level cache hierarchy, includes
hardware and software assists for enhanced performance
–Multiprocessors: SMP’s and CMP’s –cache coherence and
synchronization
–Multithreading: Fine, coarse and SMT
–Some ā€œadvancedā€ topic: current research in dept.
Docsity.com
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20

Partial preview of the text

Download Central Processing Unit - Computer Design and Organisation - Lecture Slides and more Slides Computer Science in PDF only on Docsity!

Computer Design and Organization

• Architecture = Design + Organization + Performance

• Topics in this class:

  • Central processing unit: deeply pipelined, multiple instr.

per cycle, exploitation of instruction level parallelism (in-

order and out-of-order), support for speculation (branch

prediction, spec. loads).

  • Memory hierarchy: multi-level cache hierarchy, includes

hardware and software assists for enhanced performance

  • Multiprocessors: SMP’s and CMP’s –cache coherence and

synchronization

  • Multithreading: Fine, coarse and SMT
  • Some ā€œadvancedā€ topic: current research in dept.

Technological improvements

  • CPU :
    • Annual rate of speed improvement is 35% before 1985 and 60% from 1985 until 2003
    • Slightly faster than increase in number of transistors on-chip (Moore’s law)
  • Memory:
    • Annual rate of speed improvement (decrease in latency) is < 10%
    • Density quadruples in 3 years.
  • I/O :
    • Access time has improved by 30% in 10 years
    • Density improves by 50% every year

Evolution of Intel Microprocessor Speeds

0

500

1000

1500

2000

2500

3000

3500

4000

1971 1974 1979 1982 1985 1989 1993 1997 1998 1999 2000 2001 2002 2003 Year

Speed (MHz)

Power Dissipation

Performance evaluation basics

• Performance inversely proportional to execution time

• Elapsed time includes:

user + system; I/O; memory accesses; CPU per se

• CPU execution time (for a given program): 3 factors

  • Number of instructions executed
  • Clock cycle time (or rate)
  • CPI: number of cycles per instruction (or its inverse IPC)

CPU execution time = Instruction count * CPI * clock cycle

time

Components of the CPI

• CPI for single instruction issue with ideal pipeline = 1

• Previous formula can be expanded to take into account

classes of instructions

  • For example in RISC machines: branches, f.p., load-store.
  • For example in CISC machines: string instructions

CPI = Σ CPI i * f i where f i is the frequency of instructions in class i

• We’ll talk about ā€œcontributions to the CPIā€ from, e.g,:

  • memory hierarchy
  • branch (misprediction)
  • hazards etc.

Computer design: Make the common

case fast

  • Amdahl’s law (speedup)

Speedup = (performance with enhancement)/(performance base case) Or equivalently Speedup = (exec.time base case)/(exec.time with enhancement)

  • Application to parallel processing
    • s fraction of program that is sequential
    • Speedup S is at most 1/ s
    • That is if 20% of your program is sequential the maximum speedup with an infinite number of processors is at most 5

Pipelining

  • One instruction/result every cycle (ideal)
    • Not in practice because of hazards
  • Increase throughput (wrt non-pipelined implementation)
    • Throughput = number of results/second
  • Speed-up (over non-pipelined implementation)
    • In the ideal case, if n stages , the speed-up will be close to n. Can’t make n too large: physical limitations and load balancing between stages & hazards
  • Might slightly increase the latency of individual instructions (pipeline

overhead)

Inst. mem.

PC ALU

ALU

ALU

Data mem.

Regs.

s e (^2)

zero

IF ID/RR^ EXE Mem WB

IF/ID ID/EX^

EX/MEM MEM/WB

(PC)

(Rd)

data

control

Five instructions in progress; one of each color

Hazards

• Structural hazards

– Resource conflict (mostly in multiple instruction

issue machines; also for resources which are used

for more than one cycle)

• Data dependencies

– Most common RAW but also WAR and WAW in

OOO execution

• Control hazards

– Branches and other flow of control disruptions

• Consequence: stalls in the pipeline Docsity.com

Example of structural hazard

• For single issue machine: common data and

instruction memory (unified cache)

– Pipeline stall every load-store instruction (control

easy to implement)

• Better solutions

– Separate I-cache and D-cache

– Instruction buffers

– Both + sophisticated instruction fetch unit!

• Will see more cases in multiple issue machines

Data hazards

• Data dependencies between instructions that

are in the pipe at the same time.

• For single pipeline in order issue: Read After

Write hazard (RAW)

Add R1, R2, R3 #R1 is result register

Sub R4, R1,R2 #conflict with R

Add R3, R5, R1 #conflict with R

Or R6,R1,R2 #conflict with R

Add R5, R2, R1 #R1 OK now (5 stage pipe)

Forwarding

• Result of ALU operation is known at end of

EXE stage

• Forwarding between:

– EXE/MEM pipeline register to ALUinput for

instructions i and i+

– MEM/WB pipeline register to ALUinput for

instructions i and i+

  • Note that if the same register has to be forwarded, forward the last one to be written

– Forwarding through register file (write 1st half of

cycle, read 2nd half of cycle)

• Need of a ā€œforwarding boxā€ in the Control UnitDocsity.com

Add R1, R2, R

R1 available here

Sub R4,R1,R

R 1 needed here

ADD R3,R5,R

OR R6,R1,R

Add R5,R1,R2 | | | | | |

OK w/o forwarding

IF ID EXE MEM WB