Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Central Processing Unit - Computer Design and Organisation - Lecture Slides, Slides of Computer Science

Aligarh Muslim University Computer Science

These are the Lecture Slides of Computer Design and Organisation which includes Brick and Mortar, Silicon Manufacturing, Cost of Production, Mortar Chips, Fixed Set of Functions, Standard Interface, Benefits of Brick and Mortar, Chip Design etc. Key important points are: Central Processing Unit, Memory Hierarchy, Multiprocessors, Multithreading, Technological Improvements, Moore’s Law, Intel Microprocessor Speeds, Power Dissipation, Processor-Memory Performance Gap

Typology: Slides

2012/2013

Uploaded on 03/22/2013

dhirendra 🇮🇳

4.3

(78)

268 documents

1 / 32

This page cannot be seen from the preview

Don't miss anything!

Computer Design and Organization

•Architecture = Design + Organization + Performance

•Topics in this class:

–Central processing unit: deeply pipelined, multiple instr.

per cycle, exploitation of instruction level parallelism (in-

order and out-of-order), support for speculation (branch

prediction, spec. loads).

–Memory hierarchy: multi-level cache hierarchy, includes

hardware and software assists for enhanced performance

–Multiprocessors: SMP’s and CMP’s –cache coherence and

synchronization

–Multithreading: Fine, coarse and SMT

–Some “advanced” topic: current research in dept.

Docsity.com

Discover Slides of Computer Science Aligarh Muslim University

Partial preview of the text

Download Central Processing Unit - Computer Design and Organisation - Lecture Slides and more Slides Computer Science in PDF only on Docsity!

Computer Design and Organization

• Architecture = Design + Organization + Performance

• Topics in this class:

Central processing unit: deeply pipelined, multiple instr.

per cycle, exploitation of instruction level parallelism (in-

order and out-of-order), support for speculation (branch

prediction, spec. loads).

Memory hierarchy: multi-level cache hierarchy, includes

hardware and software assists for enhanced performance

Multiprocessors: SMP’s and CMP’s –cache coherence and

synchronization

Multithreading: Fine, coarse and SMT
Some “advanced” topic: current research in dept.

Technological improvements

CPU :
- Annual rate of speed improvement is 35% before 1985 and 60% from 1985 until 2003
- Slightly faster than increase in number of transistors on-chip (Moore’s law)
Memory:
- Annual rate of speed improvement (decrease in latency) is < 10%
- Density quadruples in 3 years.
I/O :
- Access time has improved by 30% in 10 years
- Density improves by 50% every year

Evolution of Intel Microprocessor Speeds

500

1000

1500

2000

2500

3000

3500

4000

1971 1974 1979 1982 1985 1989 1993 1997 1998 1999 2000 2001 2002 2003 Year

Speed (MHz)

Power Dissipation

Performance evaluation basics

• Performance inversely proportional to execution time

• Elapsed time includes:

user + system; I/O; memory accesses; CPU per se

• CPU execution time (for a given program): 3 factors

Number of instructions executed
Clock cycle time (or rate)
CPI: number of cycles per instruction (or its inverse IPC)

CPU execution time = Instruction count * CPI * clock cycle

time

Components of the CPI

• CPI for single instruction issue with ideal pipeline = 1

• Previous formula can be expanded to take into account

classes of instructions

For example in RISC machines: branches, f.p., load-store.
For example in CISC machines: string instructions

CPI = Σ CPI i * f i where f i is the frequency of instructions in class i

• We’ll talk about “contributions to the CPI” from, e.g,:

memory hierarchy
branch (misprediction)
hazards etc.

Computer design: Make the common

case fast

Amdahl’s law (speedup)

Speedup = (performance with enhancement)/(performance base case) Or equivalently Speedup = (exec.time base case)/(exec.time with enhancement)

Application to parallel processing
- s fraction of program that is sequential
- Speedup S is at most 1/ s
- That is if 20% of your program is sequential the maximum speedup with an infinite number of processors is at most 5

Pipelining

One instruction/result every cycle (ideal)
- Not in practice because of hazards
Increase throughput (wrt non-pipelined implementation)
- Throughput = number of results/second
Speed-up (over non-pipelined implementation)
- In the ideal case, if n stages , the speed-up will be close to n. Can’t make n too large: physical limitations and load balancing between stages & hazards
Might slightly increase the latency of individual instructions (pipeline

overhead)

Inst. mem.

PC ALU

ALU

Data mem.

Regs.

s e (^2)

zero

IF ID/RR^ EXE Mem WB

IF/ID ID/EX^

EX/MEM MEM/WB

(PC)

(Rd)

data

control

Five instructions in progress; one of each color

Hazards

• Structural hazards

– Resource conflict (mostly in multiple instruction

issue machines; also for resources which are used

for more than one cycle)

• Data dependencies

– Most common RAW but also WAR and WAW in

OOO execution

• Control hazards

– Branches and other flow of control disruptions

• Consequence: stalls in the pipeline Docsity.com

Example of structural hazard

• For single issue machine: common data and

instruction memory (unified cache)

– Pipeline stall every load-store instruction (control

easy to implement)

• Better solutions

– Separate I-cache and D-cache

– Instruction buffers

– Both + sophisticated instruction fetch unit!

• Will see more cases in multiple issue machines

Data hazards

• Data dependencies between instructions that

are in the pipe at the same time.

• For single pipeline in order issue: Read After

Write hazard (RAW)

Add R1, R2, R3 #R1 is result register

Sub R4, R1,R2 #conflict with R

Add R3, R5, R1 #conflict with R

Or R6,R1,R2 #conflict with R

Add R5, R2, R1 #R1 OK now (5 stage pipe)

Forwarding

• Result of ALU operation is known at end of

EXE stage

• Forwarding between:

– EXE/MEM pipeline register to ALUinput for

instructions i and i+

– MEM/WB pipeline register to ALUinput for

instructions i and i+

Note that if the same register has to be forwarded, forward the last one to be written

– Forwarding through register file (write 1st half of

cycle, read 2nd half of cycle)

• Need of a “forwarding box” in the Control UnitDocsity.com

Add R1, R2, R

R1 available here

Sub R4,R1,R

R 1 needed here

ADD R3,R5,R

OR R6,R1,R

OK w/o forwarding

Central Processing Unit - Computer Design and Organisation - Lecture Slides, Slides of Computer Science

Related documents

Partial preview of the text

Download Central Processing Unit - Computer Design and Organisation - Lecture Slides and more Slides Computer Science in PDF only on Docsity!

Computer Design and Organization

• Architecture = Design + Organization + Performance

• Topics in this class:

per cycle, exploitation of instruction level parallelism (in-

order and out-of-order), support for speculation (branch

prediction, spec. loads).

hardware and software assists for enhanced performance

synchronization

Performance evaluation basics

• Performance inversely proportional to execution time

• Elapsed time includes:

• CPU execution time (for a given program): 3 factors

CPU execution time = Instruction count * CPI * clock cycle

time

Components of the CPI

• CPI for single instruction issue with ideal pipeline = 1

• Previous formula can be expanded to take into account

classes of instructions

CPI = Σ CPI i * f i where f i is the frequency of instructions in class i

• We’ll talk about “contributions to the CPI” from, e.g,:

Pipelining

overhead)

PC ALU

IF ID/RR^ EXE Mem WB

IF/ID ID/EX^

EX/MEM MEM/WB

(PC)

Hazards

• Structural hazards

– Resource conflict (mostly in multiple instruction

issue machines; also for resources which are used

for more than one cycle)

• Data dependencies

– Most common RAW but also WAR and WAW in

OOO execution

• Control hazards

– Branches and other flow of control disruptions

• Consequence: stalls in the pipeline Docsity.com

Example of structural hazard

• For single issue machine: common data and

instruction memory (unified cache)

– Pipeline stall every load-store instruction (control

easy to implement)

• Better solutions

– Separate I-cache and D-cache

– Instruction buffers

– Both + sophisticated instruction fetch unit!

• Will see more cases in multiple issue machines

Data hazards

• Data dependencies between instructions that

are in the pipe at the same time.

• For single pipeline in order issue: Read After

Write hazard (RAW)

Add R1, R2, R3 #R1 is result register

Sub R4, R1,R2 #conflict with R

Add R3, R5, R1 #conflict with R

Or R6,R1,R2 #conflict with R

Add R5, R2, R1 #R1 OK now (5 stage pipe)

Forwarding

• Result of ALU operation is known at end of

EXE stage

• Forwarding between:

– EXE/MEM pipeline register to ALUinput for

instructions i and i+

– MEM/WB pipeline register to ALUinput for

instructions i and i+

– Forwarding through register file (write 1st half of

cycle, read 2nd half of cycle)

• Need of a “forwarding box” in the Control UnitDocsity.com

ADD R3,R5,R

OR R6,R1,R

IF ID EXE MEM WB