Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Pipelining in CDA 4150: Advantages, Disadvantages, and Challenges, Study notes of Computer Architecture and Organization

University of Central Florida (UCF)Computer Architecture and Organization

Pipelining in cda 4150, a microprocessor, and its advantages such as shorter cycle time and functional units being reused. However, it also has disadvantages like more timing paths to analyze and additional registers required. The concept of pipeline stages, execution time for each instruction, and the potential throughput increase. It also covers the major hurdles of pipelining, including data, control, and structural hazards.

Typology: Study notes

Pre 2010

Uploaded on 11/08/2009

koofers-user-25e 🇺🇸

10 documents

1 / 14

This page cannot be seen from the preview

Don't miss anything!

CDA 4150 – Pipelining

           

Single-Cycle implementation has poor performance

• Cycle time longer than necessary for all but slowest instruction

Solution: break the instruction into smaller steps

• Execute each step in one clock cycle

• Cycle time: time it takes to execute the longest step

• Design all the steps to have similar length

Advantages of the multiple cycle processor

• Cycle time is much shorter

• Functional units can be used > once/instruction (less HW)

Disadvantages of the multiple cycle processor

• More timing paths to analyze and tune

• Additional registers to store intermediate data values

CDA 4150 – Pipelining

            

regs

alu

16 32

smd

lmd

Instruction Fetch

(IF) Decode

(RD) Execute

(EX) Memory

(MEM) Writeback

(WB)

Discover Study notes of Computer Architecture and Organization University of Central Florida (UCF)

Partial preview of the text

Download Pipelining in CDA 4150: Advantages, Disadvantages, and Challenges and more Study notes Computer Architecture and Organization in PDF only on Docsity!

CDA 4150 – Pipelining

Single-Cycle implementation has poor performance

Cycle time longer than necessary for all but slowest instruction

Solution: break the instruction into smaller steps

Execute each step in one clock cycle
Cycle time: time it takes to execute the longest step
Design all the steps to have similar length

Advantages of the multiple cycle processor

Cycle time is much shorter
Functional units can be used > once/instruction (less HW)

Disadvantages of the multiple cycle processor

More timing paths to analyze and tune
Additional registers to store intermediate data values

CDA 4150 – Pipelining

pc I$ D$

regs

ex et nd

alu

16 32

smd

lmd

Instruction Fetch

(IF)

Decode

(RD) Execute(EX) Memory(MEM)

Writeback

(WB)

CDA 4150 – Pipelining

Time taken for 1 instruction

• Add up the execution times of each phase

• Each phase may take different amounts of time

• One instruction executes at a time

Example

• Pick some execution times out of the air

• tfetch=60ns, tdecode=30ns, texec=50ns, tmem=80ns,

twb=20ns

• Total execution time per instruction = 240ns

CDA 4150 – Pipelining

We can execute multiple instructions at the same time!

Each instruction will be in a different phase of execution

Throughput will increase by the number of pipeline stages

Overlap different steps for consecutive instructions

Steps are called pipeline stages
Need latches after each stage to hold control/data for later stages

A new instruction enters the pipeline at IF on each clock

Takes 5 clocks to complete execution and leave the pipeline
Potential throughput of 1 CPI

CDA 4150 – Pipelining

pc I$ D$

regs

ex te nd

alu

16 32

smd

lmd

Instruction Fetch

(IF)

Decode

(RD) Execute(EX) Memory(MEM) Writeback(WB)

CDA 4150 – Pipelining

The major hurdle of pipelining

• Situations where next instruction cannot execute

• Reduce the performance of pipelining

Speedup = Pipeline depth/(1 + pipeline stalls/inst)

Want incredibly long pipelines, with no pipeline

stalls

Good luck!

Long pipes increase likelihood of hazards

• Let’s look at pipeline resources used by instruction

class

CDA 4150 – Pipelining

Pipe stage ALU Memory Branch IF Fetch-PCInst Cache Fetch-PCInst Cache Fetch-PCInst Cache

RD Register Read Register Read Register Read EX ALU ALU (address) ALU (dest addr) Compare logicFetch-PC (taken)

MEM N/A Cache Tags N/A Cache Data WB PC PC PC Register Write Register Write (Load)

CDA 4150 – Pipelining

Three classes of hazards

Data hazards
- One instruction has a source operand that is the result of a previous instruction in the pipeline (Read-After Write: RAW)
- There are other types of data hazards (later)
Control hazards
- The execution of an instruction depends on the resolution of a previous branch instruction in the pipeline
- Becomes a big problem with deep pipelines
Structural hazards
- Two or more Instructions in the pipeline require the same hardware resource to progress
- Most common instance is non-pipelined FU (multiplier)

CDA 4150 – Pipelining

pc I$ D$

regs

ex te nd

alu

16 32

smd

lmd

Instruction Fetch

(IF)

Decode

(RD) Execute(EX) Memory(MEM) Writeback(WB)

add r1,r2,r add r4,r1,r

CDA 4150 – Pipelining

pc I$ D$

regs

ex te nd

alu

16 32

smd

lmd

Instruction Fetch

(IF)

Decode

(RD) Execute(EX) Memory(MEM)

Writeback

(WB)

add r1,r2,r add r4,r1,r add r6,r4,r

CDA 4150 – Pipelining

pc I$ D$

regs

ex te nd

alu

16 32

smd

lmd

Instruction Fetch

(IF)

Decode

(RD) Execute(EX) Memory(MEM) Writeback(WB)

add r1,r2,r add r4,r1,r add r6,r4,r …

CDA 4150 – Pipelining

pc I$ D$

regs

ex te nd

alu

16 32

smd

lmd

Instruction Fetch

(IF)

Decode

(RD) Execute(EX) Memory(MEM)

Writeback

(WB)

add r1,r2,r add r4,r1,r add r6,r4,r …

CDA 4150 – Pipelining

pc I$ D$

regs

ex te nd

alu

16 32

smd

lmd

Instruction Fetch

(IF)

Decode

(RD) Execute(EX) Memory(MEM) Writeback(WB)

add r1,r2,r add r4,r1,r add r6,r4,r …

ALU bypass

Mem to ALU bypass

CDA 4150 – Pipelining

When a branch instruction is executed, execution of

subsequent instructions depends on whether the branch

is taken and the location of the destination

A simple, but effective approach is to assume the branch is

not taken and follow the sequential path

The branch is resolved at the end of EX

If taken, cancel instructions in the sequential path and startfetching from the destination on the next clock
- this results in a 2-clock delay for taken branches
If not taken, continue sequentially

Instruction I Clock Cycle =>

beq $t0,$t1,L1^1 IF^ RDIF^ EXRD^ MEMEX WBMEM WB I 3 IF RD -- -- L1: II^4 IF^ --^ -- 5 IF^ RD

CDA 4150 – Pipelining

Load Delay

• Explicit 1-instruction delay in MIPS ISA

If no instruction can be scheduled following the load, nop required - MIPS == “Microprocessor without Interlocked Pipeline Stages
But other implementations may have different load delays!

Branch Delay

• Explicit 1-instruction delay in MIPS, HP-PA, SPARC

For MIPS, if no instruction can be scheduled, NOP required
- Scheduled instruction must be safe to execute whether or not

branch is taken (assembler schedules)

For HP-PA/ SPARC the instruction following the branch is conditionally executed or squashed

CDA 4150 – Pipelining

pc I$ D$

regs

ex te nd

alu

16 32

smd

lmd

Instruction Fetch

(IF)

Decode

(RD) Execute(EX) Memory(MEM)

Writeback

(WB)

CDA 4150 – Pipelining

pc I$ D$

regs

ex te nd

16 32

smd

lmd

Instruction Fetch

(IF)

Decode

(RD) Execute(EX) Memory(MEM) Writeback(WB)

alu

bgez r1,offset nop

CDA 4150 – Pipelining

pc I$ D$

regs

ex te nd

16 32

smd

lmd

Instruction Fetch

(IF)

Decode

(RD) Execute(EX) Memory(MEM)

Writeback

(WB)

alu

bgez r1,offset nop …

CDA 4150 – Pipelining

Non-pipelined, multi-cycle functional units

• Integer multiply, divide

Can also have structural hazards on data cache

• Loads access tags/data in MEM

• Stores access tags in MEM, data in WB

• What if a load follows a store?

Structural hazards are detected in decode and stalled there

Only way to remove them is to add functional units

• Or pipeline them

• Or dual port them (caches)

CDA 4150 – Pipelining

Pipelining in CDA 4150: Advantages, Disadvantages, and Challenges, Study notes of Computer Architecture and Organization

Related documents

Partial preview of the text

Download Pipelining in CDA 4150: Advantages, Disadvantages, and Challenges and more Study notes Computer Architecture and Organization in PDF only on Docsity!

Single-Cycle implementation has poor performance

Solution: break the instruction into smaller steps

Advantages of the multiple cycle processor

Disadvantages of the multiple cycle processor

Instruction Fetch

(IF)

Decode

(RD) Execute(EX) Memory(MEM)

Writeback

(WB)

Time taken for 1 instruction

• Add up the execution times of each phase

• Each phase may take different amounts of time

• One instruction executes at a time

Example

• Pick some execution times out of the air

• tfetch=60ns, tdecode=30ns, texec=50ns, tmem=80ns,

twb=20ns

• Total execution time per instruction = 240ns

We can execute multiple instructions at the same time!

Each instruction will be in a different phase of execution

Throughput will increase by the number of pipeline stages

Overlap different steps for consecutive instructions

A new instruction enters the pipeline at IF on each clock

Instruction Fetch

(IF)

Decode

(RD) Execute(EX) Memory(MEM) Writeback(WB)

The major hurdle of pipelining

• Situations where next instruction cannot execute

• Reduce the performance of pipelining

Speedup = Pipeline depth/(1 + pipeline stalls/inst)

Want incredibly long pipelines, with no pipeline

stalls

Good luck!

Long pipes increase likelihood of hazards

• Let’s look at pipeline resources used by instruction

class

Three classes of hazards

Instruction Fetch

(IF)

Decode

(RD) Execute(EX) Memory(MEM) Writeback(WB)

Instruction Fetch

(IF)

Decode

(RD) Execute(EX) Memory(MEM)

Writeback

(WB)

Instruction Fetch

(IF)

Decode

(RD) Execute(EX) Memory(MEM) Writeback(WB)

Instruction Fetch

(IF)

Decode

(RD) Execute(EX) Memory(MEM)

Writeback

(WB)

Instruction Fetch

(IF)

Decode

(RD) Execute(EX) Memory(MEM) Writeback(WB)

When a branch instruction is executed, execution of

subsequent instructions depends on whether the branch

is taken and the location of the destination

A simple, but effective approach is to assume the branch is

not taken and follow the sequential path

The branch is resolved at the end of EX

Instruction I Clock Cycle =>

Load Delay

• Explicit 1-instruction delay in MIPS ISA

Branch Delay

• Explicit 1-instruction delay in MIPS, HP-PA, SPARC

branch is taken (assembler schedules)

Instruction Fetch