Single-Cycle and Multicycle Data Paths in Computer Architecture - Prof. B. Parhami, Study notes of Electrical and Electronics Engineering

The design and implementation of single-cycle and multicycle data paths in computer architecture, including the control signals, instruction execution steps, and performance estimation. It also discusses pipelining concepts, pipeline stalls or bubbles, and pipelined data path design.

Typology: Study notes

Pre 2010

Uploaded on 09/17/2009

koofers-user-04m
koofers-user-04m 🇺🇸

10 documents

1 / 80

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Feb. 2007 Computer Architecture, Data Path and Control Slide 1
Part IV
Data Path and Control
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27
pf28
pf29
pf2a
pf2b
pf2c
pf2d
pf2e
pf2f
pf30
pf31
pf32
pf33
pf34
pf35
pf36
pf37
pf38
pf39
pf3a
pf3b
pf3c
pf3d
pf3e
pf3f
pf40
pf41
pf42
pf43
pf44
pf45
pf46
pf47
pf48
pf49
pf4a
pf4b
pf4c
pf4d
pf4e
pf4f
pf50

Partial preview of the text

Download Single-Cycle and Multicycle Data Paths in Computer Architecture - Prof. B. Parhami and more Study notes Electrical and Electronics Engineering in PDF only on Docsity!

Feb. 2007

Computer Architecture, Data Path and Control

Slide 1

Part IV

Data Path and Control

Feb. 2007

Computer Architecture, Data Path and Control

Slide 2

About This Presentation

This presentation is intended to support the use of the textbook Computer Architecture: From Microprocessors to Supercomputers

Oxford University Press, 2005, ISBN 0-19-515455-X. It is updatedregularly by the author as part of his teaching of the upper-divisioncourse ECE 154, Introduction to Computer Architecture, at theUniversity of California, Santa Barbara. Instructors can use theseslides freely in classroom teaching and for other educationalpurposes. Any other use is strictly prohibited. © Behrooz Parhami

Edition
Released
Revised
Revised
Revised
Revised
First
July 2003
July 2004
July 2005
Mar. 2006
Feb. 2007

Feb. 2007

Computer Architecture, Data Path and Control

Slide 4

IV Data Path and Control

Topics in This Part Chapter 13

Instruction Execution Steps

Chapter 14

Control Unit Synthesis

Chapter 15

Pipelined Data Paths

Chapter 16

Pipeline Performance Limits

Design a simple computer (MicroMIPS) to learn about:

• Data path – part of the CPU where data signals flow • Control unit – guides data signals through data path • Pipelining – a way of achieving greater performance

Feb. 2007

Computer Architecture, Data Path and Control

Slide 5

13 Instruction Execution Steps

A simple computer executes instructions one at a time

  • Fetches an instruction from the loc pointed to by PC • Interprets and executes the instruction, then repeats

Topics in This Chapter 13.

A Small Set of Instructions

The Instruction Execution Unit

A Single-Cycle Data Path

Branching and Jumping

Deriving the Control Signals

Performance of the Single-Cycle Design

Feb. 2007

Computer Architecture, Data Path and Control

Slide 7

The MicroMIPS

Instruction Set

Instruction

Usage

Load upper immediate

lui

rt,imm

Add

add

rd,rs,rt

Subtract

sub

rd,rs,rt

Set less than

slt

rd,rs,rt

Add immediate

addi

rt,rs,imm

Set less than immediate

slti

rd,rs,imm

AND

and

rd,rs,rt

OR

or

rd,rs,rt

XOR

xor

rd,rs,rt

NOR

nor

rd,rs,rt

AND immediate

andi

rt,rs,imm

OR immediate

ori

rt,rs,imm

XOR immediate

xori

rt,rs,imm

Load word

lw

rt,imm(rs)

Store word

sw

rt,imm(rs)

Jump

j

L

Jump register

jr

rs

Branch less than 0

bltz

rs,L

Branch equal

beq

rs,rt,L

Branch not equal

bne

rs,rt,L

Jump and link

jal

L

System call

syscall

Copy

Control transfer

Logic

Arithmetic

Memory access

op

fn 323442 36373839

Table 13.

Feb. 2007

Computer Architecture, Data Path and Control

Slide 8

13.2 The Instruction Execution Unit

Fig. 13.
Abstract view of the instruction execution unit for MicroMIPS.
For naming of instruction fields, see Fig. 13.1.

ALU

Data cache

Instr cache

Next addr

Control

Reg

file

op

jta

fn

inst

imm

rs,rt,rd

(rs) (rt)

Address

Data

PC

5 bits

5 bits

31

25

20

15

0

Opcode

Source 1 or base

Source 2or dest’n

op

rs

rt

R

6 bits

5 bits

rd

5 bits

sh

6 bits

10

5

fn

jta

Jump target address, 26 bits

imm

Operand / Offset, 16 bits

Destination

Unused

Opcode ext

I J

inst

Instruction, 32 bits

bltz,jr
beq,bne
12 A/L, lui,lw,sw
j,jal
syscall
22 instructions

Feb. 2007

Computer Architecture, Data Path and Control

Slide 10

An ALU forMicroMIPS

Fig. 10.
A multifunction ALU with 8 control signals (2 for function class,
1 arithmetic, 3 shift, 2 logic) specifying the operation.

Add

Sub

x

±

y

x y

Adder

c

c 32

0

k /

Shifter

Logic

unit

s

Logic function

Amount

5

2

ConstantamountVariableamount

5 5

Const

Var

0 1

0 1 2 3 Function

class

2

Shift function

5 LSBs

Shifted

y

32

32 32

2 c

^31

32- inputNOR

Ovfl

Zero

32

32

MSB

ALU

x y

s

Shorthand

symbolfor ALU

Ovfl

Zero

Func Control

0 or 1

AND 00

OR 01

XOR 10NOR 11

00 Shift01 Set less10 Arithmetic11 Logic

00 No shift01 Logical left10 Logical right11 Arith right

lui

imm

Feb. 2007

Computer Architecture, Data Path and Control

Slide 11

13.4 Branching and Jumping

Fig. 13.

Next-address logic for MicroMIPS (see top part of Fig. 13.3).

Adder

jta

imm

(rt)^ (rs)

SE

SysCallAddr

PCSrc

(PC)

Branch

condition checker

in

c 1

0 1 2 3

/ 30

/

32

BrTrue

/

32

/

30

/

30

/

30

/

30

/

30

/

30

/

26

/

30

/

30

4

MSBs

30

MSBs

BrType

IncrPC

NextPC

/

30

31:

16

(PC)

31:

Default option
(PC)

31:

imm
When instruction is branch and condition is met
(PC)

31:

jta
When instruction is
j
or
jal
rs

31:

When the instruction is
jr
SysCallAddr
Start address of an operating system routine
Updateoptions
for PC
Lowest 2 bits ofPC always 00

4 MSBs

Feb. 2007

Computer Architecture, Data Path and Control

Slide 13

Control

Signal

Settings

Table 13.

Load upper immediateAddSubtractSet less thanAdd immediateSet less than immediateANDORXORNORAND immediateOR immediateXOR immediateLoad wordStore wordJumpJump registerBranch on less than 0Branch on equalBranch on not equalJump and linkSystem call

001111 000000 100000000000 100010000000 101010 001000001010 000000 100100000000 100101000000 100110000000 100111 001100001101001110100011101011000010 000000 001000 000001000100000101000011 000000 001100

1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 1 0

op

fn

0001010100000101010100000000 10

0101010101010101010101010100 10

1 0 0 0 1 1 0 0 0 0 1 1 1 1 1

0 1 1 0 1 0 0

00011011000110

001010011001111111111111111010

0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0

000000000000000000000000000000 11011000

00000000000000000000000000000001100000000111

Instruction

RegWrite

RegDst

RegInS rc

ALUSrc

Add’Sub

LogicFn

FnClass

DataRead

DataW rite

BrType

PCSrc

Feb. 2007

Computer Architecture, Data Path and Control

Slide 14

Control Signals in the Single-Cycle Data Path

Fig. 13.

Key elements of the single-cycle MicroMIPS data path.

/

ALU

Data
cache
Instr
cache

Next addr

Reg
file

op

jta

fn

inst

imm

rs

(rs) (rt)

Dataaddr Data

in

0 1

ALUSrc

ALUFunc

DataWrite

DataRead

SE

RegInSrc

rt

rd

RegDst

RegWrite

32

/

16

Register input

Data

out

Func

ALUOvfl Ovfl

31

012

Incr PC Next PC

(PC)

Br&Jump

ALU

out

PC

012

Add
Sub LogicFn FnClass
PCSrc
BrType
lui
x xx 00
slt
1 xx 01

Feb. 2007

Computer Architecture, Data Path and Control

Slide 16

Control Signal Generation

Auxiliary signals identifying instruction classes arithInst = addInst

subInst
sltInst
addiInst
sltiInst
logicInst = andInst
orInst
xorInst
norInst
andiInst
oriInst
xoriInst
immInst = luiInst
addiInst
sltiInst
andiInst
oriInst
xoriInst

Example logic expressions for control signals RegWrite = luiInst

arithInst

logicInst

lwInst

jalInst

ALUSrc = immInst

lwInst

swInst

Add

Sub = subInst

sltInst

sltiInst

DataRead = lwInst PCSrc

0

= jInst

jalInst

syscallInst

Control

addInst

subInst

jInst

sltInst

.. . ..

.

Feb. 2007

Computer Architecture, Data Path and Control

Slide 17

Putting It All Together

/

ALU

Data cache

Instr cache Next addr

Reg

file

op

jta

fn

inst

imm

rs

(rs) (rt)

Dataaddr Data

in

0 1 ALUSrc

ALUFunc

DataWrite

DataRead

SE

RegInSrc

rt

rd

RegDst

RegWrite

32

/ 16

Register input

Data

out

Func

ALUOvflOvfl

31

012

Incr PC Next PC

(PC)

Br&Jump

ALU

out

PC

012

Fig. 13.
Control

addInst

subInst

jInst

sltInst

.. . ..

.

Fig. 10.

Add

Sub

x

±

y

x y

Adder c

c 32

^0

k /

Shifter Logic

unit

s

Logic function

Amount

5

2

Cons tantam ountVariableam ount

5 5 Const

′ Var

0 1

0 1 2 3 Function

class

2

Shift function

5 LSBs

Shifted

y

32

32 32

2 c

^31

32- inputNOR

Ovfl

Zero

32

32

MSB

A

x y

Shorth

sym

b

for A

L

Zero

Fu

n

Con

t

0 or 1

AND 00

OR 01 XOR 10NOR 11

00 Shif t01 Set less10 Arithmetic11 Logic

00 No shif t01 Logical lef t10 Logical right11 Arith right

imm

lui

Adder

jta

imm

(rt) (rs)

SE

SysCallAddr

PCSrc

(PC)

Branch conditionchecker

in c 1

0 1 2 3

/ 30

/ 32

BrTrue

/ 32

/ 30

/ 30

/ 30

/ 30

/ 30

/ 30

/ 26

/ 30

/ 30

4

MSBs

30 MSBs

BrType

IncrPC NextPC

/ 30

31:

16

Fig. 13.

4 MSBs

Feb. 2007

Computer Architecture, Data Path and Control

Slide 19

Performance Estimation for Single-Cycle MicroMIPS

Fig. 13.
The MicroMIPS data path unfolded (by depicting the register write
Instruction access step as a separate block) so as to better visualize the critical-path latencies.
2 ns
Register read
1 ns
ALU operation
2 ns
Data cache access
2 ns
Register write
1 ns
Total
8 ns
Single-cycle clock = 125 MHz

PC PC PC PC PC

ALU-type Load Store Branch Jump

Not used Not used

Not used

Not used

Not used

Not used Not used Not used

Not used

(and

jr

)

(except jr

&

jal

)

R-type
6 ns
Load
8 ns
Store
7 ns
Branch
5 ns
Jump
3 ns
Weighted mean
6.36 ns

Feb. 2007

Computer Architecture, Data Path and Control

Slide 20

How Good is Our Single-Cycle Design?

Instruction access
2 ns
Register read
1 ns
ALU operation
2 ns
Data cache access
2 ns
Register write
1 ns
Total
8 ns
Single-cycle clock = 125 MHz

Clock rate of 125 MHz not impressive How does this compare with current processors on the market? Not bad, where latency is concerned A 2.5 GHz processor with 20 or so pipeline stages has a latency of about

0.4 ns/cycle

×

20 cycles = 8 ns

Throughput, however, is much better for the pipelined processor:

Up to 20 times better with single issue Perhaps up to 100 times better with multiple issue