The Single vs Multi Cycle Implementation Design | CS 0447, Study notes of Computer Science

Material Type: Notes; Professor: Childers; Class: COMPUTR ORGZTN & ASSMBLY LANG; Subject: Computer Science; University: University of Pittsburgh; Term: Unknown 1989;

Typology: Study notes

Pre 2010

Uploaded on 09/02/2009

koofers-user-d2b
koofers-user-d2b 🇺🇸

10 documents

1 / 9

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
69
Single vs. Multi-cycle Implementation
Single cycle design is simple
But it’s inefficient
Why?
All instructions have same clock cycle length - they
all take the same amount of time regardless of what
they actually do
Clock cycle determined by longest path
Load: uses IM, RF, ALU, DM, RF in sequence
But others may be shorter
R-type (arithmetic): use IM, RF, ALU, RF
70
Single vs. Multi-cycle Implementation
For this simple version, the multi-cycle
implementation could be as much as 1.27 times
faster (for a typical instruction mix)
Suppose we had floating point operations
Floating point has very high latency
E.g., floating-point multiply may be 16 ns vs
integer add may be 2 ns
So, clock cycle constrained by 16 ns of FP
Suppose a program doesn’t do ANY floating point?
Performance penalty is too big to tolerate
pf3
pf4
pf5
pf8
pf9

Partial preview of the text

Download The Single vs Multi Cycle Implementation Design | CS 0447 and more Study notes Computer Science in PDF only on Docsity!

Single vs. Multi-cycle Implementation

  • Single cycle design is simple
  • But it’s inefficient
  • Why?
  • All instructions have same clock cycle length - they all take the same amount of time regardless of what they actually do
  • Clock cycle determined by longest path
    • Load: uses IM, RF, ALU, DM, RF in sequence
  • But others may be shorter
    • R-type (arithmetic): use IM, RF, ALU, RF

Single vs. Multi-cycle Implementation

  • For this simple version, the multi-cycle implementation could be as much as 1.27 times faster (for a typical instruction mix)
  • Suppose we had floating point operations
    • Floating point has very high latency
    • E.g., floating-point multiply may be 16 ns vs integer add may be 2 ns
    • So, clock cycle constrained by 16 ns of FP
  • Suppose a program doesn’t do ANY floating point?
    • Performance penalty is too big to tolerate

Multi-cycle Implementation

  • Single memory unit (I and D), single ALU
  • Several temporary registers (IR, MDR, A, B, ALUOut)
  • Temporaries hold output value of element so the output value can be used on subsequent cycle
  • Values needed by subsequent instruction stored in programmer visible state (memory, RF)

A single ALU

  • Single ALU must accomodate all inputs that used to go to three different ALUs in the single cycle **implementation
  1. Multiplexor on first input to ALU to select A register** **(from RF) or the PC
  2. Multiplexor on second input to ALU to select from the** constant 4 (PC increment), sign-extended value, shifted offset field, and RF input
  • Trade-off: Additional multiplexors (and time) but only a single ALU since it can be shared across cycles

Multi-cycle Steps - Instruction Decode

  • Don’t yet know what instruction is
    • Decode the instruction concurrently with RF read
    • Optimistically read registers
    • Optimistically compute branch target
    • We’ll select the right answer on next cycle
  • Decode and Register File Read A = Reg[IR[25-21]]; B = Reg[IR[20-16]]; ALUOut = PC + (sign-extend(IR[15-0]) << 2);

Multi-cycle Steps - Execution

  • Operation varies based on instruction decode
  • Memory reference: ALUOut = A + sign-extend(IR[15-0]);
  • Arithmetic-logical instruction: ALUOut = A op B;
  • Branch: if (A == B) PC = ALUOut;
  • Jump: PC = PC[31-28] || IR[25-0] << 2)

Multi-cycle Steps - Memory / Completion

  • Load/store accesses memory or arithmetic writes result to the register file
  • Memory reference: MDR = Memory[ALUOut]; (load) or Memory[ALUOut] = B; (store)
  • Arithmetic-logical instruction: Reg[IR[15-11]] = ALUOut;

Multi-cycle Steps - Read completion

  • Finish a memory read by writing read value into the register file
  • Load operation: Reg[IR[20-16]] = MDR;

Multi-cycle Control

  • How are the control signals set in each state?
  • What are the transitions between states? (i.e., what state is next?)
  • Control signals
    • IorD, MemRead, MemWrite, IRWrite, RegDst
    • MemtoReg, RegWrite, ALUSrcA
    • ALUSrcB, ALUOp
    • PCWrite
  • Transitions from Decode based on Opcode
  • Transitions from Eff. Addr. happen on load/store

Multi-cycle Control

  • What are the control signals in each state for instrs:
    • Arithmetic
    • Load
    • Store
    • Branch
    • Jump

Control for each instruction type?

STATE CONTROL FETCH DECODE STATE 3 STATE 4 STATE 5 IorD MemRead MemWrite IRWrite RegDst MemToReg RegWrite ALUSrcA ALUSrcB ALUOp PCWrite

Control for addition (arithmetic)

STATE CONTROL FETCH DECODE EXE ALU WB ALU STATE 5 IorD 0 0 0 0 MemRead 1 0 0 0 MemWrite 0 0 0 0 IRWrite 1 0 0 0 RegDst X X X 1 MemToReg X X X 0 RegWrite 0 0 0 1 ALUSrcA 0 0 1 X ALUSrcB 01 11 00 X ALUOp 00 00 10 X PCWrite 1 0 0 0