
















Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Material Type: Notes; Professor: Yalamanchili; Class: Adv Computer Architecure; Subject: Electrical & Computer Engr; University: Georgia Institute of Technology-Main Campus; Term: Fall 2003;
Typology: Study notes
1 / 24
This page cannot be seen from the preview
Don't miss anything!

















ECE 4100/6100: Yalamanchili
Fall 2003
Fall 2003^
-^ Analysis^ –^
Fall 2003^
-^ Hazard resolution^ –^
Fall 2003^
-^ Buffering state^ –^
-^ Software support •^ Issue restrictions to reduce overhead •^ Live with imprecise exceptions
Fall 2003^
-^ Reading: A.8 and first few pages of 3.2 •^ Goal: further increase issue rate to approach CPI of 1 •^ What do we need to do?^ –^
Enforce data dependencies – Prevent WAW and WAR hazards
Æ^ centralized control complete
pipeline state
Instruction Execution State and Control
Fall 2003^
-^ Allow bypassing in ID of independent (in terms of dataflow)instructions^ –^
Localize stalls
Æ^ stall only data dependent instructions
-^ Other hazards cause stalls • Break ID into issue and read operand (RO) steps –^ Permit independent instructions to bypass in RO –^ Check for structural hazards in issue stage • Enforce WAR during write back –^ Detect and enforce hazards as late as possible • High bandwidth to and from the register file • No forwarding (will solve later) • Retain^
name^
dependencies
and^
resulting
stalls
(will^
solve
later)
Fall 2003^
-^ Data structures keep global status that can be queried by the controllogic •^ Scoreboard implementation is as complex as one functional unit
-^ Store functional unit that will deliver contentsName
dst reg^
src^
src
Source Registers have value?
Function unit producing value
Fall 2003^
Write Back Execute RO Issue Instructions
WAR
-^ Function unit latencies: FPADD = 2 cycles, FPMULT = 5 cycles,FPDIV = 15 cycles, FPLOAD = 2 cycles, Integer = 1 cycle •^ Cannot read and write a register in the same cycle •^ All units except FPDIV are pipelined
Fall 2003^
DIV.D
F0, F2, F ADD.D F6, F0, F8S.D^
F6, 0(R1) SUB.D F8, F10, F14MUL.D F6, F10, F
WAR
WAW
DIV.D
F0, F2, F ADD.D S, F0, F8S.D^
S, 0(R1) SUB.D T, F10, F14MUL.D F6, F10, T
-^ Compiler-based renaming •^ Compiler analysis to provide analysis beyond codeblock •^ May extend capabilities beyond that of the compiler(# of reservation stations) •^ Note that many forms of storage used in register re-naming
Fall 2003^
-^ LD/SD buffers act as reservations stations for memory units •^ Instruction execution cannot start until all branches resolved^ –^
Register
value Reservation stations
Values
Fall 2003^
-^ Detection of RAW dependencies through memory
SD^
F6, 44(R4) LD^
F8, 32(R8)
-^ Loads must be checked with preceding stores (RAW) •^ Stores must be checked with preceding Loads and Stores(WAW and WAR) •^ A^
simple
scheme:
all^
effective
address
calculations
are
performed in program order^ –^ Buffers’ A field stores effective address^ –^ Can use forwarding directly to/from load/store buffers
RAW Dependency?
Fall 2003^
-^ Reading: 3.6 •^ Increase the issue rate! •^ Now issue multiple instructions/cycle^ –^
Issue restrictions simplify control
-^ Increase in^ –^
Forwarding logic complexity – Importance of branch prediction mechanisms – Hardware for concurrent decoding and execution
Multiple Issue Superscalar
VLIW/EPIC
Statically scheduled
Dynamically scheduled
Statically scheduled
Fall 2003^
-^ Issue packet •^ Issue restrictions^ –^
Motivation^ –^ Match the hardware^ –^ Trade-off complexity vs. performance – Enforcement – Impact on penalties
-^ Multiple issue^ –^
Checking within and across packets – Pipelining the issue logic
Instruction
i^
Instruction
i+
Instruction
i+^
Instruction
i+
Issue and fetch multiple instructions
per clock cycle
Fall 2003^
-^ Widen the issue logic •^ Boost instruction issue (remain in-order!) by using reservationstations to move dependence handling to run-time •^ Match
between
available
functional
units,
distribution
of
dependencies,
and
amount
of^
real^
work^
determines
achievable performance • Examples