Kolo Instruction Processors  -  Computer Systems Architecture - Lecture Slides, Slides for Computer Architecture and Organization. Alagappa University
jutt
jutt

Kolo Instruction Processors - Computer Systems Architecture - Lecture Slides, Slides for Computer Architecture and Organization. Alagappa University

PDF (447 KB)
27 pages
1000+Number of visits
Description
Some concept of Computer Systems Architecture are Acyclic Graph, Advanced Micro Devices, Basic Grid Architecture, Control Flow Prediction, Desktop Processor Architecture, Message-Driven Processor. Main points of this lec...
20 points
Download points needed to download
this document
Download the document
Preview3 pages / 27
This is only a preview
3 shown on 27 pages
Download the document
This is only a preview
3 shown on 27 pages
Download the document
This is only a preview
3 shown on 27 pages
Download the document
This is only a preview
3 shown on 27 pages
Download the document

KILO-INSTRUCTION PROCESSORS

Docsity.com

Introduction

Docsity.com

Memory Wall

• Performance improvements of high-frequency micro-processors is seriously limited by main memory access latencies

60%/yr.

RAM 7%/yr.

1

10

100

1000 19

80

19 81

19 83

19

84

19 85

19

86

19 87

19

88

19 89

19

90

19 91

19

92

19 93

19

94

19 95

19

96

19 97

19

98

19 99

20

00

RAM

CPU

19 82

Processor-Memory Performance Gap: (grows 50% / year)

Pe rf

or m

an ce

Time

“Moore’s Law”

Docsity.com

Reducing Memory Latency

Docsity.com

Cache memory hierarchies

• Cache memory hierarchies – First level (L1) cache built into the

processor core • Takes 1-3 processor clock cycles

to access

– If there is a miss in the L1 cache  on-chip L2 cache accessed in the order of 10 processor cycles

– Accessing main memory takes at

least in the order of 100 processor cycles

• Prefetching data from memory to the cache

– Prefetch addresses hard to predict

Queue Schedule Schedule Schedule Dispatch Dispatch Reg. Read Reg. Read Execute

Flags Br. chk Drive

Drive Alloc.

Rename Rename

Next IP Next IP Fetch Fetch L1

Instr.

L1 Data

L2

M em

ory

Docsity.com

Out-of-order superscalar processors

Docsity.com

Sequence of instructions containing data cashe misses

Docsity.com

Kilo-Instruction Processors

Docsity.com

Definition

• An out-of-order superscalar processor that supports thousands of “in-flight instructions”

• Intelligent use of resources

Docsity.com

Docsity.com

Docsity.com

Scalability

• Thousands of In-flight Instructions and In- Order Commit make designs impractical: – ROB : Needs to maintain a copy of every in-flight

instruction – IQs : Instructions depending on long latency

instructions remain in these queues for a long time

– LSQs : Instructions remain in the queue until commit

– Registers : A new physical register for each instruction producing a new value

Docsity.com

Efficient Kilo-Instruction Processor Design

– Multi-Checkpointing the ROB • Out-of-Order Commit

– Early Release of Resources • Ephemeral Registers • Load Queues

Docsity.com

Checkpointing

Docsity.com

Checkpointing

• ROB allows of the restoration of the correct state at any instruction (not necessary)

• Checkpoint  a snapshot of the processor

state taken at a specific instruction of the program being executed (checkpoint processor state for a subset of instructions)

• With this snapshot the processor can restore

state to that point in case of an exception or

Docsity.com

Design Decisions • How many in-flight checkpoints should be

maintained by the processor? – large number of checkpoints reduce the penalty of

the recovery process – large number of checkpoints increase the

implementation cost

• What kind of instructions should be checkpointed? – take a checkpoint at any instruction – some instructions are better candidates (ex:some

current processors take checkpoints at branch instructions in order to minimize the branch

Docsity.com

Multicheckpointing

Docsity.com

Selective Checkpointing

• Replace ROB  Pseudo-ROB • Processor removes instructions that reach the

pseudo-ROB’s head at fixed rate

• Processor state is recovarable for any instruction in the pseudo-ROB

• Checkpoint taken when incomplete

instruction leaves the pseudo-ROB Docsity.com

Instruction Queue Management

Docsity.com

Bi-level Issue Queue

• Processor detects instructions that will hold an issue queue for a long time

• Removes this instructions from primary issue queue

• Offloads them to slow-lane instruction queue  larger, slower, less complex

• Same principle applied to load- store queue

Docsity.com

Physical Register File

Docsity.com

Ephemeral Registers

• A conventional superscalar processor assigns registers to architected registers when an instruction enters the issue queue

• An instruction reserves a physical register for its entire flight time

• A physical register not written a value until much later  primary function is tracking data dependencies

• Use virtual registers  late register allocation

Docsity.com

Performance Evaluation

Docsity.com

Docsity.com

Docsity.com

no comments were posted
This is only a preview
3 shown on 27 pages
Download the document