Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Explicitly Parallel Instruction Computing - Computer Design and Organisation - Lecture Slides, Slides of Computer Science

Aligarh Muslim University Computer Science

These are the Lecture Slides of Computer Design and Organisation which includes Brick and Mortar, Silicon Manufacturing, Cost of Production, Mortar Chips, Fixed Set of Functions, Standard Interface, Benefits of Brick and Mortar, Chip Design etc. Key important points are: Explicitly Parallel Instruction Computing, Sequence of Steps, Possibilities of Optimization, Static Scheduling Techniques, Partial Predication, Predication Benefits, Itanium Overview

Typology: Slides

2012/2013

Uploaded on 03/22/2013

dhirendra 🇮🇳

4.3

(78)

268 documents

1 / 23

This page cannot be seen from the preview

Don't miss anything!

The EPIC-VLIW Approach

•Explicitly Parallel Instruction Computing (EPIC) is a

“philosophy”

•Very Long Instruction Word (VLIW) is an

implementation of EPIC

•Concept derives from horizontal microprogramming,

namely:

–A sequence of steps (microoperation) that interprets the

ISA

–If only one microop per cycle: vertical microprogramming

–If (at the extreme all) several units (say, incr PC, add, f-p,

in the same cycle: horizontal microprogramming

Docsity.com

Discover Slides of Computer Science Aligarh Muslim University

Partial preview of the text

Download Explicitly Parallel Instruction Computing - Computer Design and Organisation - Lecture Slides and more Slides Computer Science in PDF only on Docsity!

The EPIC-VLIW Approach

• Explicitly Parallel Instruction Computing (EPIC) is a

“philosophy”

• Very Long Instruction Word (VLIW) is an

implementation of EPIC

• Concept derives from horizontal microprogramming,

namely:

– A sequence of steps (microoperation) that interprets the

ISA

– If only one microop per cycle: vertical microprogramming

– If (at the extreme all) several units (say, incr PC, add, f-p,

register file read, register file write etc…) can be activated

in the same cycle: horizontal microprogramming

The EPIC “philosophy”

Compiler generates packets, or bundles, of

instructions that can execute together

Instructions executed in order (static scheduling) and

assumed to have a fixed latency

Architecture should provide features that assists

the compiler in exploiting ILP

Branch prediction, load speculation (see later), and

associated recoveries

Difficulties occur with unpredictable latencies :
- Branch prediction → Use of predication in addition to

static and dynamic branch prediction

Pointer-based computations →Use cache hints,

speculative loads

Other Static Scheduling Techniques

• Eliminate branches via predication (next

slides)

• Loop unrolling

• Software pipelining (see in a few slides)

• Use of global scheduling

– Trace scheduling technique: focus on the critical

path

• Software prefetching

– We’ll talk about prefetching at length later

Predication Basic Idea

• Associate a Boolean condition (predicate) with

the issue, execution, or commit of an instruction

– The stage in which to test the predicate is an

implementation choice

• If the predicate is true, the result of the

instruction is kept

• If the predicate is false, the instruction is nullified

• Distinction between

– Partial predication: only a few opcodes can be

predicated

– Full predication: every instruction is predicated

Predication Costs

• Increased fetch utilization

• Increased register consumption

• If predication is tested at commit time, increased

functional-unit utilization

• With code movement, increased complexity of

exception handling

– For example, insert extra instructions for exception

checking

• If every instruction is predicated, larger

instruction

– Impacts I-cache

Flavors of Predication Implementation

Has its roots in vector machines like CRAY-
- Creation of vector masks to control vector

operations on an element per element basis

Often (partial) predication limited to

conditional moves as, e.g., in the Alpha, MIPS

10000, IBM Power PC, SPARC and Intel P

microarchitecture

Full predication: Every instruction predicated

as in Intel Itanium (IA-64 ISA)

Other Forms of Partial Predication

• Select dest, src1, src2,cond

– Corresponds to C-like --- dest = ( (cond)? src1 :

src2)

– Note the destination register is always assigned a

value

– Use in the Multiflow (first commercial VLIW

machine)

• Nullify

– Any register-register instruction can nullify the

next instruction, thus making it conditional

Full Predication

• Define predicates with instructions of the form:

Pred_< cmp > Pout1< type > , Pout2 < type >,, src1, src2 (P in )

where

– Pout1 and Pout2 are assigned values according to the

comparison between src1 and src2 and the cmp

“opcode”

– The predicate types are most often U (unconditional)

and U its complement, and OR and OR

– The predicate define instruction can itself be

predicated with the value of P in

There are definite rules for that, e.g., if P in = 0, U and U are set to 0

independently of the result of the comparison and the OR predicates are not

modified.

IA-64 : Explicitly Parallel

Architecture

IA-64 template specifies
- The type of operation for each instruction, e.g.
  - MFI, MMI, MII, MLI, MIB, MMF, MFB, MMB, MBB, BBB
- Intra-bundle relationship, e.g.
  - M / MI or MI / I (/ is a “stop” meaning no parallelism )
- Inter-bundle relationship
Most common combinations covered by templates
- Headroom for additional templates
Simplifies hardware requirements
Scales compatibly to future generations

Instruction 2 41 bits

Instruction 1 41 bits

Instruction 0 41 bits

Template 5 bits

128 bits (bundle)

M=Memory F=Floating-point I=Integer L=Long Immediate B=Branch

Memory (M) Memory (M) Integer (I) (MMI)

Itanium Overview

Itanium implementation

• Can execute 2 bundles (6 instructions) per

cycle

• 10 stage pipeline

• 4 integer units (2 of them can handle load-

store), 2 f-p units and 3 branch units

• Issue in order, execute in order but can

complete out of order. Uses a (restricted)

register scoreboard technique to resolve

dependencies. Docsity.com

Itanium implementation

• Predication reduces number of branches and

number of mispredicts,

• Nonetheless: sophisticated branch predictor

– Two level branch predictor of the SAs variety

– Some provision for multiway branches

Several basic blocks can terminate in the same bundle

– 4 registers for highly predictable target addresses

(end of loops) hence no bubble on taken branch

– Return address stack

– Hints from the compiler

Traditional Register Models

Procedure A calls procedure B
Procedures must share space in register
Performance penalty due to register save / restore

Traditional Register Models Traditional Register Stacks

B^ A

Register Memory

A A

B

C

D

A

B

C

D

Eliminate the need for save / restore by reserving fixed blocks in register
However, fixed blocks waste resources

Procedure Procedures Register

I think that the “traditional register stack” model they refer to is the “register windows” model used in Sparc

Explicitly Parallel Instruction Computing - Computer Design and Organisation - Lecture Slides, Slides of Computer Science

Related documents

Partial preview of the text

Download Explicitly Parallel Instruction Computing - Computer Design and Organisation - Lecture Slides and more Slides Computer Science in PDF only on Docsity!

The EPIC-VLIW Approach

• Explicitly Parallel Instruction Computing (EPIC) is a

“philosophy”

• Very Long Instruction Word (VLIW) is an

implementation of EPIC

• Concept derives from horizontal microprogramming,

namely:

– A sequence of steps (microoperation) that interprets the

ISA

– If only one microop per cycle: vertical microprogramming

– If (at the extreme all) several units (say, incr PC, add, f-p,

register file read, register file write etc…) can be activated

in the same cycle: horizontal microprogramming

The EPIC “philosophy”

instructions that can execute together

assumed to have a fixed latency

the compiler in exploiting ILP

associated recoveries

static and dynamic branch prediction

speculative loads

Other Static Scheduling Techniques

• Eliminate branches via predication (next

slides)

• Loop unrolling

• Software pipelining (see in a few slides)

• Use of global scheduling

– Trace scheduling technique: focus on the critical

path

• Software prefetching

– We’ll talk about prefetching at length later

Predication Basic Idea

• Associate a Boolean condition (predicate) with

the issue, execution, or commit of an instruction

– The stage in which to test the predicate is an

implementation choice

• If the predicate is true, the result of the

instruction is kept

• If the predicate is false, the instruction is nullified

• Distinction between

– Partial predication: only a few opcodes can be

predicated

– Full predication: every instruction is predicated

Predication Costs

• Increased fetch utilization

• Increased register consumption

• If predication is tested at commit time, increased

functional-unit utilization

• With code movement, increased complexity of

exception handling

– For example, insert extra instructions for exception

checking

• If every instruction is predicated, larger

instruction

– Impacts I-cache

Other Forms of Partial Predication

• Select dest, src1, src2,cond

– Corresponds to C-like --- dest = ( (cond)? src1 :

src2)

– Note the destination register is always assigned a

value

– Use in the Multiflow (first commercial VLIW

machine)

• Nullify

– Any register-register instruction can nullify the

next instruction, thus making it conditional

Full Predication

• Define predicates with instructions of the form:

Pred_< cmp > Pout1< type > , Pout2 < type >,, src1, src2 (P in )

where

– Pout1 and Pout2 are assigned values according to the

comparison between src1 and src2 and the cmp

“opcode”

– The predicate types are most often U (unconditional)

and U its complement, and OR and OR

– The predicate define instruction can itself be

predicated with the value of P in