Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

EPIC Architecture: A Compiler-Controlled Processor Design - Prof. Scott Mahlke, Assignments of Electrical and Electronics Engineering

University of Michigan (UM) - Ann Arbor Electrical and Electronics Engineering

Prof. Scott Mahlke

An overview of epic architecture, a compiler-controlled processor design developed at the university of michigan. Epic architecture features a philosophy that allows the compiler to create a complete plan of run-time execution, including the time and resources required. Defining features such as multiop and exposed latency, as well as other architectural features like register structure, branch architecture, and data/control speculation. The goal is to create more efficient poes, expose the microarchitecture, and play the statistics.

Typology: Assignments

Pre 2010

Uploaded on 09/02/2009

koofers-user-o4a 🇺🇸

(1)

10 documents

1 / 26

This page cannot be seen from the preview

Don't miss anything!

EECS 583 – Lecture 3

EPIC Architectures

University of Michigan

January 14, 2002

Discover Assignments of Electrical and Electronics Engineering University of Michigan (UM) - Ann Arbor

Partial preview of the text

Download EPIC Architecture: A Compiler-Controlled Processor Design - Prof. Scott Mahlke and more Assignments Electrical and Electronics Engineering in PDF only on Docsity!

EECS 583 – Lecture 3 EPIC Architectures

University of Michigan January 14, 2002

EPIC philosphy Y

Compiler creates complete plan of run-time execution^ »

At what time and using what resource »^

POE communicated to hardware via the instruction set »^

Processor obediently follows POE »^

No dynamic scheduling, out of order execution (these secondguess the compilers plan)

Compiler allowed to play the statistics^ »

Many types of info only available at run-time (branchdirections, locations accessed via pointers) »^

Traditionally compilers behave conservatively

Æ

handle worst

case possibility »^

Allow the compiler to gamble when it believes the odds are inits favor^ y

Profiling

Expose microarchitecture to the compiler^ »

memory system, branch execution

Defining feature II - Exposed latency Y

Superscalar^ »

Sequence of atomic operations »^

Sequential order defines semantics (UAL) »^

Each conceptually finishes before the next one starts

EPIC – non-atomic operations^ »

Semantics determined by relative ordering of reads/writes

Assumed latency (NUAL if > 1)^ »

Contract between the compiler and hardware »^

Instruction issuance provides common notion of time

MultiOp1: r1 = r2 + r3 MultiOp2: r4 = r1 * r5 MultiOp3: r6 = r1 / r

Other architectural features of EPIC Y

Add features into the architecture to support EPICphilosphy^ »

Create more efficient POEs »^

Expose the microarchitecture »^

Play the statistics

Register structure

Branch architecture

Data/Control speculation

Memory hierarchy

Predicated execution (largest impact on the compiler)

Rotating registers Y^

Overlap loop iterations^ »

How do you prevent registeroverwrite in later iterations? »^

Compiler-controlled dynamicregister renaming

Rotating registers^ »

Each iteration writes to r »^

But this gets mapped to adifferent physical register »^

Block of consecutive regsallocated for each reg in loopcorresponding to number ofiterations it is needed

Op1 Op iteration nRRB = 7

Op1 Op iteration n + 1RRB = 6

II

actual reg = (reg + RRB) % NumRegsAt end of each iteration, RRB--

Branch architecture Y

Branch actions^ »

Branch condition computed »^

Target address formed »^

Instructions fetched from taken, fall-through or both paths »^

Branch itself executes »^

After the branch, target of the branch is decoded/executed

Superscalar processors use hardware to hide the latency ofall the actions^ »

Icache prefetching »^

Branch prediction – Guess outcome of branch »^

Dynamic scheduling – overlap other instructions with branch »^

Reorder buffer – Squash when wrong

Speculation Y

Allow the compiler to play the statistics^ »

Reordering operations to find enough parallelism »^

Branch outcome^ y

Control speculation

»^

Lack of memory dependence in pointer code^ y

Data speculation

»^

Profile or clever analysis provides “the statistics”

General plan of action^ »

Compiler reorders aggressively »^

Hardware support to catch times when its wrong »^

Execution repaired, continue^ y

Repair is expensive y^

So have to be right most of the time to or performance willsuffer

10 -

Control speculation Y^

Compile-time movement of operationsabove branches»

Guess the operation result is needed »^

If wrong, wasted execution

Potential problems»

Too much wasted execution^ y

Speculate likely operations

»^

Spurious exceptions^ y

Useless op causes problem y^

NAT/poison/exception bit y^

check NAT operations

»^

Rename or don’t do it

»^

Memory corrupted^ y

Don’t speculate stores

blt r1, r2, L1^ r6 = r7 + r8r9 = r6 << 3r4 = r9 + 7

taken^ r3 = load(r4)r5 = r3 + 1store (r4, r5)

fallthru

12 -

Management of the memory hierarchy Y^

Common problems^ »

Kick out good locality datawith bad locality data »^

Capacity/conflict misses

prefetch the data »^

Non-deterministic latency –What should be assumed?

Expose cache hierarchy tothe compiler^ »

No longer a black box »^

Placement made explicit »^

Assumed latency explicit

CPU

L1-D

L2 - U

Main Memory

L1 - I

13 -

Source/target cache specifiers Y

Source specifier – Compiler tells hardware where withinthe cache hierarchy the data is expected to be found^ »

Assumed latency

Target specifier – Compiler tells hardware the highest levelin which the data should be placed^ »

Reduce pollution of lower levels

Prefetching – Speculative load to some dummy register

Icache managed by PBRs

Traditional processors use C1/C

Source cache specifier –

where its coming from

Æ

latency

L_B_C3_C

S_H_C

Target cache specifier – where to place the data

15 -

Predicated execution example

a = b + c if (a > 0)

e = f + g else

e = f / g h = i - j

BB1 BB1 BB3 BB3 BB2 BB

add a, b, c bgt a, 0, L1 div e, f, g jump L2 L1: add e, f, g L2: sub h, i, j

BB

Traditional branching code

BB1 BB1 BB1 BB3 BB2 BB

add a, b, c if T p2 = a > 0 if T p3 = a <= 0 if T div e, f, g if p3 add e, f, g if p2 sub h, i, j if T

BB1 BB2 BB3 BB

Æ

BB

Æ

BB

Predicated code

16 -

What about nested if-then-else’s?

a = b + c if (a > 0)

if (a > 25)

e = f + g else

e = f * g

else

e = f / g h = i - j

BB1 BB1 BB3 BB3 BB2 BB6 BB6 BB5 BB

add a, b, c bgt a, 0, L1 div e, f, g jump L2 L1: bgt a, 25, L3 mpy e, f, g jump L2 L3: add e, f, g L2: sub h, i, j

BB

Traditional branching code

BB

18 -

Benefits/Costs of predicated execution Y

Benefits^ »

Remove branches (both conditional and unconditional) »^

Remove branch mispredictions »^

Overlap execution of if-then-else statements^ y

Branches tend to sequentialize operations y^

Predicates can be computed/used in parallel

Costs^ »

Useless instructions executed »^

Code size (extra operand, can’t fit into 32-bits) »^

Possibly longer schedule lengths

The real story^ »

Must be applied selectively or you get worse performancethan not using it at all

19 -

Benefits/Costs of predicated execution (2)

Benefits: - No branches, no mispredicts - Can freely reorder independent operations in the predicated block - Overlap BB2 with BB5 and BB6 Costs (execute all paths) -worst case schedule length -worst case resources required