Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Code Generation 5 - Advanced Compilers - Fall 2002 | EECS 583, Study notes of Electrical and Electronics Engineering

University of Michigan (UM) - Ann Arbor Electrical and Electronics Engineering

Prof. Scott Mahlke

Material Type: Notes; Professor: Mahlke; Class: Advanced Compilers; Subject: Electrical Engineering And Computer Science; University: University of Michigan - Ann Arbor; Term: Winter 2002;

Typology: Study notes

Pre 2010

Uploaded on 09/02/2009

koofers-user-wuz 🇺🇸

10 documents

1 / 28

This page cannot be seen from the preview

Don't miss anything!

EECS 583 – Lecture 16

Code Generation V

University of Michigan

March 11, 2002

Discover Study notes of Electrical and Electronics Engineering University of Michigan (UM) - Ann Arbor

Partial preview of the text

Download Code Generation 5 - Advanced Compilers - Fall 2002 | EECS 583 and more Study notes Electrical and Electronics Engineering in PDF only on Docsity!

EECS 583 – Lecture 16 Code Generation V University of Michigan March 11,

Class problem (2) from last time^ Latencies: ld = 2, st = 1, add = 1, cmpp = 1, br = 1^ Resources: 1 ALU, 1 MEM, 1 BR^ 1: r1[-1] = load(r2[0])2: r3[-1] = r1[1] – r1[2]3: store (r3[-1], r2[0])4: r2[-1] = r2[0] + 45: p1[-1] = cmpp (r2[-1] < 100)remap r1, r2, r36: brct p1[-1] Loop^ Calculate RecMII, ResMII, and MII

Finding the longest path (MinDist)^ Floyd-Warshall method to find longest path^ bool floyd(Matrix& a, Matrix& b)

for (m = 0; m < dim; m++){^ for (i = 0; i < dim; i++) {^ if (a[i][m] > minus_infinity) {

//test for empty a(i,m) edge for (j = 0; j < dim; j++) {^ if (a[m][j] > minus_infinity)

{ // test for empty a(m,j) edge delay = a[i][m] + a[m][j]; if (delay > a[i][j] {^ a[i][j] = delay;^ b[i][j] = m;^

// record the intermediate node if ((i == j) && (delay > 0))

// watch for positive cycle return(true); } } } } } } return(false);

The scheduling window With cyclic scheduling, not all the predecessors may be scheduled, so a more flexible earliest schedule time is:

0, if X is not scheduled E(Y) =^ MAX

MAX (0, SchedTime(X) + EffDelay(X,Y)),

otherwise

for all X = pred(Y) where EffDelay(X,Y) = Delay(X,Y) – II*Distance(X,Y) Every II cycles a new loop iteration will be initialized, thus every II cycles the pattern will repeat. Thus, you only have to look in a window of size II, if the operation cannot be scheduled there, then it cannot be scheduled.^ L(Y) = E(Y) + II – 1

Separate code for prolog and epilog

A0A1^ B0A2^ B1^ C0 A^ B^ C

Prolog -fill thepipe D Bn Cn-1 Dn-2Cn Dn-1Dn KernelEpilog -fill thepipe

A B C D Loop body with 4 ops Generate special code before the loop (preheader) to fill the pipe and special code after the loop to drain the pipe. Peel off II-1 iterations for the prolog. Complete II-1 iterations in epilog

Removing prolog/epilog^ Disable usingpredicated execution

II = 3

Prolog

Kernel^ Epilog

Execute loop kernel on every iteration, but for prolog and epilog selectively disable the appropriate operations to fill/drain the pipeline

Modulo scheduling architectural support^ Y^ Loop requiring N iterations^ »^ Will take N + (S – 1) where S is the number of stages^ Y^ 2 special registers created^ »^ LC: loop counter (holds N)^ »^ ESC: epilog stage counter (holds S)^ Y^ Software pipeline branch operations^ »^ Initialize LC = N, ESC = S in loop preheader^ »^ All rotating predicates are cleared^ »^ BRF.B.B.F^ y^ While LC > 0, decrement LC and RRB, P[0] = 1, branch to top ofloop

X^ This occurs for prolog and kernel y If LC = 0, then while ESC > 0, decrement RRB and write a 0 intoP[0], and branch to the top of the loop X^ This occurs for the epilog

10 -

Execution history with LC/ESC^ A if P[0];

LC = 3, ESC = 3 /* Remember 0 relative */Clear all rotating predicatesP[0] = 1^ B if P[1];^ C if P[2]; D if P[3]; P[0] = BRF.B.B.F; LC^ ESC^

P[0]^ P[1]^

P[2]^ P[3]

A

A^ B

A^ B^

C

A^ B^

C^ D

-^ B^

C^ D

-^ -^

C^ D

-^ -^

-^ D

4 iterations, 4 stages, II = 1, Note 4 + 4 –1 iterations of kernel executed

12 -

Modulo scheduling – iterative scheduler^ Y^ iterative_schedule(II, budget)^ »^ compute op priorities^ »^ while (there are unscheduled ops and budget > 0) do^ y^ op = unscheduled op with the highest priority^ y^ min = early time for op (E(Y))^ y^ max = min + II – 1^ y^ t = find_slot(op, min, max)^ y^ schedule op at time t

X^ /* Backtracking phase – undo previous scheduling decisions */ X^ Unschedule all previously scheduled ops that conflict with op y budget--

13 -

Modulo scheduling – find slot^ Y^ find_slot(op, min, max)^ »^ /* Successively try each time in the range */^ »^ for (t = min to max) do^ y^ if (op has no resource conflicts in MRT at t)

X^ return t » /* Op cannot be scheduled in its specified range / » / So schedule this op and displace all conflicting ops */ » if (op has never been scheduled or min > previous scheduledtime of op) y return min » else y return MIN(1 + prev scheduled time of op, max)

15 -

Example – Step 2^ resources: 4 issue, 2 alu, 1 mem, 1 br^ latencies: add=1, mpy=3, ld = 2, st = 1, br = 1

Step 2: DSA convert LC = 99^

LC = 99

1: r3 = load(r1)2: r4 = r3 * 263: store (r2, r4)4: r1 = r1 + 45: r2 = r2 + 47: brlc Loop

1: r3[-1] = load(r1[0])2: r4[-1] = r3[-1] * 263: store (r2[0], r4[-1])4: r1[-1] = r1[0] + 45: r2[-1] = r2[0] + 4remap r1, r2, r3, r47: brlc Loop

Loop:^

Loop:

16 -

Example – Step 3

Step3: Draw dependence graph Calculate MII

resources: 4 issue, 2 alu, 1 mem, 1 br latencies: add=1, mpy=3, ld = 2, st = 1, br = 1

LC = 99 1: r3[-1] = load(r1[0]) Loop: 2: r4[-1] = r3[-1] * 263: store (r2[0], r4[-1])4: r1[-1] = r1[0] + 45: r2[-1] = r2[0] + 4remap r1, r2, r3, r47: brlc Loop

RecMII = 1 RESMII = 2 MII = 2

18 -

Example – Step 5

Schedule brlc at time II - 1

resources: 4 issue, 2 alu, 1 mem, 1 br latencies: add=1, mpy=3, ld = 2, st = 1, br = 1

Unrolled Schedule Rolled Schedule LC = 99 1: r3[-1] = load(r1[0]) Loop:2: r4[-1] = r3[-1] * 263: store (r2[0], r4[-1])4: r1[-1] = r1[0] + 45: r2[-1] = r2[0] + 4remap r1, r2, r3, r47: brlc Loop

mem bralu1alu0 0 MRT 1 X

19 -

Example – Step 6^ Step6: Schedule the highest priority op^ Op1: E = 0, L = 1^ Place at 0 (0 % 2)

Unrolled Schedule Rolled Schedule LC = 99^

1: r3[-1] = load(r1[0]) Loop:2: r4[-1] = r3[-1] * 263: store (r2[0], r4[-1])4: r1[-1] = r1[0] + 45: r2[-1] = r2[0] + 4remap r1, r2, r3, r47: brlc Loop

mem bralu1alu0 X 0 MRT 1 X

Code Generation 5 - Advanced Compilers - Fall 2002 | EECS 583, Study notes of Electrical and Electronics Engineering

Related documents

Partial preview of the text

Download Code Generation 5 - Advanced Compilers - Fall 2002 | EECS 583 and more Study notes Electrical and Electronics Engineering in PDF only on Docsity!

Finding the longest path (MinDist)^ Floyd-Warshall method to find longest path^ bool floyd(Matrix& a, Matrix& b)

The scheduling window With cyclic scheduling, not all the predecessors may be scheduled, so a more flexible earliest schedule time is:

Separate code for prolog and epilog

A0A1^ B0A2^ B1^ C0 A^ B^ C

Removing prolog/epilog^ Disable usingpredicated execution

II = 3

Execution history with LC/ESC^ A if P[0];

P[0]^ P[1]^

P[2]^ P[3]

A

A^ B

A^ B^

C

A^ B^

C^ D

-^ B^

C^ D

-^ -^

C^ D

-^ -^

-^ D

Modulo scheduling – find slot^ Y^ find_slot(op, min, max)^ »^ /* Successively try each time in the range */^ »^ for (t = min to max) do^ y^ if (op has no resource conflicts in MRT at t)

Example – Step 2^ resources: 4 issue, 2 alu, 1 mem, 1 br^ latencies: add=1, mpy=3, ld = 2, st = 1, br = 1

LC = 99

Example – Step 3

Example – Step 5

Example – Step 6^ Step6: Schedule the highest priority op^ Op1: E = 0, L = 1^ Place at 0 (0 % 2)