Modulo Scheduling II: Height-Based Priority and Loop Scheduling, Study notes of Computer Science

The modulo scheduling ii technique used in computer science for processor scheduling. It covers the use of list scheduling, priority function, and loop prolog and epilog. The document also explains how to calculate the scheduling window and the height of nodes in a dependence graph.

Typology: Study notes

Pre 2010

Uploaded on 08/05/2009

koofers-user-kx2
koofers-user-kx2 🇺🇸

9 documents

1 / 34

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
CS 6241 – Class 16
Modulo Scheduling II
Georgia Tech.
February 28, 2008
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22

Partial preview of the text

Download Modulo Scheduling II: Height-Based Priority and Loop Scheduling and more Study notes Computer Science in PDF only on Docsity!

  • CS 6241 – Class 16Modulo Scheduling II Georgia Tech.February 28,
  • 1 -

Exam Information^ ^ When/Where^ »^ Thursday, March 6, in class^ »^ 1:35pm – 2:55pm (80 minutes)^ ^ Format^ »^ Open book, open notes^ ^ But, don’t try to learn how to modulo schedule during the test!^ »^ Bring a pencil or 2^ »^ No laptops^ ^ Material^ »^ Everything from lectures/homeworks is fair game up to andincluding modulo scheduling, but focus on the major topics^ »^ No Trimaran specifics will be asked

  • 3 -

Topics^ ^ Control flow analysis/optimization^ »^ Dom/pdom/control dependence analysis^ »^ Basic blocks, traces, superblocks, if-conversion, hyperblocks^ »^ Profile-guided code layout^ ^ Dataflow analysis and optimization^ »^ Liveness, reaching defs^ »^ Classic/ILP optimizations and transformations^ ^ Scheduling + register allocation^ »^ Dependence graphs, Estart, Lstart, priority^ »^ Acyclic scheduling, control speculation^ »^ Modulo scheduling^ »^ Register allocation

  • 4 -

Modulo Scheduling Process^ ^ Use list scheduling but we need a few twists^ »^ II is predetermined – starts at MII, then is incremented^ »^ Cyclic dependences complicate matters^ ^ Estart/Priority/etc.^ ^ Consumer scheduled before producer is considered

^ There is a window where something can be scheduled! » Guarantee the repeating pattern  2 constraints enforced on the schedule » Each iteration begin exactly II cycles after the previous one » Each time an operation is scheduled in 1 iteration, it is tentativelyscheduled in subsequent iterations at intervals of II  MRT used for this

  • 6 -

Calculating Height

1.^ Insert pseudo edges from all nodes to branch withlatency = 0, distance = 0 (dotted edges)2.^ Compute II, For this example assume II = 23.^ HeightR(4) =4.^ HeightR(3) = 5.^ HeightR(2) = 6.^ HeightR(1)

0,0 0,0^ 2,0 0,

  • 7 -

The Scheduling Window^ E(Y) =^

0, if X is not scheduledMAX (0, SchedTime(X) + EffDelay(X,Y)),

otherwise

With cyclic scheduling, not all the predecessors may be scheduled,so a more flexible earliest schedule time

is: MAX for all X = pred(Y) where EffDelay(X,Y) = Delay(X,Y) – II*Distance(X,Y) Every II cycles a new loop iteration will be initialized, thus every IIcycles the pattern will repeat. Thus, you only have to look in awindow of size II, if the operation cannot be scheduled there, thenit cannot be scheduled.^ Latest schedule time(Y) = L(Y) = E(Y) + II – 1

A0A1^ B0A2^ B1^ - 9 -

C0 A B C^ D Bn Cn-1^ Dn-2Cn^ Dn-1Dn

Separate Code for Prolog and Epilog

A B C D

Loop bodywith 4 ops

Prolog -fill thepipe^ Kernel^ Epilog -drain thepipe

Generate special code before the loop (preheader) to fill the pipeand special code after the loop to drain the pipe.Peel off II-1 iterations for the prolog. Complete II-1 iterationsin epilog

  • 10 -

Removing Prolog/Epilog

Prolog

II = 3 Kernel Epilog

Disable usingpredicated execution Execute loop kernel on every iteration, but for prolog and epilogselectively disable the appropriate operations to fill/drain the pipeline

  • 12 -

Modulo Scheduling Architectural Support^ ^ Loop requiring N iterations^ »^ Will take N + (S – 1) where S is the number of stages^ ^ 2 special registers created^ »^ LC: loop counter (holds N)^ »^ ESC: epilog stage counter (holds S)^ ^ Software pipeline branch operations^ »^ Initialize LC = N, ESC = S in loop preheader^ »^ All rotating predicates are cleared^ »^ BRF.B.B.F^ ^ While LC > 0, decrement LC and RRB, P[0] = 1, branch to top ofloop

^ This occurs for prolog and kernel  If LC = 0, then while ESC > 0, decrement RRB and write a 0 intoP[0], and branch to the top of the loop ^ This occurs for the epilog

  • 13 -

Execution History With LC/ESC LC ESC^

P[0]^ P[1]^
P[2]^ P[3]
A
A^ B
A^ B^
C
A^ B^
C^ D
-^ B^
C^ D
-^ -^
C^ D
-^ -^
-^ D

A if P[0];^ B if P[1];

C if P[2]; D if P[3]; P[0] = BRF.B.B.F; LC = 3, ESC = 3 /* Remember 0 relative!! */Clear all rotating predicatesP[0] = 1 4 iterations, 4 stages, II = 1, Note 4 + 4 –1 iterations of kernel executed

  • 15 -

Modulo Scheduling – Iterative Scheduler^ ^ iterative_schedule(II, budget)^ »^ compute op priorities^ »^ while (there are unscheduled ops and budget > 0) do^ ^ op = unscheduled op with the highest priority^ ^ min = early time for op (E(Y))^ ^ max = min + II – 1^ ^ t = find_slot(op, min, max)^ ^ schedule op at time t

^ /* Backtracking phase – undo previous scheduling decisions */ ^ Unschedule all previously scheduled ops that conflict with op  budget--

  • 16 -

Modulo Scheduling – Find_slot^ ^ find_slot(op, min, max)^ »^ /* Successively try each time in the range */^ »^ for (t = min to max) do^ ^ if (op has no resource conflicts in MRT at t)

^ return t » /* Op cannot be scheduled in its specified range / » / So schedule this op and displace all conflicting ops */ » if (op has never been scheduled or min > previous scheduled timeof op)  return min » else  return MIN(1 + prev scheduled time of op, max)

  • 18 -

Example – Step 2^ resources: 4 issue, 2 alu, 1 mem, 1 brlatencies: add=1, mpy=3, ld = 2, st = 1, br = 1

LC = 99 1: r3[-1] = load(r1[0]) Loop:2: r4[-1] = r3[-1] * 263: store (r2[0], r4[-1])4: r1[-1] = r1[0] + 45: r2[-1] = r2[0] + 4remap r1, r2, r3, r47: brlc Loop Step 2: DSA convert LC = 99 1: r3 = load(r1) Loop:2: r4 = r3 * 263: store (r2, r4)4: r1 = r1 + 45: r2 = r2 + 47: brlc Loop

  • 19 -

Example – Step 3

RecMII = 1RESMII = 2MII = 2

resources: 4 issue, 2 alu, 1 mem, 1 brlatencies: add=1, mpy=3, ld = 2, st = 1, br = 1LC = 99^ 1: r3[-1] = load(r1[0]) Loop:2: r4[-1] = r3[-1] * 263: store (r2[0], r4[-1])4: r1[-1] = r1[0] + 45: r2[-1] = r2[0] + 4remap r1, r2, r3, r47: brlc Loop

Step3: Draw dependence graphCalculate MII 0,0 0,