Modulo Scheduling II - Lecture Slides | EECS 583, Study notes of Electrical and Electronics Engineering

Material Type: Notes; Professor: Mahlke; Class: Advanced Compilers; Subject: Electrical Engineering And Computer Science; University: University of Michigan - Ann Arbor; Term: Winter 2004;

Typology: Study notes

Pre 2010

Uploaded on 09/02/2009

koofers-user-85g
koofers-user-85g 🇺🇸

10 documents

1 / 30

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
EECS 583 – Lecture 18
Modulo Scheduling II
University of Michigan
March 17, 2004
Guest speaker today: Manjunath Kudlur
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e

Partial preview of the text

Download Modulo Scheduling II - Lecture Slides | EECS 583 and more Study notes Electrical and Electronics Engineering in PDF only on Docsity!

EECS 583 – Lecture 18 Modulo Scheduling II^ University of Michigan^ March 17, 2004^ Guest speaker today: Manjunath Kudlur

  • 1 -

Recap: Modulo Scheduling Process^ Y^ Use list scheduling but we need a few twists^ »^ II is predetermined – starts at MII, then is incremented^ »^ Cyclic dependences complicate matters^ y^ Estart/Priority/etc.^ y^ Consumer scheduled before producer is considered

X^ There is a window where something can be scheduled! » Guarantee the repeating pattern Y 2 constraints enforced on the schedule » Each iteration begin exactly II cycles after the previous one » Each time an operation is scheduled in 1 iteration, it is tentativelyscheduled in subsequent iterations at intervals of II y MRT used for this

  • 3 -

The Scheduling Window With cyclic scheduling, not all the predecessors may be scheduled, so a more flexible earliest schedule time

is: 0, if X is not scheduled E(Y) =^ MAX

MAX (0, SchedTime(X) + EffDelay(X,Y)),

otherwise

for all X = pred(Y) where EffDelay(X,Y) = Delay(X,Y) – II*Distance(X,Y) Every II cycles a new loop iteration will be initialized, thus every II cycles the pattern will repeat. Thus, you only have to look in a window of size II, if the operation cannot be scheduled there, then it cannot be scheduled.^ Latest schedule time(Y) = L(Y) = E(Y) + II – 1

  • 4 -

Loop Prolog and Epilog

II = 3

Prolog

Kernel^ Epilog

Only the kernel involves executing full width of operations Prolog and epilog execute a subset (ramp-up and ramp-down)

  • 6 -

Removing Prolog/Epilog^ Disable usingpredicated execution

II = 3

Prolog

Kernel^ Epilog

Execute loop kernel on every iteration, but for prolog and epilog selectively disable the appropriate operations to fill/drain the pipeline

  • 7 -

Kernel-only Code Using Rotating Predicates^ A0A1^ B0A2^ B1^ C0 A^ B^ C^

D Bn Cn-1 Dn-2Cn Dn-1Dn

A if P[0]^ B if P[1]

C if P[2] D if P[3] P referred to as the staging predicate

P[0]^ P[1]^

P[2]^ P[3]
A^ -^
-^ -
A^ B^
-^ -
A^ B^
C^ -
A^ B^
C^ D
…-^ B^
C^ D
-^ -^
C^ D
-^ -^
-^ D
  • 9 -

Execution History With LC/ESC^ A if P[0];^

LC = 3, ESC = 3 /* Remember 0 relative!! */Clear all rotating predicatesP[0] = 1B if P[1];^ C if P[2]; D if P[3]; P[0] = BRF.B.B.F; LC^ ESC^

P[0]^ P[1]^
P[2]^ P[3]
A
A^ B
A^ B^
C
A^ B^
C^ D
-^ B^
C^ D
-^ -^
C^ D
-^ -^
-^ D

4 iterations, 4 stages, II = 1, Note 4 + 4 –1 iterations of kernel executed

  • 10 -

Modulo Scheduling - Driver^ Y^ compute MII^ Y^ II = MII^ Y^ budget = BUDGET_RATIO * number of ops^ Y^ while (schedule is not found) do^ »^ iterative_schedule(II, budget)^ »^ II++^ Y^ Budget_ratio is a measure of the amount of backtracking that can beperformed before giving up and trying a higher II

  • 12 -

Modulo Scheduling – Find_slot^ Y^ find_slot(op, min, max)^ »^ /* Successively try each time in the range */^ »^ for (t = min to max) do^ y^ if (op has no resource conflicts in MRT at t)

X^ return t » /* Op cannot be scheduled in its specified range / » / So schedule this op and displace all conflicting ops */ » if (op has never been scheduled or min > previous scheduled timeof op) y return min » else y return MIN(1 + prev scheduled time of op, max)

  • 13 -

Modulo Scheduling Example^ resources: 4 issue, 2 alu, 1 mem, 1 br^ latencies: add=1, mpy=3, ld = 2, st = 1, br = 1

Step1: Compute to loop into form that uses LC for (j=0; j<100; j++)^ b[j] = a[j] * 26

LC = 99 1: r3 = load(r1) Loop:2: r4 = r3 * 263: store (r2, r4)4: r1 = r1 + 45: r2 = r2 + 47: brlc Loop

Loop:^ 1: r3 = load(r1)2: r4 = r3 * 263: store (r2, r4)4: r1 = r1 + 45: r2 = r2 + 46: p1 = cmpp (r1 < r9)7: brct p1 Loop

  • 15 -

Example – Step 3

Step3: Draw dependence graph Calculate MII

resources: 4 issue, 2 alu, 1 mem, 1 br latencies: add=1, mpy=3, ld = 2, st = 1, br = 1

LC = 99 1: r3[-1] = load(r1[0]) Loop:2: r4[-1] = r3[-1] * 263: store (r2[0], r4[-1])4: r1[-1] = r1[0] + 45: r2[-1] = r2[0] + 4remap r1, r2, r3, r47: brlc Loop

RecMII = 1 RESMII = 2 MII = 2

  • 16 -

Example – Step 4

Step 4 – Calculate priorities (MAX height to pseudo stop node) 1,1 1 2,0 0,0 2 3,0 0,0 (^3) 0,01,11,1 (^4) 0,0 1,1 (^5) 0,0 1,1 7

1: H = 5 2: H = 3 3: H = 0 4: H = 0 5: H = 0 7: H = 0 Generally you need to calculate the minDist from each node to the branch node accounting for cycles Here are there are no critical cycles, so the height is essentially the acyclic height

  • 18 -

Example – Step 6^ Step6: Schedule the highest priority op^ Op1: E = 0, L = 1^ Place at time 0 (0 % 2)

Unrolled Schedule Rolled Schedule LC = 99^

1: r3[-1] = load(r1[0]) Loop:2: r4[-1] = r3[-1] * 263: store (r2[0], r4[-1])4: r1[-1] = r1[0] + 45: r2[-1] = r2[0] + 4remap r1, r2, r3, r47: brlc Loop

brmemalu alu0^ X 0

MRT
X
  • 19 -

Example – Step 7^ Step7: Schedule the highest priority op^ Op2: E = 2, L = 3^ Place at time 2 (2 % 2)

Unrolled Schedule Rolled Schedule LC = 99^

1: r3[-1] = load(r1[0]) Loop:2: r4[-1] = r3[-1] * 263: store (r2[0], r4[-1])4: r1[-1] = r1[0] + 45: r2[-1] = r2[0] + 4remap r1, r2, r3, r47: brlc Loop

brmemalu alu0 X^ X 0

MRT
X