






















Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Material Type: Notes; Professor: Mahlke; Class: Advanced Compilers; Subject: Electrical Engineering And Computer Science; University: University of Michigan - Ann Arbor; Term: Winter 2004;
Typology: Study notes
1 / 30
This page cannot be seen from the preview
Don't miss anything!























is: 0, if X is not scheduled E(Y) =^ MAX
MAX (0, SchedTime(X) + EffDelay(X,Y)),
otherwise
for all X = pred(Y) where EffDelay(X,Y) = Delay(X,Y) – II*Distance(X,Y) Every II cycles a new loop iteration will be initialized, thus every II cycles the pattern will repeat. Thus, you only have to look in a window of size II, if the operation cannot be scheduled there, then it cannot be scheduled.^ Latest schedule time(Y) = L(Y) = E(Y) + II – 1
Prolog
Kernel^ Epilog
Only the kernel involves executing full width of operations Prolog and epilog execute a subset (ramp-up and ramp-down)
Prolog
Kernel^ Epilog
Execute loop kernel on every iteration, but for prolog and epilog selectively disable the appropriate operations to fill/drain the pipeline
D Bn Cn-1 Dn-2Cn Dn-1Dn
A if P[0]^ B if P[1]
C if P[2] D if P[3] P referred to as the staging predicate
P[0]^ P[1]^
LC = 3, ESC = 3 /* Remember 0 relative!! */Clear all rotating predicatesP[0] = 1B if P[1];^ C if P[2]; D if P[3]; P[0] = BRF.B.B.F; LC^ ESC^
4 iterations, 4 stages, II = 1, Note 4 + 4 –1 iterations of kernel executed
Step1: Compute to loop into form that uses LC for (j=0; j<100; j++)^ b[j] = a[j] * 26
LC = 99 1: r3 = load(r1) Loop:2: r4 = r3 * 263: store (r2, r4)4: r1 = r1 + 45: r2 = r2 + 47: brlc Loop
Loop:^ 1: r3 = load(r1)2: r4 = r3 * 263: store (r2, r4)4: r1 = r1 + 45: r2 = r2 + 46: p1 = cmpp (r1 < r9)7: brct p1 Loop
Step3: Draw dependence graph Calculate MII
resources: 4 issue, 2 alu, 1 mem, 1 br latencies: add=1, mpy=3, ld = 2, st = 1, br = 1
LC = 99 1: r3[-1] = load(r1[0]) Loop:2: r4[-1] = r3[-1] * 263: store (r2[0], r4[-1])4: r1[-1] = r1[0] + 45: r2[-1] = r2[0] + 4remap r1, r2, r3, r47: brlc Loop
RecMII = 1 RESMII = 2 MII = 2
Step 4 – Calculate priorities (MAX height to pseudo stop node) 1,1 1 2,0 0,0 2 3,0 0,0 (^3) 0,01,11,1 (^4) 0,0 1,1 (^5) 0,0 1,1 7
1: H = 5 2: H = 3 3: H = 0 4: H = 0 5: H = 0 7: H = 0 Generally you need to calculate the minDist from each node to the branch node accounting for cycles Here are there are no critical cycles, so the height is essentially the acyclic height
Unrolled Schedule Rolled Schedule LC = 99^
1: r3[-1] = load(r1[0]) Loop:2: r4[-1] = r3[-1] * 263: store (r2[0], r4[-1])4: r1[-1] = r1[0] + 45: r2[-1] = r2[0] + 4remap r1, r2, r3, r47: brlc Loop
brmemalu alu0^ X 0
Unrolled Schedule Rolled Schedule LC = 99^
1: r3[-1] = load(r1[0]) Loop:2: r4[-1] = r3[-1] * 263: store (r2[0], r4[-1])4: r1[-1] = r1[0] + 45: r2[-1] = r2[0] + 4remap r1, r2, r3, r47: brlc Loop
brmemalu alu0 X^ X 0