




















Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Modulo scheduling ii, a technique used to parallelize loops with cyclic dependencies in computer systems. The modulo scheduling process, priority function, scheduling window, loop prolog and epilog, and architectural support. It also includes examples and steps to calculate ii, priorities, and execute the code. Useful for students studying computer architecture, parallel computing, or digital design.
Typology: Study notes
1 / 28
This page cannot be seen from the preview
Don't miss anything!





















is: 0, if X is not scheduled E(Y) =^ MAX
MAX (0, SchedTime(X) + EffDelay(X,Y)),
otherwise
for all X = pred(Y) where EffDelay(X,Y) = Delay(X,Y) – II*Distance(X,Y) Every II cycles a new loop iteration will be initialized, thus every II cycles the pattern will repeat. Thus, you only have to look in a window of size II, if the operation cannot be scheduled there, then it cannot be scheduled.^ Latest schedule time(Y) = L(Y) = E(Y) + II – 1
Prolog -fill thepipe D^ KernelBn Cn-1 Dn-2Epilog -Cn Dn-1drain theDnpipe
A B C D Loop body with 4 ops Generate special code before the loop (preheader) to fill the pipe and special code after the loop to drain the pipe. Peel off II-1 iterations for the prolog. Complete II-1 iterations in epilog
Prolog
Kernel^ Epilog
Execute loop kernel on every iteration, but for prolog and epilog selectively disable the appropriate operations to fill/drain the pipeline
X^ This occurs for prolog and kernel y If LC = 0, then while ESC > 0, decrement RRB and write a 0 intoP[0], and branch to the top of the loop X^ This occurs for the epilog
LC = 3, ESC = 3 /* Remember 0 relative!! */Clear all rotating predicatesP[0] = 1B if P[1];^ C if P[2]; D if P[3]; P[0] = BRF.B.B.F; LC^ ESC^
4 iterations, 4 stages, II = 1, Note 4 + 4 –1 iterations of kernel executed
X^ /* Backtracking phase – undo previous scheduling decisions */ X^ Unschedule all previously scheduled ops that conflict with op y budget--
Step 2: DSA convert LC = 99^
1: r3 = load(r1)2: r4 = r3 * 263: store (r2, r4)4: r1 = r1 + 45: r2 = r2 + 47: brlc Loop
1: r3[-1] = load(r1[0])2: r4[-1] = r3[-1] * 263: store (r2[0], r4[-1])4: r1[-1] = r1[0] + 45: r2[-1] = r2[0] + 4remap r1, r2, r3, r47: brlc Loop
Loop:^
Loop:
Step3: Draw dependence graph Calculate MII
resources: 4 issue, 2 alu, 1 mem, 1 br latencies: add=1, mpy=3, ld = 2, st = 1, br = 1
LC = 99 1: r3[-1] = load(r1[0]) Loop: 2: r4[-1] = r3[-1] * 263: store (r2[0], r4[-1])4: r1[-1] = r1[0] + 45: r2[-1] = r2[0] + 4remap r1, r2, r3, r47: brlc Loop
0,
RecMII = 1 RESMII = 2 MII = 2
Schedule brlc at time II - 1
resources: 4 issue, 2 alu, 1 mem, 1 br latencies: add=1, mpy=3, ld = 2, st = 1, br = 1
Unrolled Schedule Rolled Schedule LC = 99 1: r3[-1] = load(r1[0]) Loop:2: r4[-1] = r3[-1] * 263: store (r2[0], r4[-1])4: r1[-1] = r1[0] + 45: r2[-1] = r2[0] + 4remap r1, r2, r3, r47: brlc Loop
brmemalu alu0 0
Unrolled Schedule Rolled Schedule LC = 99^
1: r3[-1] = load(r1[0]) Loop:2: r4[-1] = r3[-1] * 263: store (r2[0], r4[-1])4: r1[-1] = r1[0] + 45: r2[-1] = r2[0] + 4remap r1, r2, r3, r47: brlc Loop
brmemalu alu0^ X 0