Static Scheduling in ECE 463/521: Local Scheduling, Loop Unrolling, Software Pipelining - , Study notes of Electrical and Electronics Engineering

Information on various static scheduling techniques covered in the ece 463/521 course at nc state university. The techniques include local scheduling, loop unrolling, software pipelining, trace scheduling, and predication. The benefits and drawbacks of each technique and includes examples and code snippets.

Typology: Study notes

Pre 2010

Uploaded on 03/10/2009

koofers-user-lrs-1
koofers-user-lrs-1 🇺🇸

10 documents

1 / 6

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
1
ECE 463/521, ECE 463/521, ProfsProfs. . ConteConte,, Rotenberg Rotenberg and and GehringerGehringer, Dept. of ECE,, Dept. of ECE, NC State UniversityNC State University Lec. 27-1
Static Scheduling Techniques
Static Scheduling Techniques
Local scheduling (within a basic block)
Loop unrolling
Software pipelining (modulo scheduling)
Trace scheduling
Predication
ECE 463/521, ECE 463/521, ProfsProfs. . ConteConte,, Rotenberg Rotenberg and and GehringerGehringer, Dept. of ECE,, Dept. of ECE, NC State UniversityNC State University Lec. 27-2
Loop Unrolling
Loop Unrolling (1 of 2)
(1 of 2)
Unroll the loop (e.g., four times)
Loop: L.D F0, 0(R1)
ADD.D F4, F0, F2
S.D F4, 0(R1)
ADDI R1, R1, #8
BNE R1, xxx, Loop
Loop: L.D F0, 0(R1)
ADD.D F4, F0, F2
S.D F4, 0(R1)
L.D F6, 8(R1)
ADD.D F8, F6, F2
S.D F8, 8(R1)
L.D F10, 16(R1)
ADD.D F12, F10, F2
S.D F12, 16(R1)
L.D F14, 24(R1)
ADD.D F16, F14, F2
S.D F16, 24(R1)
ADDI R1, R1, #32
BNE R1, xxx, Loop
More registers needed!
Benefits:
Less dynamic instruction overhead (ADD/BNE)
Fewer branches larger basic block: can reschedule operations more easily
Loop: L.D F0, 0(R1)
L.D F6, 8(R1)
L.D F10, 16(R1)
L.D F14, 24(R1)
ADD.D F4, F0, F2
ADD.D F8, F6, F2
ADD.D F12, F10, F2
ADD.D F16, F14, F2
S.D F4, 0(R1)
S.D F8, 8(R1)
S.D F12, 16(R1)
S.D F16, 24(R1)
ADDI R1, R1, #32
BNE R1, xxx, Loop
14 cycles/4 iterations
(3.5 cycles/iteration)
5 cycles/iteration
ECE 463/521, ECE 463/521, ProfsProfs. . ConteConte,, Rotenberg Rotenberg and and GehringerGehringer, Dept. of ECE,, Dept. of ECE, NC State UniversityNC State University Lec. 27-3
Loop Unrolling
Loop Unrolling (2 of 2)
(2 of 2)
Positives/negatives
u(+) Larger block for scheduling
u(+) Reduces branch frequency
u(+) Reduces dynamic instruction count (loop overhead)
u(–) Expands code size
u(–) Have to handle excess iterations (“strip-mining”)
pf3
pf4
pf5

Partial preview of the text

Download Static Scheduling in ECE 463/521: Local Scheduling, Loop Unrolling, Software Pipelining - and more Study notes Electrical and Electronics Engineering in PDF only on Docsity!

ECE 463/521,ECE 463/521, ProfsProfs.. ConteConte,, RotenbergRotenberg andand GehringerGehringer, Dept. of ECE,, Dept. of ECE, NC State UniversityNC State University Lec. 27-

Static Scheduling Techniques Static Scheduling Techniques

m Local scheduling (within a basic block)

m Loop unrolling

m Software pipelining (modulo scheduling)

m Trace scheduling

m Predication

ECE 463/521,ECE 463/521, ProfsProfs.. ConteConte,, RotenbergRotenberg andand GehringerGehringer, Dept. of ECE,, Dept. of ECE, NC State UniversityNC State University Lec. 27-

Loop Unrolling Loop Unrolling (1 of 2)(1 of 2)

m Unroll the loop (e.g., four times)

Loop: L.D F0, 0(R1) ADD.D F4, F0, F S.D F4, 0(R1) ADDI R1, R1, # BNE R1, xxx, Loop

Loop: L.D F0, 0(R1) ADD.D F4, F0, F S.D F4, 0(R1) L.D F6, 8(R1) ADD.D F8, F6, F S.D F8, 8(R1) L.D F10, 16(R1) ADD.D F12, F10, F S.D F12, 16(R1) L.D F14, 24(R1) ADD.D F16, F14, F S.D F16, 24(R1) ADDI R1, R1, # BNE R1, xxx, Loop

More registers needed!

Benefits:

  • Less dynamic instruction overhead (ADD/BNE)
  • Fewer branches ⇒⇒ larger basic block: can reschedule operations more easily

Loop: L.D F0, 0(R1) L.D F6, 8(R1) L.D F10, 16(R1) L.D F14, 24(R1) ADD.D F4, F0, F ADD.D F8, F6, F ADD.D F12, F10, F ADD.D F16, F14, F S.D F4, 0(R1) S.D F8, 8(R1) S.D F12, 16(R1) S.D F16, 24(R1) ADDI R1, R1, # BNE R1, xxx, Loop

14 cycles/4 iterations (3.5 cycles/iteration)

5 cycles/iteration

ECE 463/521,ECE 463/521, ProfsProfs.. ConteConte,, RotenbergRotenberg andand GehringerGehringer, Dept. of ECE,, Dept. of ECE, NC State UniversityNC State University Lec. 27-

Loop Unrolling Loop Unrolling (2 of 2)(2 of 2)

m Positives/negatives

u (+) Larger block for scheduling

u (+) Reduces branch frequency

u (+) Reduces dynamic instruction count (loop overhead)

u (–) Expands code size

u (–) Have to handle excess iterations (“strip-mining”)

ECE 463/521,ECE 463/521, ProfsProfs.. ConteConte,, RotenbergRotenberg andand GehringerGehringer, Dept. of ECE,, Dept. of ECE, NC State UniversityNC State University Lec. 27-

Strip-mining Strip-mining

m Suppose that our loop requires 10 iterations,

and we can unroll only 4 of the iterations.

m If we go through our unrolled loop 3 times,

how many iterations (of the original loop)

would we be performing?

m If we go through our unrolled loop twice, how

many iterations (of the original loop) would

we be performing?

Iterations:

Prel: L.D F0, 0(R1) ADD.D F4, F0, F S.D F4, 0(R1) L.D F6, 8(R1) ADD.D F8, F6, F S.D F8, 8(R1) ADD R1, R1, 16 BNE R1, xxx, Prel

m So, we need to add some prelude code, e.g.,

ECE 463/521,ECE 463/521, ProfsProfs.. ConteConte,, RotenbergRotenberg andand GehringerGehringer, Dept. of ECE,, Dept. of ECE, NC State UniversityNC State University Lec. 27-

Static Scheduling Techniques Static Scheduling Techniques

m Local scheduling (within a basic block)

m Loop unrolling

m Software pipelining (modulo scheduling)

m Trace scheduling

m Predication

ECE 463/521,ECE 463/521, ProfsProfs.. ConteConte,, RotenbergRotenberg andand GehringerGehringer, Dept. of ECE,, Dept. of ECE, NC State UniversityNC State University Lec. 27-

Software PipeliningSoftware Pipelining (1 of 5)(1 of 5)

mm^ Symbolic loop unrollingSymbolic loop unrolling

u The instructions in a loop are taken from different

iterations in the original loop.

Iteration 0

Iteration 1

Iteration 2

Iteration 3

Iteration 4

Software

Pipelined

Iteration

This slide borrowed from Per Stenström, Chalmers University, Stockholm, Sweden, Computer Architecture, Lecture 6.

ECE 463/521,ECE 463/521, ProfsProfs.. ConteConte,, RotenbergRotenberg andand GehringerGehringer, Dept. of ECE,, Dept. of ECE, NC State UniversityNC State University Lec. 27-

Software Pipelining Software Pipelining (5 of 5)(5 of 5)

m Positives/negatives

u (+) No dependences in loop body

u (+) Same effect as loop unrolling (hide latencies), but don’t

need to replicate iterations (code size)

u (–) Still have extra code for prologue/epilogue (pipeline

fill/drain)

u (–) Does not reduce branch frequency

ECE 463/521,ECE 463/521, ProfsProfs.. ConteConte,, RotenbergRotenberg andand GehringerGehringer, Dept. of ECE,, Dept. of ECE, NC State UniversityNC State University Lec. 27-

Static Scheduling Techniques Static Scheduling Techniques

m Local scheduling (within a basic block)

m Loop unrolling

m Software pipelining (modulo scheduling)

m Trace scheduling

m Predication

ECE 463/521,ECE 463/521, ProfsProfs.. ConteConte,, RotenbergRotenberg andand GehringerGehringer, Dept. of ECE,, Dept. of ECE, NC State UniversityNC State University Lec. 27-

Trace SchedulingTrace Scheduling (1 of 3)(1 of 3)

Creates a sequence of instructions that are

likely to be executed—a trace.

Two steps:

l Trace selection: Find a likely sequence of

basic blocks ( trace ) across statically

predicted branches (e.g. if-then-else).

l Trace compaction: Schedule the trace to be

as efficient as possible while preserving

correctness in the case the prediction is

wrong.

This slide and the next one have been borrowed from Per Stenström, Chalmers University, Stockholm, Sweden,Computer Architecture, Lecture 6.

ECE 463/521,ECE 463/521, ProfsProfs.. ConteConte,, RotenbergRotenberg andand GehringerGehringer, Dept. of ECE,, Dept. of ECE, NC State UniversityNC State University Lec. 27-

Trace SchedulingTrace Scheduling (2 of 3)(2 of 3)

A[i]:=B[i]+C[i]

A[i] =0?

B[i]:= X

C[i]:=

T F

l The leftmost sequence is

chosen as the most likely

trace.

l The assignment to B is

control-dependent on the

if statement.

l Trace compaction has to

respect data dependencies.

l The rightmost (less likely)

trace has to be augmented

with fixup code.

ECE 463/521,ECE 463/521, ProfsProfs.. ConteConte,, RotenbergRotenberg andand GehringerGehringer, Dept. of ECE,, Dept. of ECE, NC State UniversityNC State University Lec. 27-

Trace Scheduling Trace Scheduling (3 of 3)(3 of 3)

m Select most common path – a trace

u Use profiling to select a trace

u Allows global scheduling, i.e., scheduling across

branches

u This is speculation because schedule assumes certain

path through region

u If trace is wrong (other paths taken), execute repair

code

u Efficient static branch prediction key to success

u Yields more instruction-level parallelism

Trace to be scheduled b[i] = 〈〈 old value 〉〉 a[i] = b[i] + c[i] b[i] = 〈〈 new value 〉〉 c[i] = if (a[i] != 0) goto A B:

A: restore old b[i] X maybe recalculate c[i] goto B

Repair code b[i] = 〈〈 old value 〉〉 a[i] = b[i] + c[i] if (a[i] = 0) then b[i] = 〈〈 new value 〉〉 // common case else X c[i] =

Original code

ECE 463/521,ECE 463/521, ProfsProfs.. ConteConte,, RotenbergRotenberg andand GehringerGehringer, Dept. of ECE,, Dept. of ECE, NC State UniversityNC State University Lec. 27-

Static Scheduling Techniques Static Scheduling Techniques

m Local scheduling (within a basic block)

m Loop unrolling

m Software pipelining (modulo scheduling)

m Trace scheduling

m Predication