Loop Parallelization: Eliminating Dependences and Data Dependence Tests - Prof. Rudolf Eig, Study notes of Electrical and Electronics Engineering

Loop parallelization techniques, focusing on eliminating dependences and performing data dependence tests. It covers concepts such as iteration space graphs, distance vectors, and loop-carried dependences. The document also mentions various tests like the gcd test, banerjee test, omega test, and range test for detecting dependences.

Typology: Study notes

Pre 2010

Uploaded on 07/30/2009

koofers-user-x3r-1
koofers-user-x3r-1 🇺🇸

10 documents

1 / 49

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
1
Section 7:
Loop Parallelization
Techniques
Data-Dependence Analysis
Dependence-Removing Techniques
Parallelizing Transformations
Performance-enchancing Techniques
Loop Parallelization ECE 495S, Fall 2008
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27
pf28
pf29
pf2a
pf2b
pf2c
pf2d
pf2e
pf2f
pf30
pf31

Partial preview of the text

Download Loop Parallelization: Eliminating Dependences and Data Dependence Tests - Prof. Rudolf Eig and more Study notes Electrical and Electronics Engineering in PDF only on Docsity!

1

Section 7:

Loop Parallelization

Techniques

• Data-Dependence Analysis

• Dependence-Removing Techniques

• Parallelizing Transformations

• Performance-enchancing Techniques

2

Some motivating examples

Do i = 1, n a(i) = b(i) S 1 c(i) = a(i-1) S 2 End do Is it legal to

  • Run the i loop in parallel?
  • Put S 2 first in the loop? Do I = 1, n a(i) = b(i) End do Do I = 1, n c(i) = a(i-1) End do Is it legal to
  • Fuse the two i loops? In general, it is desirable to determine if two references access the same memory location, and the order they execute, so that we can determine if the references might execute in a different order after some transformation.

4

Can this loop be run in parallel?

Do i = 1, n a(i) = b(i) S 1 c(i) = a(i-1) S 2 End do i = 1 b(1) a(1) a(0) c(1) i = 2 b(2) a(2) a(1) c(2) i = 3 b(3) a(3) a(2) c(3) i = 4 b(4) a(4) a(3) c(4) i = 5 b(5) a(5) a(4) c(5) i = 6 b(6) a(6) a(5) c(6) Assume 1 iteration per processor, then if for some reason some iterations execute out of lock-step, bad things can happen In this case, read of a(2) in i=3 will get an invalid value! time

5

Can we change the order of the

statements?

Do i = 1, n a(i) = b(i) S 1 c(i) = a(i-1) S 2 End do Do i = 1, n c(i) = a(i-1) S 2 a(i) = b(i) S 1 End do

a(0) c(1) b(1) a(1) || a(1) c(2) b(2) a(2) || a(2) c(3) b(3) a(3) || a(3) c(4) b(4) a(4)

No problem with a serial execution.

b(1) a(1) a(0) c(1) || b(2) a(2) a(1) c(2) || b(3) a(3) a(2) c(3) || b(4) a(4) a(3) c(4)

Access order before statement reordering i=1 i=2 i=3 i= i=1 i=2 i=3 i= Access order after statement reordering

7

Eliminating anti-dependence

… = a(2) a(2) = … Anti-dependence – write to a location cannot occur before a previous read is finished Let the program in be: a(2) = … … = a(2) a(2) = … = … a(2) Create additional storage to eliminate the anti- dependence The new program is: a(2) = … … = a(2) aa(2) = … = … aa(2) No more anti-dependence!

8

Getting rid of output dependences

a(2) = … a(2) = … Output dependence – write a location must wait for a previous write to finish Let the program be: a(2) = … … = a(2) a(2) = … … = a(2) Again, by creating new storage we can eliminate the output dependence. The new program is: a(2) = … … = a(2) aa(2) = … … = aa(2)

10

An example of when it is messy to

create new storage

Do i = 1, n a(3i-1) = … a(2i) = … = … a(i) End do A(3i) writes locations 2, 5, 8, 11, 14, 17, 20, 23 A(2i) writes locations 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22 A(i) reads from outside the of loop when i = 1, 3, 7, 9, 13, 15, 19, 21 A(i) reads from a(3i-1) when I = 5, 11, 17, 23 A(i) reads from a(2i) when I = 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22

11

Can we fuse the loop?

Do i = 1, n a(i) = b(i) S 1 End do Do i c(i) = a(i-1) S 2 End do Do i = 1, n a(i) = b(i) S 1 c(i) = a(i-1) S 2 End do In original execution of the unfused loops :

  1. A(i-1) gets value assigned in a(i)
  2. Can’t overwrite value assigned to a(i) or c(i)
  3. B(i) value comes from outside the loop
  4. Is ok after fusing, because get a(i-1) from the value assigned in the previous iteration
  5. No “output” dependence on a(i) or c(i), not overwritten
  6. No input flow, or true dependence on a b(i), so value comes from outside of the loop nest

13

Dependence sources and sinks

• The sink of a

dependence is the

statement at the

head of the

dependence arrow

• The source is the

statement at the tail

of the dependence

arrow

for (i=1; i < nl i++) { a[i] = … … = a[i-1] } a[1] = = a[0] a[2] = = a[1] a[3] = = a[2] a[4] = = a[3]

14 Data Dependence Tests: Concepts Terms for data dependences between statements of loop iterations.

  • Distance (vector): indicates how many iterations apart are source and sink of dependence.
  • Direction (vector): is basically the sign of the distance. There are different notations: (<,=,>) or (+1,0,-1) meaning dependence (from earlier to later, within the same, from later to earlier) iteration.
  • Loop-carried (or cross-iteration) dependence and non-loop-carried (or loop-independent) dependence: indicates whether or not a dependence exists within one iteration or across iterations. - For detecting parallel loops, only cross-iteration dependences matter. - equal dependences are relevant for optimizations such as statement reordering and loop distribution.
  • Iteration space graphs: the un-abstracted form of a dependence graph with one node per statement instance.

16 Data Dependence Tests: Distance Vectors Distance (vector): indicates how many iterations apart are the source and sink of dependence. i 2 = 1 i 2 = 3 i 2 = 2

I 2 = 4

i 2 = 5 i 1 = 1 2 3 4 5 6

I=(1,4)  I’ = (3,1)

I’ – I = (2,-3)

do i 1 = 1, n do i 2 = 1, n a(i 1 ,i 2 ) = a(i 1 -2,i 2 +3) end do end do

17 Data Dependence Tests: Direction Vectors Direction (vector): is basically the sign of the distance. There are different notations: (<,=,>) or (-1,0,+1) meaning dependence (from earlier to later, within the same, from later to earlier) iteration. do i 1 = 1, n do i 2 = 1, n a(i 1 ,i 2 ) = a(i 1 -2,i 2 +3) end do end do i 1 = 1 2 3 4 5 6 I=(1,4)  I’ = (3,1) I’ – I = d = (2,-3) Direction = (<,>) (or (sign(2),sign(-3) == (+1,-1) in some works I 2 = 1 i 2 = 3 i 2 = 2

I 2 = 4

i 2 = 5

19 Data Dependence Tests: Loop Carried

  • Loop-carried (or cross-iteration) dependence and non-loop-carried (or loop-independent) dependence: indicates whether or not a dependence exists within one iteration or across iterations. do i 1 = 1, n dopar i 2 = 1, n a(i 1 ,i 2 ) = end dopar i’ 2 = 1, n = a(i 1 ,i’ 2 -1) end do end do This is legal since loop splitting enforces the loop carried dependences i 1 = 1 2 3 4 5 I’ 2 = 2 I’ 2 = 3 i 2 = 5 I’ 2 = 1 i 2 = 3 i 2 = 4 i 2 = 1 i 2 = 2 I’ 2 = 4 I’ 2 = 5

Lower bound 20

A quick aside

A loop do i = 4 , n, 3 a(i) end do Can be always be normalized to the loop  do i = 0, (n-1)/ 3 -1, 1 a( 3 i+ 4 ) end do This makes discussing the data-dependence problem easier since we only worry about loops from 0, n, 1 More precisely, do i = lower, upper, stride { a(i)} becomes do i’ = 0, (upper – lower + incre)/stride – 1, 1 {a(i’stride + lower)} Stride Upper bound