
































Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Various optimizations for improving locality in loop transformations, including loop interchange, blocking, unrolling, and fusion. It also covers the theory behind affine transforms and their application to array accesses.
Typology: Slides
1 / 40
This page cannot be seen from the preview
Don't miss anything!

































A[1,1] A[2,1] … A[1,2] A[2,2] … A[1,3] …
for i = 1, 100 for j = 1, 200 A[i, j] = A[i, j] + 3 end_for end_for
A[1,1] A[2,1] … A[1,2] A[2,2] … A[1,3] …
for i = 1, 100 for j = 1, 200 A[i, j] = A[i, j] + 3 end_for end_for
for j = 1, 200 for i = 1, 100 A[i, j] = A[i, j] + 3 end_for end_for
i
j
Distance vector (1,-1) = (4,2)-(3,3) Direction vector (+, -) from the signs of distance vector Loop interchange is not legal if there exists dependence (+, -)
for i = 1, 1000 A[i] = A[i-1] + 3 end_for
for i = 1, 1000 C[i] = B[i] + 5 end_for
for i = 1, 1000 A[i] = A[i-1] + 3 C[i] = B[i] + 5 end_for
2 nd^ loop is parallel
for j = 1, 2m for i = 1, 2n A[i, j] = A[i-1, j] + A[i-1, j-1] end_for end_for
for j = 1, 2m, 2 for i = 1, 2n, 2 A[i, j] = A[i-1,j] + A[i-1,j-1] A[i, j+1] = A[i-1,j+1] + A[i-1,j] A[i+1, j] = A[i, j] + A[i, j-1] A[i+1, j+1] = A[i, j+1] + A[i, j] end_for end_for
Better reuse between A[i,j] and A[i,j]
for i = 2, N+ = A[i-1]+ A[i] = end_for
t1 = A[1] for i = 2, N+ = t1 + 1 t1 = A[i] = t end_for
Eliminate loads and stores for array references
for j = 1, 2*M for i = 1, N A[i, j] = A[i-1, j]
for j = 1, 2*M, 2 for i = 1, N A[i, j]=A[i-1,j]+A[i-1,j-1]
A[i, j+1]=A[i-1,j+1]+A[i-1,j] end_for end_for
Expose more opportunity for scalar replacement
for v = 1, 1000, 20 for u = 1, 1000, 20 for j = v, v+ for i = u, u+ A[i, j] = A[i, j] + B[j, i] end_for end_for end_for end_for
Access to small blocks of the arrays has good cache locality.
for i = 1, 10 a[i] = b[i]; *p = ... end_for
for I = 1, 10, 2 a[i] = b[i]; *p = … a[i+1] = b[i+1]; *p = … end_for
Large scheduling regions. Fewer dynamic branches
Increased code size
This image cannot currently be displayed.
float Z[100]; for i = 0, 9 Z[i+10] = Z[i]; end_for
This image cannot currently be displayed.
for i = 0, 9 Z[i+10] = Z[i]; end_for