Locality Optimizations and Loop Transformations in Affine Transform Theory, Slides of Compilers

Various optimizations for improving locality in loop transformations, including loop interchange, blocking, unrolling, and fusion. It also covers the theory behind affine transforms and their application to array accesses.

Typology: Slides

2012/2013

Uploaded on 04/29/2013

aalok
aalok 🇮🇳

4.4

(15)

97 documents

1 / 40

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Loop Transformations and
Locality
Docsity.com
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27
pf28

Partial preview of the text

Download Locality Optimizations and Loop Transformations in Affine Transform Theory and more Slides Compilers in PDF only on Docsity!

Loop Transformations and

Locality

Agenda

  • Introduction
  • Loop Transformations
  • Affine Transform Theory

Cache Locality

A[1,1] A[2,1] … A[1,2] A[2,2] … A[1,3] …

for i = 1, 100 for j = 1, 200 A[i, j] = A[i, j] + 3 end_for end_for

  • Suppose array A has column-major layout
    • Loop nest has poor spatial cache locality.

Loop Interchange

A[1,1] A[2,1] … A[1,2] A[2,2] … A[1,3] …

for i = 1, 100 for j = 1, 200 A[i, j] = A[i, j] + 3 end_for end_for

  • Suppose array A has column-major layout
  • New loop nest has better spatial cache locality.

for j = 1, 200 for i = 1, 100 A[i, j] = A[i, j] + 3 end_for end_for

Dependence Vectors

i

j

 Distance vector (1,-1) = (4,2)-(3,3)  Direction vector (+, -) from the signs of distance vector  Loop interchange is not legal if there exists dependence (+, -)

Agenda

  • Introduction
  • Loop Transformations
  • Affine Transform Theory

Loop Distribution

for i = 1, 1000 A[i] = A[i-1] + 3 end_for

for i = 1, 1000 C[i] = B[i] + 5 end_for

for i = 1, 1000 A[i] = A[i-1] + 3 C[i] = B[i] + 5 end_for

 2 nd^ loop is parallel

Register Blocking

for j = 1, 2m for i = 1, 2n A[i, j] = A[i-1, j] + A[i-1, j-1] end_for end_for

for j = 1, 2m, 2 for i = 1, 2n, 2 A[i, j] = A[i-1,j] + A[i-1,j-1] A[i, j+1] = A[i-1,j+1] + A[i-1,j] A[i+1, j] = A[i, j] + A[i, j-1] A[i+1, j+1] = A[i, j+1] + A[i, j] end_for end_for

 Better reuse between A[i,j] and A[i,j]

Scalar Replacement

for i = 2, N+ = A[i-1]+ A[i] = end_for

t1 = A[1] for i = 2, N+ = t1 + 1 t1 = A[i] = t end_for

 Eliminate loads and stores for array references

Unroll-and-Jam

for j = 1, 2*M for i = 1, N A[i, j] = A[i-1, j]

  • A[i-1, j-1] end_for end_for

for j = 1, 2*M, 2 for i = 1, N A[i, j]=A[i-1,j]+A[i-1,j-1]

A[i, j+1]=A[i-1,j+1]+A[i-1,j] end_for end_for

 Expose more opportunity for scalar replacement

Loop Blocking

for v = 1, 1000, 20 for u = 1, 1000, 20 for j = v, v+ for i = u, u+ A[i, j] = A[i, j] + B[j, i] end_for end_for end_for end_for

 Access to small blocks of the arrays has good cache locality.

Loop Unrolling for ILP

for i = 1, 10 a[i] = b[i]; *p = ... end_for

for I = 1, 10, 2 a[i] = b[i]; *p = … a[i+1] = b[i+1]; *p = … end_for

 Large scheduling regions. Fewer dynamic branches

 Increased code size

Objective

  • Unify a large class of program transformations.
  • Example:

This image cannot currently be displayed.

float Z[100]; for i = 0, 9 Z[i+10] = Z[i]; end_for

Iteration Space

  • A d-deep loop nest has d index variables, and is modeled by a d-dimensional space. The space of iterations is bounded by the lower and upper bounds of the loop indices.
  • Iteration space i = 0,1, …

This image cannot currently be displayed.

for i = 0, 9 Z[i+10] = Z[i]; end_for