
















Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Material Type: Notes; Class: Computer Architecture; Subject: Electrical and Computer Engr; University: University of Illinois - Urbana-Champaign; Term: Fall 2002;
Typology: Study notes
1 / 24
This page cannot be seen from the preview
Don't miss anything!

















© W. W. Hwu and S. J. Patel, 2002ECE 412, University of Illinois
Attached co-processor to improve scientificapplication performance^ – TI ASC, CDC STAR 100, IBM 3838, FPS-
-^
Supercomputers designed to run scientificapplications^ – CRAY-1, Cyber 205, CRAY-XMP, CRAY-2, CRAY-
YMP, Fujitsu VP 100/200, Hitachi S810/820, NEC SX/
-^
Minicomputers designed to give better priceperformance than supercomputers^ – CONVEX C-1, Alliant FX-
-^
Instruction set extension to improve performance^ – IBM 3090, VAX 6000, X86 MMX, 3DNow, Alpha
A vector unit typically consists of:^ – a vector instruction processor^ – a collection of vector registers (e.g. 8 64-entry registers
in CRAY-1)
implicit in MMX
add, FP multiply,
© W. W. Hwu and S. J. Patel, 2002ECE 412, University of Illinois
Vector code generated for a register-to-registervector architecture:
←
& N
←
B
←
C
←
v0 + v
←
v
An outer loop may be required if N is greater thanthe max length allowed, details discussed later.
-^
If N is sufficiently big, each vector instructionwould take about N cycles to execute.^ – With aggressive design, chaining, all the vector
instructions can overlap to all finish in about N cycles.
: D(I) = A(I-1) * D(I) 1
: A(I) = B(I) + C(I) 2
The execution of S
1
and S
2 in different iterations:
: D(1) = A(0) * D(1) 1
: A(1) = B(1) + C(1) 2
: D(2) = A(1) * D(2) 1
: A(2) = B(2) + C(2) 2
There is a flow dependence from S
2 of iteration i
to S
1 of iteration i+1.
Basic Transformation for vectorization– transform a multi-statement loop into a sequence of single-
statement loops.
-^
Example– DO I = 1, N
1
N
•Becomes:
–DO I = 1, N
•S
1
–END DO–…–DO I = 1, N
•sN -END DO
Not all multi-statement loops can be distributed.^ – DO I = 1, N
: C(I) = A(I-1) + ... 1
: A(I) = ... 2
The execution of iterations looks like:^ – C(1) = A(0) + ...^ – A(1) = ...^ – C(2) = A(1) + ...^ – A(2) = ...^ – S
in iteration i delivers its result to S 2
in iteration i+1. 1
Loop distribution generates single-statement loops:^ – DO I = 1, N
: C(I) = A(I-1) + ... 1
: A(I) = ... 2
All iterations of S
1
are executed before those of S
in iteration i cannot be delivered to S 2
in 1
iteration i+1. Therefore, the execution is invalid afterloop distribution.
Statement reordering: If S
does not dependent on 2
1 in the same iteration, one can reorder the syntactic ordering of S
1
and S
•After
–DO I = 1, N
•S
: A(I) = ... 2 •S
: C(I) = A(I-1) + ... 1
–END DO
–DO I = 1, N
•S
: C(I) = A(I-1) + ... 1 •S
: A(I) = ... 2
–END DO
Now with statement reordering and loopdistribution, the reordered loop becomes:^ – DO I = 1, N
: A(I) = ... 2
: C(I) = A(I-1) + ... 1
Note that all results of S
2
are now generated
before the execution of S
. The execution result 1
remain valid after loop distribution.
Loop interchange: Reverse the role of Inner andOuter loops
-^
In the example, the inner loop has a cyclic loop-carried dependence but the outer loop does not.^ – DO I = 1, N
With the cyclic dependence, the inner loop cannotbe converted to a vector statement.