Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

SIMD - Vector Processing - Lecture Slides | ECE 511, Study notes of Computer Architecture and Organization

University of Illinois - Urbana-Champaign Computer Architecture and Organization

Material Type: Notes; Class: Computer Architecture; Subject: Electrical and Computer Engr; University: University of Illinois - Urbana-Champaign; Term: Fall 2002;

Typology: Study notes

Pre 2010

Uploaded on 03/16/2009

koofers-user-fom 🇺🇸

8 documents

1 / 24

This page cannot be seen from the preview

Don't miss anything!

Lecture 18

SIMD: Vector Processing

Discover Study notes of Computer Architecture and Organization University of Illinois - Urbana-Champaign

Partial preview of the text

Download SIMD - Vector Processing - Lecture Slides | ECE 511 and more Study notes Computer Architecture and Organization in PDF only on Docsity!

Lecture 18

SIMD: Vector Processing

General-purpose to

Specific Application Domains

General purpose computing presents tough

problems in architecture.

One pathway to better architectures is to

“known” the application domain.

Example: Scientific applications

SIMD at work

Types of Vector Processing

•^

Attached co-processor to improve scientificapplication performance^ – TI ASC, CDC STAR 100, IBM 3838, FPS-

Supercomputers designed to run scientificapplications^ – CRAY-1, Cyber 205, CRAY-XMP, CRAY-2, CRAY-

YMP, Fujitsu VP 100/200, Hitachi S810/820, NEC SX/

Minicomputers designed to give better priceperformance than supercomputers^ – CONVEX C-1, Alliant FX-

Instruction set extension to improve performance^ – IBM 3090, VAX 6000, X86 MMX, 3DNow, Alpha

Typical Vector Architecture

•^

A vector unit typically consists of:^ – a vector instruction processor^ – a collection of vector registers (e.g. 8 64-entry registers

in CRAY-1)

a vector length register (e.g. 6 bits in CRAY-1),

implicit in MMX

a mask register (e.g. 64 bits in CRAY-1) – a set of pipelined function units (e.g., load/store, FP

add, FP multiply,

FP reciprocal, integer add, logic, shift in CRAY-1)

Vector Code

•^

Vector code generated for a register-to-registervector architecture:

←

& N

←

v0 + v

←

•^

An outer loop may be required if N is greater thanthe max length allowed, details discussed later.

If N is sufficiently big, each vector instructionwould take about N cycles to execute.^ – With aggressive design, chaining, all the vector

instructions can overlap to all finish in about N cycles.

Example

•^

DO I = 1, N^ – S

: D(I) = A(I-1) * D(I) 1

: A(I) = B(I) + C(I) 2

•^

END DO

•^

The execution of S

and S

2 in different iterations:

: D(1) = A(0) * D(1) 1

: A(1) = B(1) + C(1) 2

: D(2) = A(1) * D(2) 1

: A(2) = B(2) + C(2) 2

•^

There is a flow dependence from S

2 of iteration i

to S

1 of iteration i+1.

Loop Distribution

•^

Basic Transformation for vectorization– transform a multi-statement loop into a sequence of single-

statement loops.

Example– DO I = 1, N

…• S

END DO

•Becomes:

–DO I = 1, N

•S

–END DO–…–DO I = 1, N

•sN -END DO

Problem

•^

Not all multi-statement loops can be distributed.^ – DO I = 1, N

: C(I) = A(I-1) + ... 1

: A(I) = ... 2

END DO -^

The execution of iterations looks like:^ – C(1) = A(0) + ...^ – A(1) = ...^ – C(2) = A(1) + ...^ – A(2) = ...^ – S

in iteration i delivers its result to S 2

in iteration i+1. 1

Problem (Cont.)

•^

Loop distribution generates single-statement loops:^ – DO I = 1, N

: C(I) = A(I-1) + ... 1

END DO – DO I = 1, N
- S

: A(I) = ... 2

END DO -^

All iterations of S

are executed before those of S

The result of S

in iteration i cannot be delivered to S 2

in 1

iteration i+1. Therefore, the execution is invalid afterloop distribution.

Backward Depndence (Cont.)

•^

Statement reordering: If S

does not dependent on 2

S

1 in the same iteration, one can reorder the syntactic ordering of S

and S

•After

–DO I = 1, N

•S

: A(I) = ... 2 •S

: C(I) = A(I-1) + ... 1

–END DO

Before

–DO I = 1, N

•S

: C(I) = A(I-1) + ... 1 •S

: A(I) = ... 2

–END DO

Backward Dependence (Cont.)

•^

Now with statement reordering and loopdistribution, the reordered loop becomes:^ – DO I = 1, N

: A(I) = ... 2

END DO – DO I = 1, N
- S

: C(I) = A(I-1) + ... 1

END DO -^

Note that all results of S

are now generated

before the execution of S

. The execution result 1

remain valid after loop distribution.

Problem

Cyclic Dependence: A loop cannot be

distributed if there is a cyclic loop-carrieddependence.

Question: Can we increase the success rate

of vectorization in the presence of cyclicloop-carried dependence?

Common Solution

•^

Loop interchange: Reverse the role of Inner andOuter loops

In the example, the inner loop has a cyclic loop-carried dependence but the outer loop does not.^ – DO I = 1, N

DO J = 1, N
- S: A(I, J+1) = A(I,J) * B(I, J)
  - END DO
    - END DO -^

With the cyclic dependence, the inner loop cannotbe converted to a vector statement.

SIMD - Vector Processing - Lecture Slides | ECE 511, Study notes of Computer Architecture and Organization

Related documents

Partial preview of the text

Download SIMD - Vector Processing - Lecture Slides | ECE 511 and more Study notes Computer Architecture and Organization in PDF only on Docsity!

Lecture 18

SIMD: Vector Processing

General-purpose to

Specific Application Domains

problems in architecture.

“known” the application domain.

SIMD at work

Types of Vector Processing

•^

Typical Vector Architecture

•^

Vector Code

•^

•^

Example

•^

DO I = 1, N^ – S

•^

END DO

•^

•^

Loop Distribution

•^

Problem

•^

Problem (Cont.)

•^

Backward Depndence (Cont.)

•^

S

Backward Dependence (Cont.)

•^

Problem

distributed if there is a cyclic loop-carrieddependence.

of vectorization in the presence of cyclicloop-carried dependence?

Common Solution

•^