Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Data and Object Parallel Models - Lecture Slides | SOCB 160, Study notes of Introduction to Sociology

University of California - San Diego Introduction to Sociology

Material Type: Notes; Class: Sociology of Culture; Subject: Sociology/ Culture, Language, and Social Interaction; University: University of California - San Diego; Term: Spring 2005;

Typology: Study notes

Pre 2010

Uploaded on 03/28/2010

koofers-user-e60 🇺🇸

10 documents

1 / 15

This page cannot be seen from the preview

Don't miss anything!

Lecture #16, Slide 1

CSE 160 Chien, Spring 2005

Data and Object Parallel Models

•Last Time

»Pipelined

»Systolic

»Workflow

•Today

»Data Parallel Programming

»Object Parallel Programming

•Reminders/Announcements

»HW#3’s returned today

»HW#4 is out today…. and due Thursday, June 2 in lecture.

Lecture #16, Slide 2

CSE 160 Chien, Spring 2005

Data Parallel Programming

•Observation: In many cases, large collections of data

(vectors, matrices, etc.) can be operated on in

parallel.

•Idea: provide a clear mechanism for enabling users

to specify such parallelism

•=> extend the sequential programming model with

“data parallel” operations

•=> these operations explicitly specify the parallel

structure, so the compiler need prove nothing to

execute in parallel

•=> idea similar to vector/MMX machine instructions

Discover Study notes of Introduction to Sociology University of California - San Diego

Partial preview of the text

Download Data and Object Parallel Models - Lecture Slides | SOCB 160 and more Study notes Introduction to Sociology in PDF only on Docsity!

CSE 160 Chien, Spring 2005 Lecture #16, Slide 1

Data and Object Parallel Models

Last Time » Pipelined » Systolic » Workflow
Today » Data Parallel Programming » Object Parallel Programming
Reminders/Announcements » HW#3’s returned today » HW#4 is out today…. and due Thursday, June 2 in lecture.

Data Parallel Programming

Observation: In many cases, large collections of data (vectors, matrices, etc.) can be operated on in parallel.
Idea: provide a clear mechanism for enabling users to specify such parallelism
=> extend the sequential programming model with “data parallel” operations
=> these operations explicitly specify the parallel structure, so the compiler need prove nothing to execute in parallel
=> idea similar to vector/MMX machine instructions

CSE 160 Chien, Spring 2005 Lecture #16, Slide 3

Fortran 90 -- Data Parallel

Extension

Arrays can be treated as data parallel collections
Elementwise operations on ensembles
Told compiler what’s in parallel, but how much flexibility have we granted? Between operations?
How useful is this? Need more power...

real A(100), B(100), C(100)

A = B * C! computes elementwise product A = A + C! and sum

Data Parallel vs. Sequential

program

Computationally equivalent sequential programs
Parallelism didn’t affect semantics because there’s no hazards in the read/write sets (almost)
Specification of parallelism means the compiler doesn’t have to prove for this case, does it help much?
Parallel execution model is equivalent to copy-in, copy-out (no dependences)

real A(100), B(100), C(100) do i = 1, 100 A(i) = B(i) * C(i) enddo do i = 1, 100 A(i) = A(i) + C(i) enddo

CSE 160 Chien, Spring 2005 Lecture #16, Slide 7

Implementing Data Parallel

Languages

How do we get performance? » Exploit parallelism within an operation (vector parallelism) » Exploit concurrency across data parallel operations (pipelining) -- still requires compiler support
Two approaches » Deeply pipelined machines (already have these for uniprocessors) » Parallel machines, distributing parts of the parallel operations, operation in parallel » => Parallel vector processors, parallel microprocessors with some type of interconnect

Exploiting Data Parallelism

Sequential issue and completion of Data Parallel operations
Problem: data movement and coordination doesn’t scale well (how much do you need? how fast?) » Important in SIMD Machines, REALLY IMPORTANT in message passing machines
More aggressive: pipeline the operations (equivalent of vector chaining); for sequential processors, you’d like to do strip mining and loop fusion (regs and cache)
=> Analyze across DP operations, and control data placement

CSE 160 Chien, Spring 2005 Lecture #16, Slide 9

Compiler Analysis for

Overlap

Classical Dependence Testing
Similar to what’s required for parallelization
=> Data parallelism doesn’t solve the entire problem, but does provide some control
=> Simple programming interface enables compiler analysis, but also restricts expressible parallelism...

leftcols(1:200:1,1:100:1) = A(1:200:1,1:200:2)! can these be rightcols(1:200:1,1:100:1) = A(1:200:1,2:200:2)! overlapped? do i = 1, 200! can these be parallel/overlapped do j=1, 100 leftcols(i,j) = A(i,j2) rightcols(i,j) = A(i,j2+1) enddo enddo**

Importance of Controlling

Data Placement: Locality

What is the flop/communication balance in most machines? » Microprocessors - DEC Alpha 21064 ~ 4 Flops/64-bit word - DEC Alpha 21164 ~ 12 Flops/64-bit word - AMD Opteron ~ 18 Flops/64-bit word » Larger Scale Parallel Machines (100+ nodes) - IBM SP2 ~200/(40/8) = 40 - T3D ~ 150/(200/8) = 6 - FWGrid ~ 1800/(120/8) = 120

CSE 160 Chien, Spring 2005 Lecture #16, Slide 13

Decomposition Statement

DECOMPOSITION D(N,N)

Data Decomposition -

Alignment

Controls how arrays are aligned with respect to one another
Enables reducing data movement when operating across arrays
Array operations between aligned arrays are usually more efficient than array operations between arrays that are not known to be aligned

CSE 160 Chien, Spring 2005 Lecture #16, Slide 15

Alignment Example

REAL A(N,N)

DECOMPOSITION D(N,N)

ALIGN A(I,J) with D(J-2,I+3)

Data Decomposition -

Distribution

2 nd^ level of parallelism is distribution/machine mapping that is how arrays are distributed on physical machine parallelism
Choice and performance of distribution is affected by the topology, communication mechanisms, size of local memory, and number of processors on the underlying machine
Specified by assigning an independent attribute to each dimension.
Predefined attributes include BLOCK, CYCLIC, and BLOCK_CYCLIC, : dimensions are not distributed

CSE 160 Chien, Spring 2005 Lecture #16, Slide 19

Partition Analysis

Original program

REAL A(100)

do i = I, I

A(i) = 0.

enddo

SPMD node Program

REAL A(25)

do i = i, 25

A(i) = 0.

enddo

• Converting global to local indices

Jacobi Relaxation Code

REAL A(100,100), B(100,100) DECOMPOSITION D(100,100) ALIGN A, B with D DISTRIBUTE D(:,BLOCK) do k = l,time do j = 2, do i = 2, S1 A(i,j) = (B(i,j-l)+B(i-l,j)+ B(i+l,j)+B(i,j+l))/ enddo enddo do j = 2, do i = 2, S2 B(i,j) = A(i,j) enddo enddo enddo

CSE 160 Chien, Spring 2005 Lecture #16, Slide 21

Jacobi Relaxation Processor

Layout

Compiling for a four-processor machine.
Both arrays A and B are aligned identically with decomposition D, so they have the same distribution as D.
Because the first dimension of D is local and the second dimension is block-distributed, the local index set for both A and B on each processor (in local indices) is [1:100,1:25].

Jacobi Relaxation cont.

CSE 160 Chien, Spring 2005 Lecture #16, Slide 25

Generated Jacobi cont.

do j = lb1,ub do i = 2, B(i,j) = A(i,j) enddo enddo enddo

Only true cross-processor dependences are on the k loop thus able to vectorize messages

Controlling Data Layout, HPF Style

Arrays are the major aggregate data structure
DISTRIBUTEd over an abstract processor array
ALIGNed with each other (syntactic convenience)
BLOCK, CYCLIC distributions
Basic control => what are the limitations?

Arrays ALIGNed DISTRIBUTEd Implemented

ALIGN A(I) with B(I) DISTRIBUTE A(block) ALIGN A(I) with B(2I) DISTRIBUTE C(block,cyclic) various ALIGN C(,:) with B(:) DISTRIBUTE B(cyclic) (columns with elts)

CSE 160 Chien, Spring 2005 Lecture #16, Slide 27

Other Data Parallel Languages

Model applies in numerous language settings » Data Parallel C; C*, MPC
Object Parallelism » pC++ (parallel over object arrays) » ICC++, RWC++ » Parallel Java’s
Essential elements » Aligned parallelism, single threaded semantics » Explicit data placement (?)
Perspective: which of the application types would you want to write in a data parallel language? » forall, independent add some flexibility

Object Parallelism

Challenge in Data Parallelism is getting Aligned Parallelism » Irregularity due to Boundary Conditions » Irregularity due to Problem Structure » Ex: Jacobi, Finite Element Types, Different Bucket Sizes, Web Pages with different # of outlinks
Object Parallel Languages » Paralation Lisp, HPC++, Parallel Java Dialects » Arrays of Objects + Subtyping => Polymorphism - Can express irregularity - Can Implement it Efficiently on MIMD machines

Data and Object Parallel Models - Lecture Slides | SOCB 160, Study notes of Introduction to Sociology

Related documents

Partial preview of the text

Download Data and Object Parallel Models - Lecture Slides | SOCB 160 and more Study notes Introduction to Sociology in PDF only on Docsity!

Data and Object Parallel Models

Data Parallel Programming

Fortran 90 -- Data Parallel

Extension

Data Parallel vs. Sequential

program

Implementing Data Parallel

Languages

Exploiting Data Parallelism

Compiler Analysis for

Overlap

Importance of Controlling

Data Placement: Locality

Decomposition Statement

DECOMPOSITION D(N,N)

Data Decomposition -

Alignment

Alignment Example

REAL A(N,N)

DECOMPOSITION D(N,N)

Data Decomposition -

Distribution

Partition Analysis

Original program

REAL A(100)

do i = I, I

A(i) = 0.

enddo

SPMD node Program

REAL A(25)

do i = i, 25

A(i) = 0.

enddo

• Converting global to local indices

Jacobi Relaxation Code

Jacobi Relaxation Processor

Layout

Jacobi Relaxation cont.

Generated Jacobi cont.

Controlling Data Layout, HPF Style

Other Data Parallel Languages

Object Parallelism