Parallel Programs Models - Lecture Slides | SOCB 160, Study notes of Introduction to Sociology

Material Type: Notes; Class: Sociology of Culture; Subject: Computer Science & Engineering; University: University of California - San Diego; Term: Spring 2005;

Typology: Study notes

Pre 2010

Uploaded on 03/28/2010

koofers-user-pj9-1
koofers-user-pj9-1 šŸ‡ŗšŸ‡ø

10 documents

1 / 15

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
1
Lecture #15, Slide 1
CSE 160 Chien, Spring 2005
Parallel Program Models
•Last Time
»Embarrassingly Parallel
»Master-Worker
•Today
»Pipelined
»Systolic
»Workflow
•Reminders/Announcements
»None
Lecture #15, Slide 2
CSE 160 Chien, Spring 2005
Common Parallel
Programming Paradigms
•Embarrassingly parallel programs
•Master/Worker programs
•Synchronous: Pipelined Computations
•Synchronous: Systolic Computations
•Workflow
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff

Partial preview of the text

Download Parallel Programs Models - Lecture Slides | SOCB 160 and more Study notes Introduction to Sociology in PDF only on Docsity!

CSE 160 Chien, Spring 2005 Lecture #15, Slide 1

Parallel Program Models

  • Last Time

Ā» Embarrassingly Parallel

Ā» Master-Worker

  • Today

Ā» Pipelined

Ā» Systolic

Ā» Workflow

  • Reminders/Announcements

Ā» None

Common Parallel

Programming Paradigms

  • Embarrassingly parallel programs
  • Master/Worker programs
  • Synchronous: Pipelined Computations
  • Synchronous: Systolic Computations
  • Workflow

CSE 160 Chien, Spring 2005 Lecture #15, Slide 3

Pipelined Computations

  • Pipelined program divided into a series of tasks that have to be

completed one after the other.

  • Each task executed by a separate pipeline stage
  • Data streamed from stage to stage to form computation

f, e, d, c, b, a P1 P2 P3 P4 P

Pipelined Computations

  • Computation consists of data streaming through pipeline stages
  • Execution Time = Time to fill pipeline (P-1)
    • Time to run in steady state (N-P+1)
    • Time to empty pipeline (P-1)

f, e, d, c, b, a P1 P2 P3 P4 P

a b c d e f

a b c d e f

a b c d e f

a b c d e f

a b c d e f

time

P

P

P

P

P

P = # of processors N = # of data items (assume P < N)

CSE 160 Chien, Spring 2005 Lecture #15, Slide 7

Programming Issues

  • Algorithm will take N+P-1 to run where N is the number of data items and P is the number of processors. Ā» Can also consider just the odds or do some initial part separately
  • In given implementation, number of processors must store all primes which will appear in sequence Ā» Not a scalable approach Ā» Can fix this by having each processor do the job of multiple primes, i.e. mapping logical ā€œprocessorsā€ in the pipeline to each physical processor Ā» What is the impact of this on performance?

P2 P3 P5 P7 P11 P13 P

More Programming Issues

  • In pipelined algorithm, flow of data moves through processors in lockstep, attempt to balance work so that there is no bottleneck at any processor
  • Processors developed to support in hardware this kind of parallel pipelined computation Ā» Two commercial products: Warp (1D array) and iWarp (components for 2D array)
  • => Generalized view, Systolic Arrays

CSE 160 Chien, Spring 2005 Lecture #15, Slide 9

Systolic Arrays

  • Systolic: a rhythmically recurrent contraction; especially : the

contraction of the heart by which the blood is forced onward and

the circulation kept up

  • Warp and iWarp were examples of systolic arrays Ā» Data moved through pipelined computational units in a regular and rhythmic fashion
  • Systolic arrays meant to be special-purpose processors or co-

processors and were very fine-grained

Ā» Processors implement a limited and very simple computation, usually called cells Ā» Communication is very fast, granularity meant to be very fine (a small number of computational operations per communication) Ā» Very fast clock rates due to regular, synchronous structure

Example: Systolic Matrix Multiplication

  • Problem: multiply two nxn

matrices A ={a_ij} and B={b_ij}.

Product matrix will be R={r_ij}.

  • Systolic solution uses 2D array

with NxN cells, 2 input streams

and 2 output streams

CSE 160 Chien, Spring 2005 Lecture #15, Slide 13

Data Flow for Systolic MM

  • Beat 1 •^ Beat 2

1 r 1 , 1

a 11 b 11 2 1^ r^1 ,^1 r 2 (^) , 1 1 r 1 , 2

a 12 a 21

b 21 b 12

Data Flow for Systolic MM

  • Beat 3 •^ Beat 4

3 2^ r^1 ,^1 r 2 (^) , 1 2 r 1 , 2

a 13 a 22

b 31

a 31 b 13

b 22

1 r 3 (^) , 1 1 r 2 (^) , 2 1 r 1 , 3

4 3^ r^1 ,^1 r 2 (^) , 1 3 r 1 , 2

a 14 a 23

b 41

b 23

a 32

b 32

2 r 3 (^) , 1 2 r 2 (^) , 2 2 41^ r 1 , 3 a 1 b^14 r 4 (^) , 1 1 r 3 (^) , 2 1

r 2 , 3

1

r 1 , 4

CSE 160 Chien, Spring 2005 Lecture #15, Slide 15

Data Flow for Systolic MM

  • Beat 5 •^ Beat 6

a 42

4^ r^1 ,^1 r 2 (^) , 1 4 r 1 , 2

a 24

b 33

a 33

b 42

3 r 3 (^) , 1 3 r 2 (^) , 2 3 r 1 (^) , (^3) b 24 2 r 4 (^) , 1 2 r 3 (^) , 2 2

r 2 , 3

2

r 1 , 4

1

r 4 , 2

1

r 3 , 3

1

r 2 , 4

r 1 , 1 r 2 (^) , 1 r 1 , 2 a (^) 43

b 43

a 34 4 r 3 (^) , 1 4 r 2 (^) , 2 4 r 1 (^) , 3 b 34 3 r 4 (^) , 1 3 r 3 (^) , 2 3

r 2 , 3

3

r 1 , 4

2

r 4 , 2

2

r 3 , 3

2

r 2 , 4

1

r 4 , 3

1

r 3 , 4

Data Flow for Systolic MM

  • Beat 7 •^ Beat 8

r 1 , 1 r 2 (^) , 1 r 1 , 2 a (^) 44 r 3 (^) , 1 r 2 (^) , 2 r 1 (^) , 3 b 44 4 r 4 (^) , 1 4 r 3 (^) , 2 4

r 2 , 3

4

r 1 , 4

3

r 4 , 2

3

r 3 , 3

3

2 r^2 ,^4

r 4 , 3

2

r 3 , 4

1

r 4 , 4

r 1 , 1 r 2 (^) , 1 r 1 , 2 r 3 (^) , 1 r 2 (^) , 2 r 1 , 3

r 4 , 1 r 3 , 2 r 2 , 3

4

r 4 , 2

4

r 3 , 3

4

3 r^2 ,^4

r 4 , 3

3

r 3 , 4

2

r 4 , 4

r 1 , 4

CSE 160 Chien, Spring 2005 Lecture #15, Slide 19

Workflow

  • Directed Acyclic Graph of Tasks
  • Each Computes Independently
  • Edges indicate dependences

Ā» Control or data

  • Parallelism Arises from multiple Tasks being enabled
  • Asynchronous Structure
  • Coarse-grained Parallel

Where do Workflows Arise?

  • Business Processes

Ā» Dependent Jobs

Ā» Run Payroll

  • Runs vacation programs
  • Runs sick leave programs
  • Runs social security tax programs
  • Runs income tax payment and witholding programs
  • Runs parking and tuition payroll deduction
  • Computes Paychecks
  • Transfers funds to back the checks
  • …

CSE 160 Chien, Spring 2005 Lecture #15, Slide 21

Scientific Workflow Applications

  • GriPhyN Experiments Ā» Laser Interferometer Gravitational Wave Observatory (Caltech/UWM) Ā» ATLAS (U of Chicago) Ā» SDSS (Fermilab) Ā» CMS, many High energy Physics Applications
  • National Virtual Observatory and NASA Ā» Montage
  • Atmospheric Modeling Ā» MEAD/LEAD: Hurricane Track Prediction
  • Neuroscience Ā» Tomography for Telescience(SDSC, NIH-funded)
  • … and many more …

22

Non-GriPhyN applications using

z Galaxy Morphology Pegasus

(National Virtual Observatory)

  • Investigates the dynamical state of galaxy clusters
  • Explores galaxy evolution inside the context of large-scale structure.
  • Uses galaxy morphologies as a probe of the star formation and stellar distribution history of the galaxies inside the clusters.
  • Data intensive computations involving hundreds of galaxies in a cluster The x-ray emission is shown in blue, and the optical mission is in red. The colored dots are located at the positions of the galaxies within the cluster; the dot color represents the value of the asymmetry index. Blue dots represent the most asymmetric galaxies and are scattered throughout the image, while orange are the most symmetric, indicative of elliptical galaxies, are concentrated more toward the center.

25

Southern California Earthquake Center

The SCEC/IT project, funded by (NSF), is developing a new framework for physics-based simulations for seismic hazard analysis building on several information technology areas, including knowledge representation and reasoning, knowledge acquisition, grid computing, and digital libraries.

People involved: Vipin Gupta, Phil Maechling (USC)

Montage

  • Montage (NASA and NVO) Ā» Deliver science-grade custom mosaics on demand Ā» Produce mosaics from a wide range of data sources (possibly in different spectra) Ā» User-specified parameters of projection, coordinates, size, rotation and spatial sampling.
  • Bruce Berriman, John Good, Anastasia Laity, Caltech/IPAC
  • Joseph C. Jacob, Daniel S. Katz, JPL
  • Doing large: 6 and 10 degree dags (for the m16 cluster).
  • The 6 degree runs had about 13, compute jobs and the 10 degree run had about 40,000 compute jobs

Mosaic created by Pegasus based Montage from a run of the M101 galaxy images on the Teragrid.

CSE 160 Chien, Spring 2005 Lecture #15, Slide 27

Montage Workflow

111 222 333

mProject1mProject1 mProject2mProject2 mProject3mProject

mDiff1 2mDiff1 2 mDiff2 3mDiff2 3

D 12 D 23

mFitplaneDmFitplaneD 1212 mFitplaneDmFitplaneD 2323 mBgModelmBgModel ax + by + c = 0 dx + ey + f = 0

a 1 x + b 1 y + c 1 = 0 a 2 x + b 2 y + c 2 = 0 a 3 x + b 3 y + c 3 = 0

mBackgroundmBackground 11 mBackgroundmBackground 22 mBackgroundmBackground 33

(^11 22 )

mAddmAdd

Final MosaicFinal Mosaic

(^111 222 )

Data Stage in nodes Montage compute nodes Data stage out nodes Inter pool transfer nodes

A small Montage workflow

1202 nodes