Download Parallel Programs Models - Lecture Slides | SOCB 160 and more Study notes Introduction to Sociology in PDF only on Docsity!
CSE 160 Chien, Spring 2005 Lecture #15, Slide 1
Parallel Program Models
Ā» Embarrassingly Parallel
Ā» Master-Worker
Ā» Pipelined
Ā» Systolic
Ā» Workflow
Ā» None
Common Parallel
Programming Paradigms
- Embarrassingly parallel programs
- Master/Worker programs
- Synchronous: Pipelined Computations
- Synchronous: Systolic Computations
- Workflow
CSE 160 Chien, Spring 2005 Lecture #15, Slide 3
Pipelined Computations
- Pipelined program divided into a series of tasks that have to be
completed one after the other.
- Each task executed by a separate pipeline stage
- Data streamed from stage to stage to form computation
f, e, d, c, b, a P1 P2 P3 P4 P
Pipelined Computations
- Computation consists of data streaming through pipeline stages
- Execution Time = Time to fill pipeline (P-1)
- Time to run in steady state (N-P+1)
- Time to empty pipeline (P-1)
f, e, d, c, b, a P1 P2 P3 P4 P
a b c d e f
a b c d e f
a b c d e f
a b c d e f
a b c d e f
time
P
P
P
P
P
P = # of processors N = # of data items (assume P < N)
CSE 160 Chien, Spring 2005 Lecture #15, Slide 7
Programming Issues
- Algorithm will take N+P-1 to run where N is the number of data items and P is the number of processors. Ā» Can also consider just the odds or do some initial part separately
- In given implementation, number of processors must store all primes which will appear in sequence Ā» Not a scalable approach Ā» Can fix this by having each processor do the job of multiple primes, i.e. mapping logical āprocessorsā in the pipeline to each physical processor Ā» What is the impact of this on performance?
P2 P3 P5 P7 P11 P13 P
More Programming Issues
- In pipelined algorithm, flow of data moves through processors in lockstep, attempt to balance work so that there is no bottleneck at any processor
- Processors developed to support in hardware this kind of parallel pipelined computation Ā» Two commercial products: Warp (1D array) and iWarp (components for 2D array)
- => Generalized view, Systolic Arrays
CSE 160 Chien, Spring 2005 Lecture #15, Slide 9
Systolic Arrays
- Systolic: a rhythmically recurrent contraction; especially : the
contraction of the heart by which the blood is forced onward and
the circulation kept up
- Warp and iWarp were examples of systolic arrays Ā» Data moved through pipelined computational units in a regular and rhythmic fashion
- Systolic arrays meant to be special-purpose processors or co-
processors and were very fine-grained
Ā» Processors implement a limited and very simple computation, usually called cells Ā» Communication is very fast, granularity meant to be very fine (a small number of computational operations per communication) Ā» Very fast clock rates due to regular, synchronous structure
Example: Systolic Matrix Multiplication
- Problem: multiply two nxn
matrices A ={a_ij} and B={b_ij}.
Product matrix will be R={r_ij}.
- Systolic solution uses 2D array
with NxN cells, 2 input streams
and 2 output streams
CSE 160 Chien, Spring 2005 Lecture #15, Slide 13
Data Flow for Systolic MM
1 r 1 , 1
a 11 b 11 2 1^ r^1 ,^1 r 2 (^) , 1 1 r 1 , 2
a 12 a 21
b 21 b 12
Data Flow for Systolic MM
3 2^ r^1 ,^1 r 2 (^) , 1 2 r 1 , 2
a 13 a 22
b 31
a 31 b 13
b 22
1 r 3 (^) , 1 1 r 2 (^) , 2 1 r 1 , 3
4 3^ r^1 ,^1 r 2 (^) , 1 3 r 1 , 2
a 14 a 23
b 41
b 23
a 32
b 32
2 r 3 (^) , 1 2 r 2 (^) , 2 2 41^ r 1 , 3 a 1 b^14 r 4 (^) , 1 1 r 3 (^) , 2 1
r 2 , 3
1
r 1 , 4
CSE 160 Chien, Spring 2005 Lecture #15, Slide 15
Data Flow for Systolic MM
a 42
4^ r^1 ,^1 r 2 (^) , 1 4 r 1 , 2
a 24
b 33
a 33
b 42
3 r 3 (^) , 1 3 r 2 (^) , 2 3 r 1 (^) , (^3) b 24 2 r 4 (^) , 1 2 r 3 (^) , 2 2
r 2 , 3
2
r 1 , 4
1
r 4 , 2
1
r 3 , 3
1
r 2 , 4
r 1 , 1 r 2 (^) , 1 r 1 , 2 a (^) 43
b 43
a 34 4 r 3 (^) , 1 4 r 2 (^) , 2 4 r 1 (^) , 3 b 34 3 r 4 (^) , 1 3 r 3 (^) , 2 3
r 2 , 3
3
r 1 , 4
2
r 4 , 2
2
r 3 , 3
2
r 2 , 4
1
r 4 , 3
1
r 3 , 4
Data Flow for Systolic MM
r 1 , 1 r 2 (^) , 1 r 1 , 2 a (^) 44 r 3 (^) , 1 r 2 (^) , 2 r 1 (^) , 3 b 44 4 r 4 (^) , 1 4 r 3 (^) , 2 4
r 2 , 3
4
r 1 , 4
3
r 4 , 2
3
r 3 , 3
3
2 r^2 ,^4
r 4 , 3
2
r 3 , 4
1
r 4 , 4
r 1 , 1 r 2 (^) , 1 r 1 , 2 r 3 (^) , 1 r 2 (^) , 2 r 1 , 3
r 4 , 1 r 3 , 2 r 2 , 3
4
r 4 , 2
4
r 3 , 3
4
3 r^2 ,^4
r 4 , 3
3
r 3 , 4
2
r 4 , 4
r 1 , 4
CSE 160 Chien, Spring 2005 Lecture #15, Slide 19
Workflow
- Directed Acyclic Graph of Tasks
- Each Computes Independently
- Edges indicate dependences
Ā» Control or data
- Parallelism Arises from multiple Tasks being enabled
- Asynchronous Structure
- Coarse-grained Parallel
Where do Workflows Arise?
Ā» Dependent Jobs
Ā» Run Payroll
- Runs vacation programs
- Runs sick leave programs
- Runs social security tax programs
- Runs income tax payment and witholding programs
- Runs parking and tuition payroll deduction
- Computes Paychecks
- Transfers funds to back the checks
- ā¦
CSE 160 Chien, Spring 2005 Lecture #15, Slide 21
Scientific Workflow Applications
- GriPhyN Experiments Ā» Laser Interferometer Gravitational Wave Observatory (Caltech/UWM) Ā» ATLAS (U of Chicago) Ā» SDSS (Fermilab) Ā» CMS, many High energy Physics Applications
- National Virtual Observatory and NASA Ā» Montage
- Atmospheric Modeling Ā» MEAD/LEAD: Hurricane Track Prediction
- Neuroscience Ā» Tomography for Telescience(SDSC, NIH-funded)
- ⦠and many more ā¦
22
Non-GriPhyN applications using
z Galaxy Morphology Pegasus
(National Virtual Observatory)
- Investigates the dynamical state of galaxy clusters
- Explores galaxy evolution inside the context of large-scale structure.
- Uses galaxy morphologies as a probe of the star formation and stellar distribution history of the galaxies inside the clusters.
- Data intensive computations involving hundreds of galaxies in a cluster The x-ray emission is shown in blue, and the optical mission is in red. The colored dots are located at the positions of the galaxies within the cluster; the dot color represents the value of the asymmetry index. Blue dots represent the most asymmetric galaxies and are scattered throughout the image, while orange are the most symmetric, indicative of elliptical galaxies, are concentrated more toward the center.
25
Southern California Earthquake Center
The SCEC/IT project, funded by (NSF), is developing a new framework for physics-based simulations for seismic hazard analysis building on several information technology areas, including knowledge representation and reasoning, knowledge acquisition, grid computing, and digital libraries.
People involved: Vipin Gupta, Phil Maechling (USC)
Montage
- Montage (NASA and NVO) Ā» Deliver science-grade custom mosaics on demand Ā» Produce mosaics from a wide range of data sources (possibly in different spectra) Ā» User-specified parameters of projection, coordinates, size, rotation and spatial sampling.
- Bruce Berriman, John Good, Anastasia Laity, Caltech/IPAC
- Joseph C. Jacob, Daniel S. Katz, JPL
- Doing large: 6 and 10 degree dags (for the m16 cluster).
- The 6 degree runs had about 13, compute jobs and the 10 degree run had about 40,000 compute jobs
Mosaic created by Pegasus based Montage from a run of the M101 galaxy images on the Teragrid.
CSE 160 Chien, Spring 2005 Lecture #15, Slide 27
Montage Workflow
111 222 333
mProject1mProject1 mProject2mProject2 mProject3mProject
mDiff1 2mDiff1 2 mDiff2 3mDiff2 3
D 12 D 23
mFitplaneDmFitplaneD 1212 mFitplaneDmFitplaneD 2323 mBgModelmBgModel ax + by + c = 0 dx + ey + f = 0
a 1 x + b 1 y + c 1 = 0 a 2 x + b 2 y + c 2 = 0 a 3 x + b 3 y + c 3 = 0
mBackgroundmBackground 11 mBackgroundmBackground 22 mBackgroundmBackground 33
(^11 22 )
mAddmAdd
Final MosaicFinal Mosaic
(^111 222 )
Data Stage in nodes Montage compute nodes Data stage out nodes Inter pool transfer nodes
A small Montage workflow
1202 nodes