MPI vs. OpenMP: Parallel Programming with MPI and OpenMP in CMSC 714 - Prof. Alan L. Sussm | Study notes Computer Science

CMSC 714

Lecture 5

MPI vs. OpenMP

and

Titanium

Alan Sussman

CMSC 714, Fall05 - Alan Sussman & Jeffrey K. Hollingsworth

Notes

zFirst programming assignment coming soon

– Slight change from original plan – you’ll write one program,

first using either OpenMP or MPI, then the other

zLorin Hochstein will talk at the end of class today on

his study of all of you writing parallel programs

zFirst, questions on OpenMP and UPC

– Directives vs. language extensions

CMSC 714, Fall05 - Alan Sussman & Jeffrey K. Hollingsworth

OpenMP + MPI

zSome applications can take advantage of both

message passing and threads

– Questions is what to do to obtain best overall performance,

without too much programming difficulty

– Choices are all MPI, all OpenMP, or both

•For both, common option is outer loop parallelized with

message passing, inner loop with directives to generate

threads

zApplications studied:

–Hydrology –CGWAVE

– Computational chemistry – GAMESS

– Linear algebra – matrix multiplication and QR factorization

– Seismic processing – SPECseis95

– Computational fluid dynamics – TLNS3D

– Computational physics - CRETIN

CMSC 714, Fall05 - Alan Sussman & Jeffrey K. Hollingsworth

Types of parallelism in the codes

zFor message passing parallelism (MPI)

– Parametric – coarse-grained outer loop, essentially task

parallel

– Structured domains – domain decomposition with local

operations – structured and unstructured grids

– Direct solvers – linear algebra, lots of communication and

load balancing required – message passing works well for

large systems of equations

zShared memory parallelism (OpenMP)

– Statically scheduled parallel loops – one large, or several

smaller loops, non-nested parallel

– Parallel regions – merge loops into one parallel region to

reduce overhead of directives

– Dynamic load balanced – when static scheduling leads to

load imbalance from irregular task sizes

CMSC 714, Fall05 - Alan Sussman & Jeffrey K. Hollingsworth

CGWAVE

zFinite elements - MPI parameter space evaluation at outer loop,

OpenMP sparse linear equation solver in inner loops

zSpeedup using 2 levels of parallelism allows modeling larger

bodies of water possible in a reasonable amount of time

zMaster-worker strategy for dynamic load balancing in MPI

part/component

zSolver for each component solves large sparse linear system

with OpenMP to parallelize

zOn SGI Origin 2000 (distributed shared memory machine), use

first touch rule to migrate data for each component to the

processor that uses it

zPerformance results show that best performance obtained using

both MPI and OpenMP, with a combination of MPI workers and

OpenMP threads that depends on the problem/grid size

– And for load balancing, a lot fewer MPI workers than components

CMSC 714

Lecture 5

MPI vs. OpenMP

and

Titanium

Alan Sussman

MPI vs. OpenMP: Parallel Programming with MPI and OpenMP in CMSC 714 - Prof. Alan L. Sussm, Study notes of Computer Science

Related documents

Partial preview of the text

Download MPI vs. OpenMP: Parallel Programming with MPI and OpenMP in CMSC 714 - Prof. Alan L. Sussm and more Study notes Computer Science in PDF only on Docsity!

CMSC 714

Lecture 5

MPI vs. OpenMP

and

Titanium

Alan Sussman

z First programming assignment coming soon

z Lorin Hochstein will talk at the end of class today on

his study of all of you writing parallel programs

z First, questions on OpenMP and UPC

z Some applications can take advantage of both

message passing and threads

z Applications studied:

z For message passing parallelism (MPI)

z Shared memory parallelism (OpenMP)

CMSC 714

Lecture 5

MPI vs. OpenMP

and

Titanium

Alan Sussman

z First programming assignment is on web page

z Not here on Tuesday

z Need volunteers to present papers

z Computational chemistry – molecular dynamics –

MPI across cluster, OpenMP within each node

z Built on top of Global Arrays package – for distributed

array operations

z Linear algebra solvers mainly use OpenMP for

dynamic scheduling and load balancing

z MPI versions of parts of code are complex, but can

provide higher performance for large problems

z Performance results on “medium” sized problem from

SPEC (Standard Evaluation Performance Corp.) are

for a small system (4 8-processor Alpha processors)

connected by Memory Channel

z Hybrid parallelism with MPI for scalability and

OpenMP for load balancing, for MM and QR

factorization

z On IBM SP system with multiple 4-processor nodes

z Studies tradeoffs of hybrid approach for linear

algebra algorithms vs. only using MPI (running 4 MPI

processes per node)

z Use OpenMP for load balancing and decreasing

communication costs within a node

z Also helps to hide communication latency behind

other operations – important for overall performance

z QR factorization results on “medium” sized matrices

show that adaptive load balancing is better than

dynamic loop scheduling within a node

z For gas and oil exploration

z Original message passing version (in PVM) is SPMD,

OpenMP starts serial then starts an SPMD parallel

section

z Code scales equally well for PVM and OpenMP, on

SGI Power Challenge (a DSM machine)

z Physics application with multiple levels of message

passing and thread parallelism

z Ported onto both distributed memory system (1464 4-

processor nodes) and DSM (large SGI Origin 2000)

z Complex structure, with 2 parts discussed

z No performance results