















































Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
An overview of message passing interface (mpi) for parallel computing, including its salient features, data types, and communication functions. It covers topics such as point-to-point communication, collective communication, user-defined datatypes, and virtual topologies. The tutorial also includes examples and explanations of various mpi concepts.
Typology: Study notes
1 / 55
This page cannot be seen from the preview
Don't miss anything!
















































MPI Tutorial
High Performance Computing– Introduction– Hardware models– Software models– Parallelization strategies
-^
Message Passing Interface - MPI– Point-to-point communication– Collective communication– Communicators– Datatypes– Topologies– Inter-communicators– Profiling
MPI Tutorial
Computing that stretches the ability of whatever class ofmachine (or conglomerate of machines) in terms of– floating-point operations– number of instructions per second– memory– input/output– algorithms and data structures– system software– software development environment
-^
What is considered high performance changes overtime, as systems get faster, more capable, and cheaper.
5
Need to solve a problem faster
-^
Need to solve a previously intractable problem,
-^
Need to solve a “larger” problem– More refined– Higher dimensionality– More complex geometry or physics– Larger problem domain
-^
Growing complexity of industrial systems
-^
Growing demand for high utilization of resources and lowpollution
-^
Greater demand for high economic agility and cost-effectiveness
MPI Tutorial
HPC software more expensive to develop, maintain anduse
-^
HPC software for your problem is not available or notvalidated
-^
Legacy solutions still valuable even if old, to solve, albeitslowly
-^
A “solution” in a field, whether slow or fast or possiblyeven correct, produces the “accepted answer”
-^
The coordination of resources is easy to manage
-^
The demanding aspect of the application is the human-intensive part, not the computational part, once theproblem is set up
7
MPI Tutorial
A metric that states that sequential bottlenecks posefundamental limitations on speedup, stated as
-^
What actually limits speed up below the upper limitgoverned by sequential fraction– sequential fraction may be a function of P– communication in the system has finite, often significant cost– load imbalance
If S is the fraction of an algorithm that is serial and 1-S thefraction that can be parallelized, then the speedup that canbe achieved using P processors is: 1/( S + (1-S)/P) whichhas a limiting value of 1/S for an infinite number ofprocessors.
MPI Tutorial
13
Sustained Performance– the actual performance in flops that an application/benchmark
achieves
Turn-around time (Time-to-Solution)– a measure of the latency of an entire application once started– a summation of the CPU, System, and I/O time of an
application
Capability– A mode of operation in which a single hard problem is solved
-^
Capacity, Throughput– A mode of operation in which the number of problems solved
per time is optimized
MPI Tutorial
sockets, IPC, shmem)
variables)
MPI Tutorial
17
MPI Tutorial
nodes
nodes (differences in computer hardware, network,operating system)
19
-^
Flynn's taxonomy of parallel hardware–
SISD - Single Instruction Single Data– SIMD - Single Instruction Multiple Data– MISD - Multiple Instruction Single Data– MIMD - Multiple Instruction Multiple Data
-^
Shared memory (MIMD)–
e.g., SGI Power Challenge, Sun HPC 10000
-^
Distributed memory (MIMD)–
e.g., Intel Paragon, IBM SP, Network of Workstations
-^
Distributed shared memory (MIMD)–
e.g., HP/Convex Exemplar, SGI Origin 2000– memory is physically distributed but logically shared
-^
Vector SMP (SIMD/MIMD)–
e.g., Cray C90, Nec SX4, NEC Earth Simulator
MPI Tutorial
25
Also known as Uniform Memory Access machines (UMAs) ortightly coupled systems
Processor
Processor
Processor
Processor
Processor
….
Shared Bus Memory
MPI Tutorial
27
Network
Processor Memory
Processor Memory
Processor Memory
Processor Memory
MPI Tutorial
29
the same memory location
31
MPI Tutorial
37
SequentialProgram
ParallelCompiler
Serial Compiler
Preprocessor
Add function
calls
Explicit
message-passing
MPI Tutorial
C/C++/Fortran with threads and runtime support– compiler switches and directives (OpenMP)– good for local SMP programming
-^
High Performance Fortran (HPF)– good for data parallel models with regular data relationships– your tests show efficacy of HPF on algorithms– specific array syntax is closely relevant to algorithms
-^
C/C++/Fortran plus MPI– for irregular or dynamic data relationships– each node programmed sequentially– no parallel compiler looks over whole code (separate
compilation)
39
MPI Tutorial
41
A collection of related algorithms that solve the sameproblem
-^
Each member of the collection is fastest for a subset of theproblem domain
-^
The problem domain is described by– concurrency– problem size– memory requirements– emphasis on space or speed
-^
Relative speed changes when the poly-algorithm is ported
-^
Discovering which to use in a given situation a priori is thecurrent research challenge
MPI Tutorial
MIMD machines with data-dependent workloads leadin many situations to unbalanced loads, even for“regular algorithms”
-^
Static load balancing– choice of data distributions
-^
Dynamic load balancing– reorganization of data distributions– retasking of processing units when they finish early
-^
Task Migration– moving both code and data when appropriate across a system– seek to use unused cycles– seek to escape from busy machines
43
MPI Tutorial
sequential system time divided by a parallel systemtime.
compared to the ideal (Speedup / number ofprocessors)
concurrency increases, in order to skirt Amdahl’s law,and measures constant problem size per processor,or else constant memory use
49
be disjoint
MPI Tutorial
data
51
blocked
block-scattered
MPI Tutorial
53
MPI Tutorial
55
MPI Tutorial
messages
MPI Tutorial
A message-passing library specification– Message-passing model– Not a compiler specification– Not a specific product
-^
For parallel computers, clusters, and heterogeneousnetworks
-^
Designed to aid the development of portable parallelsoftware libraries
-^
Designed to provide access to advanced parallelhardware for– End users– Library writers– Tool developers
MPI Tutorial
63
platforms
MPI Tutorial
65
#include
<stdio.h>
#include
<mpi.h>
main(
int
argc,
char
****argv**
)
{
MPI_Init ( &argc, &argv );printf ( “Hello World!\n” );MPI_Finalize ( ); } program
main
include
’mpif.h’
integer
ierr
call
MPI_INIT(
ierr
)
*,
’Hello
world!’
call
MPI_FINALIZE(
ierr
)
end
MPI Tutorial
MPI_INIT ( )^ Initializes MPI environment. This function must becalled and must be the first MPI function called in aprogram (exception:
Syntax
int MPI_Init (
**int *argc, char *argv )
MPI_INIT ( IERROR )INTEGER IERROR
67
Cleans up all MPI state. Once this routine has beencalled, no MPI routine ( even
) may be
called^ Syntax
int MPI_Finalize ( );MPI_FINALIZE
( IERROR )
INTEGER IERROR
MPI Tutorial
MPI_INIT: The C version accepts the argc and argvvariables that are provided as arguments to main ( )
-^
Error codes: Almost all MPI Fortran subroutines have aninteger return code as their last argument. Almost all Cfunctions return an integer error code
-^
Types: Opaque objects are given type names in C.Opaque objects are usually of type INTEGER in Fortran(exception: binary-valued variables are of typeLOGICAL)
-^
Inter-language interoperability is not guaranteed
73
Communication in
takes place with respect to
communicators
is one such predefined
communicator (something of type
and
contains group and context information
and
return
information based on the communicator passed in asthe first argument
-^
Processes may belong to many differentcommunicators
75
Login– ssh everest00.cis.uab.edu (you must be logged into a CIS
machine, otherwise you have to login to moat.cis.uab.edu first)
Compile– mpicc –o program program.c
-^
Submit– qsub myscript.sge
-^
Monitor– qstat –u
-^
See User Guide for more details– http://www.cis.uab.edu/cs541/homework/instructions.pdf
77
Basic message passing process
-^
Questions– To whom is data sent?– Where is the data?– What type of data is sent?– How much of data is sent?– How does the receiver identify it?
A:
Send
Receive
B: Process 1
Process 0
MPI Tutorial
79
Specified in MPI by starting address, count, anddatatype, where datatype is as follows:– Elementary (all C and Fortran datatypes)– Contiguous array of datatypes– Strided blocks of datatypes– Indexed array of blocks of datatypes– General structure
-^
Datatypes are constructed recursively
-^
Specifying application-oriented layout of data allowsmaximal use of special hardware
-^
Elimination of length in favor of count is clearer– Traditional: send 20 bytes– MPI: send 5 integers
MPI Tutorial
C datatype
MPI_CHAR
signed
char
MPI_SHORT
signed
short
int
MPI_INT
signed
int
MPI_LONG
signed
long
int
MPI_UNSIGNED_CHAR
unsigned
char
MPI_UNSIGNED_SHORT
unsigned
short
int
MPI_UNSIGNED_LONG
unsigned
long_int
MPI_UNSIGNED
unsigned
int
MPI_FLOAT
float
MPI_DOUBLE
double
MPI_LONG_DOUBLE
long
double
MPI_BYTE MPI_PACKED