Assignment 1: Message Passing Interface | CSCI 4320, Assignments of Computer Science

Material Type: Assignment; Professor: Carothers; Class: PARALLEL PROGRAMMING; Subject: Computer Science; University: Rensselaer Polytechnic Institute; Term: Spring 2009;

Typology: Assignments

Pre 2010

Uploaded on 08/09/2009

koofers-user-6jm
koofers-user-6jm 🇺🇸

10 documents

1 / 5

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
CSCI-4320/6340 Assignment 1: Message Passing
Interface
Christopher D. Carothers
Department of Computer Science
Rensselaer Polytechnic Institute
110 8th Street
Troy, New York U.S.A. 12180-3590
January 23, 2009
DUE DATE: 12 p.m/Noon, Friday, February 6th
1 Description
For this assignment you will creating a timing test program for MPI. In particular, you will write
a series of functions where each function will determine the MAX average,MIN average and
AVERAGE average execution time in microseconds of a particular MPI routine across all the
MPI tasks, where each MPI tasks would have excercised a particular MPI call in a loop, Ltimes.
The following routines (or pairs of routines) are the ones you must measure the performance of for
this assignment.
MPI Send and MPI recv: Here, you will pair-up the MPI tasks according the following
pattern, Rank 0 pars with Rank N1, Rank 1 pairs with Rank N2, Rank 2 pairs with
Rank N3and so on until Rank N/21pairs with Rank N/2. In a loop, each pairs of
tasks will first send a message of size M, where Mis an input parameter and then receive
the message that the other task just sent. The loop with integrate Ltimes (where Lis also
an input parameter). You will time the whole loop, compute the local average time and then
determine across all processors the MAX average, MIN average and AVERAGE average
cycle times. I will give you the cycle timer code for an x86 system. The MAX, MIN and
AVERAGE can be computed by doing a MPI Allreduce.
MPI Isend and MPI Irecv: Do the same as you did for the previous MPI Send and MPI Recv.
1
pf3
pf4
pf5

Partial preview of the text

Download Assignment 1: Message Passing Interface | CSCI 4320 and more Assignments Computer Science in PDF only on Docsity!

CSCI-4320/6340 Assignment 1: Message Passing

Interface

Christopher D. Carothers

Department of Computer Science

Rensselaer Polytechnic Institute

110 8th Street

Troy, New York U.S.A. 12180-

January 23, 2009

DUE DATE: 12 p.m/Noon, Friday, February 6th

1 Description

For this assignment you will creating a timing test program for MPI. In particular, you will write a series of functions where each function will determine the MAX average, MIN average and AVERAGE average execution time in microseconds of a particular MPI routine across all the MPI tasks, where each MPI tasks would have excercised a particular MPI call in a loop, L times. The following routines (or pairs of routines) are the ones you must measure the performance of for this assignment.

  • MPI Send and MPI recv: Here, you will pair-up the MPI tasks according the following pattern, Rank 0 pars with Rank N − 1 , Rank 1 pairs with Rank N − 2 , Rank 2 pairs with Rank N − 3 and so on until Rank N/ 2 − 1 pairs with Rank N/ 2. In a loop, each pairs of tasks will first send a message of size M , where M is an input parameter and then receive the message that the other task just sent. The loop with integrate L times (where L is also an input parameter). You will time the whole loop, compute the local average time and then determine across all processors the MAX average, MIN average and AVERAGE average cycle times. I will give you the cycle timer code for an x86 system. The MAX, MIN and AVERAGE can be computed by doing a MPI Allreduce.
  • MPI Isend and MPI Irecv: Do the same as you did for the previous MPI Send and MPI Recv.
  • MPI Scan: Compute the prefix sum over the ranks of all MPI tasks. Do this L times in a loop like before and then use the allreduce to compute the MAX average, MIN average and AVERAGE average cycle times across all MPI tasks.
  • MPI Gather: Here, each task will send it rank number, which is then gather/received by the target MPI task 0 (ZERO) into an array of ranks. Do this L times and compute the statistics as before.
  • MPI Scatter: Here, the source MPI task 0 (ZERO) will re-distribute the previously gathered rank values to the nodes that sent them. As before, do this L time and compute the statistics.
  • MPI Barrier: Here, have each task enter the barrier and do it L times in the loop, then compute the usual statistics.
  • MPI Allreduce: Here, generate a random floating point value (double) using drand48(). Use the Allreduce routine to determine what was the MIN random value among all MPI tasks. Do this L times and compute the statistics.

2 Timing Code

The following is a routine that you would include as “rdtsc.h”. RDTSC stands for the “read time- stamp counter” and it is an x86 or PowerPC assembly language instruction that returns a 64 bit number that is the number of cycles this machine is processed since the last boot-time. The C code for this is:

#ifndef RDTSC_H_DEFINED #define RDTSC_H_DEFINED

#if defined(i386)

static inline unsigned long long rdtsc(void) { unsigned long long int x; asm volatile (".byte 0x0f, 0x31" : "=A" (x)); return x; } #elif defined(x86_64)

#endif

Use the above macro rdtsc. To do things like:

unsigned long long start_time = 0; unsigned long long finish_time = 0; unsigned long long total_time = 0;

rdtsc( start_time );

for( i; i < MAX_WHATEVER; i++ ) { DO TEST }

rdtsc( finish_time );

total_time = finish_time - start_time;

Note, I’ll place a copy of of this in rdtsc.h on the Class website for you to download.

3 HAND-IN INSTRUCTIONS

Using the CS cluster, you will need to run your tests over the following configurations. Note, the CPU speed of the CS cluster Opteron processors is 2.0 GHz. To translate the cycle counts into microseconds you will need to divide the number of cycles by 2000.0. However, don’t conver to microseconds until the performance test is complete as you will preturb your results with the overhead of the floating point division operations.

  • Number of Processors: 2, 4, 8 processors. If the Blue Gene accounts are available, then run your test up to 128 processors.
  • L Loop Iterations: 1024, 8192, 65536
  • M message sizes: 4, 16, 64, 256, 1024, 4096 bytes for the MPI send/recv and isend/irecv routines only.

Tabulate this data into a table using LaTeX or MSWord. Attach a printed version of you code and place a copy in your account on my office machine and let me know where I can find it.

Note, you’ll need modify your path to get MPI to work correctly with the following bash command line:

export PATH=/cs/chrisc/MPI/mpich-1.2.7p1/bin:$PATH