



Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Material Type: Assignment; Professor: Carothers; Class: PARALLEL PROGRAMMING; Subject: Computer Science; University: Rensselaer Polytechnic Institute; Term: Spring 2009;
Typology: Assignments
1 / 5
This page cannot be seen from the preview
Don't miss anything!




For this assignment you will creating a timing test program for MPI. In particular, you will write a series of functions where each function will determine the MAX average, MIN average and AVERAGE average execution time in microseconds of a particular MPI routine across all the MPI tasks, where each MPI tasks would have excercised a particular MPI call in a loop, L times. The following routines (or pairs of routines) are the ones you must measure the performance of for this assignment.
2 Timing Code
The following is a routine that you would include as “rdtsc.h”. RDTSC stands for the “read time- stamp counter” and it is an x86 or PowerPC assembly language instruction that returns a 64 bit number that is the number of cycles this machine is processed since the last boot-time. The C code for this is:
#ifndef RDTSC_H_DEFINED #define RDTSC_H_DEFINED
#if defined(i386)
static inline unsigned long long rdtsc(void) { unsigned long long int x; asm volatile (".byte 0x0f, 0x31" : "=A" (x)); return x; } #elif defined(x86_64)
#endif
Use the above macro rdtsc. To do things like:
unsigned long long start_time = 0; unsigned long long finish_time = 0; unsigned long long total_time = 0;
rdtsc( start_time );
for( i; i < MAX_WHATEVER; i++ ) { DO TEST }
rdtsc( finish_time );
total_time = finish_time - start_time;
Note, I’ll place a copy of of this in rdtsc.h on the Class website for you to download.
3 HAND-IN INSTRUCTIONS
Using the CS cluster, you will need to run your tests over the following configurations. Note, the CPU speed of the CS cluster Opteron processors is 2.0 GHz. To translate the cycle counts into microseconds you will need to divide the number of cycles by 2000.0. However, don’t conver to microseconds until the performance test is complete as you will preturb your results with the overhead of the floating point division operations.
Tabulate this data into a table using LaTeX or MSWord. Attach a printed version of you code and place a copy in your account on my office machine and let me know where I can find it.
Note, you’ll need modify your path to get MPI to work correctly with the following bash command line:
export PATH=/cs/chrisc/MPI/mpich-1.2.7p1/bin:$PATH