



Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Material Type: Assignment; Professor: Carothers; Class: PARALLEL PROGRAMMING; Subject: Computer Science; University: Rensselaer Polytechnic Institute; Term: Spring 2009;
Typology: Assignments
1 / 5
This page cannot be seen from the preview
Don't miss anything!




For this assignment you will creating a Pthread program that sends messages between proces- sors using shared memory on the Intel Medium Memory SMP – ims01 upto 64 processors and compare the performance of your implementation to an equivalent MPI program that executes on the Blue Gene/L supercomputer located in the CCNI for the same processor counts. The specifics are as follows:
is 163840 , then you can delete that message from the system. Otherwise, randomly pick another processor/thread as a destination and schedule the message for that processor/thread. When all messages have been remove from the system – that is all threads have no more work
2 Pthread Implementation Hints
For this assignment, I suggest you architect it in the following way.
// typedef unsigned long long int unsigned long long;
static inline unsigned long long rdtsc(void) { unsigned hi, lo; asm volatile ("rdtsc" : "=a"(lo), "=d"(hi)); return ( (unsigned long long)lo)|( ((unsigned long long)hi)<<32 ); }
#elif defined(powerpc)
// typedef unsigned long long int unsigned long long;
static inline unsigned long long rdtsc(void) { unsigned long long int result=0; unsigned long int upper, lower,tmp; asm volatile( "0: \n" "\tmftbu %0 \n" "\tmftb %1 \n" "\tmftbu %2 \n" "\tcmpw %2,%0 \n" "\tbne 0b \n" : "=r"(upper),"=r"(lower),"=r"(tmp) ); result = upper; result = result<<32; result = result|lower;
return(result); }
#else
#error "No tick counter is available!"
#endif
#endif
Use the above macro rdtsc. To do things like:
unsigned long long start_time = 0; unsigned long long finish_time = 0; unsigned long long total_time = 0;
rdtsc( start_time );
for( i; i < MAX_WHATEVER; i++ ) { DO TEST }
rdtsc( finish_time );
total_time = finish_time - start_time;
Note, I’ll place a copy of of this in rdtsc.h on the Class website for you to download.
5 Experiments and Write-up
You will conduct a series of experiments on both platforms where you are to collect execution times and graph your results across all runs. In particular, there are 7 processor configurations and 4 messages sizes for a total of 28 experiments. You should graph each group of processors within a fixed message for the Blue Gene and the Intel SMP system. That is you’ll have two plots per graph where the y-axis is the execution time and the x-axis is the processor count. In total you’ll have four graphs, one for each message size. Along with the graphs, please include a write-up which describes your Pthread solution why you observed the particular performance phenomena that is shown in your graphs.
6 Hand-In Instructions
Place your code in your area51 account. Please bring a hard copy of your performance report with graphs to class.