Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Solved Homework 3 | High Performance Computing Systems | CS 1645, Assignments of Computer Science

University of Pittsburgh (Pitt) - Medical Center-Health System Computer Science

Material Type: Assignment; Class: INTRO HIGH PERF COMPTNG SYSTMS; Subject: Computer Science; University: University of Pittsburgh; Term: Fall 2007;

Typology: Assignments

Pre 2010

Uploaded on 09/17/2009

koofers-user-oau 🇺🇸

9 documents

1 / 7

This page cannot be seen from the preview

Don't miss anything!

CS 1645/2045 – INTRODUCTION TO HIGH PERFORMANCE

COMPUTING SYSTEMS

HOMEWORK 3 - SOLUTIONS

October 2, 2007

Socrates Dimitriadis (TA)

Problem 1: fork/join method

Creating new threads is time consuming and should be consider as a trade off in designing.

The current implementation creates threads on each iteration of the algorithm, resulting in a

fork-and-join large loop. Therefore, it is expected to note performance degradation. In addition,

if we further partition the algorithm to several threads we will have further performance

degradation due to thread handling overheads which are more noticeable when the problem

size is getting smaller.

Speed-up = f (#threads)

0.2

0.4

0.6

0.8

1.2

1 2 4 8

Numbe r of T hre ads

Speed-up

N=32

N=64

N=128

The next plot, presents the speedup of using thread implementation instead of the standard sequential

implementation

Speed-up = f (#threads)

0.2

0.4

0.6

0.8

1.2

seq 1 2 4 8

Numbe r of Thre ads

Speed-up

N=32

N=64

N=128

Discover Assignments of Computer Science University of Pittsburgh (Pitt) - Medical Center-Health System

Partial preview of the text

Download Solved Homework 3 | High Performance Computing Systems | CS 1645 and more Assignments Computer Science in PDF only on Docsity!

CS 1645/2045 – INTRODUCTION TO HIGH PERFORMANCE

COMPUTING SYSTEMS

HOMEWORK 3 - SOLUTIONS

October 2, 2007

Socrates Dimitriadis (TA)

Problem 1: fork/join method

Creating new threads is time consuming and should be consider as a trade off in designing.

The current implementation creates threads on each iteration of the algorithm, resulting in a

fork-and-join large loop. Therefore, it is expected to note performance degradation. In addition,

if we further partition the algorithm to several threads we will have further performance

degradation due to thread handling overheads which are more noticeable when the problem

size is getting smaller.

Speed-up = f (#threads)

Num ber of Threads

Speed-up

N=

The next plot, presents the speedup of using thread implementation instead of the standard sequential

implementation.

Speed-up = f (#threads)

seq 1 2 4 8 Num ber of Threads

Speed-up

N= N= N=

Fork_join.c

#include <pthread.h> #include <stdlib.h> #include <sys/time.h> #include <math.h>

# define MAX_THREADS 32 #define n 128

void * laplace ( void *);

struct arg_to_thread { int id ;} ; float x[n+2][n+2] ; float xn[n+2][n+2] ; int conv[MAX_THREADS]; int conv_total; int num_threads;

main ( ) { int i, j, k ; pthread_t p_threads[MAX_THREADS]; pthrea d_at tr_t attr; double time_start, time_end; struct timeval tv; struct timezone tz; struct arg_to_thread my_arg[MAX_THREADS] ;

printf ("Enter number of threads: "); scanf ("%d", &num_threads);

for (k=1 ; k <= n ; k++) { x[0][k] = 0; x[k][0] = 0; x[n+1][k] = k; x[k][n+1] = k; }

for (i=1 ; i <= n ; i++) for (j=1 ; j <= n ; j++) x[i][j] = 0 ;

gettimeofday (&tv , &tz); time_start = ( double )tv.tv_sec + ( double )tv.tv_usec / 1000000.0;

/* Iterations with forking/Joining */ for (k=1 ; k < 15000 ; k++) { pthread_attr_init (&attr); pthread_attr_setscope (&attr, PTHREAD_SCOPE_SYSTEM);

conv_total = 0;

/* Create threads, compute for k */ for (i=0; i< num_threads; i++) { my_arg[i].id = i ; pthread_create (&p_threads[i], &attr, laplace, ( void *) &my_arg[i]);

Problem 2

Here, we create 2-8 threads at once, and each thread performs the necessary iterations of the

algorithm. Of course, synchronization issues should be considered carefully.

For small size problems, increasing the number of threads could degrade the performance since

thread switching and synchronization overheads overrule the computation partitioning. While

the problem size is getting bigger, the computation part becomes more computational sensitive

and therefore, by using more threads/cpus we might be able to increase the overall performance

of the algorithm.

Speed-up = f (#threads)

Num ber of Threads

Speed-up

N=

The next plot, presents the speedup of using thread implementation instead of the standard sequential

implementation.

Speed-up = f (#threads)

seq 1 2 4 8 Num ber of Threads

Speed-up

(^) N=

N= N=

barrier.c

#include <pthread.h> #include <stdlib.h> #include <sys/time.h> #include <math.h>

# define MAX_THREADS 32 #define n 128

typedef struct { pthread_mutex_t x_lock; pthread_cond_t barrier; int count; } my_barrier_t;

void * laplace ( void *); void init_my_barrier (my_barrier_t *); void my_barrier (my_barrier_t *, int );

struct arg_to_thread { int id ;} ; float x[n+2][n+2] ; float xn[n+2][n+2] ;

int num_threads; int conv=0; my_barrier_t br; pthread_mutex_t conv_lock;

void init_my_barrier (my_barrier_t *b) { b->count = 0; pthread_mutex_init(&(b->x_lock), NULL) ; pthread_cond_init(&(b->barrier) , NULL) ; }

void my_barrier (my_barrier_t *b, int num_threads) { pthread_mutex_lock(&(b->x_lock)); b->count++ ; if (b->count == num_threads) { b->count = 0 ; pthread_cond_broadcast(&(b->barrier)) ; } else pthread_cond_wait(&(b->barrier), &(b->x_lock)) ; pthread_mutex_unlock(&(b->x_lock));

void * laplace ( void *s) { struct arg_to_thread local_arg ; int k, i, j, lid, local_conv; local_arg = s; lid = (local_arg).id; float error ;

/* Each tread performs iterations until all the points converged / for (k=1 ; k < 15000 ; k++) { local_conv = 0 ; for (i= lid(n/num_threads)+1 ; i <= (lid+1)*(n/num_threads) ; i++) for (j=1 ; j <= n ; j++) { xn[i][j] = 0.25 * (x[i-1][j]+x[i+1][j]+x[i][j-1]+x[i][j+1]); if (xn[i][j] <= x[i][j]) error = x[i][j] - xn[i][j]; else error = xn[i][j] - x[i][j]; if (error <= 0.001) local_conv = local_conv + 1 ; }

/* Access to a global variable */ pthread_mutex_lock(&conv_lock) ; conv = conv + local_conv; pthread_mutex_unlock(&conv_lock) ;

/* barrier - wait all the threads to reach this point */ my_barrier(&br,num_threads);

if (conv == n*n) break ; // break all threads if all points have been converged */ else { if (lid==0) conv=0; } // this is ok since there is another barrier later

/* Update the values in matrix x / for (i= lid(n/num_threads)+1 ; i <= (lid+1)*(n/num_threads) ; i++) for (j=1 ; j <= n ; j++) x[i][j] = xn[i][j] ;

/* barrier - wait all the threads to reach this point */ my_barrier(&br,num_threads);

} pthread_exit (0);

Solved Homework 3 | High Performance Computing Systems | CS 1645, Assignments of Computer Science

Related documents

Partial preview of the text

Download Solved Homework 3 | High Performance Computing Systems | CS 1645 and more Assignments Computer Science in PDF only on Docsity!

CS 1645/2045 – INTRODUCTION TO HIGH PERFORMANCE

COMPUTING SYSTEMS

HOMEWORK 3 - SOLUTIONS

October 2, 2007

Socrates Dimitriadis (TA)

Problem 1: fork/join method

Creating new threads is time consuming and should be consider as a trade off in designing.

The current implementation creates threads on each iteration of the algorithm, resulting in a

fork-and-join large loop. Therefore, it is expected to note performance degradation. In addition,

if we further partition the algorithm to several threads we will have further performance

degradation due to thread handling overheads which are more noticeable when the problem

size is getting smaller.

Speed-up = f (#threads)

N=

N=

N=

The next plot, presents the speedup of using thread implementation instead of the standard sequential

implementation.

Problem 2

Here, we create 2-8 threads at once, and each thread performs the necessary iterations of the

algorithm. Of course, synchronization issues should be considered carefully.

For small size problems, increasing the number of threads could degrade the performance since

thread switching and synchronization overheads overrule the computation partitioning. While

the problem size is getting bigger, the computation part becomes more computational sensitive

and therefore, by using more threads/cpus we might be able to increase the overall performance

of the algorithm.

Speed-up = f (#threads)

N=

N=

N=

The next plot, presents the speedup of using thread implementation instead of the standard sequential

implementation.