CS 441/632/732 Parallel Computing - HW3: Matrix Multiplication with 2-D Data Distribution | Assignments Computer Science

Fall 2005 CS 441/632/732 Parallel Computing

Homework-3

10/11/2005 1-2

Individual work only. 150 points. Due Oct 25, 2005.

1. Implement matrix-matrix multiplication with 2-D data distribution using (a) Cannon’s

algorithm (section 11.2.4, page 348) and (b) Fox’s algorithm (handout given in class).

Measure time taken for the matrix sizes 5000x5000 and 10000x10000 for the following

process grid layouts: 1x1, 2x2, and 4x4. Analyze the performance of Cannon’s algorithm vs.

Fox’s algorithm and plot the speedup plots for different process grid layouts and the two

matrix sizes. To reduce the overall execution time of the program allocate and initialize only

the required parts of the matrices in each process. Do not include the time spent in allocation

and initialization when measuring the time taken for matrix-matrix multiplication. Use BLAS

routine dgemm to perform local matrix-matrix multiplication and use an input file to

provide matrix size and process grid layout (the name of the input file must be specified as

a command line argument). Implementation of Cannon’s algorithm is optional for

undergraduate students.

Note that the problem sizes and process grid layouts where chosen such that each process

performed multiplication of square matrices. Also while measuring speedup the problem size

was fixed and the number of processes were varied. Measure the time taken by Cannon’s

algorithm and Fox’s algorithm when the matrix sizes are increased as the number of

processes are increased. Assign each process matrices of size 1000x1000 and use the

following process grid layouts: 1x1, 2x2, 3x3, and 4x4 (the corresponding global problem

sizes will be 1000x1000, 2000x2000, 3000x3000, and 4000x4000). Plot the scaled speedup

plots for both the algorithms. Use the example programs provided with the MPI tutorial as a

starting point and use MPI point-to-point and collective communications primitives for

message passing. Use the following tables to include the timing measurements:

5000x5000 10000x10000

# of

Process 1x1 = 1 2x2 = 4 4x4 = 16 1x1 = 1 2x2 = 4 4x4 = 16

Cannon

Fox

# of

Process

1x1 = 1

1000x1000

2x2 = 4

2000x2000

3x3 = 9

3000x3000

4x4 = 16

4000x4000

Cannon

Fox

General Comments:

You must implement and test these programs on the CIS cluster (Everest) and use MPI for

communication. Instructions for using the CIS cluster and submitting jobs to SGE can be found

at: http://www.cis.uab.edu/ccl/resources/everest/EverestGridNodeUserGuide.php. While

submitting to the queue you must request # of processors = # of processes, for example, for the

process grid layout 3x3, total # of processors requested = 9.

CS 441/632/732 Parallel Computing - HW3: Matrix Multiplication with 2-D Data Distribution, Assignments of Computer Science

Related documents

Partial preview of the text

Download CS 441/632/732 Parallel Computing - HW3: Matrix Multiplication with 2-D Data Distribution and more Assignments Computer Science in PDF only on Docsity!

of 5000x5000^ 10000x

of