



Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
An assignment for a parallel algorithm implementation to multiply a sparse matrix with a dense vector using xmtc. Information on the data structures, setting up the environment, and instructions for serial and parallel implementations. Students are required to write the xmtc programs, run them with given data sets, and collect clock cycles for each run.
Typology: Assignments
1 / 5
This page cannot be seen from the preview
Don't miss anything!




Course: ENEE759K/CMSC Title: Matrix-vector multiplication (matvec) Date Assigned: February 10th, 2009 Date Due: February 24th, 2009 Contact: Fuat Keceli – keceli (at) umd (dot) edu
Your assignment is to implement a parallel algorithm in XMTC to multiply a sparse matrix with a dense vector 1. Your parallel algorithm should be as fast as possible. Use the data structures described in the next section. Your implementation should satisfy the following:
In this assignment the sparse matrix is represented using the following three data structures:
Consider the 6x7 sparse matrix in Figure 1. The above described data structures corresponding to this matrix can be seen in Figure 2.
(^1) A related assignment was first given at the University of California, Santa Barbara at the end of a graduate course based on the parallel programming language MPI. The current assignment was developed to allow comparative study of program development-time for XMTC vs MPI. See http://www.cs.umd.edu/ basili/publications/proceedings/P119.pdf
Figure 1: Example 6x7 sparse matrix
Figure 2: Implementation
The header files and the binary files can be downloaded from ∼ george/xmtdata. To get the data files, log in to your account in the class server and copy the matvec.tgz file from directory using the following commands:
$ cp ~george/xmtdata/matvec.tgz ~ $ tar xzvf matvec.tgz
This will create the directory matvec with following folders: data, src, and doc. Data files are available in data directory. Put your c files to src, and txt files to doc.
3 Questions
(a) Describe the serial algorithm of matvec in file algorithm.s.txt (b) Provide a brief work and time complexity analysis of this algorithm. Append this analysis to the file algorithm.s.txt (c) Write the XMTC serial program that executes this algorithm. Use matvec.s.c that is given to you in src. Write your code to the place indicated in the file. Please do not modify the marked region, you will use that region to check the correctness of your program. (d) Run this program using 4 sets of data given in the Input section. (e) Collect the number of clock cycles for each run and fill out the table in doc/table.txt using this information (see Output section).
Description Data Set Header File Binary file Max. non-zero elements / row Small m = 50, n = 100 data/small/matvec.h data/small/matvec.xbo 5 nnz = 110 Medium m = 400, n = 100 data/medium/matvec.h data/medium/matvec.xbo 9 nnz = 826 Large m = 10000, n = 100 data/large/matvec.h data/large/matvec.xbo 10 nnz = 19872 X-Large m = 30000, n = 100 data/xlarge/matvec.h data/xlarge/matvec.xbo 10 nnz = 60130
Table 2: Header files
4 Testing the program
You can test the correctness of your programs with the result data given in data sets as follows:
xmtcc matvec.p.c -include ../data/small/matvec.h ../data/small/matvec.xbo
-D PRINT_RESULT -quiet -o matvec.p xmtfpga matvec.p.b -o myFileSmall.txt diff -b myFileSmall.txt ../data/small/resultFileSmall.txt
If the diff command does not give any output, it means that you have the same result value and your program is right. Don’t forget the -b option. IMPORTANT: The FPGA has limited reserved space for standard output right now. This causes problems when while using printf statements to test the output of larger programs, such as matvec for the XLarge testcase. In particular, if you run your reference solution with that dataset and compare the outputs, you will notice that the result is truncated and diff with the provided result will fail. Use the standard output only to test the first 3 dataset (small, medium and large), and not the largest one (xlarge).
Fill the following table: A text file named table.txt in doc is already created for you. Fill out the table in this text file using white spaces to indent the fields. This text file will be parsed automatically by a script so it is important to adhere to the format. Remove any printf statements from your code while taking these measurements. Printf statements increase the clock count. Therefore the measurements with printf statements may not reflect the actual time and work done.
Input size Small Medium Large X-large matvec.s.c matvec.p.c
Table 3: Clock cycles will be written to table.txt
Note that, a part of your grading criteria is the performance of your parallel implementation. There- fore you should try to obtain the fastest running parallel program. As a guideline, following are the cycle counts for our reference serial and parallel implementations on the FPGA computer. Serial im- plementation: 3653342 clock cycles, parallel implementation: 214319 clock cycles, parallel speedup: ∼17x.
The use of the make utility for submission make submit is required. Make sure that you have the correct files at correct locations (src and doc directories) using the make submitcheck command. Run following commands to submit the assignment:
$ make submitcheck $ make submit