Matrix-Vector Multiplication Assignment for ENEE759K/CMSC751 Course, Assignments of Electrical and Electronics Engineering

An assignment for a parallel algorithm implementation to multiply a sparse matrix with a dense vector using xmtc. Information on the data structures, setting up the environment, and instructions for serial and parallel implementations. Students are required to write the xmtc programs, run them with given data sets, and collect clock cycles for each run.

Typology: Assignments

Pre 2010

Uploaded on 07/30/2009

koofers-user-zlm
koofers-user-zlm 🇺🇸

5

(1)

10 documents

1 / 5

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
HW1: Matrix-Vector Multiplication
Course: ENEE759K/CMSC751
Title: Matrix-vector multiplication (matvec)
Date Assigned: February 10th, 2009
Date Due: February 24th, 2009
Contact: Fuat Keceli keceli (at) umd (dot) edu
1 Assignment
Your assignment is to implement a parallel algorithm in XMTC to multiply a sparse matrix with a dense
vector 1. Your parallel algorithm should be as fast as possible. Use the data structures described in the
next section. Your implementation should satisfy the following:
- Each row will be handled by a single thread.
- No thread will handle more than one row.
2 Data structures
In this assignment the sparse matrix is represented using the following three data structures:
-rowptr array: For each row iof the sparse matrix, rowptr[i]contains the index number of the
first nonzero element in this row. This index can be used in col_ind and values arrays (see below).
If row idoes not contain a non-zero element (i.e. it is all zeros), then rowpt r[i] == rowptr[i+1].
This array has m+1 elements where mis the number of rows in the matrix. The last element of
this row points to the outside of the col_ind and values array to indicate that there are no more
non-zero numbers.
-col_ind array: This array contains the column indices of the non-zero elements. If the matrix
element at row icolumn jis a non-zero element, then for some ksuch that 0 k<rowptr[i+
1]rowptr[i],col_ind [rowptr[i] + k] == j.
-values array: This array contains the values of non-zero elements. This array is indexed similar
to the col_ind array. If the matrix element at row icolumn jhas the non-zero value v, then for
some ksuch that 0 k<rowptr[i+1]row ptr[i],values[rowptr[i] + k] == v.
Consider the 6x7 sparse matrix in Figure 1. The above described data structures corresponding to
this matrix can be seen in Figure 2.
1A related assignment was first given at the University of California, Santa Barbara at the end of a graduate course based
on the parallel programming language MPI. The current assignment was developed to allow comparative study of program
development-time for XMTC vs MPI. See http://www.cs.umd.edu/ basili/publications/proceedings/P119.pdf
1
pf3
pf4
pf5

Partial preview of the text

Download Matrix-Vector Multiplication Assignment for ENEE759K/CMSC751 Course and more Assignments Electrical and Electronics Engineering in PDF only on Docsity!

HW1: Matrix-Vector Multiplication

Course: ENEE759K/CMSC Title: Matrix-vector multiplication (matvec) Date Assigned: February 10th, 2009 Date Due: February 24th, 2009 Contact: Fuat Keceli – keceli (at) umd (dot) edu

1 Assignment

Your assignment is to implement a parallel algorithm in XMTC to multiply a sparse matrix with a dense vector 1. Your parallel algorithm should be as fast as possible. Use the data structures described in the next section. Your implementation should satisfy the following:

  • Each row will be handled by a single thread.
  • No thread will handle more than one row.

2 Data structures

In this assignment the sparse matrix is represented using the following three data structures:

  • rowptr array: For each row i of the sparse matrix, rowptr[i] contains the index number of the first nonzero element in this row. This index can be used in col_ind and values arrays (see below). If row i does not contain a non-zero element (i.e. it is all zeros), then rowptr[i] == rowptr[i + 1 ]. This array has m + 1 elements where m is the number of rows in the matrix. The last element of this row points to the outside of the col_ind and values array to indicate that there are no more non-zero numbers.
  • col_ind array: This array contains the column indices of the non-zero elements. If the matrix element at row i column j is a non-zero element, then for some k such that 0 ≤ k < rowptr[i + 1 ] − rowptr[i], col_ind[rowptr[i] + k] == j.
  • values array: This array contains the values of non-zero elements. This array is indexed similar to the col_ind array. If the matrix element at row i column j has the non-zero value v, then for some k such that 0 ≤ k < rowptr[i + 1 ] − rowptr[i], values[rowptr[i] + k] == v.

Consider the 6x7 sparse matrix in Figure 1. The above described data structures corresponding to this matrix can be seen in Figure 2.

(^1) A related assignment was first given at the University of California, Santa Barbara at the end of a graduate course based on the parallel programming language MPI. The current assignment was developed to allow comparative study of program development-time for XMTC vs MPI. See http://www.cs.umd.edu/ basili/publications/proceedings/P119.pdf

Figure 1: Example 6x7 sparse matrix

Figure 2: Implementation

2.1 Setting up the environment

The header files and the binary files can be downloaded from ∼ george/xmtdata. To get the data files, log in to your account in the class server and copy the matvec.tgz file from directory using the following commands:

$ cp ~george/xmtdata/matvec.tgz ~ $ tar xzvf matvec.tgz

This will create the directory matvec with following folders: data, src, and doc. Data files are available in data directory. Put your c files to src, and txt files to doc.

3 Questions

  1. Serial implementation:

(a) Describe the serial algorithm of matvec in file algorithm.s.txt (b) Provide a brief work and time complexity analysis of this algorithm. Append this analysis to the file algorithm.s.txt (c) Write the XMTC serial program that executes this algorithm. Use matvec.s.c that is given to you in src. Write your code to the place indicated in the file. Please do not modify the marked region, you will use that region to check the correctness of your program. (d) Run this program using 4 sets of data given in the Input section. (e) Collect the number of clock cycles for each run and fill out the table in doc/table.txt using this information (see Output section).

Description Data Set Header File Binary file Max. non-zero elements / row Small m = 50, n = 100 data/small/matvec.h data/small/matvec.xbo 5 nnz = 110 Medium m = 400, n = 100 data/medium/matvec.h data/medium/matvec.xbo 9 nnz = 826 Large m = 10000, n = 100 data/large/matvec.h data/large/matvec.xbo 10 nnz = 19872 X-Large m = 30000, n = 100 data/xlarge/matvec.h data/xlarge/matvec.xbo 10 nnz = 60130

Table 2: Header files

4 Testing the program

You can test the correctness of your programs with the result data given in data sets as follows:

xmtcc matvec.p.c -include ../data/small/matvec.h ../data/small/matvec.xbo
-D PRINT_RESULT -quiet -o matvec.p xmtfpga matvec.p.b -o myFileSmall.txt diff -b myFileSmall.txt ../data/small/resultFileSmall.txt

If the diff command does not give any output, it means that you have the same result value and your program is right. Don’t forget the -b option. IMPORTANT: The FPGA has limited reserved space for standard output right now. This causes problems when while using printf statements to test the output of larger programs, such as matvec for the XLarge testcase. In particular, if you run your reference solution with that dataset and compare the outputs, you will notice that the result is truncated and diff with the provided result will fail. Use the standard output only to test the first 3 dataset (small, medium and large), and not the largest one (xlarge).

4.1 Output

Fill the following table: A text file named table.txt in doc is already created for you. Fill out the table in this text file using white spaces to indent the fields. This text file will be parsed automatically by a script so it is important to adhere to the format. Remove any printf statements from your code while taking these measurements. Printf statements increase the clock count. Therefore the measurements with printf statements may not reflect the actual time and work done.

Input size Small Medium Large X-large matvec.s.c matvec.p.c

Table 3: Clock cycles will be written to table.txt

Note that, a part of your grading criteria is the performance of your parallel implementation. There- fore you should try to obtain the fastest running parallel program. As a guideline, following are the cycle counts for our reference serial and parallel implementations on the FPGA computer. Serial im- plementation: 3653342 clock cycles, parallel implementation: 214319 clock cycles, parallel speedup: ∼17x.

4.2 Submission

The use of the make utility for submission make submit is required. Make sure that you have the correct files at correct locations (src and doc directories) using the make submitcheck command. Run following commands to submit the assignment:

$ make submitcheck $ make submit