Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Log in Sign up

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Matrix-Vector Multiplication Assignment for ENEE759K/CMSC751 Course, Assignments of Electrical and Electronics Engineering

University of Maryland Electrical and Electronics Engineering

An assignment for a parallel algorithm implementation to multiply a sparse matrix with a dense vector using xmtc. Information on the data structures, setting up the environment, and instructions for serial and parallel implementations. Students are required to write the xmtc programs, run them with given data sets, and collect clock cycles for each run.

Typology: Assignments

Pre 2010

Uploaded on 07/30/2009

koofers-user-zlm 🇺🇸

5

(1)

10 documents

1 / 5

This page cannot be seen from the preview

Don't miss anything!

HW1: Matrix-Vector Multiplication

Course: ENEE759K/CMSC751

Title: Matrix-vector multiplication (matvec)

Date Assigned: February 10th, 2009

Date Due: February 24th, 2009

Contact: Fuat Keceli – keceli (at) umd (dot) edu

1 Assignment

Your assignment is to implement a parallel algorithm in XMTC to multiply a sparse matrix with a dense

vector 1. Your parallel algorithm should be as fast as possible. Use the data structures described in the

next section. Your implementation should satisfy the following:

- Each row will be handled by a single thread.

- No thread will handle more than one row.

2 Data structures

In this assignment the sparse matrix is represented using the following three data structures:

-rowptr array: For each row iof the sparse matrix, rowptr[i]contains the index number of the

first nonzero element in this row. This index can be used in col_ind and values arrays (see below).

If row idoes not contain a non-zero element (i.e. it is all zeros), then rowpt r[i] == rowptr[i+1].

This array has m+1 elements where mis the number of rows in the matrix. The last element of

this row points to the outside of the col_ind and values array to indicate that there are no more

non-zero numbers.

-col_ind array: This array contains the column indices of the non-zero elements. If the matrix

element at row icolumn jis a non-zero element, then for some ksuch that 0 ≤k<rowptr[i+

1]−rowptr[i],col_ind [rowptr[i] + k] == j.

-values array: This array contains the values of non-zero elements. This array is indexed similar

to the col_ind array. If the matrix element at row icolumn jhas the non-zero value v, then for

some ksuch that 0 ≤k<rowptr[i+1]−row ptr[i],values[rowptr[i] + k] == v.

Consider the 6x7 sparse matrix in Figure 1. The above described data structures corresponding to

this matrix can be seen in Figure 2.

1A related assignment was first given at the University of California, Santa Barbara at the end of a graduate course based

on the parallel programming language MPI. The current assignment was developed to allow comparative study of program

development-time for XMTC vs MPI. See http://www.cs.umd.edu/ basili/publications/proceedings/P119.pdf

1

Discover Assignments of Electrical and Electronics Engineering University of Maryland

Partial preview of the text

Download Matrix-Vector Multiplication Assignment for ENEE759K/CMSC751 Course and more Assignments Electrical and Electronics Engineering in PDF only on Docsity!

HW1: Matrix-Vector Multiplication

Course: ENEE759K/CMSC Title: Matrix-vector multiplication (matvec) Date Assigned: February 10th, 2009 Date Due: February 24th, 2009 Contact: Fuat Keceli – keceli (at) umd (dot) edu

1 Assignment

Your assignment is to implement a parallel algorithm in XMTC to multiply a sparse matrix with a dense vector 1. Your parallel algorithm should be as fast as possible. Use the data structures described in the next section. Your implementation should satisfy the following:

Each row will be handled by a single thread.
No thread will handle more than one row.

2 Data structures

In this assignment the sparse matrix is represented using the following three data structures:

rowptr array: For each row i of the sparse matrix, rowptr[i] contains the index number of the first nonzero element in this row. This index can be used in col_ind and values arrays (see below). If row i does not contain a non-zero element (i.e. it is all zeros), then rowptr[i] == rowptr[i + 1 ]. This array has m + 1 elements where m is the number of rows in the matrix. The last element of this row points to the outside of the col_ind and values array to indicate that there are no more non-zero numbers.
col_ind array: This array contains the column indices of the non-zero elements. If the matrix element at row i column j is a non-zero element, then for some k such that 0 ≤ k < rowptr[i + 1 ] − rowptr[i], col_ind[rowptr[i] + k] == j.
values array: This array contains the values of non-zero elements. This array is indexed similar to the col_ind array. If the matrix element at row i column j has the non-zero value v, then for some k such that 0 ≤ k < rowptr[i + 1 ] − rowptr[i], values[rowptr[i] + k] == v.

Consider the 6x7 sparse matrix in Figure 1. The above described data structures corresponding to this matrix can be seen in Figure 2.

(^1) A related assignment was first given at the University of California, Santa Barbara at the end of a graduate course based on the parallel programming language MPI. The current assignment was developed to allow comparative study of program development-time for XMTC vs MPI. See http://www.cs.umd.edu/ basili/publications/proceedings/P119.pdf

Figure 1: Example 6x7 sparse matrix

Figure 2: Implementation

2.1 Setting up the environment

The header files and the binary files can be downloaded from ∼ george/xmtdata. To get the data files, log in to your account in the class server and copy the matvec.tgz file from directory using the following commands:

$ cp ~george/xmtdata/matvec.tgz ~ $ tar xzvf matvec.tgz

This will create the directory matvec with following folders: data, src, and doc. Data files are available in data directory. Put your c files to src, and txt files to doc.

3 Questions

Serial implementation:

(a) Describe the serial algorithm of matvec in file algorithm.s.txt (b) Provide a brief work and time complexity analysis of this algorithm. Append this analysis to the file algorithm.s.txt (c) Write the XMTC serial program that executes this algorithm. Use matvec.s.c that is given to you in src. Write your code to the place indicated in the file. Please do not modify the marked region, you will use that region to check the correctness of your program. (d) Run this program using 4 sets of data given in the Input section. (e) Collect the number of clock cycles for each run and fill out the table in doc/table.txt using this information (see Output section).

Description Data Set Header File Binary file Max. non-zero elements / row Small m = 50, n = 100 data/small/matvec.h data/small/matvec.xbo 5 nnz = 110 Medium m = 400, n = 100 data/medium/matvec.h data/medium/matvec.xbo 9 nnz = 826 Large m = 10000, n = 100 data/large/matvec.h data/large/matvec.xbo 10 nnz = 19872 X-Large m = 30000, n = 100 data/xlarge/matvec.h data/xlarge/matvec.xbo 10 nnz = 60130

Table 2: Header files

4 Testing the program

You can test the correctness of your programs with the result data given in data sets as follows:

xmtcc matvec.p.c -include ../data/small/matvec.h ../data/small/matvec.xbo
-D PRINT_RESULT -quiet -o matvec.p xmtfpga matvec.p.b -o myFileSmall.txt diff -b myFileSmall.txt ../data/small/resultFileSmall.txt

If the diff command does not give any output, it means that you have the same result value and your program is right. Don’t forget the -b option. IMPORTANT: The FPGA has limited reserved space for standard output right now. This causes problems when while using printf statements to test the output of larger programs, such as matvec for the XLarge testcase. In particular, if you run your reference solution with that dataset and compare the outputs, you will notice that the result is truncated and diff with the provided result will fail. Use the standard output only to test the first 3 dataset (small, medium and large), and not the largest one (xlarge).

4.1 Output

Fill the following table: A text file named table.txt in doc is already created for you. Fill out the table in this text file using white spaces to indent the fields. This text file will be parsed automatically by a script so it is important to adhere to the format. Remove any printf statements from your code while taking these measurements. Printf statements increase the clock count. Therefore the measurements with printf statements may not reflect the actual time and work done.

Input size Small Medium Large X-large matvec.s.c matvec.p.c

Table 3: Clock cycles will be written to table.txt

Note that, a part of your grading criteria is the performance of your parallel implementation. There- fore you should try to obtain the fastest running parallel program. As a guideline, following are the cycle counts for our reference serial and parallel implementations on the FPGA computer. Serial im- plementation: 3653342 clock cycles, parallel implementation: 214319 clock cycles, parallel speedup: ∼17x.

4.2 Submission

The use of the make utility for submission make submit is required. Make sure that you have the correct files at correct locations (src and doc directories) using the make submitcheck command. Run following commands to submit the assignment:

$ make submitcheck $ make submit

Matrix-Vector Multiplication Assignment for ENEE759K/CMSC751 Course, Assignments of Electrical and Electronics Engineering

Related documents

Partial preview of the text

Download Matrix-Vector Multiplication Assignment for ENEE759K/CMSC751 Course and more Assignments Electrical and Electronics Engineering in PDF only on Docsity!

HW1: Matrix-Vector Multiplication

1 Assignment

2 Data structures

2.1 Setting up the environment

4.1 Output

4.2 Submission