Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Log in Sign up

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Parallel Numerical Libraries-Computer Science-Project Report, Study Guides, Projects, Research of Computer Science

Acharya Nagarjuna University Computer Science

This is project report related to Computer science degree. Project was supervised by Dr. Niharika Raj at Acharya Nagarjuna University. Its main points are: Cluster, Computer, Scope, Performance, Parallel, Comparsion, Computing, Matlab, Programming, Fortran, Linux

Typology: Study Guides, Projects, Research

2011/2012

Uploaded on 07/16/2012

sanjeev 🇮🇳

5

(1)

19 documents

1 / 79

This page cannot be seen from the preview

Don't miss anything!

1

Chapter 1. Introduction

The performance of a system is an important characteristic but it is difficult to

characterize with a single number. The best way of capturing performance is to

identify what the system must do to fulfill its mission and then identify the values that

allow one to characterize performance for that mission. If the analysis of the workload

shows that one or more standard benchmarks is a close approximation to the

workload, then one may use the relative performance of different systems on the

relevant benchmark(s) to provide a good predictor of performance on the company‟s

workload. If no benchmarks fit well, then one can compare performance only by

running tests on the real workload.

1.1 Scope

Parallel computers can run some types of programs far faster than traditional single

processor computers. That is why there is an emerging trend in the use of parallel

computing.

Most of the people do not want to run all the available programs related to their

problem in order to find the best performing program. This project will help them to

see some performance comparison of different parallel programs so that they might be

able to start their initial task easily.

When writing programs for parallel computer architectures, programmers

explicitly specify how to decompose the computing work between all available nodes.

After this project, performance of certain libraries will be available and by analyzing

the decomposition of tasks of these well performing libraries, programmers can have

an idea about the decomposition rules of the program that can give better

performance.

1.2 Applications

Parallelism finds applications in very diverse application domains for different

motivating reasons. The use of multi-processor machines or parallel computing has

made a great impact in a variety of areas, especially in computational simulations for

scientific and engineering applications. These range from improved application

performance to cost considerations. Some of the particular fields are,

docsity.com

Discover Study Guides, Projects, Research of Computer Science Acharya Nagarjuna University

Partial preview of the text

Download Parallel Numerical Libraries-Computer Science-Project Report and more Study Guides, Projects, Research Computer Science in PDF only on Docsity!

Chapter 1. Introduction

The performance of a system is an important characteristic but it is difficult to characterize with a single number. The best way of capturing performance is to identify what the system must do to fulfill its mission and then identify the values that allow one to characterize performance for that mission. If the analysis of the workload shows that one or more standard benchmarks is a close approximation to the workload, then one may use the relative performance of different systems on the relevant benchmark(s) to provide a good predictor of performance on the company‟s workload. If no benchmarks fit well, then one can compare performance only by running tests on the real workload.

1.1 Scope

Parallel computers can run some types of programs far faster than traditional single processor computers. That is why there is an emerging trend in the use of parallel computing. Most of the people do not want to run all the available programs related to their problem in order to find the best performing program. This project will help them to see some performance comparison of different parallel programs so that they might be able to start their initial task easily. When writing programs for parallel computer architectures, programmers explicitly specify how to decompose the computing work between all available nodes. After this project, performance of certain libraries will be available and by analyzing the decomposition of tasks of these well performing libraries, programmers can have an idea about the decomposition rules of the program that can give better performance.

1.2 Applications

Parallelism finds applications in very diverse application domains for different motivating reasons. The use of multi-processor machines or parallel computing has made a great impact in a variety of areas, especially in computational simulations for scientific and engineering applications. These range from improved application performance to cost considerations. Some of the particular fields are,

 Earth Quakes  Economics  Fission  Fusion  Material Science  Medical Imaging

1.3 Thesis Layout

In the first chapter, Introduction , we have discussed the scope and applications of this project. Second chapter, Parallel Numerical Libraries , gives detail description of the parallel numerical libraries selected for analysis purposes. Namely these libraries include PETSc, ScaLAPACK and MatlabMPI. Chapter 3, Underlying Architectures , discusses the hardware using which we implemented the examples of parallel numerical libraries. Chapter 4, Performance Metrics , discusses the performance metrics used to evaluate the performance of parallel libraries. These performance metrics include parallel execution time, speed-up and efficiency. Finally, the results after analyzing the libraries are discussed in chapter 5.

The components below the line, labeled “LOCAL” are called on a single processor. Whereas the components above the line, labeled “GLOBAL” are making use of parallel routines.

2.1.2 Naming Convention of ScaLAPACK Routines

All routines have the name of the form PXYYZZZ , where the seventh character is left blank for some routines. The first letter „P‟ indicates that it is the parallel version of LAPACK routine. The second letter „X‟ shows the data type that can be real(S), double (D), complex(C) or double complex (Z). The next two letters “YY” indicate the type of the matrix. Matrix type can be tri- diagonal (DT), symmetric (SY), general (GE) etc. The last three letters “ZZZ” indicate the computation performed.

2.2 PETSc

The PETSc (Portable, Extensible Toolkit for Scientific Computation) is a suite of data structures and routines that provide the building blocks for the implementation of large scale application codes on parallel (and serial) computers. PETSc uses the MPI standard for all message-passing communication. PETSc includes an expanding suite of parallel linear, nonlinear equation solvers and time integrators that may be used in application codes written in Fortran, C, and C++[ 2 ]. PETSc provides many of the mechanisms needed within parallel application codes, such as parallel matrix and vector assembly routines. The library is organized hierarchically, enabling users to employ the level of abstraction that is most appropriate for a particular problem. By using techniques of object-oriented programming, PETSc provides enormous flexibility for users. Some of the PETSc modules deal with  Vectors  Matrices (generally sparse)  Distributed arrays (useful for parallelizing regular grid-based problems) PETSc has been used for modeling in the areas such as Acoustics, Combustion, Concrete, Aerodynamics, Bone Fractures, Brain Surgery, Cancer Surgery, Cancer Cure, Carbon Sequestration, Cardiology, Air Pollution, Corrosion, Data Mining, Earth Quakes, and Surface Water Flow [ 3 ].

2.2.1 External Libraries

PETSc makes use of a variety of external libraries. During installation process, user can select the external libraries of his own choice. These external libraries include BLAS, LAPACK, LINPACK, MINPACK, SPARSPAK etc. [ 2 ].

2.2.2 PETSc Objects

PETSc library has its own objects. These objects are particularly optimized for vector and matrix operations. Moreover these objects can be specified for sparse vectors and matrices [ 2 ]. Some PETSc objects include Vec, Mat, etc. For viewing the vectors and matrices, PETSc has PETSc Viewer objects.

2.3 MatlabMPI

Matlab is the leading programming language for executing numerical computations. It is extensively used for algorithm development, simulation, data reduction, and testing and system evaluation by many engineers and scientists [ 4 ]. Many of these computations could be implemented in less time on a parallel computer. Many attempts were made to provide efficient ways of executing its programs. But none of these has received widespread acceptance. Message Passing Interface (MPI) is the de- facto standard for implementing programs on multiple processors, in the world of parallel computing. MPI provides C and FORTRAN language routines for communication in a parallel program. MPI is widely used in the world's most demanding applications (weather modeling, weapons simulation, aircraft design, etc.). MatlabMPI is set of Matlab scripts that allow us to execute any Matlab program on a parallel computer. Thus, MatlabMPI will run on any combination of computers that Matlab supports. Most of the routines provide by MatlabMPI are almost the same as MPI [ 5 ].

Table 3-2 shows the hardware specifications of Grid Lab‟s cluster. It has one 8-core system, five 4-core systems and two 2-core systems. So, it has a total of 32 processors. It is using Scientific LINUX [ 6 ] operating system. The system with 8- cores acts as a master node in the cluster.

Chapter 4. Performance Metrics

In order to analyze the performance of a system, we need some performance evaluation metrics. By using these metrics, we can perform comparison of various types of systems. In parallel computing, the criteria for evaluating performance can include speedup, efficiency and scalability.

4.1 Parallel Execution Time

The parallel execution time is the measure of time from the moment a parallel computation starts to the moment the last processing element finishes execution[ 7 ]. Serial time is denoted by TS while parallel execution time is denoted by TP.

4.2 Speedup

Speedup tells us that how much a parallel algorithm is faster than a corresponding sequential algorithm. It is defined as

Where Sp is the speed up, TS is the execution time for sequential algorithm, TP is the execution time for parallel algorithm and p is the number of processors. There are three cases for speedup linear, sub-linear and super-linear as shown in figure 4-1.

Figure 4-1 Speedup vs. Number of Processors[ 8 ]

Chapter 5. Results and Discussions

In this section, all the results for this project are discussed. Results are displayed in the form of graphs. A conclusion is adopted after discussion on these results. This conclusion could assist user in choosing a particular parallel numerical library that suits him the most.

5.1 PETSc’s Performance Analysis

Following are the examples implemented using PETSc library.  Matrix Operations o Matrix-Vector Multiplication o Matrix-Matrix Multiplication  System of Linear Equations o Jacobi o SOR In all these examples, the dimensions of the matrix and vector are n x n and n x 1 respectively. Vector, b = [ 1,1,...,1]T^ and the n x n matrix A has 3 on the main diagonal, -1 on the super and sub diagonal, and 0.5 on the (i,n+1-i) position of all i=1,…,n except for i=n/2 and n/2+1. All of these examples were executed on Virtu and Cluster. The specifications of these underlying architectures are discussed earlier in this report. The performance of PETSc is best at Virtu. These examples were also executed on only the master node of cluster at grid lab. During this time all other computers of cluster were switched off. These examples were executed with different dimensions of matrices. Dimensions were varied from 50 x 50 to 10,000 x 10,000. Three different underlying architectures are used for these examples. In the figures 5-1 to 5-44, the following Graph Legends are used, Virtu: Multicore machine with 8 processors. For more information see section-3. Grid: A cluster consisting of 8 computers and a total of 32 physical processors. For more information see section-3.2. Grid 8n: Master node of the cluster described in section-3.2. It is multicore machine with 8 physical processors.

5.1.1 Matrix-Vector Multiplication (Parallel Execution Time)

Figures 5-1 to 5-6 show the graphs of parallel execution time for matrix vector multiplication. X-axis indicates the time in seconds and Y-axis indicates the number of processors. It is clear from the below timing graphs that Virtu gives the better performance. Although the cluster at Grid Lab has a total of 32 processors but there come the bottle neck of communication through the LAN. With matrix of dimensions 50 x 50, the parallel execution time is much more than the serial time. This is due to the fact of communication overhead. After that as we increase the dimensions of matrix, the parallel time go on decreasing as compared to the sequential time. Great decrease in parallel execution time can be seen when the dimensions of matrix are 1000 x 1000 and above. Moreover the parallel time keeps on decreasing until the number of processors is less than 8. It is due to the fact that Virtu has 8 processors. Above 8 processors the performance becomes worse.

Figure 5-1 Parallel Execution Time Comparison for Matrix-Vector Multiplication operation of Dimension 50 x 50

(^00 20 40 60 80 100 120 )

0.25 Parallel Time Comparison Size 50

Number of Processors

Time (seconds)

Virtu Grid Grid 8n

Figure 5-4 Parallel Execution Time Comparison for Matrix-Vector Multiplication with Dimension 1000 x 1000

Figure 5-5 Parallel Execution Time Comparison for Matrix-Vector Multiplication with Dimension 5000 x 5000

Figure 5-6 Parallel Execution Time Comparison for Matrix-Vector Multiplication with Dimension 10,000 x 10,

5.1.2 Matrix-Vector Multiplication (Speed Up & Efficiency)

Figures 5-7 to 5-12 show the graphs of speed up for matrix vector multiplication and Figures 5-13 to 5-18 show the graphs of efficiency for matrix vector multiplication. Speed up tells us that how much the parallel algorithm is better than the serial algorithm. Theoretically, if value of speed up is less than one then we say that it is useless to parallelize the algorithm. For problems of matrix dimension 50 x 50 and 100 x 100, we can see that the speed up is less than one. And for problems of matrix dimension 500 x 500 and above, we can see that the speed up is greater than one. Hence we can conclude that it is useless to use the PETSc‟s for problem size less than 500 x 500. Efficiency tells us that how much time or resources are wasted in communication overhead, synchronization etc. Ideally, efficiency should be equal to one. From the graphs of efficiency, we can see that efficiency approaches to one when our problem size is very large i.e. 5000 and above. Moreover, we can notice that efficiency approaches to one when number of processes ranges from 4 to 8.

(^00 20 40 60 80 100 120 )

Parallel Time Comparison Size 10,

Number of Processors

Time (seconds)

Virtu Grid Grid 8n

Figure 5-9 Speed Up for Matrix Vector Multiplication of Problem Size 500 x 500

Figure 5-10 Speed Up for Matrix Vector Multiplication of Problem Size 1000 x 1000

(^00 20 40 60 80 100 120 )

1

1.5 Speed Up for Problem Size 500 x 500

Number of Processors

Speed UP

Virtu Grid Grid 8n

(^00 20 40 60 80 100 120 )

1

2

3 Speed Up for Problem Size 1000 x 1000

Number of Processors

Speed UP

Virtu Grid Grid 8n

Figure 5-11 Speed Up for Matrix Vector Multiplication of Problem Size 5000 x 5000

Figure 5-12 Speed Up for Matrix Vector Multiplication of Problem Size 10,000 x 10,

(^00 20 40 60 80 100 120 )

1

2

3

4

5

6

7

8 Speed Up for Problem Size 5000 x 5000

Number of Processors

Speed UP

Virtu Grid Grid 8n

(^00 20 40 60 80 100 120 )

2

4

6

8

10

12

14 Speed Up for Problem Size 10,000 x 10,

Number of Processors

Speed UP

Virtu Grid Grid 8n

Figure 5-15 Efficiency for Matrix Vector Multiplication of Problem Size 500 x 500

Figure 5-16 Efficiency for Matrix Vector Multiplication of Problem Size 1000 x 1000

(^00 20 40 60 80 100 120 )

0.7 Problem Size 500 x 500

Number of Processors

Efficiency

Virtu Grid Grid 8n

(^00 20 40 60 80 100 120 )

0.9 Problem Size 1000 x 1000

Number of Processors

Efficiency

Virtu Grid Grid 8n

Figure 5-17 Efficiency for Matrix Vector Multiplication of Problem Size 5000 x 5000

Figure 5-18 Efficiency for Matrix Vector Multiplication of Problem Size 10,000 x 10,

(^00 20 40 60 80 100 120 )

1 Problem Size 5000 x 5000

Number of Processors

Efficiency

Virtu Grid Grid 8n

(^00 20 40 60 80 100 120 )

1 Problem Size 10,000 x 10,

Number of Processors

Efficiency

Virtu Grid Grid 8n

Parallel Numerical Libraries-Computer Science-Project Report, Study Guides, Projects, Research of Computer Science

Related documents

Partial preview of the text

Download Parallel Numerical Libraries-Computer Science-Project Report and more Study Guides, Projects, Research Computer Science in PDF only on Docsity!

Chapter 1. Introduction

1.1 Scope

1.2 Applications

1.3 Thesis Layout

2.1.2 Naming Convention of ScaLAPACK Routines

2.2 PETSc

2.2.1 External Libraries

2.2.2 PETSc Objects

2.3 MatlabMPI

Chapter 4. Performance Metrics

4.1 Parallel Execution Time

4.2 Speedup

Chapter 5. Results and Discussions

5.1 PETSc’s Performance Analysis

5.1.1 Matrix-Vector Multiplication (Parallel Execution Time)

5.1.2 Matrix-Vector Multiplication (Speed Up & Efficiency)