Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Calculating Program - High Performance Computing - Lecture Slides, Slides of Computer Science

Biju Patnaik University of Technology Computer Science

Some concept of High Performance Computing are Addressing Modes, Program Execution, Basic Computer Organization, Control Hazard Solutions, Least Recently Used, Memory Hierarchy Progression. Main points of this lecture are: Calculating Program, Master Process, Number of Intervals, Master Process, Slave Process, Decomposition, Assignment, Orchestration, Mapping, Mapping of Processes to Processors

Typology: Slides

2012/2013

Uploaded on 04/28/2013

dewaan 🇮🇳

3.8

(4)

43 documents

1 / 20

This page cannot be seen from the preview

Don't miss anything!

High Performance Computing

Lecture 41

Docsity.com

Discover Slides of Computer Science Biju Patnaik University of Technology

Partial preview of the text

Download Calculating Program - High Performance Computing - Lecture Slides and more Slides Computer Science in PDF only on Docsity!

High Performance Computing

Lecture 41

Example: MPI Pi Calculating Program

/Each process initializes, determines the communicator

size and its own rank

MPI_Init (&argc, &argv);

MPI_Comm_size ( MPI_COMM_WORLD, &numprocs);

MPI_Comm_rank ( MPI_COMM_WORLD, &myid);

/The master process (P

) takes input from the user

if (myid == 0){

printf(“Enter the number of intervals”);

scanf(“%d”, &n);

/The master process broadcasts the value of n

MPI_Bcast (&n,1,MPI_INT,0, MPI_COMM_WORLD);

Parallelizing a Program

Given a sequential program/algorithm, how to

go about producing a parallel version

Four steps in program parallelization

1. Decomposition

Identifying parallel tasks with large extent of possible parallel activity

2. Assignment

Grouping the tasks into processes with best load balancing

3. Orchestration

Reducing synchronization and communication costs

4. Mapping

Mapping of processes to processors

Example 1: Barrier Implementation

 What is a barrier?

 A process synchronization primitive

 If n cooperating processes all include a call to the

barrier primitive …

 Each entering process gets blocked on the barrier

call until all the n processes have reached the

barrier call

 Thus, the n processes are synchronized on

departure from the barrier call

Linear Barrier Pseudocode

P

0 P 2

P

P 7

1 When a process reaches the barrier call, it sends a message to the master process

Linear Barrier Pseudocode

P

0 P 2

P

P 7

1 When the master process has received n messages, it sends a message to each of the participating processes to go ahead

Alternatively …

P

0 P 2

P

5 P 6

P

P 7

1 Master does 3 receives and then 3 sends

Tree Barrier

Alternatively …

P

0 P 2

P

5 P 6

P

P 7

Butterfly Barrier

Each process does 3 send- receives Stage 1: P0-P1, P2-P3, P4-P5, P6-P Stage 2: P0-P2, P1-P3, P4-P6, P5-P Stage 3: P0-P4, P1-P5, P2-P6, P3-P

Some Decomposition Options

1. A parallel task for each element update

Option 1

Some Decomposition Options..

1. A parallel task for each element update

 Maximum parallelism: n

 Synchronization required: wait for left & top values

 High synchronization cost

2. A parallel task for each anti-diagonal

Option 2 Anti-diagonals

Option 3 Blocks of rows

High Performance Computing

Program execution: Compilation, Object files, Function call and return, Address space, Data & its representation (4)
Computer organization: Memory, Registers, Instruction set architecture, Instruction processing (6)
Virtual memory: Address translation, Paging (4)
Operating system: Processes, System calls, Process management (6)
Pipelined processors: Structural, data and control hazards, impact on programming (4)
Cache memory: Organization, impact on programming (5)
Program profiling (2)
File systems: Disk management, Name management, Protection (4)
Parallel programming: Inter-process communication, Synchronization, Mutual exclusion, Parallel architecture, Programming with message passing using MPI (5)

Calculating Program - High Performance Computing - Lecture Slides, Slides of Computer Science

Related documents

Partial preview of the text

Download Calculating Program - High Performance Computing - Lecture Slides and more Slides Computer Science in PDF only on Docsity!

High Performance Computing

Lecture 41

Example: MPI Pi Calculating Program

/Each process initializes, determines the communicator

size and its own rank

MPI_Init (&argc, &argv);

MPI_Comm_size ( MPI_COMM_WORLD, &numprocs);

MPI_Comm_rank ( MPI_COMM_WORLD, &myid);

/The master process (P

) takes input from the user

if (myid == 0){

printf(“Enter the number of intervals”);

scanf(“%d”, &n);

/The master process broadcasts the value of n

MPI_Bcast (&n,1,MPI_INT,0, MPI_COMM_WORLD);

Parallelizing a Program

Given a sequential program/algorithm, how to

go about producing a parallel version

Four steps in program parallelization

1. Decomposition

2. Assignment

3. Orchestration

4. Mapping

Example 1: Barrier Implementation

 What is a barrier?

 A process synchronization primitive

 If n cooperating processes all include a call to the

barrier primitive …

 Each entering process gets blocked on the barrier

call until all the n processes have reached the

barrier call

 Thus, the n processes are synchronized on

departure from the barrier call

Linear Barrier Pseudocode

P

P

P

P

P

P

P 7

Linear Barrier Pseudocode

P

P

P

P

P

P

P 7

Alternatively …

P

0 P 2

P

P

P

5 P 6

P

P 7

Tree Barrier

Alternatively …

P

0 P 2

P

P

P

5 P 6

P

P 7

Butterfly Barrier

Some Decomposition Options

1. A parallel task for each element update

Option 1

Some Decomposition Options..

1. A parallel task for each element update

 Maximum parallelism: n

 Synchronization required: wait for left & top values