Calculating Program - High Performance Computing - Lecture Slides, Slides of Computer Science

Some concept of High Performance Computing are Addressing Modes, Program Execution, Basic Computer Organization, Control Hazard Solutions, Least Recently Used, Memory Hierarchy Progression. Main points of this lecture are: Calculating Program, Master Process, Number of Intervals, Master Process, Slave Process, Decomposition, Assignment, Orchestration, Mapping, Mapping of Processes to Processors

Typology: Slides

2012/2013

Uploaded on 04/28/2013

dewaan
dewaan 🇮🇳

3.8

(4)

43 documents

1 / 20

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
High Performance Computing
Lecture 41
Docsity.com
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14

Partial preview of the text

Download Calculating Program - High Performance Computing - Lecture Slides and more Slides Computer Science in PDF only on Docsity!

High Performance Computing

Lecture 41

2

Example: MPI Pi Calculating Program

/Each process initializes, determines the communicator

size and its own rank

MPI_Init (&argc, &argv);

MPI_Comm_size ( MPI_COMM_WORLD, &numprocs);

MPI_Comm_rank ( MPI_COMM_WORLD, &myid);

/The master process (P

0

) takes input from the user

if (myid == 0){

printf(“Enter the number of intervals”);

scanf(“%d”, &n);

/The master process broadcasts the value of n

MPI_Bcast (&n,1,MPI_INT,0, MPI_COMM_WORLD);

4

Parallelizing a Program

Given a sequential program/algorithm, how to

go about producing a parallel version

Four steps in program parallelization

1. Decomposition

Identifying parallel tasks with large extent of possible parallel activity

2. Assignment

Grouping the tasks into processes with best load balancing

3. Orchestration

Reducing synchronization and communication costs

4. Mapping

Mapping of processes to processors

5

Example 1: Barrier Implementation

 What is a barrier?

 A process synchronization primitive

 If n cooperating processes all include a call to the

barrier primitive …

 Each entering process gets blocked on the barrier

call until all the n processes have reached the

barrier call

 Thus, the n processes are synchronized on

departure from the barrier call

7

Linear Barrier Pseudocode

P

0 P 2

P

3

P

4

P

5

P

6

P

P 7

1 When a process reaches the barrier call, it sends a message to the master process

8

Linear Barrier Pseudocode

P

0 P 2

P

3

P

4

P

5

P

6

P

P 7

1 When the master process has received n messages, it sends a message to each of the participating processes to go ahead

10

Alternatively …

P

0 P 2

P

3

P

4

P

5 P 6

P

P 7

1 Master does 3 receives and then 3 sends

Tree Barrier

11

Alternatively …

P

0 P 2

P

3

P

4

P

5 P 6

P

P 7

1

Butterfly Barrier

Each process does 3 send- receives Stage 1: P0-P1, P2-P3, P4-P5, P6-P Stage 2: P0-P2, P1-P3, P4-P6, P5-P Stage 3: P0-P4, P1-P5, P2-P6, P3-P

13

Some Decomposition Options

1. A parallel task for each element update

14

Option 1

16

Some Decomposition Options..

1. A parallel task for each element update

 Maximum parallelism: n

2

 Synchronization required: wait for left & top values

 High synchronization cost

2. A parallel task for each anti-diagonal

17

Option 2 Anti-diagonals

19

Option 3 Blocks of rows

20

High Performance Computing

  1. Program execution: Compilation, Object files, Function call and return, Address space, Data & its representation (4)
  2. Computer organization: Memory, Registers, Instruction set architecture, Instruction processing (6)
  3. Virtual memory: Address translation, Paging (4)
  4. Operating system: Processes, System calls, Process management (6)
  5. Pipelined processors: Structural, data and control hazards, impact on programming (4)
  6. Cache memory: Organization, impact on programming (5)
  7. Program profiling (2)
  8. File systems: Disk management, Name management, Protection (4)
  9. Parallel programming: Inter-process communication, Synchronization, Mutual exclusion, Parallel architecture, Programming with message passing using MPI (5)