Basic Communication Operations-Parallel Processing-Lecture Slides, Slides of Parallel Computing and Programming

Prof. Bhairav Gupta delivered this lecture at Ankit Institute of Technology and Science for Parallel Processing course. It includes: SPMD, Program, Elements, SMP, System, Parallel, Machines, Communication, Costs, Programming, Model, Semantics

Typology: Slides

2011/2012

Uploaded on 07/23/2012

paramita
paramita ๐Ÿ‡ฎ๐Ÿ‡ณ

4.6

(16)

120 documents

1 / 18

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Quiz๎˜ƒ#๎˜ƒ04
Marks๎˜ƒ๎˜ƒ๎˜ƒ๎˜ƒ10๎˜ƒ๎˜ƒ๎˜ƒ๎˜ƒ๎˜ƒ๎˜ƒ๎˜ƒ๎˜ƒ๎˜ƒ๎˜ƒ๎˜ƒ๎˜ƒ๎˜ƒ๎˜ƒ๎˜ƒ๎˜ƒ๎˜ƒ๎˜ƒ๎˜ƒ๎˜ƒ๎˜ƒ๎˜ƒ๎˜ƒ๎˜ƒ๎˜ƒ๎˜ƒ๎˜ƒ๎˜ƒ๎˜ƒ๎˜ƒ๎˜ƒ๎˜ƒ๎˜ƒ๎˜ƒ๎˜ƒ๎˜ƒ๎˜ƒ๎˜ƒ๎˜ƒ๎˜ƒ๎˜ƒ๎˜ƒ๎˜ƒTime๎˜ƒ10๎˜ƒminutes
Write๎˜ƒan๎˜ƒSPMD๎˜ƒOpenMP๎˜ƒbased๎˜ƒprogram๎˜ƒto๎˜ƒcompute๎˜ƒ
the๎˜ƒsum๎˜ƒof๎˜ƒ1024๎˜ƒelements๎˜ƒof๎˜ƒan๎˜ƒarray๎˜ƒusing๎˜ƒ8๎˜ƒcores๎˜ƒ
SMP๎˜ƒsystem
docsity.com
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12

Partial preview of the text

Download Basic Communication Operations-Parallel Processing-Lecture Slides and more Slides Parallel Computing and Programming in PDF only on Docsity!

Quiz

Marks

Time

minutes

Write

an

SPMD

OpenMP

based

program

to

compute

the

sum

of

elements

of

an

array

using

cores

SMP

system

Basic Communication Operations

Message Passing Costs in

Parallel Computers

-^

The total time to transfer a message overa network comprises of the following:โ€“

Startup time

t^ s

): Time spent to prepare

(setup) a message (header, trailer, errorcorrection info etc) & interface tonetwork.

Per-hop time

t^ h

): Time for header

processing at each node switch per hop.

Per-word transfer time

t^ w

): Time to

transmit & buffer one word of messagebetween two communicating nodes.

Store-and-Forward Routing

-^

A message traversing multiple hops iscompletely received at an intermediate hopbefore being forwarded to the next hop.

-^

The total communication cost for a message ofsize

m

words to traverse โ€˜

l โ€™

communication

links is

-^

In most platforms,

t

h^

is small and the above

expression can be approximated by

Packet Routing

โ€ข^

Store-and-forward makes poor use of communicationresources.

-^

Packet routing breaks messages into packets andpipelines them through the network.

-^

Since packets may take different paths, each packetmust carry routing information, error checking,sequencing, and other related header information.

-^

The total communication time for packet routing isapproximated by:

-^

The factor

t

h^

accounts for overheads in packet headers.

(m

p

Cut-Through Routing

โ€ข^

Takes the concept of packet routing to an extreme byfurther dividing messages into basic units called flits.

-^

Since flits are typically small, the header informationmust be minimized.

-^

This is done by forcing all flits to take the same path, insequence.

-^

A tracer message first programs all intermediate routers.All flits then take the same route.

-^

Error checks are performed on the entire message, asopposed to flits.

-^

No sequence numbers are needed.

Communication Patterns in Different

Topologies

  • One-to-All Broadcast and All-to-One

Reduction

  • All-to-All Broadcast and Reduction

& All-Reduce Operations

  • Scatter and Gather (one to all & all to

one personalized comm)

Group Communication Operations:

โ€ข^

Group communication operations are built usingpoint-to-point messaging primitives.

-^

Recall from our discussion of architectures thatcommunicating a message of size

m

over an

uncongested network takes time (

ts

+ t

w

m )

โ€ข^

We use this as the basis for our analyses. Wherenecessary, we take congestion into account explicitlyby scaling the

tw

term.

โ€ข^

We assume that the network is bidirectional and thatcommunication is single-ported.

One-to-All Broadcast and All-to-One

Reduction on Rings

โ€ข^

Simplest way is to send

p-

messages from the source

to the other

p-

processors - this is not very efficient.

โ€ข^

Use recursive doubling: source sends a message to aselected processor. We now have two independentproblems derived over halves of machines.

-^

Reduction can be performed in an identical fashion byinverting the process.

One-to-All Broadcast on a Ring: Recursive doubling

One-to-all broadcast on an eight-node ring. Node 0 is the source of the broadcast. Each

message transfer step is shown by a numbered, dotted arrow from the source of themessage to its destination. The number on an arrow indicates the time step during

which the message is transferred.

Broadcast and Reduction on a Mesh:

One-to-all broadcast on a 16-node mesh.

Broadcast and Reduction on a Hypercube

One-to-all broadcast on a three-dimensional hypercube. The binary

representations of node labels are shown in parentheses.