Scope of Parallelism Continued-Parallel Processing-Lecture Slides, Slides of Parallel Computing and Programming

Prof. Bhairav Gupta delivered this lecture at Ankit Institute of Technology and Science for Parallel Processing course. It includes: Parallelism, Processor, Memory, System, Performance, , Latency, Prefetch, Bandwidth, Datapath, Bottlenecks

Typology: Slides

2011/2012

Uploaded on 07/23/2012

paramita
paramita 🇮🇳

4.6

(16)

120 documents

1 / 16

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
1.2
Scope of Parallelism
Conventional architectures coarsely
comprise of a processor, memory
system, and the datapath.
Each of these components present
significant performance bottlenecks.
Parallelism addresses each of these
components in significant ways.
It is important to understand each of
these performance bottlenecks.
docsity.com
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff

Partial preview of the text

Download Scope of Parallelism Continued-Parallel Processing-Lecture Slides and more Slides Parallel Computing and Programming in PDF only on Docsity!

Scope of Parallelism

  • Conventional architectures coarsely

comprise of a processor, memorysystem, and the datapath.

  • Each of these components present

significant performance bottlenecks.

  • Parallelism addresses each of these

components in significant ways.

  • It is important to understand each of

these performance bottlenecks.

docsity.com

Limitations of

Memory System Performance

•^

Memory system, and not processor speed, is oftenthe bottleneck for many applications.

-^

Memory system performance is largely captured bytwo parameters, latency and bandwidth.

-^

Latency is the time from the issue of a memoryrequest to the time the data is available at theprocessor.

-^

Bandwidth is the rate at which data can be pumpedto the processor by the memory system.

-^

Prefetch & Caches help to alleviate the problems

docsity.com

Control Structure of Parallel

Programs

• Parallelism can be expressed at various

levels of granularity – e.g instructionlevel to process-level (fine vs coarse).

• Between these extremes exist a range

of models, along with correspondingarchitectural support.

docsity.com

Flynn’s Classification for Parallel

Architecture

•^

Instruction Stream & Data Streams basedclassification (SISD, MISD, SIMD, MIMD)

-^

Processing units in parallel computers either operateunder the centralized control of a single control unitor work independently.

-^

If there is a single control unit that dispatches thesame instruction to various processors (that work ondifferent data), the model is referred to as singleinstruction stream, multiple data stream (SIMD).

-^

If each processor has its own control unit, eachprocessor can execute different instructions ondifferent data items. This model is called multipleinstruction stream, multiple data stream (MIMD).

docsity.com

SIMD Processors

•^

Some of the earliest parallel computers such as theIlliac IV, MPP, DAP, CM-2, and MasPar MP-1 belongedto this class of machines.

-^

Variants of this concept have found use in co-processingunits such as the MMX units in Intel processors and DSPchips such as the Sharc.

-^

SIMD relies on the regular structure of computations(such as those in image processing).

-^

It is often necessary to selectively turn off operations oncertain data items. For this reason, most SIMDprogramming paradigms allow for an ``activity mask'',which determines if a processor should participate in acomputation or not.

docsity.com

Conditional Execution in SIMD Processors

Idle

Idle

(a) Step 2(b)

Idle

Step 1 Initial values

Idle

B C

0

A B C^

0

A B C

0

A B

A

0

else

C

Processor 0

Processor 1

Processor 2

5 0

4 2

1 1

0 0

A B C^

0

A B C

A B C^

0

A B C

5

0

C = A;C = A/B; if (B == 0)

Processor 3

Processor 0

Processor 1

Processor 2

Processor 3

5 0

4 2

1 1

0 0

Processor 0

Processor 1

Processor 2

Processor 3

5 0

4 2

1 1

0 0 0

A B C

A B C

A B C

A B C^

5

1

2

Executing a conditional statement on an SIMD computer with four processors:(a) the conditional statement; (b) the execution of the statement in two steps.

docsity.com

SIMD-MIMD Comparison

•^

Synchronous (SIMD) vs Asynchronous (MIMD) controlflow

-^

Cost: SIMD computers require less hardware than MIMDcomputers (single control unit).

-^

However, since SIMD processors are specially designed,they tend to be expensive and have long design cycles.

-^

Flexibility: SIMD perform very well for specialized /regular applications but Not for all applications. So MIMDare more flexible & general purpose.

-^

In contrast, platforms supporting the SPMD paradigmcan be built from inexpensive off-the-shelf componentswith relatively little effort in a short amount of time.

docsity.com

Shared memory vs Message passing

platforms

•^

There are two primary forms of data exchange betweenparallel tasks - accessing a shared data space andexchanging messages.

-^

Platforms that provide a shared memory for data sharingare called multiprocessors. Shared memory can beCentral or DSM.

-^

Platforms that exchange messaging for sharing data arecalled message passing platforms or multicomputers.

-^

Shared memory platforms have low comm overhead,can support lower grain levels, while message passingmore suited for coarse grain levels

docsity.com

Simplistic view of a small shared memory

multiprocessor (SMP)

Examples:

•^

Dual Pentiums

-^

Quad Pentiums

Processors

Shared memory

Bus

docsity.com

Quad Pentium SharedMemory MultiprocessorProcessorL2 Cache

Bus interface

L1 cache

Processor

L2 Cache Bus interface

L1 cache

Processor

L2 Cache Bus interface

L1 cache

Processor

L2 Cache Bus interface

L1 cache

Memory controller

Memory

I/O interface

I/O bus

Processor/memorybus

Shared memory

docsity.com

Shared-Address-Space

vs.

Shared Memory Machines

-^

It is important to note the difference betweenthe terms shared address space and sharedmemory.

-^

We refer to the former as a programmingabstraction and to the latter as a physicalmachine attribute.

-^

It is possible to provide a shared addressspace using a physically distributed memory.

-^

Symmetric Multiprocessor (SMP=> UMA) &Distributed Shared Memory (DSM => NUMA)

docsity.com