Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Scope of Parallelism Continued-Parallel Processing-Lecture Slides, Slides of Parallel Computing and Programming

Ankit Institute of Technology and Science Parallel Computing and Programming

Prof. Bhairav Gupta delivered this lecture at Ankit Institute of Technology and Science for Parallel Processing course. It includes: Parallelism, Processor, Memory, System, Performance, , Latency, Prefetch, Bandwidth, Datapath, Bottlenecks

Typology: Slides

2011/2012

Uploaded on 07/23/2012

paramita 🇮🇳

4.6

(16)

120 documents

1 / 16

This page cannot be seen from the preview

Don't miss anything!

1.2

Scope of Parallelism

• Conventional architectures coarsely

comprise of a processor, memory

system, and the datapath.

• Each of these components present

significant performance bottlenecks.

• Parallelism addresses each of these

components in significant ways.

• It is important to understand each of

these performance bottlenecks.

docsity.com

Discover Slides of Parallel Computing and Programming Ankit Institute of Technology and Science

Partial preview of the text

Download Scope of Parallelism Continued-Parallel Processing-Lecture Slides and more Slides Parallel Computing and Programming in PDF only on Docsity!

Scope of Parallelism

Conventional architectures coarsely

comprise of a processor, memorysystem, and the datapath.

Each of these components present

significant performance bottlenecks.

Parallelism addresses each of these

components in significant ways.

It is important to understand each of

these performance bottlenecks.

docsity.com

Limitations of

Memory System Performance

•^

Memory system, and not processor speed, is oftenthe bottleneck for many applications.

Memory system performance is largely captured bytwo parameters, latency and bandwidth.

Latency is the time from the issue of a memoryrequest to the time the data is available at theprocessor.

Bandwidth is the rate at which data can be pumpedto the processor by the memory system.

Prefetch & Caches help to alleviate the problems

docsity.com

Control Structure of Parallel

Programs

• Parallelism can be expressed at various

levels of granularity – e.g instructionlevel to process-level (fine vs coarse).

• Between these extremes exist a range

of models, along with correspondingarchitectural support.

docsity.com

Flynn’s Classification for Parallel

Architecture

•^

Instruction Stream & Data Streams basedclassification (SISD, MISD, SIMD, MIMD)

Processing units in parallel computers either operateunder the centralized control of a single control unitor work independently.

If there is a single control unit that dispatches thesame instruction to various processors (that work ondifferent data), the model is referred to as singleinstruction stream, multiple data stream (SIMD).

If each processor has its own control unit, eachprocessor can execute different instructions ondifferent data items. This model is called multipleinstruction stream, multiple data stream (MIMD).

docsity.com

SIMD Processors

•^

Some of the earliest parallel computers such as theIlliac IV, MPP, DAP, CM-2, and MasPar MP-1 belongedto this class of machines.

Variants of this concept have found use in co-processingunits such as the MMX units in Intel processors and DSPchips such as the Sharc.

SIMD relies on the regular structure of computations(such as those in image processing).

It is often necessary to selectively turn off operations oncertain data items. For this reason, most SIMDprogramming paradigms allow for an ``activity mask'',which determines if a processor should participate in acomputation or not.

docsity.com

Conditional Execution in SIMD Processors

Idle

(a) Step 2(b)

Idle

Step 1 Initial values

Idle

B C

A B C^

A B C

A B

else

Processor 0

Processor 1

Processor 2

5 0

4 2

1 1

0 0

A B C^

A B C

A B C^

A B C

C = A;C = A/B; if (B == 0)

Processor 3

Processor 0

Processor 1

Processor 2

Processor 3

5 0

4 2

1 1

0 0

Processor 0

Processor 1

Processor 2

Processor 3

5 0

4 2

1 1

0 0 0

A B C

A B C^

Executing a conditional statement on an SIMD computer with four processors:(a) the conditional statement; (b) the execution of the statement in two steps.

docsity.com

SIMD-MIMD Comparison

•^

Synchronous (SIMD) vs Asynchronous (MIMD) controlflow

Cost: SIMD computers require less hardware than MIMDcomputers (single control unit).

However, since SIMD processors are specially designed,they tend to be expensive and have long design cycles.

Flexibility: SIMD perform very well for specialized /regular applications but Not for all applications. So MIMDare more flexible & general purpose.

In contrast, platforms supporting the SPMD paradigmcan be built from inexpensive off-the-shelf componentswith relatively little effort in a short amount of time.

docsity.com

Shared memory vs Message passing

platforms

•^

There are two primary forms of data exchange betweenparallel tasks - accessing a shared data space andexchanging messages.

Platforms that provide a shared memory for data sharingare called multiprocessors. Shared memory can beCentral or DSM.

Platforms that exchange messaging for sharing data arecalled message passing platforms or multicomputers.

Shared memory platforms have low comm overhead,can support lower grain levels, while message passingmore suited for coarse grain levels

docsity.com

Simplistic view of a small shared memory

multiprocessor (SMP)

Examples:

•^

Dual Pentiums

Quad Pentiums

Processors

Shared memory

Bus

docsity.com

Quad Pentium SharedMemory MultiprocessorProcessorL2 Cache

Bus interface

L1 cache

Processor

L2 Cache Bus interface

L1 cache

Processor

L2 Cache Bus interface

L1 cache

Processor

L2 Cache Bus interface

L1 cache

Memory controller

Memory

I/O interface

I/O bus

Processor/memorybus

Shared memory

docsity.com

Shared-Address-Space

vs.

Shared Memory Machines

It is important to note the difference betweenthe terms shared address space and sharedmemory.

We refer to the former as a programmingabstraction and to the latter as a physicalmachine attribute.

It is possible to provide a shared addressspace using a physically distributed memory.

Symmetric Multiprocessor (SMP=> UMA) &Distributed Shared Memory (DSM => NUMA)

docsity.com

Scope of Parallelism Continued-Parallel Processing-Lecture Slides, Slides of Parallel Computing and Programming

Related documents

Partial preview of the text

Download Scope of Parallelism Continued-Parallel Processing-Lecture Slides and more Slides Parallel Computing and Programming in PDF only on Docsity!

Scope of Parallelism

comprise of a processor, memorysystem, and the datapath.

significant performance bottlenecks.

components in significant ways.

these performance bottlenecks.

•^

Control Structure of Parallel

Programs

• Parallelism can be expressed at various

levels of granularity – e.g instructionlevel to process-level (fine vs coarse).

• Between these extremes exist a range

of models, along with correspondingarchitectural support.

•^

•^

SIMD-MIMD Comparison

•^

•^

•^

Quad Pentium SharedMemory MultiprocessorProcessorL2 Cache

It is important to note the difference betweenthe terms shared address space and sharedmemory.

We refer to the former as a programmingabstraction and to the latter as a physicalmachine attribute.

It is possible to provide a shared addressspace using a physically distributed memory.

Symmetric Multiprocessor (SMP=> UMA) &Distributed Shared Memory (DSM => NUMA)