









Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Prof. Bhairav Gupta delivered this lecture at Ankit Institute of Technology and Science for Parallel Processing course. It includes: Parallelism, Processor, Memory, System, Performance, , Latency, Prefetch, Bandwidth, Datapath, Bottlenecks
Typology: Slides
1 / 16
This page cannot be seen from the preview
Don't miss anything!










docsity.com
Limitations of
Memory System Performance
Memory system, and not processor speed, is oftenthe bottleneck for many applications.
-^
Memory system performance is largely captured bytwo parameters, latency and bandwidth.
-^
Latency is the time from the issue of a memoryrequest to the time the data is available at theprocessor.
-^
Bandwidth is the rate at which data can be pumpedto the processor by the memory system.
-^
Prefetch & Caches help to alleviate the problems
docsity.com
docsity.com
Flynn’s Classification for Parallel
Architecture
Instruction Stream & Data Streams basedclassification (SISD, MISD, SIMD, MIMD)
-^
Processing units in parallel computers either operateunder the centralized control of a single control unitor work independently.
-^
If there is a single control unit that dispatches thesame instruction to various processors (that work ondifferent data), the model is referred to as singleinstruction stream, multiple data stream (SIMD).
-^
If each processor has its own control unit, eachprocessor can execute different instructions ondifferent data items. This model is called multipleinstruction stream, multiple data stream (MIMD).
docsity.com
SIMD Processors
Some of the earliest parallel computers such as theIlliac IV, MPP, DAP, CM-2, and MasPar MP-1 belongedto this class of machines.
-^
Variants of this concept have found use in co-processingunits such as the MMX units in Intel processors and DSPchips such as the Sharc.
-^
SIMD relies on the regular structure of computations(such as those in image processing).
-^
It is often necessary to selectively turn off operations oncertain data items. For this reason, most SIMDprogramming paradigms allow for an ``activity mask'',which determines if a processor should participate in acomputation or not.
docsity.com
Conditional Execution in SIMD Processors
Idle
Idle
(a) Step 2(b)
Idle
Step 1 Initial values
Idle
B C
0
A B C^
0
A B C
0
A B
A
0
else
C
Processor 0
Processor 1
Processor 2
5 0
4 2
1 1
0 0
A B C^
0
A B C
A B C^
0
A B C
5
0
C = A;C = A/B; if (B == 0)
Processor 3
Processor 0
Processor 1
Processor 2
Processor 3
5 0
4 2
1 1
0 0
Processor 0
Processor 1
Processor 2
Processor 3
5 0
4 2
1 1
0 0 0
A B C
A B C
A B C
A B C^
5
1
2
Executing a conditional statement on an SIMD computer with four processors:(a) the conditional statement; (b) the execution of the statement in two steps.
docsity.com
Synchronous (SIMD) vs Asynchronous (MIMD) controlflow
-^
Cost: SIMD computers require less hardware than MIMDcomputers (single control unit).
-^
However, since SIMD processors are specially designed,they tend to be expensive and have long design cycles.
-^
Flexibility: SIMD perform very well for specialized /regular applications but Not for all applications. So MIMDare more flexible & general purpose.
-^
In contrast, platforms supporting the SPMD paradigmcan be built from inexpensive off-the-shelf componentswith relatively little effort in a short amount of time.
docsity.com
Shared memory vs Message passing
platforms
There are two primary forms of data exchange betweenparallel tasks - accessing a shared data space andexchanging messages.
-^
Platforms that provide a shared memory for data sharingare called multiprocessors. Shared memory can beCentral or DSM.
-^
Platforms that exchange messaging for sharing data arecalled message passing platforms or multicomputers.
-^
Shared memory platforms have low comm overhead,can support lower grain levels, while message passingmore suited for coarse grain levels
docsity.com
Simplistic view of a small shared memory
multiprocessor (SMP)
Examples:
Dual Pentiums
-^
Quad Pentiums
Processors
Shared memory
Bus
docsity.com
Bus interface
L1 cache
Processor
L2 Cache Bus interface
L1 cache
Processor
L2 Cache Bus interface
L1 cache
Processor
L2 Cache Bus interface
L1 cache
Memory controller
Memory
I/O interface
I/O bus
Processor/memorybus
Shared memory
docsity.com
Shared-Address-Space
vs.
Shared Memory Machines
-^
-^
-^
-^
docsity.com