Understanding Parallel Hardware: Types and Characteristics, Lecture notes of Mathematics

An overview of parallel hardware, focusing on Flynn's Taxonomy, SIMD, MISD, and MIMD systems. It covers the concepts of data parallelism, vector processors, and graphics processing units (GPUs), discussing their advantages and disadvantages. The document also touches upon shared and distributed memory systems and interconnection networks.

Typology: Lecture notes

2020/2021

Uploaded on 05/22/2021

alan-alan-22
alan-alan-22 🇭🇰

4 documents

1 / 47

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Parallel Programming
Parallel Hardware
Slides adapted from the lecture notes by Peter Pacheco
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27
pf28
pf29
pf2a
pf2b
pf2c
pf2d
pf2e
pf2f

Partial preview of the text

Download Understanding Parallel Hardware: Types and Characteristics and more Lecture notes Mathematics in PDF only on Docsity!

Parallel Programming

Parallel Hardware Slides adapted from the lecture notes by Peter Pacheco

PARALLEL HARDWARE

A programmer can write code to exploit.

SIMD

  • Parallelism achieved by dividing data among the processors.
  • Applies the same instruction to multiple data items.
  • Called data parallelism.

SIMD example

control unit ALU 1 ALU 2 ALUn

for (i = 0; i < n; i++) x[i] += y[i]; x[1] x[2] x[n] n data items n ALUs 5

SIMD drawbacks

  • All ALUs are required to execute the same instruction, or remain idle.
  • In traditional design, they must also operate synchronously.
  • The ALUs have no instruction storage.
  • Efficient for large data parallel problems, but not other types of more complex parallel problems.

Vector processors (1)

  • Operate on arrays or vectors of data while conventional CPU’s operate on individual data elements or scalars.
  • Vector registers
    • Capable of storing a vector of operands and operating simultaneously on their contents.

Vector processors (3)

  • Interleaved memory
    • Multiple “banks” of memory, which can be accessed more or less independently.
    • Distribute elements of a vector across multiple banks, so reduce or eliminate delay in loading/storing successive elements.
  • Strided memory access and hardware scatter/gather - The program accesses elements of a vector located at fixed intervals.

Vector processors - Pros

  • Fast.
  • Easy to use.
  • Vectorizing compilers are good at identifying code to exploit.
  • Compilers also can provide information about code that cannot be vectorized. - Helps the programmer re-evaluate code.
  • High memory bandwidth.
  • Uses every item in a cache line.

Graphics Processing Units (GPU)

  • Real time graphics application programming interfaces or API’s use points, lines, and triangles to internally represent the surface of an object.

GPUs

  • A graphics processing pipeline converts the internal representation into an array of pixels that can be sent to a computer screen.
  • Several stages of this pipeline (called shader functions) are programmable. - Typically just a few lines of C code.

MIMD

  • Supports multiple simultaneous instruction streams operating on multiple data streams.
  • Typically consist of a collection of fully independent processing units or cores, each of which has its own control unit and its own ALU.

Shared Memory System (1)

  • A collection of autonomous processors is connected to a memory system via an interconnection network.
  • Each processor can access each memory location.
  • The processors usually communicate implicitly by accessing shared data structures.

Shared Memory System

UMA multicore system

Time to access all the memory locations is the same for all the cores. 20 ,./"0 1 (!)^2 (3$% 43"#%$53&$'