Download Understanding Parallel Hardware: Types and Characteristics and more Lecture notes Mathematics in PDF only on Docsity!
Parallel Programming
Parallel Hardware Slides adapted from the lecture notes by Peter Pacheco
PARALLEL HARDWARE
A programmer can write code to exploit.
SIMD
- Parallelism achieved by dividing data among the processors.
- Applies the same instruction to multiple data items.
- Called data parallelism.
SIMD example
control unit ALU 1 ALU 2 ALUn
for (i = 0; i < n; i++) x[i] += y[i]; x[1] x[2] x[n] n data items n ALUs 5
SIMD drawbacks
- All ALUs are required to execute the same instruction, or remain idle.
- In traditional design, they must also operate synchronously.
- The ALUs have no instruction storage.
- Efficient for large data parallel problems, but not other types of more complex parallel problems.
Vector processors (1)
- Operate on arrays or vectors of data while conventional CPU’s operate on individual data elements or scalars.
- Vector registers
- Capable of storing a vector of operands and operating simultaneously on their contents.
Vector processors (3)
- Interleaved memory
- Multiple “banks” of memory, which can be accessed more or less independently.
- Distribute elements of a vector across multiple banks, so reduce or eliminate delay in loading/storing successive elements.
- Strided memory access and hardware scatter/gather - The program accesses elements of a vector located at fixed intervals.
Vector processors - Pros
- Fast.
- Easy to use.
- Vectorizing compilers are good at identifying code to exploit.
- Compilers also can provide information about code that cannot be vectorized. - Helps the programmer re-evaluate code.
- High memory bandwidth.
- Uses every item in a cache line.
Graphics Processing Units (GPU)
- Real time graphics application programming interfaces or API’s use points, lines, and triangles to internally represent the surface of an object.
GPUs
- A graphics processing pipeline converts the internal representation into an array of pixels that can be sent to a computer screen.
- Several stages of this pipeline (called shader functions) are programmable. - Typically just a few lines of C code.
MIMD
- Supports multiple simultaneous instruction streams operating on multiple data streams.
- Typically consist of a collection of fully independent processing units or cores, each of which has its own control unit and its own ALU.
Shared Memory System (1)
- A collection of autonomous processors is connected to a memory system via an interconnection network.
- Each processor can access each memory location.
- The processors usually communicate implicitly by accessing shared data structures.
Shared Memory System
UMA multicore system
Time to access all the memory locations is the same for all the cores. 20 ,./"0 1 (!)^2 (3$% 43"#%$53&$'