Download Parallel Processing Architectures - Intro to Computer Architecture - Lecture Slides and more Slides Computer Architecture and Organization in PDF only on Docsity!
Parallel Processing Architectures
Parallel Computers
- Definition: “A parallel computer is a collection of
processing elements that cooperate and communicate to solve large problems fast.”
- Questions about parallel computers:
- How large a collection?
- How powerful are processing elements?
- How do they cooperate and communicate?
- How are data transmitted?
- What type of interconnection?
- What are HW and SW primitives for programmer?
- Does it translate into performance?
What level Parallelism?
- Bit level parallelism: 1970 to ~
- 4 bits, 8 bit, 16 bit, 32 bit microprocessors
- Instruction level parallelism (ILP): ~1985 through today - Pipelining - Superscalar - VLIW - Out-of-Order execution - Limits to benefits of ILP?
- Process Level or Thread level parallelism; mainstream for general purpose computing? - Servers are parallel - High-end Desktop dual processor PC
Why Multiprocessors?
- Microprocessors as the fastest CPUs
- Collecting several much easier than redesigning 1
- Complexity of current microprocessors
- Do we have enough ideas to sustain 2X/1.5yr?
- Can we deliver such complexity on schedule?
- Limit to ILP due to data dependency
- Slow (but steady) improvement in parallel software (scientific apps, databases, OS)
- Emergence of embedded and server markets driving microprocessors in addition to desktops
- Embedded functional parallelism
- Network processors exploiting packet-level parallelism
- SMP Servers and cluster of workstations for multiple users – Less demand for parallel computing
Amdahl’s Law and Parallel Computers
- A portion is sequential => limits parallel speedup
- Ex. What fraction sequetial to get 80X speedup
from 100 processors? Assume either 1 processor or 100 fully used
80 = 1 / [(FracX/100 + (1-FracX)]
0.8FracX + 80(1-FracX) = 80 - 79.2*FracX = 1
FracX = (80-1)/79.2 = 0.
Classification of Parallel Processors
- SIMD – EX: Illiac IV and Maspar
- MIMD - True Multiprocessors
- Message Passing Multiprocessor: Interprocessor communication through explicit “send” and “receive” operation of messages over the network EX: IBM SP2, NCUBE, and Clusters
- Shared Memory Multiprocessor: Interprocessor communication by load and store operations to shared memory locations. EX: SMP Servers, SGI Origin, HP V-Class, Cray T3E
Shared Address/Memory Model
- Communicate via Load and Store
- Oldest and most popular model
- Based on timesharing: processes on multiple processors vs. sharing single processor
- Single virtual and physical address space
- Multiple processes can overlap (share), but ALL threads share a process address space
- Writes to shared address space by one thread are visible to reads of other threads - Usual model: share code, private stack, some shared heap, some private heap
Message Passing Model
- Whole computers (CPU, memory, I/O devices)
communicate as explicit I/O operations
- Send specifies local buffer + receiving process on
remote computer
- Receive specifies sending process on remote
computer + local buffer to place data
- Usually send includes process tag and receive has rule on tag: match 1, match any
- Synch: when send completes, when buffer free, when request accepted, receive wait for send
- Send+receive => memory-memory copy, where each
each supplies local address, AND does pair-wise synchronization!
Message Passing Model
- Send+receive => memory-memory copy, synchronization on OS even on 1 processor
- History of message passing:
- Network topology important because could only send to immediate neighbor
- Typically synchronous, blocking send & receive
- Later DMA with non-blocking sends, DMA for receive into buffer until processor does receive, and then data is transferred to local memory
- Later SW libraries to allow arbitrary communication
- Example: IBM SP-2, RS6000 workstations in racks
- Network Interface Card has Intel 960
- 8X8 Crossbar switch as communication building block
- 40 MBytes/sec per link
Advantages shared-memory
communication model
- Compatibility with SMP hardware
- Ease of programming when communication patterns are complex or vary dynamically during execution
- Ability to develop apps using familiar SMP model, attention only on performance critical accesses
- Lower communication overhead, better use of BW for small items, due to implicit communication and memory mapping to implement protection in hardware, rather than through I/O system
- HW-controlled caching to reduce remote comm. by caching of all data, both shared and private.
Advantages message-passing
communication model
- The hardware can be simpler
- Communication explicit => simpler to understand; in shared memory it can be hard to know when communicating and when not, and how costly it is
- Explicit communication focuses attention on costly aspect of parallel computation, sometimes leading to improved structure in multiprocessor program
- Synchronization is naturally associated with sending messages, reducing the possibility for errors introduced by incorrect synchronization
- Easier to use sender-initiated communication, which may have some advantages in performance