Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Prepare for your exams

Study with the several resources on Docsity

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

For each uploaded document

Answer questions

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Parallel Processing Architectures - Intro to Computer Architecture - Lecture Slides, Slides of Computer Architecture and Organization

During the course work of the Intro to Computer Architecture, we study the main concept regarding the:Parallel Processing Architectures, Parallel Computers, Type of Interconnection, Bit Level Parallelism, Instruction Level Parallelism, Amdahl’s Law, Flynn’s Classification, Classification of Computer Systems, Shared Address

Typology: Slides

2012/2013

Uploaded on 05/06/2013

anurati 🇮🇳

4.2

(24)

121 documents

1 / 18

This page cannot be seen from the preview

Don't miss anything!

bg1

Parallel Processing Architectures

Docsity.com

pf3

pf4

pf5

pf8

pf9

pfa

pfd

pfe

pff

pf12

Related documents

Parallel Computing: Understanding Parallel Processors and Architectures

Parallel Computing: Flynn's Taxonomy and Parallel Architectures

Neural Networks on Parallel Architectures: Learning and Parallelism

Parallel Processing and Multiprocessors: A Deep Dive into Architectures and Applications

Parallel Processing and Multiprocessor Architectures: An Overview - Prof. Josep Torrellas

Introduction to Parallel Programming: Architectures and Techniques

Parallel Processing Fundamentals: Architectures, Programming Models, and Performance

Amdahl’s Law - Parallel Processing Architectures - Lecture Slides

Parallel Processing: Architectures, Techniques, and Challenges

Low-Diameter Architectures - Parallel Processing - Lecture Slides

Feng's Computer Architectures: Parallelism & Utilization Rate

Parallel Numerical Algorithms: Motivation, Architectures, Networks, and Communication

Partial preview of the text

Download Parallel Processing Architectures - Intro to Computer Architecture - Lecture Slides and more Slides Computer Architecture and Organization in PDF only on Docsity!

Parallel Processing Architectures

Parallel Computers

Definition: “A parallel computer is a collection of

processing elements that cooperate and communicate to solve large problems fast.”

Questions about parallel computers:
- How large a collection?
- How powerful are processing elements?
- How do they cooperate and communicate?
- How are data transmitted?
- What type of interconnection?
- What are HW and SW primitives for programmer?
- Does it translate into performance?

What level Parallelism?

Bit level parallelism: 1970 to ~
- 4 bits, 8 bit, 16 bit, 32 bit microprocessors
Instruction level parallelism (ILP): ~1985 through today - Pipelining - Superscalar - VLIW - Out-of-Order execution - Limits to benefits of ILP?
Process Level or Thread level parallelism; mainstream for general purpose computing? - Servers are parallel - High-end Desktop dual processor PC

Why Multiprocessors?

Microprocessors as the fastest CPUs
- Collecting several much easier than redesigning 1
Complexity of current microprocessors
- Do we have enough ideas to sustain 2X/1.5yr?
- Can we deliver such complexity on schedule?
- Limit to ILP due to data dependency
Slow (but steady) improvement in parallel software (scientific apps, databases, OS)
Emergence of embedded and server markets driving microprocessors in addition to desktops
- Embedded functional parallelism
- Network processors exploiting packet-level parallelism
- SMP Servers and cluster of workstations for multiple users – Less demand for parallel computing

Amdahl’s Law and Parallel Computers

A portion is sequential => limits parallel speedup
- Speedup <= 1/ (1-FracX)
Ex. What fraction sequetial to get 80X speedup

from 100 processors? Assume either 1 processor or 100 fully used

80 = 1 / [(FracX/100 + (1-FracX)]

0.8FracX + 80(1-FracX) = 80 - 79.2*FracX = 1

FracX = (80-1)/79.2 = 0.

Only 0.25% sequential!

Classification of Parallel Processors

SIMD – EX: Illiac IV and Maspar
MIMD - True Multiprocessors
1. Message Passing Multiprocessor: Interprocessor communication through explicit “send” and “receive” operation of messages over the network EX: IBM SP2, NCUBE, and Clusters
2. Shared Memory Multiprocessor: Interprocessor communication by load and store operations to shared memory locations. EX: SMP Servers, SGI Origin, HP V-Class, Cray T3E

Shared Address/Memory Model

Communicate via Load and Store
- Oldest and most popular model
Based on timesharing: processes on multiple processors vs. sharing single processor
Single virtual and physical address space
- Multiple processes can overlap (share), but ALL threads share a process address space
Writes to shared address space by one thread are visible to reads of other threads - Usual model: share code, private stack, some shared heap, some private heap

Message Passing Model

Whole computers (CPU, memory, I/O devices)

communicate as explicit I/O operations

Send specifies local buffer + receiving process on

remote computer

Receive specifies sending process on remote

computer + local buffer to place data

Usually send includes process tag and receive has rule on tag: match 1, match any
Synch: when send completes, when buffer free, when request accepted, receive wait for send
Send+receive => memory-memory copy, where each

each supplies local address, AND does pair-wise synchronization!

Message Passing Model

Send+receive => memory-memory copy, synchronization on OS even on 1 processor
History of message passing:
- Network topology important because could only send to immediate neighbor
- Typically synchronous, blocking send & receive
- Later DMA with non-blocking sends, DMA for receive into buffer until processor does receive, and then data is transferred to local memory
- Later SW libraries to allow arbitrary communication
Example: IBM SP-2, RS6000 workstations in racks
- Network Interface Card has Intel 960
- 8X8 Crossbar switch as communication building block
- 40 MBytes/sec per link

Advantages shared-memory

communication model

Compatibility with SMP hardware
Ease of programming when communication patterns are complex or vary dynamically during execution
Ability to develop apps using familiar SMP model, attention only on performance critical accesses
Lower communication overhead, better use of BW for small items, due to implicit communication and memory mapping to implement protection in hardware, rather than through I/O system
HW-controlled caching to reduce remote comm. by caching of all data, both shared and private.

Advantages message-passing

communication model

The hardware can be simpler
Communication explicit => simpler to understand; in shared memory it can be hard to know when communicating and when not, and how costly it is
Explicit communication focuses attention on costly aspect of parallel computation, sometimes leading to improved structure in multiprocessor program
Synchronization is naturally associated with sending messages, reducing the possibility for errors introduced by incorrect synchronization
Easier to use sender-initiated communication, which may have some advantages in performance