Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Log in Sign up

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Parallel Processing and Multiprocessor Architectures: An Overview - Prof. Josep Torrellas, Study notes of Computer Architecture and Organization

University of Illinois - Urbana-Champaign Computer Architecture and Organization

Prof. Josep Torrellas

An overview of the progress towards multiprocessors, flynn's classification of parallel architectures, and various types of mimd machines, including message passing machines and distributed shared memory systems. It also covers performance metrics for communications and different locking mechanisms.

Typology: Study notes

Pre 2010

Uploaded on 03/16/2009

koofers-user-6t1 🇺🇸

10 documents

1 / 58

This page cannot be seen from the preview

Don't miss anything!

Copyright Josep Torrellas 1999, 2001, 2002 1

Chapter 6

Instructor: Josep Torrellas

CS433

Discover Study notes of Computer Architecture and Organization University of Illinois - Urbana-Champaign

Partial preview of the text

Download Parallel Processing and Multiprocessor Architectures: An Overview - Prof. Josep Torrellas and more Study notes Computer Architecture and Organization in PDF only on Docsity!

Copyright Josep Torrellas 1999, 2001, 2002

Chapter 6

Instructor: Josep Torrellas

CS

Copyright Josep Torrellas 1999, 2001, 2002

Progress Towards Multiprocessors

Rate of speed growth in uniprocessors is saturating+ Modern multiple issue processors are becoming very

complex

Æ

multicores

Steady progress in parallel software : the major obstacle to

parallel processing

Copyright Josep Torrellas 1999, 2001, 2002

•^

Multiple I streams, single D stream (MISD) : nocommercial machine

-^

Multiple I streams, multiple D streams (MIMD)– each processor fetches its own instructions and operates

on its own data

usually off the shelf

μ

processors

architecture of choice for general purpose mps– Flexible: can be used in single user mode or

multiprogrammed

use of the shelf

μ

processors

Copyright Josep Torrellas 1999, 2001,
See figure 6.1 and 6.

Copyright Josep Torrellas 1999, 2001, 2002

Also reduces the memory latency– of course interprocessor communication is more costly

and complex

often each node is a cluster (bus based multiprocessor)– 2 types, depending on method used for interprocessor

communication:1. Distributed shared memory (DSM) or scalable

shared memory

Message passing machines or multicomputers

Copyright Josep Torrellas 1999, 2001, 2002

DSMs :•^

memories addressed as one shared address space: processorP1 writes address X, processor P2 reads address X

-^

Shared memory means that some address in 2 processorsrefers to same mem location; not that mem is centralized

-^

also called NUMA (Non Uniform Memory Access)

-^

processors communicate implicitly via loads and stores Multicomputers:•^

each processor has its own address space , disjoint to otherprocessors , cannot be addressed by other processors

Copyright Josep Torrellas 1999, 2001, 2002

processors are notified of the arrival of a msg

polling →

interrupt

standard message passing libraries: message passinginterface (MPI) Performance Metrics for Communications1. Communication bandwidth2. Communication latency = sender ovhd + transfer +recv

ovhd

Communication latency hiding

Copyright Josep Torrellas 1999, 2001, 2002

Shared memory communication (DSM)+ Compatibility w/well understood mechanisms in centralized

mps

easy of programming /compiler design for pgms w/irregular

communication patterns

lower overhead of communication

better use of bandwidth when using small communications

reduced remote communication by using automatic caching

of data

Copyright Josep Torrellas 1999, 2001, 2002

Amdahl’s law:fparallel

fparallel

Speedup = 2) Large latency of remote accesses (50-10,000 clock cycles)

(1- f

enh)

F

enh S

penh

( 1- f

parallel

fparallel^100

Round trip

time

Cray T3D

1 μ

sec

Convex Exemplar 2

μsec

KSR-

2-

μ

sec

CM-

10

μ

sec

Intel Paragon

10-

μ

sec

IBM SP-

30-

μ

sec

Example : 10ns machine has a roundtrip latency of 2

μ

sec. 0.5% of remote

requests. Local all hit in cache (CPI = 1)

Whats new CPI?

CPI = 1 + 0.5% * 2000/10 = 2

The Cache Coherence Problem

•^

Caches are critical to modern high-speed processors

-^

Multiple copies of a block can easily get inconsistent–

processor writes. I/O writes,..

P

Cache

A = 5

3

A = 7

Memory

A = 5

Snoopy Cache Coherence Schemes

•^

A distributed cache coherence scheme based on the notionof a snoop that watches all activity on a global bus, or isinformed about such activity by some global broadcastmechanism.

-^

Most commonly used method in commercialmultiprocessors

Dirty

Shared

Invalid

Bus Write MissBus invalidateP-read

Bus-readP- Read

P-read P-write

Bus Write Miss

Bus-read

P-write

P- Read

P-write

Write-Back/Ownership Schemes

•^

When a single cache has ownership of a block, processorwrites do not result in bus writes thus conservingbandwidth.

-^

Most bus-based multiprocessors nowadays use suchschemes.

-^

Many variants of ownership-based protocols exist:– Goodman’s write -once scheme– Berkley ownership scheme– Firefly update protocol– …

-^

We will discuss a few of these

Invalidation vs. Update Strategies

Invalidation : On a write, all other caches with a copy are invalidated2. Update : On a write, all other caches with a copy are updated•^

Invalidation is bad when :–

single producer and many consumers of data.

-^

Update is bad when :–

multiple writes by one PE before data is read by another PE.– Junk data accumulates in large caches (e.g. process migration).

-^

Overall, invalidation schemes are more popular as the default

Parallel Processing and Multiprocessor Architectures: An Overview - Prof. Josep Torrellas, Study notes of Computer Architecture and Organization

Related documents

Partial preview of the text

Download Parallel Processing and Multiprocessor Architectures: An Overview - Prof. Josep Torrellas and more Study notes Computer Architecture and Organization in PDF only on Docsity!

Chapter 6

Instructor: Josep Torrellas

CS

Progress Towards Multiprocessors

Æ

•^

F

The Cache Coherence Problem

•^

P

P

Snoopy Cache Coherence Schemes

•^

Write-Back/Ownership Schemes

•^

Invalidation vs. Update Strategies