Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Properties of Parallel Computers, Lecture notes of Computer Networks

Massachusetts Institute of Technology (MIT)Computer Networks

The main properties of parallel computers and the challenges they pose to programmers. It emphasizes the lack of a standard architecture and the need for original thinking about numerical analysis and data management. The document also covers different types of parallel computers, including multicore, symmetric multiprocessors, large scale parallel machines, and clusters. It explains the shared memory model and the challenges of backing up in the network. The document also discusses multithreading and the facts of instruction execution. It concludes with a discussion of the MESI protocol and its four states.

Typology: Lecture notes

2021/2022

Uploaded on 05/11/2023

tomcrawford 🇺🇸

4.2

(15)

257 documents

1 / 41

This page cannot be seen from the preview

Don't miss anything!

Part II: Architecture

Goal: Understand the main properties of parallel

computers

The parallel approach to computing … does require

that some original thinking be done about numerical

analysis and data management in order to secure

efficient use. In an environment which has

represented the absence of the need to think as the

highest virtue, this is a decided disadvantage.

-- Dan Slotnick, 1967

What’s The Deal With Hardware?

 Facts Concerning Hardware

 Parallel computers differ dramatically from

each other -- there is no standard architecture

 No single programming target!

 Parallelism introduces costs not present in vN

machines -- communication; influence of

external events

 Many parallel architectures have failed

 Details of parallel computer are of no greater

concern to programmers than details of vN

The “no single target” is key problem to solve

should be

Discover Lecture notes of Computer Networks Massachusetts Institute of Technology (MIT)

Partial preview of the text

Download Properties of Parallel Computers and more Lecture notes Computer Networks in PDF only on Docsity!

Part II: Architecture

Goal: Understand the main properties of parallel computers The parallel approach to computing … does require that some original thinking be done about numerical analysis and data management in order to secure efficient use. In an environment which has represented the absence of the need to think as the highest virtue, this is a decided disadvantage. -- Dan Slotnick, 1967

What’s The Deal With Hardware?

 Facts Concerning Hardware  Parallel computers differ dramatically from each other -- there is no standard architecture  No single programming target!  Parallelism introduces costs not present in vN machines -- communication; influence of external events  Many parallel architectures have failed  Details of parallel computer are of no greater concern to programmers than details of vN The “no single target” is key problem to solve should be

Our Plan

 Think about the problem abstractly  Introduce instances of basic || designs  Multicore  Symmetric Multiprocessors (SMPs)  Large scale parallel machines  Clusters  Blue Gene/L  Formulate a model of computation  Assess the model of computation

Shared Memory

 Global memory shared among ||processors is the natural generalization of the sequential memory model  Thinking about it, programmers assume sequential consistency when they think ||ism  Recall Lamport’s definition of SC:  "...the result of any execution is the same as if the operations of all the processors were executed in some sequential order, and the operations of each individual processor appear in this sequence in the order specified by its program."

Reduce Contention

 Replace bus with network, an early design  Network delays cause memory latency to be higher for a single reference than with a the bus, but simultaneous use should help when many references are in the air (MT) M M M M M M M M P P P P P P P P Interconnection Network (Dance Hall)

An Implementation

 Ω-Network is one possible interconnect  Processor 2 references memory 6 (110) 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 000 001 010 011 100 101 110 111 000 001 010 011 100 101 110 111 Processor ID Hi Memory Bits

Backing Up In Network

 Even if processors work on different data, the requests can back up in the network  Everyone references data in memory 6 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 000 001 010 011 100 101 110 111 000 001 010 011 100 101 110 111

One-At-A-Time Use

 The critical problem is that only one processor at a time can use/change data  Cache read-only data (& pgms) only  Check-in/Check-out model most appropriate  Conclusion: Processors stall a lot …  Solution: Multi-threading  When stalled, change to another waiting activity  Must make transition quickly, keeping context  Need ample supply of waiting activities  Available at different granularities

Fine Grain Multithreading: Tera

Figure from: [email protected]

Coarse Grain Multithreading: Alewife

Simultaneous Multi-threading: SMT

Multi-threading Grain Size

 The point when the activity switches can be  Instruction level, at memory reference: Tera MTA  Basic block level, with L1 cache miss: Alewife  …  At process level, with page fault: Time sharing  Another variation (3-address code level) is to execute many threads ( P*log P ) in batches, called Bulk Synchronous Programming No individual activity improved, but less wait time

Diversity Among Small Systems

Intel CoreDuo

 2 32-bit Pentiums  Private 32K L1s  Shared 2M-4M L  MESI cc-protocol  Shared bus control and memory bus L1-I L1-D Memory Bus Controller Processor P Processor P L1-I L1-D L2 Cache Front Side Bus

MESI Protocol

 Standard Protocol for cache - coherent shared memory  Mechanism for multiple caches to give single memory image  We will not study it  4 states can be amazingly rich Thanks: Slater & Tibrewala of CMU

MESI, Intuitively

 Upon loading, a line is marked E, subsequent reads are OK; write marks M  Seeing another load, mark as S  A write to an S, sends I to all, marks as M  Another’s read to an M line, writes it back, marks it S  Read/write to an I misses  Related scheme: MOESI (used by AMD) Modified Exclusive Shared Invalid

Comparing Core Duo/Dual Core

L1-I L1-D Memory Bus Controller Processor P Processor P L1-I L1-D L2 Cache Front Side Bus Intel System Request Interface L1-I L1-D Mem Ctlr Processor P Processor P L1-I L1-D L2 Cache HT L2 Cache Cross-Bar Interconnect AMD AMD

Comparing Core Duo/Dual Core

L1-I L1-D Memory Bus Controller Processor P Processor P L1-I L1-D L2 Cache Front Side Bus System Request Interface L1-I L1-D Mem Ctlr Processor P Processor P L1-I L1-D L2 Cache HT L2 Cache Cross-Bar Interconnect System Request Interface L1-I L1-D Mem Ctlr Processor P Processor P L1-I L1-D L2 Cache HT L2 Cache Cross-Bar Interconnect Intel AMD^ AMD^ AMD^ AMD

Symmetric Multiprocessor on a Bus

 The bus is a point that serializes references  A serializing point is a shared mem enabler Bus L1-I L1-D Processor P L2 Cache Cache Control Memory Memory Memory Memory L1-I L1-D Processor P L2 Cache Cache Control L1-I L1-D Processor P L2 Cache Cache Control L1-I L1-D Processor P L2 Cache Cache Control

Sun Fire E25K

Co-Processor Architectures

 A powerful parallel design is to add 1 or more subordinate processors to std design  Floating point instructions once implemented this way  Graphics Processing Units - deep pipelining  Cell Processor - multiple SIMD units  Attached FPGA chip(s) - compile to a circuit  These architectures will be discussed later

Clusters

 Interconnecting with InfiniBand  Switch-based technology  Host channel adapters (HCA)  Peripheral computer interconnect (PCI) Thanks: IBM’s Clustering sytems using InfiniBand Hardware

Clusters

 Cheap to build using commodity technologies  Effective when interconnect is “switched”  Easy to extend, usually in increments of 1  Processors often have disks “nearby”  No shared memory  Latencies are usually large  Programming uses message passing

Networks

Torus (Mesh) Hyper- Cube Fat Tree Omega Network

Summarizing Architectures

 Two main classes  Complete connection: CMPs, SMPs, X-bar  Preserve single memory image  Complete connection limits scaling to …  Available to everyone  Sparse connection: Clusters, Supercomputers, Networked computers used for parallelism (Grid)  Separate memory images  Can grow “arbitrarily” large  Available to everyone with air conditioning  Differences are significant; world views diverge

Break

 During the break, consider which aspects of the architectures we’ve seen should be high-lighted and which should be abstracted away

The Parallel Programming Problem

 Some computations can be platform specific  Most should be platform independent  Parallel Software Development Problem: How do we neutralize the machine differences given that  Some knowledge of execution behavior is needed to write programs that perform  Programs must port across platforms effortlessly, meaning, by at most recompilation

Options for Solving the PPP

 Leave the problem to the compiler …

Properties of Parallel Computers, Lecture notes of Computer Networks

Related documents

Partial preview of the text

Download Properties of Parallel Computers and more Lecture notes Computer Networks in PDF only on Docsity!

Part II: Architecture

What’s The Deal With Hardware?

Our Plan

Shared Memory

Reduce Contention

An Implementation

Backing Up In Network

One-At-A-Time Use

Fine Grain Multithreading: Tera

Coarse Grain Multithreading: Alewife

Simultaneous Multi-threading: SMT

Multi-threading Grain Size

Diversity Among Small Systems

Intel CoreDuo

MESI Protocol

MESI, Intuitively

Comparing Core Duo/Dual Core

Comparing Core Duo/Dual Core

Symmetric Multiprocessor on a Bus

Sun Fire E25K

Co-Processor Architectures

Clusters

Clusters

Networks

Summarizing Architectures

Break

The Parallel Programming Problem

Options for Solving the PPP