Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

CS 352H: Computer Systems Architecture, Slides of Computer Architecture and Organization

University of Massachusetts - Dartmouth Computer Architecture and Organization

University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 2. Introduction. Goal: connecting multiple computers.

Typology: Slides

2022/2023

Uploaded on 05/11/2023

mathieu 🇮🇹

4.2

(11)

235 documents

1 / 51

This page cannot be seen from the preview

Don't miss anything!

University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell

CS 352H: Computer Systems Architecture

Topic 14: Multicores, Multiprocessors, and

Clusters

Discover Slides of Computer Architecture and Organization University of Massachusetts - Dartmouth

Partial preview of the text

Download CS 352H: Computer Systems Architecture and more Slides Computer Architecture and Organization in PDF only on Docsity!

CS 352H: Computer Systems Architecture

Topic 14: Multicores, Multiprocessors, and

Clusters

Introduction

Goal: connecting multiple computers

to get higher performance

Multiprocessors Scalability, availability, power efficiency

Job-level (process-level) parallelism

High throughput for independent jobs

Parallel processing program

Single program run on multiple processors

Multicore microprocessors

Chips with multiple processors (cores)

What We’ve Already Covered

§2.11: Parallelism and Instructions

Synchronization

§3.6: Parallelism and Computer Arithmetic

Associativity

§4.10: Parallelism and Advanced Instruction-Level

Parallelism

§5.8: Parallelism and Memory Hierarchies

Cache Coherence

§6.9: Parallelism and I/O:

Redundant Arrays of Inexpensive Disks

Parallel Programming

Parallel software is the problem

Need to get significant performance improvement

Otherwise, just use a faster uniprocessor, since it’s easier!

Difficulties

Partitioning Coordination Communications overhead

Scaling Example

Workload: sum of 10 scalars, and 10 × 10 matrix sum

Speed up from 10 to 100 processors

Single processor: Time = (10 + 100) × t

add

10 processors

Time = 10 × tadd + 100/10 × tadd = 20 × tadd Speedup = 110/20 = 5.5 (55% of potential)

100 processors

Time = 10 × tadd + 100/100 × tadd = 11 × tadd Speedup = 110/11 = 10 (10% of potential)

Assumes load can be balanced across processors

Scaling Example (cont)

What if matrix size is 100 × 100?

Single processor: Time = (10 + 10000) × t

add

10 processors

Time = 10 × tadd + 10000/10 × tadd = 1010 × tadd Speedup = 10010/1010 = 9.9 (99% of potential)

100 processors

Time = 10 × tadd + 10000/100 × tadd = 110 × tadd Speedup = 10010/110 = 91 (91% of potential)

Assuming load balanced

Shared Memory

SMP: shared memory multiprocessor

Hardware provides single physical address space for all processors Synchronize shared variables using locks Memory access time UMA (uniform) vs. NUMA (nonuniform)

Example: Sum Reduction

Sum 100,000 numbers on 100 processor UMA

Each processor has ID: 0 ≤ Pn ≤ 99 Partition 1000 numbers per processor Initial summation on each processor sum[Pn] = 0; for (i = 1000Pn; i < 1000(Pn+1); i = i + 1) sum[Pn] = sum[Pn] + A[i];

Now need to add these partial sums

Reduction: divide and conquer Half the processors add pairs, then quarter, … Need to synchronize between reduction steps

Message Passing

Each processor has private physical address space

Hardware sends/receives messages between processors

Loosely Coupled Clusters

Network of independent computers Each has private memory and OS Connected using I/O system E.g., Ethernet/switch, Internet Suitable for applications with independent tasks Web servers, databases, simulations, … High availability, scalable, affordable Problems Administration cost (prefer virtual machines) Low interconnect bandwidth c.f. processor/memory bandwidth on an SMP

Sum Reduction (Again)

Given send() and receive() operations

limit = 100; half = 100;/* 100 processors / repeat half = (half+1)/2; / send vs. receive dividing line / if (Pn >= half && Pn < limit) send(Pn - half, sum); if (Pn < (limit/2)) sum = sum + receive(); limit = half; / upper limit of senders / until (half == 1); / exit with final sum */ Send/receive also provide synchronization Assumes send/receive take similar time to addition

Grid Computing

Separate computers interconnected by long-haul networks

E.g., Internet connections Work units farmed out, results sent back

Can make use of idle time on PCs

E.g., SETI@home, World Community Grid

Simultaneous Multithreading

In multiple-issue dynamically scheduled processor

Schedule instructions from multiple threads Instructions from independent threads execute when function units are available Within threads, dependencies handled by scheduling and register renaming

Example: Intel Pentium-4 HT

Two threads: duplicated registers, shared function units and caches

CS 352H: Computer Systems Architecture, Slides of Computer Architecture and Organization

Related documents

Partial preview of the text

Download CS 352H: Computer Systems Architecture and more Slides Computer Architecture and Organization in PDF only on Docsity!

CS 352H: Computer Systems Architecture

Topic 14: Multicores, Multiprocessors, and

Clusters

Introduction

Goal: connecting multiple computers

to get higher performance

Job-level (process-level) parallelism

Parallel processing program

Multicore microprocessors

What We’ve Already Covered

§2.11: Parallelism and Instructions

§3.6: Parallelism and Computer Arithmetic

§4.10: Parallelism and Advanced Instruction-Level

Parallelism

§5.8: Parallelism and Memory Hierarchies

§6.9: Parallelism and I/O:

Parallel Programming

Parallel software is the problem

Need to get significant performance improvement

Difficulties

Scaling Example

Workload: sum of 10 scalars, and 10 × 10 matrix sum

Single processor: Time = (10 + 100) × t

10 processors

100 processors

Assumes load can be balanced across processors

Scaling Example (cont)

What if matrix size is 100 × 100?

Single processor: Time = (10 + 10000) × t

10 processors

100 processors

Assuming load balanced

Shared Memory

SMP: shared memory multiprocessor

Example: Sum Reduction

Sum 100,000 numbers on 100 processor UMA

Now need to add these partial sums

Message Passing

Each processor has private physical address space

Hardware sends/receives messages between processors

Loosely Coupled Clusters

Sum Reduction (Again)

Given send() and receive() operations

Grid Computing

Separate computers interconnected by long-haul networks

Can make use of idle time on PCs

Simultaneous Multithreading

In multiple-issue dynamically scheduled processor

Example: Intel Pentium-4 HT

Multithreading Example