Multiprocessor - Advance Computers Architectures - Lecture Slides, Slides of Computer Architecture and Organization

Main points of this lecture are: Multiprocessor, Uniprocessor Performance, Data-Intensive Applications, Leveraging Design, Flynn’s Taxonomy, Centralized Memory Multiprocessor, Distributed Memory, Centralized Memory, Symmetric Multiprocessors

Typology: Slides

2012/2013

Uploaded on 04/23/2013

atasi
atasi 🇮🇳

4.6

(32)

134 documents

1 / 45

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
CIS 600 Advanced Computer
Architecture
Lecture 8 Multiprocessor
Introduction
Docsity.com
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27
pf28
pf29
pf2a
pf2b
pf2c
pf2d

Partial preview of the text

Download Multiprocessor - Advance Computers Architectures - Lecture Slides and more Slides Computer Architecture and Organization in PDF only on Docsity!

CIS 600 Advanced Computer

Architecture

Lecture 8 –Multiprocessor

Introduction

Uniprocessor Performance (SPECint)

1

10

100

1000

10000

1978 1980 1982 1984 1986 1988 1990 1992 1994 1996 1998 2000 2002 2004 2006

Performance (vs. VAX-11/780) 25%/year

52%/year

??%/year

  • VAX : 25%/year 1978 to 1986
  • RISC + x86: 52%/year 1986 to 2002
  • RISC + x86: ??%/year 2002 to present

From Hennessy and Patterson, Computer Architecture: A Quantitative Approach , 4th edition, 2006

3X

Docsity.com

Other Factors ⇒ Multiprocessors

  • Growth in data-intensive applications
    • Data bases, file servers, …
  • Growing interest in servers, server perf.
  • Increasing desktop perf. less important
    • Outside of graphics
  • Improved understanding in how to use multiprocessors effectively - Especially server where significant natural TLP
  • Advantage of leveraging design investment by replication - Rather than unique design

Flynn’s Taxonomy

  • Flynn classified by data and control streams in 1966
  • SIMD ⇒ Data Level Parallelism
  • MIMD ⇒ Thread Level Parallelism
  • MIMD popular because
    • Flexible: N pgms and 1 multithreaded pgm
    • Cost-effective: same MPU in desktop & MIMD

Single Instruction Single Data (SISD) (Uniprocessor)

Single Instruction Multiple Data SIMD (single PC: Vector, CM-2) Multiple Instruction Single Data (MISD) (????)

Multiple Instruction Multiple Data MIMD (Clusters, SMP servers)

Proc. of the IEEE^ M.J. Flynn, "Very High-Speed Computers",, V 54, 1900-1909, Dec. 1966.

Centralized vs. Distributed Memory

P 1 $ Interconnection network

$

Pn

Mem (^) Mem

P 1 $ Interconnection network

$

Pn

Mem Mem

Centralized Memory Distributed Memory

Scale

Centralized Memory Multiprocessor

  • Also called symmetric multiprocessors (SMPs) because single main memory has a symmetric relationship to all processors
  • Large caches ⇒ single memory can satisfy memory demands of small number of processors
  • Can scale to a few dozen processors by using a switch and by using many memory banks
  • Although scaling beyond that is technically conceivable, it becomes less attractive as the number of processors sharing centralized memory increases

2 Models for Communication and

Memory Architecture

1. Communication occurs by explicitly passing

messages among the processors:

message-passing multiprocessors

2. Communication occurs through a shared

address space (via loads and stores):

shared memory multiprocessors either

  • UMA (Uniform Memory Access time) for shared address, centralized memory MP
  • NUMA (Non Uniform Memory Access time multiprocessor) for shared address, distributed memory MP
  • In past, confusion whether “sharing” means

sharing physical memory (Symmetric MP) or

sharing address space

Challenges of Parallel Processing

  • First challenge is % of program inherently sequential
  • Suppose 80X speedup from 100 processors. What fraction of original program can be sequential? a. 10% b. 5% c. 1% d. <1%

Challenges of Parallel Processing

  1. Application parallelism ⇒ primarily via new algorithms that have better parallel performance
  2. Long remote latency impact ⇒ both by architect and by the programmer
  • For example, reduce frequency of remote accesses either by
    • Caching shared data (HW)
    • Restructuring the data layout to make more accesses local (SW)
  • Today’s lecture on HW to help latency via caches

Symmetric Shared-Memory

Architectures

  • From multiple boards on a shared bus to multiple processors inside a single chip
  • Caches both
    • Private data are used by a single processor
    • Shared data are used by multiple processors
  • Caching shared data ⇒ reduces latency to shared data, memory bandwidth for shared data, and interconnect bandwidth ⇒ cache coherence problem

Example

  • Intuition not guaranteed by coherence
  • expect memory to respect order between

accesses to different locations issued by a given process

  • to preserve orders among accesses to same location by different processes

P 1 P 2 /Assume initial value of A and flag is 0/ A = 1; while (flag == 0); /spin idly/ flag = 1; print A;

Mem

P 1 Pn

Conceptual Picture

Intuitive Memory Model

  • Too vague and simplistic; 2 issues
  1. Coherence defines values returned by a

read

  1. Consistency determines when a written

value will be returned by a read

P

Disk

Memory

L

L

100:

100:

100:

  • Reading an address should return the last value written to that address
    • Easy in uniprocessors, except for I/O

Write Consistency

  • For now assume
  1. A write does not complete (and allow the

next write to occur) until all processors have seen the effect of that write

  1. The processor does not change the order of

any write with respect to any other memory access

⇒ if a processor writes location A followed by

location B, any processor that sees the new value of B must also see the new value of A

Basic Schemes for Enforcing Coherence

  • Program on multiple processors will normally

have copies of the same data in several caches

  • Unlike I/O, where its rare
  • Rather than trying to avoid sharing in SW,

SMPs use a HW protocol to maintain coherent caches

  • Migration and Replication key to performance of shared data
  • Migration - data can be moved to a local cache

and used there in a transparent fashion