Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Log in Sign up

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Parallel Processing in Computer Architecture: Symmetric Multiprocessors & Clusters, Lecture notes of Computer science

Computer science

An in-depth analysis of parallel processing, focusing on symmetric multiprocessors and clusters. It covers classifications of parallel processing, types of parallel processor systems, and their characteristics. The document delves into the organization of tightly coupled multiprocessors, time-shared bus, multiport memory, and central control unit. It also discusses failure management, load balancing, and parallelizing computation in cluster operating systems.

Typology: Lecture notes

2023/2024

Uploaded on 03/31/2024

naruto-20 🇮🇳

1 document

1 / 11

This page cannot be seen from the preview

Don't miss anything!

1

Parallel Processing – Page 1 of 63CSCI 4717 – Computer Architecture

CSCI 4717/5717

Computer Architecture

Topic: Symmetric Multiprocessors & Clusters

Reading: Stallings, Sections 18.1 through 18.4

Parallel Processing – Page 2 of 63CSCI 4717 – Computer Architecture

Classifications of Parallel Processing

M. Flynn classified types of parallel

processing in 1972 ("Some Computer

Organizations and Their Effectiveness",

IEEE Transactions on Computers) Types of

Parallel Processor Systems (Figure 18.2)

– Single instruction, single data stream

– Single instruction, multiple data stream

– Multiple instruction, single data stream

– Multiple instruction, multiple data stream

Parallel Processing – Page 3 of 63CSCI 4717 – Computer Architecture

Classifications of Parallel Processing

(continued)

• Single Instruction, Single Data Stream

(SISD) – Single processor operates on a

single instruction stream from a single

memory (Uniprocessor)

• Single Instruction, Multiple Data Stream

(SIMD) – Lockstep operation of multiple

processors on single instruction memory

with one data memory per processing

element. (Vector/array processing)

Parallel Processing – Page 4 of 63CSCI 4717 – Computer Architecture

Classifications of Parallel Processing

(continued)

• Multiple Instruction, Single Data Stream

(MISD) – Multiple processors execute

different sequences of instructions on a

single data set. Not commercially

implemented

• Multiple Instruction, Multiple Data Stream

(MIMD) – A set of processors

simultaneously execute different instructions

on different data sets.

Parallel Processing – Page 5 of 63CSCI 4717 – Computer Architecture

Classifications of Parallel Processing

(continued)

Parallel Processing – Page 6 of 63CSCI 4717 – Computer Architecture

M

u

lti

p

l

e

I

ns

t

ruc

ti

on,

M

u

lti

p

l

e

D

a

t

a

Stream

• Processors are general purpose

• Each processor should be able to complete

process by themselves

• Communications methods

– Through shared memory ("Tightly Coupled")

• Symmetric multiprocessor (SMP) – memory access times

are consistent for all processors

• Nonuniform Memory Access (NUMA) – memory access

times may differ

– Cluster – Either through fixed connections or a

network ("Loosely Coupled")

Partial preview of the text

Download Parallel Processing in Computer Architecture: Symmetric Multiprocessors & Clusters and more Lecture notes Computer science in PDF only on Docsity!

CSCI 4717 – Computer Architecture Parallel Processing – Page 1 of 63

CSCI 4717/

Computer Architecture

Topic: Symmetric Multiprocessors & Clusters

Reading: Stallings, Sections 18.1 through 18.

CSCI 4717 – Computer Architecture Parallel Processing – Page 2 of 63

Classifications of Parallel Processing

M. Flynn classified types of parallel processing in 1972 ("Some Computer Organizations and Their Effectiveness", IEEE Transactions on Computers) Types of Parallel Processor Systems (Figure 18.2)

Single instruction, single data stream
Single instruction, multiple data stream
Multiple instruction, single data stream
Multiple instruction, multiple data stream

CSCI 4717 – Computer Architecture Parallel Processing – Page 3 of 63

Classifications of Parallel Processing

(continued)

Single Instruction, Single Data Stream (SISD) – Single processor operates on a single instruction stream from a single memory (Uniprocessor)
Single Instruction, Multiple Data Stream (SIMD) – Lockstep operation of multiple processors on single instruction memory with one data memory per processing element. (Vector/array processing)

CSCI 4717 – Computer Architecture Parallel Processing – Page 4 of 63

Classifications of Parallel Processing

(continued)

Multiple Instruction, Single Data Stream (MISD) – Multiple processors execute different sequences of instructions on a single data set. Not commercially implemented
Multiple Instruction, Multiple Data Stream (MIMD) – A set of processors simultaneously execute different instructions on different data sets.

CSCI 4717 – Computer Architecture Parallel Processing – Page 5 of 63

Classifications of Parallel Processing

(continued)

CSCI 4717 – Computer Architecture Parallel Processing – Page 6 of 63

Multiple Instruction, Multiple Data

Stream

Processors are general purpose
Each processor should be able to complete process by themselves
Communications methods
- Through shared memory ("Tightly Coupled")
  - Symmetric multiprocessor (SMP) – memory access times are consistent for all processors
  - Nonuniform Memory Access (NUMA) – memory access times may differ
- Cluster – Either through fixed connections or a network ("Loosely Coupled")

CSCI 4717 – Computer Architecture Parallel Processing – Page 7 of 63

Symmetric Multiprocessors (SMP)

A stand alone computer with the following traits

Two or more similar processors of comparable capacity
Processors share same memory and I/O
Processors are connected by a bus or other internal connection
Memory access time is approximately the same for each processor

CSCI 4717 – Computer Architecture Parallel Processing – Page 8 of 63

Symmetric Multiprocessors (continued)

All processors share access to I/O through either:
- same channels
- different channels providing paths to same devices
All processors can perform the same functions (hence symmetric)
System controlled by integrated operating system providing interaction between processors
Interaction at job, task, file and data element levels

CSCI 4717 – Computer Architecture Parallel Processing – Page 9 of 63

Integrated Operating System

O/S for SMP is NOT like clusters/loosely coupled where communication usually is at file level
Can be a high degree of interaction between processes
O/S schedules processes or threads across all processors

CSCI 4717 – Computer Architecture Parallel Processing – Page 10 of 63

SMP Advantages

Advantages only realized if O/S can provide parallelism

Performance, but only if some work can be done in parallel
Availability/reliability – Since all processors can perform the same functions, failure of a single processor does not halt the system
Incremental growth – User can enhance performance by adding additional processors
Scaling – Vendors can offer range of products based on number of processors
Transparent to user – User only sees improvement in performance

CSCI 4717 – Computer Architecture Parallel Processing – Page 11 of 63

Organization of Tightly Coupled

Multiprocessor

Individual processors are self-contained, i.e., they have their own control unit, ALU, registers, one or more levels of cache, and private main memory
Access to shared memory and I/O devices through some interconnection network
Processors communicate through memory in common data area

CSCI 4717 – Computer Architecture Parallel Processing – Page 12 of 63

Organization of Tightly Coupled

Multiprocessor (continued)

Memory is often organized to provide simultaneous access to separate blocks of memory
Bus
- Time-shared or common bus
- Central controller (arbitrator)
- Multiport memory

CSCI 4717 – Computer Architecture Parallel Processing – Page 19 of 63

Multiport Memory

CSCI 4717 – Computer Architecture Parallel Processing – Page 20 of 63

Multiport Memory (continued)

Advantages
- Removing bus access bottleneck
- Dedicate portions of memory to only one processor - Better security - Better recovery from faults
Disadvantages
- Complex memory logic
- More PCB wiring
- Write through policy should be used for caches

CSCI 4717 – Computer Architecture Parallel Processing – Page 21 of 63

Central Control Unit

Functions

Funnels separate data streams between independent modules
Can buffer requests
Performs arbitration and timing
Pass status and control
Perform cache update alerting

CSCI 4717 – Computer Architecture Parallel Processing – Page 22 of 63

Central Control Unit (continued)

Uses same control, addressing, and data interfaces as typical processor, therefore, interfaces to modules remain the same
Disadvantages
- Very complex control unit
- Control unit is possible bottleneck

CSCI 4717 – Computer Architecture Parallel Processing – Page 23 of 63

SMP Operating System

To user, it appears as if there is a single O/S, i.e., single processor multiprogramming system
User should be able to create multithreaded processes without needing to know whether one processor or more will be used

CSCI 4717 – Computer Architecture Parallel Processing – Page 24 of 63

SMP Operating System Design Issues

Simultaneous concurrent processes
- O/S routines should be reentrant
- O/S tables and other management structures must be expanded to handle multiple processes and processors
Scheduling
- More than just order now, also which processor gets a process
- Any processor should be capable of scheduling too

CSCI 4717 – Computer Architecture Parallel Processing – Page 25 of 63

SMP Operating System Design Issues

(continued)

Synchronization – scheduling of resources now more than just for processes but also for processors
Memory management
- Shared page replacement strategy
- Must understand and take advantage of memory hardware
Reliability and fault tolerance – Must be able to handle the loss of a processor without taking down other processors.

CSCI 4717 – Computer Architecture Parallel Processing – Page 26 of 63

Cache Coherence

One or two levels of cache typically associated with each processor – this is essential for performance
Problem
- Multiple copies of same data in different caches
- Can result in an inconsistent view of memory

CSCI 4717 – Computer Architecture Parallel Processing – Page 27 of 63

Write Policy Review

Write back policy
- Write goes only to cache
- Main memory updated only when cache block is replaced
- Can lead to inconsistency
Write through policy
- All writes made to cache and main memory
- Inconsistencies can occur unless all caches monitor memory traffic

CSCI 4717 – Computer Architecture Parallel Processing – Page 28 of 63

Software Solutions

Compiler and operating system deal with problem
Overhead transferred to compile time
Design complexity transferred from hardware to software
Software tends to make conservative decisions leading to inefficient cache utilization

CSCI 4717 – Computer Architecture Parallel Processing – Page 29 of 63

Software Solutions (continued)

Marked shared variables as non-cacheable
- Too conservative
Instructions added to enable/disable caching for variables. Then compiler can analyze code to determine safe periods for caching shared variables

CSCI 4717 – Computer Architecture Parallel Processing – Page 30 of 63

Hardware Solution

A.K.A cache coherence protocols
Dynamic recognition of potential problems at run time
Because it only deals w/problem when it occurs, more efficient use of cache
Transparent to programmer and compiler
Methods
- Directory protocols
- Snoopy protocols

CSCI 4717 – Computer Architecture Parallel Processing – Page 37 of 63

Snoopy Protocols – Implementations

Performance of these two implementations depends on number of caches and pattern of read/writes
Some systems use adaptive protocols to use both methods
Write invalidate most common – Used in Pentium 4 and PowerPC systems

CSCI 4717 – Computer Architecture Parallel Processing – Page 38 of 63

MESI Protocol

Each line of a cache has associated with it two bits
- four states
Modified – line in this cache is modified and only valid in this cache
Exclusive – line in this cache is same as that in memory (unmodified) and not present in any other cache
Shared – line in this cache is same as that in memory (unmodified) and may also be present in another cache
Invalid – line in this cache contains bad data
Write throughs from an L1 cache to an L2 cache makes it visible to the MESI protocol

CSCI 4717 – Computer Architecture Parallel Processing – Page 39 of 63

MESI Protocol (continued)

CSCI 4717 – Computer Architecture Parallel Processing – Page 40 of 63

MESI – State Transition Diagram

CSCI 4717 – Computer Architecture Parallel Processing – Page 41 of 63

Clusters

Defined
- a group of interconnected, whole computers
- working together as a unified computing resource
- can create the illusion of being one machine
Alternative to Symmetric Multiprocessing (SMP) - High performance - High availability - Server applications
Each computer called a node CSCI 4717 – Computer Architecture Parallel Processing – Page 42 of 63

Cluster Computer Architecture

Figure 18.11 from Stallings, Computer Organization & Architecture

CSCI 4717 – Computer Architecture Parallel Processing – Page 43 of 63

Cluster Benefits

Absolute scalability – Almost limitless in terms of adding independent multiprocessing machines
Incremental scalability – Can start out small and build as user acquires new machines

CSCI 4717 – Computer Architecture Parallel Processing – Page 44 of 63

Cluster Benefits (continued)

High availability
- Loss of one node only causes small decrement in performance
- Software (middleware) handles fault tolerance automatically
Superior price/performance
- By using easily affordable building blocks, gets better performance at a lower price than a single large computer
- Expanding design doesn't depend on PCB redesign

CSCI 4717 – Computer Architecture Parallel Processing – Page 45 of 63

Cluster Configurations

High-speed message link options/configurations - Dedicated LAN with at least one having connection to remote client - Shared LAN with other non-cluster machines
Simplest way to classify clusters is based on whether computers share disk(s) - No shared disk – each machine has a local disk - Shared disk in addition to local disk – should use disk mirroring or RAID CSCI 4717 – Computer Architecture Parallel Processing – Page 46 of 63

Cluster Configurations – Standby

Server with no Shared Disk

CSCI 4717 – Computer Architecture Parallel Processing – Page 47 of 63

Cluster Configurations – Shared Disk

CSCI 4717 – Computer Architecture Parallel Processing – Page 48 of 63

Cluster Configurations (continued)

Secondary server – cluster functional classification - Passive Standby - Second computer will take over in the event of a failure on the part of the first - First computer sends "heartbeat“ - Heartbeat stops, secondary takes over - Data must be shared or disks must be shared in order for secondary to take over database stuff too - Active Standby – Second computer participates in processing

CSCI 4717 – Computer Architecture Parallel Processing – Page 55 of 63

Cluster O/S Design Issues –

Parallelizing Computation

Single application executing in parallel on a number of machines in cluster
Three general approaches to the problem:
- Parallelizing compiler
- Parallelizing application
- Parametric computing

CSCI 4717 – Computer Architecture Parallel Processing – Page 56 of 63

Parallelizing Compiler

Determines at compile time which parts can be executed in parallel
Split off for different computers
Performance depends on compiler

CSCI 4717 – Computer Architecture Parallel Processing – Page 57 of 63

Parallelizing Application

Application written to be parallel
Message passing to move data between nodes
Hard to program
Performance depends on programmer
Potential for best end result

CSCI 4717 – Computer Architecture Parallel Processing – Page 58 of 63

Parametric computing

If a problem is repeated execution of algorithm on different sets of data
Example: simulation using different scenarios
Depends on tools to organize/manage and execute

CSCI 4717 – Computer Architecture Parallel Processing – Page 59 of 63

Cluster Middleware

Software installed on each node to enable cluster operation:

Provides high availability through load balancing and failover control
Creates unified image to user
- Single point of entry – User logs onto cluster rather than a node
- Single file hierarchy – User sees a single file structure
- Single control point – single node acts as the interface to the user
- Single virtual network visible to cluster nodes
- Single memory space – programs are allowed to share variables across distributed memory
- Single job management system – cluster assigns the jobs, not the user
- Single user interface CSCI 4717 – Computer Architecture Parallel Processing – Page 60 of 63

Cluster Middleware (continued)

Enhancement of availability
- Single I/O space – I/O is accessible by any of the nodes regardless of the I/O device's location
- Single process space
  - Processes are treated as if they are all operating on a single machine
  - This means that the process identification scheme should be uniform and independent of host node
- Checkpointing – for recovery from a failure, each process should periodically save its state and intermediate variable values for failback
- Process migration – to allow for load balancing

CSCI 4717 – Computer Architecture Parallel Processing – Page 61 of 63

Cluster v. SMP

Positive points for both

Both provide multiprocessor support to high demand applications.
Both available commercially – SMP has been around longer

CSCI 4717 – Computer Architecture Parallel Processing – Page 62 of 63

SMP benefits

Easier to manage and configure since it is a single machine
Closer to single processor systems for which nearly all applications are written
Scheduling is main difference between SMP and single-processor system
Less physical space
Lower power consumption
Well-established

CSCI 4717 – Computer Architecture Parallel Processing – Page 63 of 63

Cluster benefits

Superior incremental & absolute scalability
Superior availability through redundancy of all components, not just processors
Simpler to create from computers than SMP which is designed from PCB level
With time, clusters are likely to dominate

Parallel Processing in Computer Architecture: Symmetric Multiprocessors & Clusters, Lecture notes of Computer science

Related documents

Partial preview of the text

Download Parallel Processing in Computer Architecture: Symmetric Multiprocessors & Clusters and more Lecture notes Computer science in PDF only on Docsity!

CSCI 4717/

Computer Architecture

Classifications of Parallel Processing

Classifications of Parallel Processing

(continued)

Classifications of Parallel Processing

(continued)

Classifications of Parallel Processing

(continued)

Multiple Instruction, Multiple Data

Stream

Symmetric Multiprocessors (SMP)

Symmetric Multiprocessors (continued)

Integrated Operating System

SMP Advantages

Organization of Tightly Coupled

Multiprocessor

Organization of Tightly Coupled

Multiprocessor (continued)

Multiport Memory

Multiport Memory (continued)

Central Control Unit

Central Control Unit (continued)

SMP Operating System

SMP Operating System Design Issues

SMP Operating System Design Issues

(continued)

Cache Coherence

Write Policy Review

Software Solutions

Software Solutions (continued)

Hardware Solution

Snoopy Protocols – Implementations

MESI Protocol

MESI Protocol (continued)

MESI – State Transition Diagram

Clusters

Cluster Computer Architecture

Cluster Benefits

Cluster Benefits (continued)

Cluster Configurations

Cluster Configurations – Standby

Server with no Shared Disk

Cluster Configurations – Shared Disk

Cluster Configurations (continued)

Cluster O/S Design Issues –

Parallelizing Computation

Parallelizing Compiler

Parallelizing Application

Parametric computing

Cluster Middleware

Cluster Middleware (continued)

Cluster v. SMP

SMP benefits

Cluster benefits