Understanding Parallel Memory Architecture & Programming Paradigms for Parallel Computing, Study notes of Earth Sciences

An overview of parallel computing, including concepts such as parallel memory architecture, programming paradigms, and parallelization strategies. It covers topics like shared memory paradigm, message passing paradigm, data parallel paradigm, and single instruction multiple data (simd) and multiple instruction multiple data (mimd) systems. The document also discusses advantages and disadvantages of shared memory processors and distributed memory systems, as well as thread implementations and message passing.

Typology: Study notes

Pre 2010

Uploaded on 08/31/2009

koofers-user-ye3
koofers-user-ye3 🇺🇸

9 documents

1 / 31

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Overview Concepts Parallel Memory Architecture Parallel Programming Paradigms ParallelizationStrategies
Introduction To Parallel Computing
Mohamed Iskandarani and Ashwanth Srinivasan
November 12, 2008
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f

Partial preview of the text

Download Understanding Parallel Memory Architecture & Programming Paradigms for Parallel Computing and more Study notes Earth Sciences in PDF only on Docsity!

Introduction To Parallel Computing

Mohamed Iskandarani and Ashwanth Srinivasan

November 12, 2008

Outline

Overview

Concepts

Parallel Memory Architecture

Parallel Programming Paradigms Shared memory paradigm Message passing paradigm Data parallel paradigm

Parallelization Strategies

Why Use Parallel Computing

  1. Overcome limits to serial computing 1.1 Limits to increase transistor density 1.2 Limits to data transmission speed 1.3 Prohibitive cost of supercomputer (niche market)
  2. Commodity (cheap) components to achieve high performance
  3. Faster turn-around time
  4. Solve larger problems

Serial Von Neumann Architecture

Memory

CPU

6 Fetch WriteBack?

Execute

  • Memory stores program instructions and data
  • (^) CPU fetches instructions/data from memory
  • CPU executes instructions sequentially
  • results are written back to memory

Single Instruction Single Data

  • (^) A serial (non-parallel) computer
  • CPU acts on single instruction stream per cycle
  • Only one-data item is being used at input each cycle
  • (^) Deterministic execution path
  • Example: most single CPU laptops/workstations
  • (^) Example: load A Load B C=A+B Store C A=2*B Store A −→ time

Single Instruction Multiple Data (SIMD)

  • (^) A type of parallel computer
  • Single Instruction: All processors execute the same instruction at any clock cycle
  • Multiple Data: Each processor unit acts on different data elements
  • Typically high speed and high-bandwidth internal network
  • A large number of small capacity instruction units
  • (^) Synchronous and deterministic execution
  • Best suited for problems with high regularity, e.g. image processing, graphics
  • (^) Examples:
    • Vector processors: Cray C90, NEC SX2, IBM
    • Processor arrays: Connection Machine CM-2, Maspar MP-

Multiple Instruction Single Data:MISD

  • (^) Uncommon type of parallel computers

Multiple Instruction Multiple Data: MIMD

  • (^) Most common type of parallel computers
  • Multiple Instruction: Each processor maybe executing a different instruction stream
  • (^) Multiple Data: Each processor is working on different data stream.
  • (^) Execution could be synchronous or asynchronous
  • Execution not necessarily deterministic
  • (^) Example: most current supercomputers, clusters, IBM blue-gene

Shared Memory Processors

P3 Memory P

P

P

  • (^) All processors access all memory as global address space
  • (^) Processors operate independently but share memory resources

Shared Memory Processors

General characteristics

  • (^) Advantages
    • (^) Global address space simplified programming
    • (^) Allow incremental parallelization
    • (^) Data sharing between CPUs fast and uniform
  • (^) Disadvantages
    • (^) Lack of memory scalability between memory and CPU
    • Increasing CPUs increase memory traffic geometrically on shared memory-CPU paths.
    • Programmers responsible for synchronization of memory accesses
    • Soaring expense of internal network.

Distributed Memory

memory CPU CPU memory

memory CPU CPU memory

n ¯  etwork^ -

  • (^) Each processor has its own private memory
  • No global address space
  • (^) Network access to communicate between processors

Distributed Memory

  • Advantages
    • (^) Memory size scales with CPUs
    • (^) Fast local memory access with no network interference.
    • (^) Cost effective (commodity components)
  • (^) Disadvantages
    • (^) Programmer responsible for communication details
    • (^) Difficult to map existing data structure, based on global memory, to this memory organization.
    • (^) Non-uniform memory access time. Dependence on network latency, bandwidth, and congestion.
    • (^) All or nothing parallelization.

Parallel Programming Paradigms

  • (^) Several programming paradigms are common
    • (^) Shared Memory (OpenMP, threads)
    • (^) Message Passing
    • (^) Hybrid
    • (^) Data parallel (HPF)
  • Programming paradigm abstracts hardware and memory architecture
  • Paradigms are NOT specific to a particular type of machine
  • Any of these models can (in principle) be implemented on any underlying hardware.
  • Shared memory model on distributed hardware: Kendal Square Research
  • SGI origin is a shared memory machine which supported effectively message passing.
  • Performance depends on choice of programming model, and knowing details of data traffic.

Shared Memory Model

  • Parallel tasks share a common global address space
  • Read and write can occur asynchronously.
  • (^) Locks and semaphors to control shared data access
    • (^) avoid reading stale data from shared memory.
    • (^) avoid multiple CPUs writing to the same shared memory address.
  • Compiler translates variables into memory addresses which are global
  • User specifies private and shared variables
  • Incremental parallelization possible