Parallel and Distributed Systems: Parallel Performance, Shared Memory and Threads, Lecture notes of Cryptography and System Security

An introduction to parallel software, sources of parallelism, programming models, major abstractions, processes and threads, communication, synchronization, shared memory, API description, and implementation at ABI, ISA levels. It also covers functional parallelism, automatic extraction, data parallelism, coordinating work, expressing parallelism, MP interfaces, instruction set architecture, programming model elements, threads and processes, shared memory communication, code locking, data locking, point-to-point synchronization, and rendezvous.

Typology: Lecture notes

2022/2023

Available from 06/07/2023

cynthia-std
cynthia-std 🇺🇸

84 documents

1 / 60

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
BCT 2307 - Parallel
and & Distributed
Systems
MOD 3: PARALLEL
PERFORMANCE, SHARED
MEMORY AND THREADS
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27
pf28
pf29
pf2a
pf2b
pf2c
pf2d
pf2e
pf2f
pf30
pf31
pf32
pf33
pf34
pf35
pf36
pf37
pf38
pf39
pf3a
pf3b
pf3c

Partial preview of the text

Download Parallel and Distributed Systems: Parallel Performance, Shared Memory and Threads and more Lecture notes Cryptography and System Security in PDF only on Docsity!

BCT 2307 - Parallel

and & Distributed

Systems

MOD 3: PARALLEL

PERFORMANCE, SHARED

MEMORY AND THREADS

2

Lecture Objectives

  • (^) Introduction to Parallel Software
    • (^) Sources of parallelism
    • (^) Expressing parallelism
  • (^) Programming Models
  • (^) Major Abstractions
    • (^) Processes & threads
    • (^) Communication
    • (^) Synchronization
  • (^) Shared Memory
    • (^) API description
    • (^) Implementation at ABI, ISA levels
    • (^) ISA support

Finding Parallelism

1. Functional parallelism

  • Car: {engine, brakes, entertain, nav, …}
  • Game: {physics, logic, UI, render, …}
  • (^) Signal processing: {transform, filter, scaling, …}

2. Automatic extraction

  • (^) Decompose serial programs

3. Data parallelism

  • (^) Vector, matrix, db table, pixels, …

4. Request parallelism

  • (^) Web, shared database, telephony, … Mikko Lipasti-University of Wisconsin 4
  1. Functional Parallelism

1. Functional parallelism

  • Car: {engine, brakes, entertain, nav, …}
  • Game: {physics, logic, UI, render, …}
  • (^) Signal processing: {transform, filter, scaling, …}
  • (^) Relatively easy to identify and utilize
  • (^) Provides small-scale parallelism
  • (^) 3x-10x
  • (^) Balancing stages/functions is difficult Mikko Lipasti-University of Wisconsin 5
  1. Data Parallelism

3. Data parallelism

  • (^) Vector, matrix, db table, pixels, web pages,…
  • (^) Large data => significant parallelism
  • (^) Many ways to express parallelism
  • (^) Vector/SIMD
  • (^) Threads, processes, shared memory
  • (^) Message-passing
  • (^) Challenges:
  • (^) Balancing & coordinating work
  • (^) Communication vs. computation at scale Mikko Lipasti-University of Wisconsin 7
  1. Request Parallelism
    • (^) Multiple users => significant parallelism
    • (^) Challenges
      • (^) Synchronization, communication, balancing work Web Browsing Users Web Server(s) Database Server(s) Mikko Lipasti-University of Wisconsin 8

Coordinating Work

  • (^) Synchronization
    • Some data somewhere is shared
    • Coordinate/order updates and reads
    • (^) Otherwise  chaos
  • (^) Traditionally: locks and mutual exclusion
    • (^) Hard to get right, even harder to tune for perf.
  • (^) Research to reality: Transactional Memory
    • (^) Programmer: Declare potential conflict
    • (^) Hardware and/or software: speculate & check
    • (^) Commit or roll back and retry
    • (^) IBM, Intel, others, now support in HW Mikko Lipasti-University of Wisconsin 10

Expressing Parallelism

  • (^) SIMD – introduced by Cray-1 vector supercomputer
    • MMX, SSE/SSE2/SSE3/SSE4, AVX at small scale
  • (^) SPMD or SIMT – GPGPU model (later)
    • (^) All processors execute same program on disjoint data
    • (^) Loose synchronization vs. rigid lockstep of SIMD
  • (^) MIMD – most general (this lecture)
    • (^) Each processor executes its own program
  • (^) Expressed through standard interfaces
    • (^) API, ABI, ISA Mikko Lipasti-University of Wisconsin 11

Programming Models

  • (^) High level paradigm for expressing an algorithm
    • (^) Examples:
      • Functional
      • (^) Sequential, procedural
      • (^) Shared memory
      • Message Passing
  • (^) Embodied in high level languages that support concurrent

execution

  • (^) Incorporated into HLL constructs
  • (^) Incorporated as libraries added to existing sequential language
  • (^) Top level features:
  • For conventional models – shared memory, message passing
  • (^) Multiple threads are conceptually visible to programmer
  • (^) Communication/synchronization are visible to programmer (c) 2007 Jim Smith 13

Application Programming Interface (API)

  • (^) Interface where HLL programmer works
  • (^) High level language plus libraries
    • (^) Individual libraries are sometimes referred to as an “API”
  • (^) User level runtime software is often part of API

implementation

  • Executes procedures
  • (^) Manages user-level state
  • (^) Examples:
  • (^) C and pthreads
  • (^) FORTRAN and MPI (c) 2007 Jim Smith 14

Instruction Set Architecture (ISA)

  • (^) Interface between hardware and software
    • (^) What the hardware implements
  • (^) Architected state
    • Registers
    • (^) Memory architecture
  • (^) All instructions
    • (^) May include parallel (SIMD) operations
    • (^) Both non-privileged and privileged
  • (^) Exceptions (traps, interrupts) (c) 2007 Jim Smith 16

Programming Model Elements

  • (^) For both Shared Memory and Message Passing
  • (^) Processes and threads
    • (^) Process: A shared address space and one or more threads of control
    • Thread: A program sequencer and private address space
    • (^) Task : Less formal term – part of an overall job
    • (^) Created, terminated, scheduled, etc.
  • (^) Communication
    • (^) Passing of data
  • (^) Synchronization
    • (^) Communicating control information
    • (^) To assure reliable, deterministic communication (c) 2007 Jim Smith 17

Shared Memory

  • (^) Flat shared memory or object heap
    • (^) Synchronization via memory variables enables reliable sharing
  • (^) Single process
  • (^) Multiple threads per process
    • Private memory per thread
  • Typically built on shared memory hardware system T h re a d 1 P r iv a te V a r ia b le s T h re a d 1 T h r e a d 2 T h r e a d N ... w r ite (^) re a d V A R S h a r e d V a r ia b le s (c) 2007 Jim Smith 19

Threads and Processes

  • (^) Creation
    • (^) generic -- Fork
      • (^) (Unix forks a process, not a thread)
    • (^) pthread_create(….*thread_function….)
      • creates new thread in current address space
  • (^) Termination
    • (^) pthread_exit
      • (^) or terminates when thread_function terminates
    • (^) pthread_kill
      • (^) one thread can kill another (c) 2007 Jim Smith 20