PARALLEL ALGORITHMS CONTEXT, Lecture notes of Parallel Computing and Programming

PARALLEL ALGORITHMS CONTEXT PARALLEL ALGORITHMS CONTEXT PARALLEL ALGORITHMS CONTEXT

Typology: Lecture notes

2022/2023

Available from 06/07/2023

cynthia-std
cynthia-std 🇺🇸

84 documents

1 / 59

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Parallel and &
Distributed Systems
MOD 4: PARALLEL
ALGORITHMS
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27
pf28
pf29
pf2a
pf2b
pf2c
pf2d
pf2e
pf2f
pf30
pf31
pf32
pf33
pf34
pf35
pf36
pf37
pf38
pf39
pf3a
pf3b

Partial preview of the text

Download PARALLEL ALGORITHMS CONTEXT and more Lecture notes Parallel Computing and Programming in PDF only on Docsity!

Parallel and &

Distributed Systems

MOD 4: PARALLEL

ALGORITHMS

MOD 4 - Parallel Algorithms

CSE524 Parallel

Algorithms

Part I: Introduction

Goal: Set the parameters for studying parallelism

Why Study Parallelism?

  • (^) After all, for most of our daily computer uses, sequential processing is plenty fast - It is a fundamental departure from the “normal” computer model, therefore

it is inherently cool

  • (^) The extra power from parallel computers is enabling in science, engineering,

business, …

  • (^) Multicore chips present a new opportunity
  • (^) Deep intellectual challenges for CS -- models, programming languages,

algorithms, HW, …

Size vs Power

  • (^) Power5 (Server)
    • (^) 389mm^
    • (^) 120W@1900MHz
  • Intel Core2 sc (laptop)
    • 130mm^
    • 15W@1000MHz
  • ARM Cortex A8 (automobiles)
    • (^) 5mm^
    • 0.8W@800MHz
  • (^) Tensilica DP (cell phones / printers)
    • (^) 0.8mm^
    • (^) 0.09W@600MHz
  • (^) Tensilica Xtensa (Cisco router)
    • 0.32mm^2 for 3!
    • (^) 0.05W@600MHz

Intel Core

ARM TensilicaDP Xtensa x 3

Power 5

Each processor operates with 0.3-0.1 efficiency of the largest chip: more threads, lower power

Topic Overview

  • (^) Goal: To give a good idea of parallel computation
    • Concepts -- looking at problems with “parallel eyes”
    • (^) Algorithms -- different resources; different goals
    • (^) Languages -- reduce control flow; increase independence; new abstractions
    • (^) Hardware -- the challenge is communication, not instruction execution
    • (^) Programming -- describe the computation without saying it sequentially
    • (^) Practical wisdom about using parallelism

Parallel vs Distributed Computing

  • (^) Comparisons are often matters of degree Characteristic Parallel Distributed Overall Goal Speed Convenience Interactions Frequent Infrequent Granularity Fine Coarse Reliable Assumed Not Assumed

Parallel vs Concurrent

  • (^) In OS and DB communities execution of multiple threads is logically simultaneous
  • (^) In Arch and HPC communities execution of multiple threads is physically simultaneous
  • (^) The issues are often the same, say with respect to races
  • (^) Parallelism can achieve states that are impossible with concurrent execution because two events happen at once

Parallel Summation

  • (^) To sum a sequence in parallel
    • add pairs of values producing 1st level results,
    • add pairs of 1st level results producing 2nd level results,
    • (^) sum pairs of 2nd level results …
  • (^) That is,

(…((A[0]+A[1]) + (A[2]+A[3])) + ... + (A[n-2]+A[n-1]))…)

Express the Two Formulations

  • (^) Graphic representation makes difference clear
    • (^) Same number of operations; different order 6 4 16 10 16 14 2 8

What’s the Problem?

  • (^) It’s not likely a compiler will produce parallel code from a C specification any time soon…
  • (^) Fact: For most computations, a “best” sequential solution (practically, not theoretically) and a “best” parallel solution are usually fundamentally different … - (^) Different solution paradigms imply computations are not

“simply” related

  • (^) Compiler transformations generally preserve the solution

paradigm

Therefore... the programmer must discover the || solution

A Related Computation

  • (^) Consider computing the prefix sums
  • (^) Semantics ...
    • (^) A[0] is unchanged
    • (^) A[1] = A[1] + A[0]
    • A[2] = A[2] + (A[1] + A[0])
  • (^) A[n-1] = A[n-1] + (A[n-2] + ( ... (A[1] + A[0]) … )

for (i=1; i<n; i++) {

A[i] += A[i-1];

A[i] is the sum of the first i + 1 elements What advantage can ||ism give?

10 0 0+ 6 6+0 4+6 4 16 16+10 10+26 10 16 16+36 14+52 14 2 2+66 8+68 8 26 10 10+ 30 36 36+ 10 66 66+ 36 0 0+ 40 36 36+ 76 0 0+ 0 Parallel Prefix Algorithm 6 4 16 10 16 14 2 8 6 10 26 36 52 66 68 76 Compute sum going up Figure prefixes going down Invariant: Parent data is sum of elements to left of subtree

Fundamental Tool of || Pgmming

  • (^) Original research on parallel prefix algorithm published by R. E. Ladner and M. J. Fischer Parallel Prefix Computation Journal of the ACM 27(4):831-838, 1980 The Ladner-Fischer algorithm requires 2log n time , twice as much as simple tournament global sum, not linear time

Applies to a wide class of operations