Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Cloud Computing – How did we get here?, Lecture notes of Computer Architecture and Organization

Slides and notes from a lecture on Cloud Computing given by Wes J. Lloyd at the School of Engineering and Technology, University of Washington - Tacoma. The lecture covers topics such as data, thread-level, task-level parallelism, parallel architectures, SIMD architectures, vector processing, multimedia extensions, graphics processing units, speed-up, Amdahl's Law, Scaled Speedup, properties of distributed systems, and modularity. The lecture also introduces Cloud Computing concepts, technology, and architecture. feedback from students and additional resources on MapReduce.

Typology: Lecture notes

2021/2022

Uploaded on 05/11/2023

parolie
parolie 🇺🇸

4.9

(14)

248 documents

1 / 51

Toggle sidebar

Partial preview of the text

Download Cloud Computing – How did we get here? and more Lecture notes Computer Architecture and Organization in PDF only on Docsity!

TCSS 562: Software Engineering for Cloud Computing School of Engineering and Technology, UW-Tacoma

Cloud Computing –

How did we get here?

Wes J. Lloyd

School of Engineering and Technology

University of Washington - Tacoma

TCSS 462/562:

(SOFTWARE ENGINEERING

FOR) CLOUD COMPUTING

 Questions from 10/  Cloud Computing – How did we get here? (Marinescu Ch. 2 - 1 st^ edition, Ch. 4 - 2 nd^ edition)  Data, thread-level, task-level parallelism & Parallel architectures  Class Activity 1 – Implicit vs Explicit Parallelism  SIMD architectures, vector processing, multimedia extensions  Graphics processing units  Speed-up, Amdahl's Law, Scaled Speedup  Properties of distributed systems  Modularity  Introduction to Cloud Computing – loosely based on book #1: Cloud Computing Concepts, Technology & Architecture October 11, 2022 TCSS462/562:(Software Engineering for) Cloud Computing [Fall 2022] School of Engineering and Technology, University of Washington - Tacoma L4. 2

OBJECTIVES – 10/

TCSS 562: Software Engineering for Cloud Computing School of Engineering and Technology, UW-Tacoma  Please classify your perspective on material covered in today’s class (47 respondents):  1 - mostly review, 5-equal new/review, 10-mostly new  Average – 6.89 (  - previous 6.16)  Please rate the pace of today’s class:  1 - slow, 5-just right, 10-fast  Average – 5.62 ( - previous 5.35)  Response rates:  TCSS 462: 25/32 – 78.1%  TCSS 562: 22/26 – 84.6% October 11, 2022 TCSS462/562:(Software Engineering for) Cloud Computing [Fall 2022] School of Engineering and Technology, University of Washington - Tacoma L4. 3 MATERIAL / PACE

 I'm not quite clear on how Bit - Level and Instruction-Level

Parallelism, being implicit, happens "automatically".

 With bit-level parallelism, arithmetic operations that

require multiple instructions to perform on CPUs having

lower word size can be accomplished with a single

instruction on today’s 64-bit CPUs

 Word "size" refers to the amount of data a CPU's internal

data registers can hold and process at one time. Modern

desktop computers have 64-bit words. Computers

embedded in appliances and consumer products have

word sizes of 8, 16 or 32 bits

October 11, 2022 TCSS462/562:(Software Engineering for) Cloud Computing [Fall 2022] School of Engineering and Technology, University of Washington - Tacoma L4. 4 FEEDBACK FROM 10/

TCSS 562: Software Engineering for Cloud Computing School of Engineering and Technology, UW-Tacoma  I understand multithreading to some degree, where multiple instructions can happen on a single or multiple cores, but multithreading requires specific code to 'break down' the program. How does parallel computing 'break down' tasks implicitly?  With instruction-level parallelism, CPU features like pipelining , speculative execution , and out-of-order execution help CPUs accomplish more than one operation per clock cycle, to appear to magically do things in parallel when developers write only sequential code to effectively gain a speed - up  Out-of-order execution (OoOE) allows instructions for high- performance CPUs to begin execution as soon as their operands are ready. Although instructions are issued in - order, they can proceed out-of-order with respect to each other. October 11, 2022 TCSS462/562:(Software Engineering for) Cloud Computing [Fall 2022] School of Engineering and Technology, University of Washington - Tacoma L4. 5 FEEDBACK - 2  Speculative execution is an optimization technique where a CPU performs some task that may not be needed. Work is done before it is known if it is actually needed, to prevent a delay that would have been incurred by doing the work after it is known that it is needed. ▪ This is one way for CPUs to be productive during otherwise “idle” times  Modern pipelined microprocessors use speculative execution to reduce cost of conditional branch instructions by predicting a program’s execution path based on history of branch executions. To improve performance and CPU utilization, instructions can be scheduled at a time when it is not yet known whether the instructions will need to be executed, ahead of a branch… October 11, 2022 TCSS462/562:(Software Engineering for) Cloud Computing [Fall 2022] School of Engineering and Technology, University of Washington - Tacoma L4. 6 FEEDBACK - 3

TCSS 562: Software Engineering for Cloud Computing School of Engineering and Technology, UW-Tacoma  Am seeking some clarification for what MAP - REDUCE is besides a framework that uses lots of data processed in parallel. Are cloud computing ser vices built using this infrastructure and then it decides how the work is broken up for ser vers with dif ferent system hardware (heterogeneous, homogeneous, etc.)?  MapReduce is a programming model for writing applications to process vast amounts of data (multi - terabyte data-sets) in parallel on large clusters (thousands of nodes) of commodity hardware in a reliable, fault-tolerant manner  We also consider for data parallelism, data processing tasks that can be sped up using a divide - and-conquer approach  MapReduce provides a programming model and architecture for repeatedly applying the divide - and-conquer pattern October 11, 2022 TCSS462/562:(Software Engineering for) Cloud Computing [Fall 2022] School of Engineering and Technology, University of Washington - Tacoma L4. 7 FEEDBACK - 4  MapReduce consists of two sequential tasks: Map and Reduce. MAP filters and sorts data while converting it into key - value pairs. REDUCE takes this input and reduces its size by performing some kind of summary operation over the dataset  MapReduce drastically speeds up big data tasks by breaking down large datasets and processing them in parallel  MapReduce paradigm was first proposed in 2004 by Google and later incorporated into the open-source Apache Hadoop framework for distributed processing over large datasets using files  Apache Spark supports MapReduce over large datasets in RAM  Amazon Elastic Map Reduce (EMR) provides cloud provider managed services for Apache Hadoop and Spark services October 11, 2022 TCSS462/562:(Software Engineering for) Cloud Computing [Fall 2022] School of Engineering and Technology, University of Washington - Tacoma L4. 8 MAP-REDUCE

TCSS 562: Software Engineering for Cloud Computing School of Engineering and Technology, UW-Tacoma  Original Google paper on MapReduce:  https://static.googleusercontent.com/media/research.google. com/en//archive/mapreduce - osdi04.pdf  Apache Spark:  https://spark.apache.org/  Apache Hadoop:  https://hadoop.apache.org/  Amazon Elastic Map Reduce:  https://aws.amazon.com/emr/ October 11, 2022 TCSS462/562:(Software Engineering for) Cloud Computing [Fall 2022] School of Engineering and Technology, University of Washington - Tacoma L4. 9 MAP-REDUCE - ADDITIONAL RESOURCES

 When you speak through the mic in class there's a bit of a

delay and it can be somewhat distracting at times. Would

you be able to change anything about that to minimize the

delay?

 Is this happening on Zoom? Or in the classroom?

 In the classroom I’m able to use the Zoom audio as

output and am able to speak with less microphone

feedback because of the delay (as long as the volume is

not too high)

 I can not use the Zoom audio, but it may be hard to hear

questions asked verbally over Zoom

 This is a work in progress…

October 11, 2022 TCSS462/562:(Software Engineering for) Cloud Computing [Fall 2022] School of Engineering and Technology, University of Washington - Tacoma L4. 10 FEEDBACK - 3

TCSS 562: Software Engineering for Cloud Computing School of Engineering and Technology, UW-Tacoma  If you did not provide your AWS account number on the AWS CLOUD CREDITS SURVEY to request AWS cloud credits and you would like credits this quarter, please contact the professor  56 of 58 survey completions logged as of early Oct 11 th October 11, 2022 TCSS462/562: (Software Engineering for) Cloud Computing [Fall 2022] School of Engineering and Technology, University of Washington - Tacoma L4. 11 AWS CLOUD CREDITS SURVEY  Introduction to Linux & the Command Line  https://faculty.washington.edu/wlloyd/courses/tcss562/tutori als/TCSS462_562_f2022_tutorial_1.pdf  Tutorial Sections:

  1. The Command Line
  2. Basic Navigation
  3. More About Files
  4. Manual Pages
  5. File Manipulation
  6. VI – Text Editor
  7. Wildcards
  8. Permissions
  9. Filters
  10. Grep and regular expressions
  11. Piping and Redirection
  12. Process Management October 11, 2022 TCSS462/562:(Software Engineering for) Cloud Computing [Fall 2022] School of Engineering and Technology, University of Washington - Tacoma L4. 12 TUTORIAL 1

TCSS 562: Software Engineering for Cloud Computing School of Engineering and Technology, UW-Tacoma  I ntroduction to Bash Scripting  https://faculty.washington.edu/wlloyd/courses/tcss562/tutorials/T CSS462_562_f2022_tutorial_2.pdf  Review tutorial sections:  Create a BASH webser vice client

  1. What is a BASH script?
  2. Variables
  3. Input
  4. Arithmetic
  5. If Statements
  6. Loops
  7. Functions
  8. User Interface  Call ser vice to obtain IP address & lat/long of computer  Call ser vice to obtain weather forecast for lat/long October 11, 2022 TCSS462/562:(Software Engineering for) Cloud Computing [Fall 2022] School of Engineering and Technology, University of Washington - Tacoma L4. 13 TUTORIAL 2  Questions from 10/  Cloud Computing – How did we get here? (Marinescu Ch. 2 - 1 st^ edition, Ch. 4 - 2 nd^ edition)  Data, thread-level, task-level parallelism & Parallel architectures  Class Activity 1 – Implicit vs Explicit Parallelism  SIMD architectures, vector processing, multimedia extensions  Graphics processing units  Speed-up, Amdahl's Law, Scaled Speedup  Properties of distributed systems  Modularity  Introduction to Cloud Computing – loosely based on book #1: Cloud Computing Concepts, Technology & Architecture October 11, 2022 TCSS462/562:(Software Engineering for) Cloud Computing [Fall 2022] School of Engineering and Technology, University of Washington - Tacoma L4. 14 OBJECTIVES – 10/

TCSS 562: Software Engineering for Cloud Computing School of Engineering and Technology, UW-Tacoma  Form groups of ~3 - in class or with Zoom breakout rooms  Each group will complete a MSWORD DOCX worksheet  Be sure to add names at top of document as they appear in Canvas  Activity can be completed in class or after class  The activity can also be completed individually  When completed, one person should submit a PDF of the Google Doc to Canvas  Instructor will score all group members based on the uploaded PDF file  To get started: ▪ Log into your UW Google Account (https://drive.google.com) using you UW NET ID ▪ Follow the link: https://tinyurl.com/tcss462- 562 - a October 11, 2022 TCSS462/562:(Software Engineering for) Cloud Computing [Fall 2022] School of Engineering and Technology, University of Washington - Tacoma L4. 15 ACTIVITY 1  Solutions to be discussed.. October 11, 2022 TCSS462/562:(Software Engineering for) Cloud Computing [Fall 2022] School of Engineering and Technology, University of Washington - Tacoma L4. 16 ACTIVITY 1

TCSS 562: Software Engineering for Cloud Computing School of Engineering and Technology, UW-Tacoma  Applies to:  Advantages:  Disadvantages: October 11, 2022 TCSS462/562:(Software Engineering for) Cloud Computing [Fall 2022] School of Engineering and Technology, University of Washington - Tacoma L4. 17 IMPLICIT PARALLELISM  Applies to:  Advantages:  Disadvantages: October 11, 2022 TCSS462/562:(Software Engineering for) Cloud Computing [Fall 2022] School of Engineering and Technology, University of Washington - Tacoma L4. 18 EXPLICIT PARALLELISM

TCSS 562: Software Engineering for Cloud Computing School of Engineering and Technology, UW-Tacoma

 7. For bit-level parallelism, should a developer be

concerned with the available number of virtual CPU

processing cores when choosing a cloud - based virtual

machine if wanting to obtain the best possible speed - up?

(Yes / No)

 8. For instruction-level parallelism, should a developer be

concerned with the physical CPU’s architecture used to

host a cloud-based virtual machine if wanting to obtain

the best possible speed-up? (Yes / No)

October 11, 2022 TCSS462/562:(Software Engineering for) Cloud Computing [Fall 2022] School of Engineering and Technology, University of Washington - Tacoma L4. 19 PARALLELISM QUESTIONS

 9. For thread level parallelism (TLP) where a programmer

has spent considerable effort to parallelize their code and

algorithms, what consequences result when this code is

deployed on a virtual machine with too few virtual CPU

processing cores?

 What happens when this code is deployed on a virtual

machine with too many virtual CPU processing cores?

October 11, 2022 TCSS462/562:(Software Engineering for) Cloud Computing [Fall 2022] School of Engineering and Technology, University of Washington - Tacoma L4. 20 PARALLELISM QUESTIONS - 2

TCSS 562: Software Engineering for Cloud Computing School of Engineering and Technology, UW-Tacoma  Questions from 10/  Cloud Computing – How did we get here? (Marinescu Ch. 2 - 1 st^ edition, Ch. 4 - 2 nd^ edition)  Data, thread-level, task-level parallelism & Parallel architectures  Class Activity 1 – Implicit vs Explicit Parallelism  SIMD architectures, vector processing, multimedia extensions  Graphics processing units  Speed-up, Amdahl's Law, Scaled Speedup  Properties of distributed systems  Modularity  Introduction to Cloud Computing – loosely based on book #1: Cloud Computing Concepts, Technology & Architecture October 11, 2022 TCSS462/562:(Software Engineering for) Cloud Computing [Fall 2022] School of Engineering and Technology, University of Washington - Tacoma L4. 21 OBJECTIVES – 10/  Michael Flynn’s proposed taxonomy of computer architectures based on concurrent instructions and number of data streams (1966)

 SISD (Single Instruction Single Data)

 SIMD (Single Instruction, Multiple Data)

 MIMD (Multiple Instructions, Multiple Data)

 LESS COMMON : MISD (Multiple Instructions, Single Data)

 Pipeline architectures: functional units perform different

operations on the same data

 For fault tolerance, may want to execute same instructions redundantly to detect and mask errors – for task replication October 11, 2022 TCSS462/562:(Software Engineering for) Cloud Computing [Fall 2022] School of Engineering and Technology, University of Washington - Tacoma L4. 22 MICHAEL FLYNN’S COMPUTER ARCHITECTURE TAXONOMY

TCSS 562: Software Engineering for Cloud Computing School of Engineering and Technology, UW-Tacoma

 SISD (Single Instruction Single Data)

Scalar architecture with one processor/core.

▪ Individual cores of modern multicore processors are

“SISD”

 SIMD (Single Instruction, Multiple Data)

Supports vector processing

▪ When SIMD instructions are issued, operations on

individual vector components are carried out concurrently

▪ Two 64-element vectors can be added in parallel

▪ Vector processing instructions added to modern CPUs

▪ Example: Intel MMX (multimedia) instructions

October 11, 2022 TCSS462/562:(Software Engineering for) Cloud Computing [Fall 2022] School of Engineering and Technology, University of Washington - Tacoma L4. 23 FLYNN’S TAXONOMY  Exploit data-parallelism: vector operations enable speedups  Vectors architecture provide vector registers that can store entire matrices into a CPU register  SIMD CPU extension (e.g. MMX) add support for vector operations on traditional CPUs  Vector operations reduce total number of instructions for large vector operations  Provides higher potential speedup vs. MIMD architecture  Developers can think sequentially; not worry about parallelism October 11, 2022 TCSS462/562:(Software Engineering for) Cloud Computing [Fall 2022] School of Engineering and Technology, University of Washington - Tacoma L4. 24 (SIMD): VECTOR PROCESSING ADVANTAGES

TCSS 562: Software Engineering for Cloud Computing School of Engineering and Technology, UW-Tacoma  MIMD (Multiple Instructions, Multiple Data) - system with several processors and/or cores that function asynchronously and independently  At any time, different processors/cores may execute different instructions on different data  Multi-core CPUs are MIMD  Processors share memory via interconnection networks ▪ Hypercube, 2D torus, 3D torus, omega network, other topologies  MIMD systems have different methods of sharing memory ▪ Uniform Memory Access (UMA) ▪ Cache Only Memory Access (COMA) ▪ Non-Uniform Memory Access (NUMA) October 11, 2022 TCSS462/562:(Software Engineering for) Cloud Computing [Fall 2022] School of Engineering and Technology, University of Washington - Tacoma L4. 25 FLYNN’S TAXONOMY - 2

 Arithmetic intensity: Ratio of work (W) to

memory traffic r/w (Q)

Example: # of floating point ops per byte of data read

 Characterizes application scalability with SIMD support

SIMD can perform many fast matrix operations in parallel

 High arithmetic Intensity:

P rograms with dense matrix operations scale up nicely

(many calcs vs memory RW, supports lots of parallelism)

 Low arithmetic intensity:

Programs with sparse matrix operations do not scale well

with problem size

(memory RW becomes bottleneck, not enough ops!)

October 11, 2022 TCSS462/562:(Software Engineering for) Cloud Computing [Fall 2022] School of Engineering and Technology, University of Washington - Tacoma L4. 26 ARITHMETIC INTENSITY

TCSS 562: Software Engineering for Cloud Computing School of Engineering and Technology, UW-Tacoma  When program reaches a given arithmetic intensity performance of code running on CPU hits a “roof”  CPU performance bottleneck changes from: memory bandwidth (left) → floating point performance (right) October 11, 2022 TCSS462/562:(Software Engineering for) Cloud Computing [Fall 2022] School of Engineering and Technology, University of Washington - Tacoma L4. 27 ROOFLINE MODEL

Key take-aways:

When a program’s has low

Arithmetic Intensity, memory

bandwidth limits performance..

With high Arithmetic intensity,

the system has peak parallel

performance…

→ performance is limited by??

 Questions from 10/  Cloud Computing – How did we get here? (Marinescu Ch. 2 - 1 st^ edition, Ch. 4 - 2 nd^ edition)  Data, thread-level, task-level parallelism & Parallel architectures  Class Activity 1 – Implicit vs Explicit Parallelism  SIMD architectures, vector processing, multimedia extensions  Graphics processing units  Speed-up, Amdahl's Law, Scaled Speedup  Properties of distributed systems  Modularity  Introduction to Cloud Computing – loosely based on book #1: Cloud Computing Concepts, Technology & Architecture October 11, 2022 TCSS462/562:(Software Engineering for) Cloud Computing [Fall 2022] School of Engineering and Technology, University of Washington - Tacoma L4. 28 OBJECTIVES – 10/

TCSS 562: Software Engineering for Cloud Computing School of Engineering and Technology, UW-Tacoma

 GPU provides multiple SIMD processors

 Typically 7 to 15 SIMD processors each

 32,768 total registers, divided into 16 lanes

(2048 registers each)

 GPU programming model:

single instruction, multiple thread

 Programmed using CUDA - C like programming

language by NVIDIA for GPUs

 CUDA threads – single thread associated with each

data element (e.g. vector or matrix)

 Thousands of threads run concurrently

October 11, 2022 TCSS462/562:(Software Engineering for) Cloud Computing [Fall 2022] School of Engineering and Technology, University of Washington - Tacoma L4. 29 GRAPHICAL PROCESSING UNITS (GPUS)  Questions from 10/  Cloud Computing – How did we get here? (Marinescu Ch. 2 - 1 st^ edition, Ch. 4 - 2 nd^ edition)  Data, thread-level, task-level parallelism & Parallel architectures  Class Activity 1 – Implicit vs Explicit Parallelism  SIMD architectures, vector processing, multimedia extensions  Graphics processing units  Speed-up, Amdahl's Law, Scaled Speedup  Properties of distributed systems  Modularity  Introduction to Cloud Computing – loosely based on book #1: Cloud Computing Concepts, Technology & Architecture October 11, 2022 TCSS462/562:(Software Engineering for) Cloud Computing [Fall 2022] School of Engineering and Technology, University of Washington - Tacoma L4. 30 OBJECTIVES – 10/

TCSS 562: Software Engineering for Cloud Computing School of Engineering and Technology, UW-Tacoma  Parallel hardware and software systems allow:

▪ Solve problems demanding resources not available on

single system.

▪ Reduce time required to obtain solution

 The speed-up (S) measures effectiveness of parallelization: S(N) = T(1) / T(N)

T(1) → execution time of total sequential computation

T(N) → execution time for performing N parallel

computations in parallel

October 11, 2022 TCSS462/562:(Software Engineering for) Cloud Computing [Fall 2022] School of Engineering and Technology, University of Washington - Tacoma L4. 31 PARALLEL COMPUTING  Consider embarrassingly parallel image processing  Eight images (multiple data)  Apply image transformation (greyscale) in parallel  8 - core CPU, 16 hyperthreads  Sequential processing: perform transformations one at a time using a single program thread ▪ 8 images, 3 seconds each: T(1) = 24 seconds  Parallel processing ▪ 8 images, 3 seconds each: T(N) = 3 seconds  Speedup: S(N) = 24 / 3 = 8x speedup  Called “per fect scaling”  Must consider data transfer and computation setup time October 11, 2022 TCSS462/562:(Software Engineering for) Cloud Computing [Fall 2022] School of Engineering and Technology, University of Washington - Tacoma L4. 32 SPEED-UP EXAMPLE

TCSS 562: Software Engineering for Cloud Computing School of Engineering and Technology, UW-Tacoma

 Amdahl’s law is used to estimate the speed - up of a job

using parallel computing

1. Divide job into two parts

2. Part A that will still be sequential

3. Part B that will be sped-up with parallel computing

 Portion of computation which cannot be parallelized will

determine (i.e. limit) the overall speedup

 Amdahl’s law assumes jobs are of a fixed size

 Also, Amdahl’s assumes no overhead for distributing the

work, and a perfectly even work distribution

October 11, 2022 TCSS462/562:(Software Engineering for) Cloud Computing [Fall 2022] School of Engineering and Technology, University of Washington - Tacoma L4. 33 AMDAHL’S LAW  S = theoretical speedup of the whole task  f= fraction of work that is parallel (ex. 25% or 0.25)  N= proposed speed up of the parallel part ( ex. 5 times speedup )  % improvement of task execution = 100 * (1 – (1 / S))  Using Amdahl’s law, what is the maximum possible speed - up? October 11, 2022 TCSS462/562:(Software Engineering for) Cloud Computing [Fall 2022] School of Engineering and Technology, University of Washington - Tacoma L4. 34 AMDAHL’S LAW

TCSS 562: Software Engineering for Cloud Computing School of Engineering and Technology, UW-Tacoma  Program with two independent parts: ▪ Part A is 75% of the execution time ▪ Part B is 25% of the execution time  Part B is made 5 times faster with parallel computing  Estimate the percent improvement of task execution  Original Part A is 3 seconds, Part B is 1 second  N=5 (speedup of part B)  f=.25 (only 25% of the whole job (A+B) will be sped - up)  S=1 / ((1-f) + f/S)  S=1 / ((.75) + .25/5)  S=1.  % improvement = 100 * (1 – 1/1.25) = 20% October 11, 2022 TCSS462/562:(Software Engineering for) Cloud Computing [Fall 2022] School of Engineering and Technology, University of Washington - Tacoma L4. 35 AMDAHL’S LAW EXAMPLE from Wikipedia

 Calculates the scaled speed-up using “N” processors

S(N) = N + (1 - N) α

N: Number of processors

α: fraction of program run time which can’t be parallelized

(e.g. must run sequentially)

Can be used to estimate runtime of parallel portion of program October 11, 2022 TCSS462/562:(Software Engineering for) Cloud Computing [Fall 2022] School of Engineering and Technology, University of Washington - Tacoma L4. 36 GUSTAFSON'S LAW

TCSS 562: Software Engineering for Cloud Computing School of Engineering and Technology, UW-Tacoma

 Calculates the scaled speed-up using “N” processors

S(N) = N + (1 - N) α

N: Number of processors

α: fraction of program run time which can’t be parallelized

(e.g. must run sequentially)

Can be used to estimate runtime of parallel portion of program

 Where α =  / ( + )

 Where = sequential time,  =parallel time

 Our Amdahl’s example: = 3s,  =1s, α =.

October 11, 2022 TCSS462/562:(Software Engineering for) Cloud Computing [Fall 2022] School of Engineering and Technology, University of Washington - Tacoma L4. 37 GUSTAFSON'S LAW

 Calculates the scaled speed-up using “N” processors

S(N) = N + (1 - N) α

N: Number of processors

α: fraction of program run time which can’t be parallelized

(e.g. must run sequentially)

 Example:

Consider a program that is embarrassingly parallel,

but 75% cannot be parallelized. α=.

QUESTION: If deploying the job on a 2 - core CPU, what

scaled speedup is possible assuming the use of two

processes that run in parallel?

October 11, 2022 TCSS462/562:(Software Engineering for) Cloud Computing [Fall 2022] School of Engineering and Technology, University of Washington - Tacoma L4. 38 GUSTAFSON'S LAW

TCSS 562: Software Engineering for Cloud Computing School of Engineering and Technology, UW-Tacoma  QUESTION: What is the maximum theoretical speed - up on a 2 - core CPU?

S(N) = N + (1 - N) α

N=2, α=.

S(N) = 2 + (1 - 2).

S(N) =?

 What is the maximum theoretical speed - up on a 16-core CPU?

S(N) = N + (1 - N) α

N=16, α=.

S(N) = 16 + (1 - 16).

S(N) =?

October 11, 2022 TCSS462/562:(Software Engineering for) Cloud Computing [Fall 2022] School of Engineering and Technology, University of Washington - Tacoma L4. 39 GUSTAFSON’S EXAMPLE  QUESTION: What is the maximum theoretical speed - up on a 2 - core CPU?

S(N) = N + (1 - N) α

N=2, α=.

S(N) = 2 + (1 - 2).

S(N) =?

 What is the maximum theoretical speed - up on a 16 - core CPU?

S(N) = N + (1 - N) α

N=16, α=.

S(N) = 16 + (1 - 16).

S(N) =?

October 11, 2022 TCSS462/562:(Software Engineering for) Cloud Computing [Fall 2022] School of Engineering and Technology, University of Washington - Tacoma L4. 40 GUSTAFSON’S EXAMPLE For 2 CPUs, speed up is 1.25x For 16 CPUs, speed up is 4.75x For 2 CPUs, speed up is 1.25x For 16 CPUs, speed up is 4.75x

TCSS 562: Software Engineering for Cloud Computing School of Engineering and Technology, UW-Tacoma  Transistors on a chip doubles approximately every 1.5 years  CPUs now have billions of transistors  Power dissipation issues at faster clock rates leads to heat removal challenges ▪ Transition from: increasing clock rates → to adding CPU cores  Symmetric core processor – multi-core CPU, all cores have the same computational resources and speed  Asymmetric core processor – on a multi-core CPU, some cores have more resources and speed  Dynamic core processor – processing resources and speed can be dynamically configured among cores  Observation: asymmetric processors of fer a higher speedup October 11, 2022 TCSS462/562:(Software Engineering for) Cloud Computing [Fall 2022] School of Engineering and Technology, University of Washington - Tacoma L4. 41 MOORE’S LAW  Questions from 10/6  Cloud Computing – How did we get here? (Marinescu Ch. 2 - 1 st^ edition, Ch. 4 - 2 nd^ edition)  Data, thread-level, task-level parallelism & Parallel architectures  Class Activity 1 – Implicit vs Explicit Parallelism  SIMD architectures, vector processing, multimedia extensions  Graphics processing units  Speed-up, Amdahl's Law, Scaled Speedup  Properties of distributed systems  Modularity  Introduction to Cloud Computing – loosely based on book #1: Cloud Computing Concepts, Technology & Architecture October 11, 2022 TCSS462/562:(Software Engineering for) Cloud Computing [Fall 2022] School of Engineering and Technology, University of Washington - Tacoma L4. 42 OBJECTIVES – 10/11

TCSS 562: Software Engineering for Cloud Computing School of Engineering and Technology, UW-Tacoma  Collection of autonomous computers, connected through a network with distribution software called “middleware” that enables coordination of activities and sharing of resources  Key characteristics:  Users perceive system as a single, integrated computing facility.  Compute nodes are autonomous  Scheduling, resource management, and security implemented by every node  Multiple points of control and failure  Nodes may not be accessible at all times  System can be scaled by adding additional nodes  Availability at low levels of HW/software/network reliability October 11, 2022 TCSS462/562:(Software Engineering for) Cloud Computing [Fall 2022] School of Engineering and Technology, University of Washington - Tacoma L4. 43 DISTRIBUTED SYSTEMS  Key non-functional attributes ▪ Known as “ilities” in software engineering  Availability – 24/7 access?  Reliability - Fault tolerance  Accessibility – reachable?  Usability – user friendly  Understandability – can under  Scalability – responds to variable demand  Extensibility – can be easily modified, extended  Maintainability – can be easily fixed  Consistency – data is replicated correctly in timely manner October 11, 2022 TCSS462/562:(Software Engineering for) Cloud Computing [Fall 2022] School of Engineering and Technology, University of Washington - Tacoma L4. 44 DISTRIBUTED SYSTEMS - 2

TCSS 562: Software Engineering for Cloud Computing School of Engineering and Technology, UW-Tacoma  Access transparency: local and remote objects accessed using identical operations  Location transparency: objects accessed w/o knowledge of their location.  Concurrency transparency: several processes run concurrently using shared objects w/o interference among them  Replication transparency : multiple instances of objects are used to increase reliability

- users are unaware if and how the system is replicated  Failure transparency : concealment of faults  Migration transparency: objects are moved w/o affecting operations performed on them  Per formance transparency : system can be reconfigured based on load and quality of service requirements  Scaling transparency: system and applications can scale w/o change in system structure and w/o affecting applications October 11, 2022 TCSS462/562:(Software Engineering for) Cloud Computing [Fall 2022] School of Engineering and Technology, University of Washington - Tacoma L4. 45 TRANSPARENCY PROPERTIES OF DISTRIBUTED SYSTEMS  Questions from 10/6  Cloud Computing – How did we get here? (Marinescu Ch. 2 - 1 st^ edition, Ch. 4 - 2 nd^ edition)  Data, thread-level, task-level parallelism & Parallel architectures  Class Activity 1 – Implicit vs Explicit Parallelism  SIMD architectures, vector processing, multimedia extensions  Graphics processing units  Speed-up, Amdahl's Law, Scaled Speedup  Properties of distributed systems  Modularity  Introduction to Cloud Computing – loosely based on book #1: Cloud Computing Concepts, Technology & Architecture October 11, 2022 TCSS462/562:(Software Engineering for) Cloud Computing [Fall 2022] School of Engineering and Technology, University of Washington - Tacoma L4. 46 OBJECTIVES – 10/11

TCSS 562: Software Engineering for Cloud Computing School of Engineering and Technology, UW-Tacoma  Sof t modularity: TRADITIONAL  Divide a program into modules (classes) that call each other and communicate with shared-memory  A procedure calling convention is used (or method invocation)  Enforced modularity: CLOUD COMPUTING  Program is divided into modules that communicate only through message passing  The ubiquitous client-server paradigm  Clients and servers are independent decoupled modules  System is more robust if servers are stateless  May be scaled and deployed separately  May also FAIL separately! October 11, 2022 TCSS462/562:(Software Engineering for) Cloud Computing [Fall 2022] School of Engineering and Technology, University of Washington - Tacoma L4. 47 TYPES OF MODULARITY

 Multi-core CPU technology and hyper-threading

 What is a

▪ Heterogeneous system? ▪ Homogeneous system? ▪ Autonomous or self-organizing system?

 Fine grained vs. coarse grained parallelism

 Parallel message passing code is easier to debug than

shared memory (e.g. p-threads)

 Know your application’s max/avg Thread Level

Parallelism ( TLP )

 Data-level parallelism: Map-Reduce, (SIMD) Single

Instruction Multiple Data, Vector processing & GPUs

October 11, 2022 TCSS462/562:(Software Engineering for) Cloud Computing [Fall 2022] School of Engineering and Technology, University of Washington - Tacoma L4. 48 CLOUD COMPUTING – HOW DID WE GET HERE? SUMMARY OF KEY POINTS

TCSS 562: Software Engineering for Cloud Computing School of Engineering and Technology, UW-Tacoma  Bit-level parallelism  Instruction-level parallelism (CPU pipelining)  Flynn’s taxonomy: computer system architecture classification ▪ SISD – Single Instruction, Single Data (modern core of a CPU) ▪ SIMD – Single Instruction, Multiple Data (Data parallelism) ▪ MIMD – Multiple Instruction, Multiple Data ▪ MISD is RARE; application for fault tolerance…  Arithmetic intensity: ratio of calculations vs memory RW  Roofline model: Memory bottleneck with low arithmetic intensity  GPUs: ideal for programs with high arithmetic intensity ▪ SIMD and Vector processing supported by many large registers October 11, 2022 TCSS462/562:(Software Engineering for) Cloud Computing [Fall 2022] School of Engineering and Technology, University of Washington - Tacoma L4. 49 CLOUD COMPUTING – HOW DID WE GET HERE? SUMMARY OF KEY POINTS - 2  Speed-up (S)

S(N) = T(1) / T(N)

 Amdahl’s law:

S = 1/ α

α = percent of program that must be sequential

 Scaled speedup with N processes:

S(N) = N – α( N-1)

 Moore’s Law

 Symmetric core, Asymmetric core, Dynamic core CPU

 Distributed Systems Non-function quality attributes

 Distributed Systems – Types of Transparency

 Types of modularity - Soft, Enforced

October 11, 2022 TCSS462/562:(Software Engineering for) Cloud Computing [Fall 2022] School of Engineering and Technology, University of Washington - Tacoma L4. 50 CLOUD COMPUTING – HOW DID WE GET HERE? SUMMARY OF KEY POINTS - 3