Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Analysis of Fork-Join Parallel Programs, Slides of Programming Languages

Aligarh Muslim University Programming Languages

In all programming language only syntax is different not the logic. This course discuss core concepts for many different programming language and techniques. Key points for this lecture are:Analysis of Fork-Join Parallel Programs, Sophomoric, Parallelism and Concurrency, Amdahl's Law, Asymptotic Analysis for Fork-Join Parallelism, Arrays & Balanced Trees, Reductions, Associative Operator, Data Parallelism, Mapreduce on Clusters

Typology: Slides

2012/2013

Uploaded on 09/29/2013

dhanvant 🇮🇳

4.9

(9)

89 documents

1 / 27

This page cannot be seen from the preview

Don't miss anything!

A Sophomoric Introduction to Shared-Memory

Parallelism and Concurrency

Analysis of Fork-Join Parallel Programs

docsity.com

Discover Slides of Programming Languages Aligarh Muslim University

Partial preview of the text

Download Analysis of Fork-Join Parallel Programs and more Slides Programming Languages in PDF only on Docsity!

A Sophomoric Introduction to Shared-Memory

Parallelism and Concurrency

Analysis of Fork-Join Parallel Programs

Outline

Done:

How to use fork and join to write a parallel algorithm
Why using divide-and-conquer with lots of small tasks is best
- Combines results in parallel
Some Java and ForkJoin Framework specifics
- More pragmatics (e.g., installation) in separate notes

Now:

More examples of simple parallel programs
Arrays & balanced trees support parallelism better than linked lists
Asymptotic analysis for fork-join parallelism
Amdahl’s Law

Examples

Maximum or minimum element
Is there an element satisfying some property (e.g., is there a 17)?
Left-most element satisfying some property (e.g., first 17)
- What should the recursive tasks return?
- How should we merge the results?
Corners of a rectangle containing all points (a “bounding box”)
Counts, for example, number of strings that start with a vowel
- This is just summing with a different base case
- Many problems are!

Reductions

Computations of this form are called reductions (or reduces?)
Produce single answer from collection via an associative operator
- Examples: max, count, leftmost, rightmost, sum, product, …
- Non-examples: median, subtraction, exponentiation
(Recursive) results don’t have to be single numbers or strings. They can be arrays or objects with multiple fields. - Example: Histogram of test results is a variant of sum
But some things are inherently sequential
- How we process arr[i] may depend entirely on the result of processing arr[i-1]

Maps in ForkJoin Framework

Even though there is no result-combining, it still helps with load balancing to create many small tasks - Maybe not for vector-add but for more compute-intensive maps - The forking is O(log n) whereas theoretically other approaches to vector-add is O(1)

class VecAdd extends RecursiveAction { int lo; int hi; int[] res; int[] arr1; int[] arr2; VecAdd(int l,int h,int[] r,int[] a1,int[] a2){ … } protected void compute(){ if(hi – lo < SEQUENTIAL_CUTOFF) { for(int i=lo; i < hi; i++) res[i] = arr1[i] + arr2[i]; } else { int mid = (hi+lo)/2; VecAdd left = new VecAdd(lo,mid,res,arr1,arr2); VecAdd right= new VecAdd(mid,hi,res,arr1,arr2); left.fork(); right.compute(); left.join(); } } } static final ForkJoinPool fjPool = new ForkJoinPool(); int[] add(int[] arr1, int[] arr2){ assert (arr1.length == arr2.length); int[] ans = new int[arr1.length]; fjPool.invoke(new VecAdd(0,arr.length,ans,arr1,arr2); return ans; }

Maps and reductions

Maps and reductions: the “workhorses” of parallel programming

By far the two most important and common patterns
- Two more-advanced patterns in next lecture
Learn to recognize when an algorithm can be written in terms of maps and reductions
Use maps and reductions to describe (parallel) algorithms
Programming them becomes “trivial” with a little practice
- Exactly like sequential for-loops seem second-nature

Trees

Maps and reductions work just fine on balanced trees
- Divide-and-conquer each child rather than array subranges
- Correct for unbalanced trees, but won’t get much speed-up
Example: minimum element in an unsorted but balanced binary tree in O ( log n ) time given enough processors
How to do the sequential cut-off?
- Store number-of-descendants at each node (easy to maintain)
- Or could approximate it with, e.g., AVL-tree height

Linked lists

Can you parallelize maps or reduces over linked lists?
- Example: Increment all elements of a linked list
- Example: Sum all elements of a linked list
- Parallelism still beneficial for expensive per-element operations

b c d e f

front back

Once again, data structures matter!
For parallelism, balanced trees generally better than lists so that we can get to all the data exponentially faster O ( log n ) vs. O ( n ) - Trees have the same flexibility as lists compared to arrays

Work and Span

Let TP be the running time if there are P processors available

Two key measures of run-time:

Work: How long it would take 1 processor = T 1
- Just “sequentialize” the recursive forking
Span: How long it would take infinity processors = T 
- The longest dependence-chain
- Example: O ( log n ) for summing an array
  - Notice having > n /2 processors is no additional help
- Also called “critical path length” or “computational depth”

The DAG

A program execution using fork and join can be seen as a DAG
- Nodes: Pieces of work
- Edges: Source must finish before destination starts
  - A fork “ends a node” and makes two outgoing edges - New thread - Continuation of current thread
  - A join “ends a node” and makes a node with two incoming edges - Node just ended - Last node of thread joined on

More interesting DAGs?

The DAGs are not always this simple
Example:
- Suppose combining two results might be expensive enough that we want to parallelize each one
- Then each node in the inverted tree on the previous slide would itself expand into another set of nodes for that parallel computation

Connecting to performance

Recall: TP = running time if there are P processors available
Work = T 1 = sum of run-time of all nodes in the DAG
- That lonely processor does everything
- Any topological sort is a legal execution
- O ( n ) for simple maps and reductions
Span = T  = sum of run-time of all nodes on the most-expensive path in the DAG - Note: costs are on the nodes not the edges - Our infinite army can do everything that is ready to be done, but still has to wait for earlier results - O ( log n ) for simple maps and reductions

Optimal TP: Thanks ForkJoin library!

So we know T 1 and T (^)  but we want TP (e.g., P =4)
Ignoring memory-hierarchy issues (caching), TP can’t beat
- T 1 / P why not?
- T (^)  why not?
So an asymptotically optimal execution would be:

TP = O ((T 1 / P) + T  )

First term dominates for small P , second for large P
The ForkJoin Framework gives an expected-time guarantee of asymptotically optimal!
Expected time because it flips coins when scheduling
How? For an advanced course (few need to know)
Guarantee requires a few assumptions about your code…

Division of responsibility

Our job as ForkJoin Framework users:
- Pick a good algorithm, write a program
- When run, program creates a DAG of things to do
- Make all the nodes a small-ish and approximately equal amount of work
The framework-writer’s job:
- Assign work to available processors to avoid idling
  - Let framework-user ignore all scheduling issues
- Keep constant factors low
- Give the expected-time optimal guarantee assuming framework-user did his/her job

Analysis of Fork-Join Parallel Programs, Slides of Programming Languages

Related documents

Partial preview of the text

Download Analysis of Fork-Join Parallel Programs and more Slides Programming Languages in PDF only on Docsity!

A Sophomoric Introduction to Shared-Memory

Parallelism and Concurrency

Analysis of Fork-Join Parallel Programs

Outline

Examples

Reductions

Maps in ForkJoin Framework

Maps and reductions

Trees

Linked lists

Work and Span

The DAG

More interesting DAGs?

Connecting to performance

Optimal TP: Thanks ForkJoin library!

TP = O ((T 1 / P) + T  )

Division of responsibility

TP = O ((T 1 / P) + T  )