02 programming with openmp 3 0 rev1.1.3, Thesis of Accelerator Physics

mmm - mmm - mmm - mmm

Typology: Thesis

2015/2016

Uploaded on 05/04/2016

Avneendra.Kanva
Avneendra.Kanva 🇬🇧

2 documents

1 / 85

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Programming with OpenMP*
Intel Software College
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27
pf28
pf29
pf2a
pf2b
pf2c
pf2d
pf2e
pf2f
pf30
pf31
pf32
pf33
pf34
pf35
pf36
pf37
pf38
pf39
pf3a
pf3b
pf3c
pf3d
pf3e
pf3f
pf40
pf41
pf42
pf43
pf44
pf45
pf46
pf47
pf48
pf49
pf4a
pf4b
pf4c
pf4d
pf4e
pf4f
pf50
pf51
pf52
pf53
pf54
pf55

Partial preview of the text

Download 02 programming with openmp 3 0 rev1.1.3 and more Thesis Accelerator Physics in PDF only on Docsity!

Programming with OpenMP*

Intel Software College

Copyright © 2008, Intel Corporation. All rights reserved.

Objectives

Upon completion of this module you will be able to use OpenMP

to:

  • (^) implement data parallelism
  • (^) implement task parallelism

Copyright © 2008, Intel Corporation. All rights reserved.

What Is OpenMP?

Portable, shared-memory threading API

– Fortran, C, and C++

– Multi-vendor support for both Linux and Windows

Standardizes task & loop-level parallelism

Supports coarse-grained parallelism

Combines serial and parallel code in single

source

Standardizes ~ 20 years of compiler-directed

threading experience

http://www.openmp.org

Current spec is OpenMP 3.

318 Pages

(combined C/C++ and Fortran)

Copyright © 2008, Intel Corporation. All rights reserved.

Programming Model

Fork-Join Parallelism :

  • Master thread^ spawns a^ team of threads^ as needed
  • Parallelism is added incrementally: that is, the sequential

program evolves into a parallel program

Parallel Regions

Master

Thread

Copyright © 2008, Intel Corporation. All rights reserved.

Agenda

What is OpenMP? Parallel regions Worksharing Data environment Synchronization Curriculum Placement Optional Advanced topics

Copyright © 2008, Intel Corporation. All rights reserved.

Most OpenMP constructs apply to structured blocks

  • (^) Structured block: a block with one point of entry at the

top and one point of exit at the bottom

  • (^) The only “branches” allowed are STOP statements in

Fortran and exit() in C/C++

A structured block Not a structured block

Parallel Region & Structured Blocks

(C/C++)

if (go_now()) goto more; #pragma omp parallel { int id = omp_get_thread_num(); more: res[id] = do_big_job(id); if (conv (res[id]) goto done; goto more; } done: if (!really_done()) goto more; #pragma omp parallel { int id = omp_get_thread_num(); more: res[id] = do_big_job (id); if (conv (res[id]) goto more; } printf (“All done\n”);

Copyright © 2008, Intel Corporation. All rights reserved.

Agenda

What is OpenMP? Parallel regions Worksharing – Parallel For Data environment Synchronization Curriculum Placement Optional Advanced topics

Copyright © 2008, Intel Corporation. All rights reserved.

Worksharing

Worksharing is the general term used in OpenMP to

describe distribution of work across threads.

Three examples of worksharing in OpenMP are:

• omp for construct

• omp sections construct

• omp task construct

Automatically divides work

among threads

Copyright © 2008, Intel Corporation. All rights reserved.

Combining constructs

These two code segments are equivalent

#pragma omp parallel { #pragma omp for for (i=0;i< MAX; i++) { res[i] = huge(); } } #pragma omp parallel for for (i=0;i< MAX; i++) { res[i] = huge(); }

Copyright © 2008, Intel Corporation. All rights reserved.

The Private Clause

Reproduces the variable for each task

  • (^) Variables are un-initialized; C++ object is default constructed
  • (^) Any value external to the parallel region is undefined void work(float c, int N) {** float x, y; int i; #pragma omp parallel for private(x,y) for(i=0; i<N; i++) { x = a[i]; y = b[i]; c[i] = x + y; } }

Copyright © 2008, Intel Corporation. All rights reserved.

The schedule clause

The schedule clause affects how loop iterations are mapped onto threads schedule(static [,chunk])

  • Blocks of iterations of size “chunk” to threads
  • (^) Round robin distribution
  • (^) Low overhead, may cause load imbalance schedule(dynamic[,chunk])
  • (^) Threads grab “chunk” iterations
  • When done with iterations, thread requests next set
  • (^) Higher threading overhead, can reduce load imbalance schedule(guided[,chunk])
  • (^) Dynamic schedule starting with large block
  • (^) Size of the blocks shrink; no smaller than “chunk”

Copyright © 2008, Intel Corporation. All rights reserved.

Schedule Clause Example

#pragma omp parallel for schedule (static, 8) for( int i = start; i <= end; i += 2 ) { if ( TestForPrime(i) ) gPrimesFound++; }

Iterations are divided into chunks of 8

  • (^) If start = 3, then first chunk is i ={3,5,7,9,11,13,15,17}

Copyright © 2008, Intel Corporation. All rights reserved.

Agenda

What is OpenMP? Parallel regions Worksharing – Parallel Sections Data environment Synchronization Curriculum Placement Optional Advanced topics

Copyright © 2008, Intel Corporation. All rights reserved. Task Decomposition a = alice(); b = bob(); s = boss(a, b); c = cy(); printf ("%6.2f\n", bigboss(s,c));

alice,bob, and cy

can be computed

in parallel

alice bob

boss

bigboss

cy