Profiling - High Performance Computing - Lecture Slides, Slides of Computer Science

Some concept of High Performance Computing are Addressing Modes, Program Execution, Basic Computer Organization, Control Hazard Solutions, Least Recently Used, Memory Hierarchy Progression. Main points of this lecture are: Profiling, Profiler, Helps, Optimization, Breakup, Execution Time, Instrumentation, Data is Gathered, Outputted, Possibly Processed Later

Typology: Slides

2012/2013

Uploaded on 04/28/2013

dewaan
dewaan 🇮🇳

3.8

(4)

43 documents

1 / 23

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
High Performance Computing
Lecture 33
Docsity.com
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17

Partial preview of the text

Download Profiling - High Performance Computing - Lecture Slides and more Slides Computer Science in PDF only on Docsity!

High Performance Computing

Lecture 33

2

Profiling

 Profiler: A tool that helps you identify the

`important’ parts of your program to concentrate

your optimization efforts

 Profile: a breakup (of execution time) across

the different parts of the program

 Can be done by adding statements to your

program (instrumentation) -- so that during

execution, data is gathered, outputted and

possibly processed later

 Automation: where a profiling tool adds those

instructions into your program for you

4

Why Function Level Profiling?

 How useful can it be to identify and optimize a

few functions of a program?

 Example: LINPACK Benchmark

 LINPACK: A Linear Algebra package

 The benchmark solves a large system of linear

equations by Gaussian elimination using LINPACK

routines

 Benchmark programs are used to compare the performance of computer systems

 It spends most (~70%) of its run time in SAXPY

5

Prof: UNIX Function Level Profiling

 Usage

% cc – p program.c /generates instrumented a.out

% a.out / execution; instrumentation

/ generates data and mon.out

% prof / processing of profile data

 Output gives a function by function breakup

of execution time

 Useful in identifying which functions to

concentrate optimization efforts on

7

What profil( ) does

 One of the parameters in call to profil() is a

buffer

 It is used as an array of counters initialized to 0

 The array elements are associated with

contiguous regions of the program text

 During program execution

 PC value is sampled

 once every clock tick (typical default: 10 msec)

 triggered by the hardware timer interrupt

 Corresponding buffer element is incremented

8

What profil( ) does

Program text main() func1() func2() func3() func4() func5() func6() func7() func8() 0x 0x000FA4D profil( ) buffer PC value 0 bufsize- 1 scaled to a value between 0 and bufsize- 1 Every 10 msec And that buffer counter is incremented

10

Using prof

 From how it works, we understand that

 The granularity is at best 10 msec

 The generated profile could differ for multiple runs

of a program running on the same input data

 Remember that there could be other programs running on the same system  This can affect the behaviour of the profiling run in terms of page faults, cache misses, etc

 And could even be completely wrong

 e.g., there could be a particular function that just happens to be running each time the timer interrupt occurs

11

Prof giving bad time estimates

10 msec A( ) B( ) The prof profile will show 100% of the execution time being spent in function A()

13

Pixie: Basic Block Level Profiling

 A different style of profiling

 Usage

% cc program.c / a.out

% pixie a.out / instrumented a.out.pixie

% a.out.pixie / profile output file

% prof / report on profile data

 Output is based on basic block level

execution counts

 Useful for all kinds of things

14

What is a Basic Block?

 A section of program that does not cross any

conditional branches, loop boundaries or other

transfers of control

 A sequence of instructions with a single entry

point, single exit point, and no internal branches

 A sequence of program statements that contains

no labels and no branches

 A basic block can only be executed completely

and in sequence

16

Identifying Basic Blocks of a Program

 Basic blocks are defined by control transfer

instructions and their targets

Program Text Problem case: JR R

  • Target address known only when program runs
  • Target address can be different each time instruction is executed

17

Pixie: How it works

1. Identification of basic blocks

  • Q: How can basic blocks be identified?
  • Pixie uses heuristics where necessary

2. Instrumentation

  • Increment a counter for the basic block
  • On program entry and exit: initialization of data

structures; writing profile output file

19

How intrusive are these mechanisms?

 Pixie

 The instrumented executable program can be

much larger than the original program

20

Pixie instrumentation

 In each basic block, instructions must be

added to increment an execution counter for

that basic block

 The counters cannot be maintained in registers

 There can be a lot of basic blocks in a program

 At least three MIPS 1 instructions would be

needed

 LW R1, counter  ADDI R1, R1, 1  SW counter, R