Download Profiling - High Performance Computing - Lecture Slides and more Slides Computer Science in PDF only on Docsity!
High Performance Computing
Lecture 33
2
Profiling
Profiler: A tool that helps you identify the
`important’ parts of your program to concentrate
your optimization efforts
Profile: a breakup (of execution time) across
the different parts of the program
Can be done by adding statements to your
program (instrumentation) -- so that during
execution, data is gathered, outputted and
possibly processed later
Automation: where a profiling tool adds those
instructions into your program for you
4
Why Function Level Profiling?
How useful can it be to identify and optimize a
few functions of a program?
Example: LINPACK Benchmark
LINPACK: A Linear Algebra package
The benchmark solves a large system of linear
equations by Gaussian elimination using LINPACK
routines
Benchmark programs are used to compare the performance of computer systems
It spends most (~70%) of its run time in SAXPY
5
Prof: UNIX Function Level Profiling
Usage
% cc – p program.c /generates instrumented a.out
% a.out / execution; instrumentation
/ generates data and mon.out
% prof / processing of profile data
Output gives a function by function breakup
of execution time
Useful in identifying which functions to
concentrate optimization efforts on
7
What profil( ) does
One of the parameters in call to profil() is a
buffer
It is used as an array of counters initialized to 0
The array elements are associated with
contiguous regions of the program text
During program execution
PC value is sampled
once every clock tick (typical default: 10 msec)
triggered by the hardware timer interrupt
Corresponding buffer element is incremented
8
What profil( ) does
Program text main() func1() func2() func3() func4() func5() func6() func7() func8() 0x 0x000FA4D profil( ) buffer PC value 0 bufsize- 1 scaled to a value between 0 and bufsize- 1 Every 10 msec And that buffer counter is incremented
10
Using prof
From how it works, we understand that
The granularity is at best 10 msec
The generated profile could differ for multiple runs
of a program running on the same input data
Remember that there could be other programs running on the same system This can affect the behaviour of the profiling run in terms of page faults, cache misses, etc
And could even be completely wrong
e.g., there could be a particular function that just happens to be running each time the timer interrupt occurs
11
Prof giving bad time estimates
10 msec A( ) B( ) The prof profile will show 100% of the execution time being spent in function A()
13
Pixie: Basic Block Level Profiling
A different style of profiling
Usage
% cc program.c / a.out
% pixie a.out / instrumented a.out.pixie
% a.out.pixie / profile output file
% prof / report on profile data
Output is based on basic block level
execution counts
Useful for all kinds of things
14
What is a Basic Block?
A section of program that does not cross any
conditional branches, loop boundaries or other
transfers of control
A sequence of instructions with a single entry
point, single exit point, and no internal branches
A sequence of program statements that contains
no labels and no branches
A basic block can only be executed completely
and in sequence
16
Identifying Basic Blocks of a Program
Basic blocks are defined by control transfer
instructions and their targets
Program Text Problem case: JR R
- Target address known only when program runs
- Target address can be different each time instruction is executed
17
Pixie: How it works
1. Identification of basic blocks
- Q: How can basic blocks be identified?
- Pixie uses heuristics where necessary
2. Instrumentation
- Increment a counter for the basic block
- On program entry and exit: initialization of data
structures; writing profile output file
19
How intrusive are these mechanisms?
Pixie
The instrumented executable program can be
much larger than the original program
20
Pixie instrumentation
In each basic block, instructions must be
added to increment an execution counter for
that basic block
The counters cannot be maintained in registers
There can be a lot of basic blocks in a program
At least three MIPS 1 instructions would be
needed
LW R1, counter ADDI R1, R1, 1 SW counter, R