Performance Programming and Static Analysis for Security: Lecture 28 by Sean Peisert, Slides of Computer Science

A transcript of Lecture 28 from the Performance Programming and Static Analysis for Security course given by Sean Peisert at Lawrence Livermore National Labs in Spring 2009. The lecture covers topics such as Power3's capabilities, harnessing its power, FLOP to MemOp ratio, pipeline latency, Fortran vs C, and security vulnerabilities like buffer overflows and TOCTTOU.

Typology: Slides

2019/2020

Uploaded on 06/15/2020

judyth
judyth 🇺🇸

4.6

(27)

316 documents

1 / 40

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27
pf28

Partial preview of the text

Download Performance Programming and Static Analysis for Security: Lecture 28 by Sean Peisert and more Slides Computer Science in PDF only on Docsity!

1

Performance Programming

& Static Analysis for Security

Lecture 28

(partially taken from slides given at Lawrence

Livermore National Labs by Larry Carter and

Sean Peisert)

Dr. Sean Peisert – ECS 142 – Spring 2009

Status

  • Project 3 back sometime in the middle of this week.
  • Project 4 due this Friday, 11:55pm 2

4 Power3’s power … and limits

  • Eight pipelined functional units
  • 2 floating point
  • 2 load/store
  • 2 single-cycle integer
  • 1 multi-cycle integer
  • 1 branch
  • Powerful operations
  • Fused multiply-add (FMA)
  • Load (or Store) update
  • Launch 4 ops per cycle
  • Can’t launch 2 stores/cyc
  • FMA pipe 3-4 cycles long
  • Memory hierarchy (Tues)

5 Can its power be harnessed?

for (j=0; j<n; j+=4){
p00 += a[j+0]*a[j+2];
m00 -= a[j+0]*a[j+2];
p01 += a[j+1]*a[j+3];
m01 -= a[j+1]*a[j+3];
p10 += a[j+0]*a[j+3];
m10 -= a[j+0]*a[j+3];
p11 += a[j+1]*a[j+2];
m11 -= a[j+1]*a[j+2];
8 FMAʼs
4 Loads
Runs at 4.6 cycles/iteration (= 772 MFLOP/S)
CL.6:

FMA fp31=fp31,fp2,fp0,fcr LFL fp1=()double(gr3,16) FNMS fp30=fp30,fp2,fp0,fcr LFDU fp3,gr3=()double(gr3,32) FMA fp24=fp24,fp0,fp1,fcr FNMS fp25=fp25,fp0,fp1,fcr LFL fp0=()double(gr3,24) FMA fp27=fp27,fp2,fp3,fcr FNMS fp26=fp26,fp2,fp3,fcr LFL fp2=()double(gr3,8) FMA fp29=fp29,fp1,fp3,fcr FNMS fp28=fp28,fp1,fp3,fcr BCT ctr=CL.6,

7 FLOP to MemOp ratio

  • Most scientific programs have at most one FMA per MemOp

     Matrix-vector product: ( _K+1_ ) loads, _K_ fma’s 
    • FFT butterfly: 8 MemOps, 10 floats (but 5 or 6 FMA)
    • DAXPY: 2 Loads, 1 Store, 1 FMA
    • DDOT: 2 Loads, 1 FMA
  • A few have more (use ESSL!)

     Matrix multiply (well-tuned): 2 FMA per load 
    • Radix-8 FFT
  • Performance is limited by Memory Operations!

8 The effect of pipeline latency for (i=0; i<size; i++) { sum = a[i] + sum; } for (i=0; i<size; i+=4) { sum0 += a[i]; sum1 += a[i+1]; sum2 += a[i+2]; sum3 += a[i+3]; } sum = sum0+sum1+sum2+sum3; 3.86 cycles/addition 1.1 cycles/addition Next add can’t start until previous is finished (3 to 4 cycles later)

What’s so great about Fortran??

DO I = 1, N
A(I) = B(I)
ENDDO
CL.8:

L4A gr0=b(gr5,4) L4A gr6=b(gr5,8) L4A gr7=b(gr5,12) L4AU gr8,gr5=b(gr5,16) ST4A a(gr4,8)=gr ST4A a(gr4,4)=gr ST4A a(gr4,12)=gr ST4U gr4,a(gr4,16)=gr BCT ctr=CL.8,

for (i=0; i<N; i++) {
b[i] = a[i];
CL.6:

ST4U gr4,()int(gr4,4)=gr L4AU gr24,gr3=()int(gr3,4) BCT ctr=CL.6,

10 Fortran vs C - what’s going on??

  • C prevents compiler from unrolling code

A feature, not a bug!

User may want b[0] and a[1] to be same location

tricky way to set a[n] = ..… = a[1] = a[0]

Most C compilers don’t try to prove non-aliasing 

a and b were malloc-ed in this example

Fortran doesn’t allow arrays to be aliased 

Unless explicit, e.g. via EQUIVALENCE

12 Decreasing MemOp to FLOP Ratio for (i=1; i<N; i++) for (j=1; j<N; j++) b[i,j] = 0.25 * (a[i-1][j] + a[i+1][j]

  • a[i,j-1] + a[i][j-1]); for (i=1; i<N-2; i+=3) { for(j=1; j<N; j++) { b[i+0][j] = ... ; b[i+1][j] = ... ; b[i+2][j] = ... ; } } 3 loads 4 floats 1 store 5 loads 12 floats 3 store

Compilers: Topics Not Covered

Lambda Calculus

Foundation for Programming

Languages

Instruction Scheduling

Compiling Exceptions

Vulnerable Program? int main(int argc, char *argv[]) { char buffer[500]; strcpy(buffer, argv[1]); printf("Safe program?"); return 0; }

Shellcode \x31\xc0\x50\x68\x2\x2\x73\x68\x68\x2\x62\x 69\x6\x89\xe3\x50\x53\x50\x54\x53\xb0\x3\x 0\xcd\x

Buffer Overflows 18

Impact of Buffer Overflows

  • Buffer overflows (including those on the stack, the heap, and also integer overflows) have dominated exploitable programming flaws for years. (http://www.sans.org/top20/)
  • E.g., worms: Blaster, Morris, Slammer, Witty, etc....
  • It would be great if programmers just wrote better code and did their own bounds checking. But in practice, even good coders make mistakes.