Dynamic Optimization Motivation - Lecture Slides | ECE 512, Study Guides, Projects, Research of Computer Architecture and Organization

Material Type: Project; Class: Computer Microarchitecture; Subject: Electrical and Computer Engr; University: University of Illinois - Urbana-Champaign; Term: Spring 2005;

Typology: Study Guides, Projects, Research

Pre 2010

Uploaded on 02/24/2010

koofers-user-j8f
koofers-user-j8f 🇺🇸

5

(1)

9 documents

1 / 26

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
A Survey of Dynamic
Optimization Techniques
ECE 512
John King
Jim Simon
Brian Watson
April 25, 2005
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a

Partial preview of the text

Download Dynamic Optimization Motivation - Lecture Slides | ECE 512 and more Study Guides, Projects, Research Computer Architecture and Organization in PDF only on Docsity!

A Survey of Dynamic

Optimization Techniques

ECE 512

John King

Jim Simon

Brian Watson

April 25, 2005

Control flow limits compile time optimization

Compiler must produce correct code for all paths

Dynamic binding, indirect jumps/function calls

Dynamically linking prevents some optimizations

Branch biases may change during execution

ISVs are hesitant to recompile for performance

Utilizes excess CPU time and transistor budget

Dynamic Optimization

Motivation

Constructed using runtime execution information

Dynamic control flow information is known

Traces uncover additional optimization potential

that cannot be seen statically

Optimized to execute “hot” control flows only

3% of static instructions represent 90% of dynamic execution

May need to recover from incorrect paths

Dynamic Optimization

Advantages

Software-based

No hardware, compiler, or OS support needed;

completely transparent

OS and hardware dependent

Overhead discovering and optimizing traces

Single-entry, multiple-exit traces

Persistent code storage

Dynamo

Loop unrolling and function inlining

Indirect branch inlining

Code sinking (to exit stubs)

Copy and constant propagation

Loop-invariant code motion

Strength Reduction

Runtime-disambiguated load removal

Branch linking to other traces

Dynamo Optimizations

Branch, function call are inlined

Exit stubs with fix-up code if control takes a different path

Dynamo Example

trap to

Dynamo

A

C

D

E

F

H

B

D

trap to

Dynamo

A

C

D

E

F

H

return A B C D E F (^) G H call

Hardware Advantages

Dedicated hardware designed for dynamic optimization

Cache or memory structures may avoid conflicts with processor

TLB or cache

Uses little to no CPU time

Transparent runtime profiling

Easier to recover correct architectural state

Software Advantages

Possible to implement without significant or any hardware support

Hardware vs. Software

Single-entry, single-exit traces called frames

Created via branch promotion

Hotspot Detection via Branch Bias Table

Separate frame cache stores optimized

frame

Software-programmable optimization engine

Off of critical execution path

rePLay Framework

Dead Code Removal

Constant Propagation / Store Forwarding

Function Inlining

Reassociation

Fetch Scheduling

Limited Strength Reduction

Dependency Elimination

rePLay Optimizations

rePLay Example

T

B

C

D

E

A

NT

B

C

D

A A

E

ASSERT

taken

Unobtrusive Hardware

Hotspot Optimization

Larger Code Fragments

Single-Entry Multiple-Exit Traces

Simpler Optimizations

Persistent Code Storage

Off of Critical Execution Path

ROAR

Runtime Optimization ARchitecture

Branch Behavior Buffer

Hotspot Detection Counter

ROAR Architecture

Trace Generation Unit

Memory-based code cache

Linked through BTB

Fetch Decode...Execute

In-order

Retire

Branch

Predictor

BTB

Memory

Trace

Generation

Unit

Hot

Spot

Detector

I-Cache

ROAR Example

Call A Call B C Call F

D
E

ret CC-jmp CC-D CC-call F CC-E CC-D CC-ret CC-callB CC-D CC-call A

ROAR Performance

Cost

0.005% of execution time spent writing code cache

Small (about 10KB) profiling structures in hardware

Benefits

45% fewer control-flow transfers

22% more Fetched Instructions Per Cycle (FIPC)

Up to 39% more FIPC when combined with trace

cache and larger structures