Intermediate Representation in Compiler Design: High-Level to Low-Level Translation, Papers of Electrical and Electronics Engineering

The concept of intermediate representation (ir) in compiler design, its importance, and the process of translating high-level ir to low-level ir. Various ir representations, such as three-address code and stack machine, and provides examples of translating expressions, array accesses, structure accesses, and control structures. It also touches upon the translation of nested expressions and statements.

Typology: Papers

Pre 2010

Uploaded on 09/17/2009

koofers-user-bhj
koofers-user-bhj 🇺🇸

5

(1)

10 documents

1 / 29

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Intermediate Representation I
High-Level to Low-Level IR
Translation
EECS 483 – Lecture 17
University of Michigan
Monday, November 6, 2006
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d

Partial preview of the text

Download Intermediate Representation in Compiler Design: High-Level to Low-Level Translation and more Papers Electrical and Electronics Engineering in PDF only on Docsity!

Intermediate Representation I High-Level to Low-Level IR Translation

EECS 483 – Lecture 17 University of Michigan Monday, November 6, 2006

  • 1 -

Where We Are...

Lexical Analysis^ Syntax Analysis Semantic Analysis Intermediate Code Gen

Source code(character stream)

token stream abstract syntax

tree

abstract syntax^ tree + symbol

tables, types

Intermediate code

regular expressions grammars static semantics

  • 3 -

What Makes a Good IR?

Captures high-level language constructs

» Easy to translate from AST » Supports high-level optimizations

Captures low-level machine features

» Easy to translate to assembly » Supports machine-dependent optimizations

Narrow interface: small number of nodetypes (instructions)

» Easy to optimize » Easy to retarget

  • 4 -

Multiple IRs

™

Most compilers use 2 IRs:

» High-level IR (HIR): Language independent but closer

to the language » Low-level IR (LIR): Machine independent but closer

to the machine » A significant part of the compiler is both language and

machine independent!

AST
HIR

Pentium Java bytecode Itanium TI C5x ARM

optimize

LIR

optimize

optimize

C++C

Fortran

  • 6 -

Low-Level IR

A set of instructions which emulates anabstract machine (typically RISC)

Has low-level constructs

» Unstructured jumps, registers, memory

locations

Types of instructions

» Arithmetic/logic (a = b OP c), unary

operations, data movement (move, load,store), function call/return, branches

  • 7 -

Alternatives for LIR

3 general alternatives

» Three-address code or quadruples

y^

a = b OP c y^

Advantage: Makes compiler analysis/opti easier

» Tree representation

y^

Was popular for CISC architectures y^

Advantage: Easier to generate machine code

» Stack machine

y^

Like Java bytecode y^

Advantage: Easier to generate from AST

  • 9 -

IR Instructions

™

Assignment instructions^ »

a = b OP C (binary op)^ y

arithmetic: ADD, SUB,MUL, DIV, MOD y logic: AND, OR, XOR y comparisons: EQ, NEQ,LT, GT, LEQ, GEQ

»^

a = OP b (unary op)^ y

arithmetic MINUS,logical NEG

»^

a = b : copy instruction »^

a = [b] : load instruction »^

[a] = b : store instruction »^

a = addr b: symbolicaddress

™

Flow of control^ »

label L: label instruction »^

jump L: unconditional jump »^

cjump a L : conditional jump

™

Function call^ »

call f(a1, ..., an) »^

a = call f(a1, ..., an)

™

IR describes the instructionset of an abstract machine

  • 10 -

IR Operands

The operands in 3-address code can be:

» Program variables » Constants or literals » Temporary variables

Temporary variables = new locations

» Used to store intermediate values » Needed because 3-address code not as

expressive as high-level languages

  • 12 -

Translating High IR to Low IR

May have nested language constructs

» E.g., while nested within an if statement

Need an algorithmic way to translate

» Strategy for each high IR construct » High IR construct

Æ

sequence of low IR

instructions

Solution

» Start from the high IR (AST like) representation » Define translation for each node in high IR » Recursively translate nodes

  • 13 -

Notation

™

Use the following notation:

» [[e]] = the low IR representation of high IR construct e

™

[[e]] is a sequence of low IR instructions

™

If e is an expression (or statement expression), itrepresents a value

» Denoted as: t = [[e]] » Low IR representation of e whose result value is

stored in t

™

For variable v: t = [[v]] is the copy instruction

» t = v

  • 15 -

Translating Array Accesses

Array access: t = [[ v[e] ]]

» (type of e is array [T] and S = size of T)

t1 = addr v t2 = [[e]] t3 = t2 * S t4 = t1 + t3 t = [t4]

/* ie load */

array v

e

  • 16 -

Translating Structure Accesses

Structure access: t = [[ v.f ]]

» (v is of type T, S = offset of f in T)

t1 = addr v t2 = t1 + S t = [t2]

/* ie load */

struct v

f

  • 18 -

Class Problem

Short-circuit AND: t = [[e1 SC-AND e2]]

» e.g., && operator in C/C++

Semantics:

  1. Evaluate e1 2. if e1 is true,

then evaluate e

  1. else done
  • 19 -

Translating Statements

Statement sequence: [[s1; s2; ...; sN]]

IR instructions of a statement sequence =concatenation of IR instructions ofstatements

[[ s1 ]] [[ s2 ]] ... [[ sN ]]

seq

s

s

sN