Compiler Design: Intermediate Representations - Types, Abstraction Levels, and Examples, Papers of Computer Science

An overview of intermediate representations (ir) in compiler design. Irs are produced by the front end, transformed by the middle end, and turned into native code by the back end. The importance of ir design, the properties of irs, and the major categories of irs: structural, linear, and hybrid. Examples of irs include abstract syntax trees (ast), directed acyclic graphs (dag), three address code, quadruples, and triples. The level of abstraction of irs influences the profitability and feasibility of optimizations.

Typology: Papers

Pre 2010

Uploaded on 07/29/2009

koofers-user-1ew
koofers-user-1ew 🇺🇸

10 documents

1 / 5

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
1
Intermediate Representations
CS430 2
Intermediate Representations (EaC Chaper 5)
Front end - produces an intermediate representation (
IR
)
Middle end - transforms the
IR
into an equivalent
IR
that
runs more efficiently
Back end - transforms the
IR
into native code
IR
encodes the compiler’s knowledge of the program
Middle end usually consists of several passes
Front
End
Middle
End
Back
End
IR IR
Source
Code
Target
Code
CS430 3
Intermediate Representations
Decisions in
IR
design affect the speed and efficiency
of the compiler
Some important
IR
properties
Ease of generation
Ease of manipulation
Procedure size
Freedom of expression
Level of abstraction
The importance of different properties varies between
compilers
Selecting an appropriate
IR
for a compiler is critical
CS430 4
Types of Intermediate Representations
Three major categories
Structural
Graphically oriented
Heavily used in source-to-sourc e translators
Tend to be large
Linear
Pseudo-code for an abstract ma chine
Level of abstraction varies
Simple, compact data structur es
Easier to rearrange
Hybrid
Combination of graphs and linear c ode
Examples:
Trees, DAGs
Examples:
3 address code
Stack machine code
Example:
Control-flow graph
CS430 5
Level of Abstraction
The level of detail exposed in an
IR
influences the
profitability and feasibility of different optimizations.
Two different representations of an array reference:
subscript
A i j
loadI 1 => r
1
sub r
j
, r
1
=> r
2
loadI 10 => r
3
mult r
2
, r
3
=> r
4
sub r
i
, r
1
=> r
5
add r
4
, r
5
=> r
6
loadI @A => r
7
Add r
7
, r
6
=> r
8
load r
8
=> r
Aij
High level AST:
Good for memory
disambiguation Low level linear code:
Good for address calculation
CS430 6
Level of Abstraction
Structural
IR
s are usually considered high-level
Linear
IR
s are usually considered low-level
Not necessarily true:
+
*
10
j1
-
--
-
j1
-
--
-
+
@A
load
Low level AST
loadArray A,i,j
High level linear code
pf3
pf4
pf5

Partial preview of the text

Download Compiler Design: Intermediate Representations - Types, Abstraction Levels, and Examples and more Papers Computer Science in PDF only on Docsity!

Intermediate Representations

CS430 2

Intermediate Representations (EaC Chaper 5)

  • Front end - produces an intermediate representation ( IR)
  • Middle end - transforms the IR into an equivalent IR that runs more efficiently
  • Back end - transforms the IR into native code
  • IR encodes the compiler’s knowledge of the program
  • Middle end usually consists of several passes

Front End

Middle End

Back End

Source^ IR IR Code

Target Code

CS430 3

Intermediate Representations

  • Decisions in IR design affect the speed and efficiency of the compiler
  • Some important IR properties

→ Ease of generation → Ease of manipulation → Procedure size → Freedom of expression → Level of abstraction

  • The importance of different properties varies between compilers → Selecting an appropriate IR for a compiler is critical

CS430 4

Types of Intermediate Representations

Three major categories

  • Structural → Graphically oriented → Heavily used in source-to-source translators → Tend to be large
  • Linear → Pseudo-code for an abstract machine → Level of abstraction varies → Simple, compact data structures → Easier to rearrange
  • Hybrid → Combination of graphs and linear code

Examples: Trees, DAGs

Examples: 3 address code Stack machine code

Example: Control-flow graph

CS430 5

Level of Abstraction

  • The level of detail exposed in an IR influences the profitability and feasibility of different optimizations.
  • Two different representations of an array reference:

subscript

A i j

loadI 1 => r 1 sub rj, r 1 => r 2 loadI 10 => r 3 mult r 2 , r 3 => r 4 sub ri, r 1 => r 5 add r 4 , r 5 => r 6 loadI @A => r 7 Add r 7 , r 6 => r 8 High level AST:Good for memory load r 8 => rAij

disambiguation (^) Low level linear code: Good for address calculation

CS430 6

Level of Abstraction

  • Structural IRs are usually considered high-level
  • Linear IRs are usually considered low-level
  • Not necessarily true:

10

j 1

  • -^ - -^ j^1

@A

load Low level AST (^) loadArray A,i,j

High level linear code

CS430 7

Abstract Syntax Tree

An abstract syntax tree is the procedure’s parse tree with the nodes for most non-terminal nodes removed

x - 2 * y

  • Can use linearized form of the tree → Easier to manipulate than pointers x 2 y * - in postfix form - * 2 y x in prefix form
  • S-expressions are (essentially) ASTs

x

2 y

CS430 8

Directed Acyclic Graph

A directed acyclic graph (DAG) is an AST with a unique node for each value

  • Makes sharing explicit
  • Encodes redundancy

x

2 y

z (^) /

w

z ← x - 2 * y w ← x / 2

Same expression twice means that the compiler might arrange to evaluate it just once!

CS430 9

Stack Machine Code

Originally used for stack-based computers, now Java

  • Example:

x - 2 * y becomes

Advantages

  • Compact form
  • Introduced names are implicit, not explicit
  • Simple to generate and execute code

Useful where code is transmitted over slow communication links ( the net )

push x push 2 push y multiply subtract

Implicit names take up no space, where explicit ones do!

CS430 10

Three Address Code

Several different representations of three address code

  • In general, three address code has statements of the form: x ← y op z With 1 operator ( op ) and, at most, 3 names (x, y, z)

Example: z ← x - 2 * y becomes

Advantages:

  • Resembles many machines
  • Introduces a new set of names
  • Compact form

t ←←←← 2 * y z ←←←← x - t

CS430 11

Three Address Code: Quadruples

Naïve representation of three address code

  • Table of k * 4 small integers
  • Simple record structure
  • Easy to reorder
  • Explicit names

sub 5 4 2

load 4 X

mult 3 2 1

loadi 2 2

load 1 Y load r1, y loadI r2, 2 mult r3, r2, r load r4, x sub r5, r4, r

RISC assembly code Quadruples

The original FORTRAN compiler used “quads”

CS430 12

Three Address Code: Triples

  • Index used as implicit name
  • 25% less space consumed than quads
  • Much harder to reorder

sub (4) (3)

load x

mult (1) (2)

loadI 2

(1) load y

(2)

(3) (4) (5)

Implicit names take no space!

CS430 19

The Rest of the Story…

Representing the code is only part of an IR

There are other necessary components

  • Symbol table (already discussed)
  • Constant table → Representation, type → Storage class, offset
  • Storage map → Overall storage layout → Overlap information → Virtual register assignments

CS430 20

Virtual Machines

  • Can interpret IR using virtual machine
  • Examples → P-code for Pascal → postscript for display devices → Java byte code for everywhere
  • Result → easy & portable → much slower
  • Just-in-time compilation (JIT) → begin interpreting IR → find performance critical section(s) → compile section(s) to native code → ...or just compile entire program → compilation time becomes execution time

CS430 21

Java Virtual Machine (JVM)

  • The JVM consists of four parts
  • Memory

→ Stack (for function call frames) → Heap (for dynamically allocated memory) → Constant pool (shared constant data) → Code segment (instructions of class files)

  • Registers

→ Stack pointer (SP), local stack pointer (LSP), program counter (PC)

  • Condition codes → Stores result of last conditional instruction
  • Execution unit
    1. Reads current JVM instruction
    2. Change state of virtual machine
    3. Increment PC (modify if call, branch)

CS430 22

Java Byte Codes

CS430 23

Java Byte Codes

CS430 24

Java Byte Codes

CS430 25

Java Byte Codes

CS430 26

Java Byte Code Interpreter