Compiler Design, Lecture Slides - Computer Science, Slides of Computers and Information technologies

Processor Design compiler design importance of processors requirements compiler writer's platform c as code generator LLVM as code generator go as code generator memory management stacks and heaps garbage collection in C tracing reference counting sanboxing types of compiler ideal compiling AOT compiler requirements consistency thread thread stacks dynamic loading processes and thread web programs

Typology: Slides

2010/2011

Uploaded on 09/06/2011

stifler_11
stifler_11 🇬🇧

4.6

(9)

272 documents

1 / 42

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Writer's Block, slide 1
Compiler
Ian Holyer
Ian Holyer
Writer's Block
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27
pf28
pf29
pf2a

Partial preview of the text

Download Compiler Design, Lecture Slides - Computer Science and more Slides Computers and Information technologies in PDF only on Docsity!

Compiler

Ian Holyer

Ian Holyer

Writer's Block

Processor Design v. Compiler Writing

Hardware and software people need to talk to each other

Hardware and software people need to talk to each other

The history of hardware & compiler design includes a

The history of hardware & compiler design includes a

lot of cooperation of this kind

lot of cooperation of this kind

This is a part-time compiler writer's perspective, giving

This is a part-time compiler writer's perspective, giving

a wish list for the design of the hardware/software

a wish list for the design of the hardware/software

boundary for the future

boundary for the future

Let's stick with modern, high level, general purpose,

Let's stick with modern, high level, general purpose,

object oriented languages

object oriented languages

Most of them are interpreted or semi-interpreted, a sure

Most of them are interpreted or semi-interpreted, a sure

sign of a mismatch that has grown up

sign of a mismatch that has grown up

Essentials of Platforms

What is the most important feature of an operating

What is the most important feature of an operating

system, from the compiler writing point of view?

system, from the compiler writing point of view?

Answer: the ability to

Answer: the ability to ignore it completely

ignore it completely !

Strategy 1:

Strategy 1:

separate library code for every platform

separate library code for every platform

Reject!

Reject!

Strategy 2:

Strategy 2: find existing platform independent libraries

find existing platform independent libraries

Accept!

Accept!

Requirements

Compiler writing requires platform independent or at

Compiler writing requires platform independent or at

least portable approaches to all these issues

least portable approaches to all these issues

code generation for all common processors

automatic memory management

sandbox (type safety and security checking)

threads (pre-emptive)

dynamic loading of code

standard libraries (e.g. graphics/networking)

Each ought to be covered in a platform independent way

Each ought to be covered in a platform independent way

by a suitable systems programming language

by a suitable systems programming language

The C language is old/bad in every one of these areas

The C language is old/bad in every one of these areas

C as a Code Generator

C is the obvious systems programming language,

C is the obvious systems programming language,

providing reasonable platform independence for

providing reasonable platform independence for

generating code (it is a "portable assembly language")

generating code (it is a "portable assembly language")

But C has the following missing features (according to

But C has the following missing features (according to

the C-- project) if you want to generate really good code

the C-- project) if you want to generate really good code

multiple results in registers

tail call optimisation

jumps to dynamically determined addresses

efficient exception handling

We won't investigate further, because C also fails the

We won't investigate further, because C also fails the

other requirements. By the way, C-- is too immature to

other requirements. By the way, C-- is too immature to

use yet (it doesn't cover enough processors)

use yet (it doesn't cover enough processors)

LLVM as Code Generator

The best suitable code generator, below the level of

The best suitable code generator, below the level of

languages like C, is probably LLVM

languages like C, is probably LLVM

It is portable, meaning you might have to tweak some

It is portable, meaning you might have to tweak some

parameters and rebuild for each processor

parameters and rebuild for each processor

It provides a very low level ('assembly' level) language,

It provides a very low level ('assembly' level) language,

plus code generation on a variety of platforms

plus code generation on a variety of platforms

This potentially gives total control over calling

This potentially gives total control over calling

conventions, stack layout, and other such issues

conventions, stack layout, and other such issues

Unfortunately, it fails all other requirements - there is

Unfortunately, it fails all other requirements - there is

nowhere to get

nowhere to get platform code

platform code for LLVM, other than to

for LLVM, other than to

move upwards to C or similar

move upwards to C or similar

Memory Management

The important part of automatic memory management

The important part of automatic memory management

is garbage collection: the ideal aims should be:

is garbage collection: the ideal aims should be:

it should be accurate, guaranteed to find all garbage

it should move objects, to get rid of gaps

it should not cause long pauses

it should have a guaranteed small percentage overhead

Without the first two requirements, it is impossible to

Without the first two requirements, it is impossible to

give any space guarantees for real time programs

give any space guarantees for real time programs

Without the last two, it is impossible to give any time

Without the last two, it is impossible to give any time

guarantees for real time programs

guarantees for real time programs

Stacks and Heaps

The run time system of a language is traditionally

The run time system of a language is traditionally

divided into a

divided into a

stack

stack

and a

and a

heap

heap

, perhaps expanding

, perhaps expanding

towards each other from opposite ends of memory

towards each other from opposite ends of memory

The stack holds local variables during function calls,

The stack holds local variables during function calls,

including arguments, dumped register contents, and

including arguments, dumped register contents, and

return addresses

return addresses

The heap holds objects which have been allocated

The heap holds objects which have been allocated

dynamically with "malloc" or "new" or whatever

dynamically with "malloc" or "new" or whatever

The stack is a bit like the stack data structure in

The stack is a bit like the stack data structure in

libraries, but the heap is not like the heap data structure

libraries, but the heap is not like the heap data structure

The C Stack Problem

With memory management, you need to scan the stack

With memory management, you need to scan the stack

to find pointers to heap objects, and mark them

to find pointers to heap objects, and mark them

stack

...

...

...

...

The problem is that you can't know which stack words

The problem is that you can't know which stack words

are pointers

are pointers

The other problem with the C stack is that there is only

The other problem with the C stack is that there is only

one of it, making portable threads almost impossible

one of it, making portable threads almost impossible

pointer

heap object

mark

Solution 1: 'Conservative' GC

If a bit pattern (e.g. integer) in the stack looks like a

If a bit pattern (e.g. integer) in the stack looks like a

heap pointer, treat it as a heap pointer

heap pointer, treat it as a heap pointer

stack

...

...

...

...

Minor problem

Minor problem : you might mark a dead object

: you might mark a dead object

Major problem

Major problem

: you can't move the object, because

: you can't move the object, because

then you would mangle the integer

then you would mangle the integer

Because you can't move objects, you get fragmentation

Because you can't move objects, you get fragmentation

(i.e. unusable holes in the heap)

(i.e. unusable holes in the heap)

integer

heap object

mark

Solution 3: Shadow Stack, Step 1

Shadow Stack

Shadow Stack : (Henderson: "Accurate garbage

: (Henderson: "Accurate garbage

collection in uncooperative environments")

collection in uncooperative environments")

Add a structure round the local pointer variables

Add a structure round the local pointer variables

f()

{

**obj p1, p2;

int m, n;

...

}

struct frame

**{ obj p1, p2; };

f()

{

struct frame pointers;

int m, n;

...

}

Note this is

Note this is

zero cost

zero cost

, so far

, so far

Solution 3, Step 2

Add a pointer to the previous frame

Add a pointer to the previous frame &

& pointer arg slots

pointer arg slots

*f(obj p)

{

...

}

struct frame

*{ struct frame prev;

*obj p; ... };

*f(obj p)

{

struct frame pointers;

pointers.p = p;

...

}

The only cost is extra space, and copying the argument

The only cost is extra space, and copying the argument

pointers into the frame structure

pointers into the frame structure

Solution 3, Step 4

Step 4 is to understand how it works

Step 4 is to understand how it works

It keeps track of where the pointers are in the C stack

It keeps track of where the pointers are in the C stack

The global variable stops C from caching pointers in

The global variable stops C from caching pointers in

registers across function calls

registers across function calls

The C compiler says "the global variable means the

The C compiler says "the global variable means the

called function could change pointers inside the

called function could change pointers inside the

structure so, if I have a pointer in a register, I must

structure so, if I have a pointer in a register, I must

dump it in the structure before the call and restore it

dump it in the structure before the call and restore it

from the structure after the call"

from the structure after the call"

This 'solves' the memory management problem, except

This 'solves' the memory management problem, except

that you still have to implement a garbage collector

that you still have to implement a garbage collector

Solution 3, Step 5

Step 5 is to

Step 5 is to read the paper by Baker et. al. "Accurate

read the paper by Baker et. al. "Accurate

Garbage Collection in Uncooperative Environments

Garbage Collection in Uncooperative Environments

Revisited"

Revisited"

This does the same sort of thing lazily (i.e. delayed until

This does the same sort of thing lazily (i.e. delayed until

another function is actually called)

another function is actually called)

That reduces overheads and improves C compiler

That reduces overheads and improves C compiler

optimisations

optimisations