


































Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Processor Design compiler design importance of processors requirements compiler writer's platform c as code generator LLVM as code generator go as code generator memory management stacks and heaps garbage collection in C tracing reference counting sanboxing types of compiler ideal compiling AOT compiler requirements consistency thread thread stacks dynamic loading processes and thread web programs
Typology: Slides
1 / 42
This page cannot be seen from the preview
Don't miss anything!



































Ian Holyer
Ian Holyer
●
Hardware and software people need to talk to each other
Hardware and software people need to talk to each other
●
The history of hardware & compiler design includes a
The history of hardware & compiler design includes a
lot of cooperation of this kind
lot of cooperation of this kind
●
This is a part-time compiler writer's perspective, giving
This is a part-time compiler writer's perspective, giving
a wish list for the design of the hardware/software
a wish list for the design of the hardware/software
boundary for the future
boundary for the future
●
Let's stick with modern, high level, general purpose,
Let's stick with modern, high level, general purpose,
object oriented languages
object oriented languages
●
Most of them are interpreted or semi-interpreted, a sure
Most of them are interpreted or semi-interpreted, a sure
sign of a mismatch that has grown up
sign of a mismatch that has grown up
●
What is the most important feature of an operating
What is the most important feature of an operating
system, from the compiler writing point of view?
system, from the compiler writing point of view?
●
Answer: the ability to
Answer: the ability to ignore it completely
ignore it completely !
●
Strategy 1:
Strategy 1:
separate library code for every platform
separate library code for every platform
●
Reject!
Reject!
●
Strategy 2:
Strategy 2: find existing platform independent libraries
find existing platform independent libraries
●
Accept!
Accept!
●
Compiler writing requires platform independent or at
Compiler writing requires platform independent or at
least portable approaches to all these issues
least portable approaches to all these issues
●
code generation for all common processors
●
automatic memory management
●
sandbox (type safety and security checking)
●
threads (pre-emptive)
●
dynamic loading of code
●
standard libraries (e.g. graphics/networking)
●
Each ought to be covered in a platform independent way
Each ought to be covered in a platform independent way
by a suitable systems programming language
by a suitable systems programming language
●
The C language is old/bad in every one of these areas
The C language is old/bad in every one of these areas
●
C is the obvious systems programming language,
C is the obvious systems programming language,
providing reasonable platform independence for
providing reasonable platform independence for
generating code (it is a "portable assembly language")
generating code (it is a "portable assembly language")
●
But C has the following missing features (according to
But C has the following missing features (according to
the C-- project) if you want to generate really good code
the C-- project) if you want to generate really good code
●
multiple results in registers
●
tail call optimisation
●
jumps to dynamically determined addresses
●
efficient exception handling
●
We won't investigate further, because C also fails the
We won't investigate further, because C also fails the
other requirements. By the way, C-- is too immature to
other requirements. By the way, C-- is too immature to
use yet (it doesn't cover enough processors)
use yet (it doesn't cover enough processors)
●
The best suitable code generator, below the level of
The best suitable code generator, below the level of
languages like C, is probably LLVM
languages like C, is probably LLVM
●
It is portable, meaning you might have to tweak some
It is portable, meaning you might have to tweak some
parameters and rebuild for each processor
parameters and rebuild for each processor
●
It provides a very low level ('assembly' level) language,
It provides a very low level ('assembly' level) language,
plus code generation on a variety of platforms
plus code generation on a variety of platforms
●
This potentially gives total control over calling
This potentially gives total control over calling
conventions, stack layout, and other such issues
conventions, stack layout, and other such issues
●
Unfortunately, it fails all other requirements - there is
Unfortunately, it fails all other requirements - there is
nowhere to get
nowhere to get platform code
platform code for LLVM, other than to
for LLVM, other than to
move upwards to C or similar
move upwards to C or similar
●
The important part of automatic memory management
The important part of automatic memory management
is garbage collection: the ideal aims should be:
is garbage collection: the ideal aims should be:
●
it should be accurate, guaranteed to find all garbage
●
it should move objects, to get rid of gaps
●
it should not cause long pauses
●
it should have a guaranteed small percentage overhead
●
Without the first two requirements, it is impossible to
Without the first two requirements, it is impossible to
give any space guarantees for real time programs
give any space guarantees for real time programs
●
Without the last two, it is impossible to give any time
Without the last two, it is impossible to give any time
guarantees for real time programs
guarantees for real time programs
●
The run time system of a language is traditionally
The run time system of a language is traditionally
divided into a
divided into a
stack
stack
and a
and a
heap
heap
, perhaps expanding
, perhaps expanding
towards each other from opposite ends of memory
towards each other from opposite ends of memory
●
The stack holds local variables during function calls,
The stack holds local variables during function calls,
including arguments, dumped register contents, and
including arguments, dumped register contents, and
return addresses
return addresses
●
The heap holds objects which have been allocated
The heap holds objects which have been allocated
dynamically with "malloc" or "new" or whatever
dynamically with "malloc" or "new" or whatever
●
The stack is a bit like the stack data structure in
The stack is a bit like the stack data structure in
libraries, but the heap is not like the heap data structure
libraries, but the heap is not like the heap data structure
●
With memory management, you need to scan the stack
With memory management, you need to scan the stack
to find pointers to heap objects, and mark them
to find pointers to heap objects, and mark them
stack
...
...
...
...
●
The problem is that you can't know which stack words
The problem is that you can't know which stack words
are pointers
are pointers
●
The other problem with the C stack is that there is only
The other problem with the C stack is that there is only
one of it, making portable threads almost impossible
one of it, making portable threads almost impossible
pointer
heap object
mark
●
If a bit pattern (e.g. integer) in the stack looks like a
If a bit pattern (e.g. integer) in the stack looks like a
heap pointer, treat it as a heap pointer
heap pointer, treat it as a heap pointer
stack
...
...
...
...
●
Minor problem
Minor problem : you might mark a dead object
: you might mark a dead object
●
Major problem
Major problem
: you can't move the object, because
: you can't move the object, because
then you would mangle the integer
then you would mangle the integer
●
Because you can't move objects, you get fragmentation
Because you can't move objects, you get fragmentation
(i.e. unusable holes in the heap)
(i.e. unusable holes in the heap)
integer
heap object
mark
●
Shadow Stack
Shadow Stack : (Henderson: "Accurate garbage
: (Henderson: "Accurate garbage
collection in uncooperative environments")
collection in uncooperative environments")
●
Add a structure round the local pointer variables
Add a structure round the local pointer variables
f()
{
**obj p1, p2;
int m, n;
...
}
struct frame
**{ obj p1, p2; };
f()
{
struct frame pointers;
int m, n;
...
}
●
Note this is
Note this is
zero cost
zero cost
, so far
, so far
●
Add a pointer to the previous frame
Add a pointer to the previous frame &
& pointer arg slots
pointer arg slots
*f(obj p)
{
...
}
struct frame
*{ struct frame prev;
*obj p; ... };
*f(obj p)
{
struct frame pointers;
pointers.p = p;
...
}
●
The only cost is extra space, and copying the argument
The only cost is extra space, and copying the argument
pointers into the frame structure
pointers into the frame structure
●
Step 4 is to understand how it works
Step 4 is to understand how it works
●
It keeps track of where the pointers are in the C stack
It keeps track of where the pointers are in the C stack
●
The global variable stops C from caching pointers in
The global variable stops C from caching pointers in
registers across function calls
registers across function calls
●
The C compiler says "the global variable means the
The C compiler says "the global variable means the
called function could change pointers inside the
called function could change pointers inside the
structure so, if I have a pointer in a register, I must
structure so, if I have a pointer in a register, I must
dump it in the structure before the call and restore it
dump it in the structure before the call and restore it
from the structure after the call"
from the structure after the call"
●
This 'solves' the memory management problem, except
This 'solves' the memory management problem, except
that you still have to implement a garbage collector
that you still have to implement a garbage collector
●
Step 5 is to
Step 5 is to read the paper by Baker et. al. "Accurate
read the paper by Baker et. al. "Accurate
Garbage Collection in Uncooperative Environments
Garbage Collection in Uncooperative Environments
Revisited"
Revisited"
●
This does the same sort of thing lazily (i.e. delayed until
This does the same sort of thing lazily (i.e. delayed until
another function is actually called)
another function is actually called)
●
That reduces overheads and improves C compiler
That reduces overheads and improves C compiler
optimisations
optimisations