MIPS Assembly Generation for Decaf Programs, Exams of Computer Science

Examples of mips assembly code generated from decaf programs. It includes simple hello world program, use of branch to route control through if statement, a program with a class and showing dynamic dispatch. These examples demonstrate the process of converting decaf tac instructions into mips assembly code.

Typology: Exams

Pre 2010

Uploaded on 08/09/2009

koofers-user-8el
koofers-user-8el 🇺🇸

10 documents

1 / 17

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff

Partial preview of the text

Download MIPS Assembly Generation for Decaf Programs and more Exams Computer Science in PDF only on Docsity!

CS143 Handout 31 Autumn 2007 November 14th, 2007

Final Code Generation

Handout written by Maggie Johnson and revised by Julie Zelenski and Jerry Cain. The last phase of the compiler to run is the final code generator. Given an intermediate representation of the source program, it produces as output an equivalent program in the target's machine language. This step can range from trivial to complex, depending how high or low-level the intermediate representation is and what information it contains about the target machine and runtime environment. How aggressively optimized the final result is also makes a big difference. Unlike all our previous tasks, this one is very machine-specific, since each architecture has its own set of instructions and peculiarities that must be respected. The ABI or application binary interface specifies the rules for executable programs on an architecture (instructions, register usage, calling conventions, instruction scheduling, memory organization, executable format, and so on) and these details direct the code generation. The final code generator handles locations of variables and temporaries and generates all the code to maintain the runtime environment, set up and return from function calls, manage the stack and so on. MIPS R2000/R3000 assembly The target language for Decaf is MIPS R2000/R3000 assembly. We chose this because it allows us to use SPIM, an excellent simulator for the MIPS processor. The SPIM simulator reads MIPS assembly instructions from a file and executes them on a virtual MIPS machine. It was written by James Larus of the University of Wisconsin, and gracefully distributed for non-commercial use free of charge. (By the way, SPIM is "MIPS" backwards. Funny. ) There are some links to some excellent SPIM documentation and resources on our class web site that provide more detail on using this tool. You will definitely want to check those out during your last programming project. First, let’s start by looking at the MIPS R2000/R3000 machine. The processor chip contains a main CPU and a couple of co-processors, one for floating point operations and another for memory management. The word size is 32 bits, and the processor has 32 word-sized registers on-board, some of which have designed special uses, others are general-purpose. It also provides for up to 128K of high-speed cache, half each for instructions and data (although this is not simulated in SPIM). All processor instructions are encoded in a single 32-bit word format. All data operations are register to register; the only memory references are pure load/store operations.

Here are the 32 MIPS registers identified by their symbolic name and the purpose each is used for: zero holds constant 0 (used surprising often, so a dedicated register for it) at reserved for assembler v0-v1 used to return results of functions a0-a3 usually used to pass first 4 args to function call (although Decaf places all arguments on the stack) t0-t7 general purpose (caller-saved) s0-s7 general purpose (callee-saved) t8-t9 general purpose (caller-saved) k0-k1 reserved for OS gp global pointer to static data segment sp stack pointer fp frame pointer ra return address The floating point co-processor unit has another set of 32 registers specifically for floating-point. Since we will only be dealing with integer-derived types, we won’t use those. The MIPS instructions are a fairly ordinary RISC instruction set. Here are a few excerpts to show the flavor: li $t0, 12 Load constant 12 into $t lw $t0, -4($fp) Load word at address $fp-4 into $t sw $t0, 8($sp) Store word from $t0 to address $sp+ add $t0, $t1, $t2 Add operands in $t1 and $t2, store result in $t addiu $t0, $t1, 4 Add $t1 to constant 4, store result in $t or $t0, $t1, $t2 Or $t1 and $t2, store result in $t seq $t0, $t1, $t2 Set $t0 to 1 if $t1 == $t2, 0 otherwise b label Unconditional branch to label beqz $t1, label Branch to label if $t1 equals 0 jal label Jump to label (save $ra) jalr $t0 Jump to address in $t0 (save $ra) jr $ra Return from function, resume at address $ra The machine provides only one memory addressing mode: c(rx) which accesses the location at offset c (positive or negative) from the address held in register rx. Load and store instructions operate only on aligned data (a quantity is aligned if its memory address is a multiple of its size in bytes). You’ll notice that all our generated assembly programs have .align 2 in the preamble. This means we should align the next value (which is the global "main") on a word boundary. Address Space Layout On a MIPS machine, a program’s address space is composed of various segments. At the bottom is the text segment that holds the program instructions. Above the text

The Static Data Segment All of the locations in the data segment are configured at compile-time as well. When the compiler is putting together the program, each global variable is assigned a unique fixed offset in the data segment (+0, +4, +8, etc.). String constants and shared vtables are placed in the top half of the data segment (this area could be marked read-only in a protected memory system) and each marked with a label that allows us to uniquely refer to it such as when setting the vptr for a newly allocated object. The aggregate size of all globals/statics is usually what determines the space reserved for the entire segment. In our simulator, the data segment size is set using a flag when starting the simulator. It defaults to 64K, which is plenty for all the programs we run, so we don't change it. At runtime, the $gp register is reserved for the purpose of holding the base address of the global data segment. It is initialized to the proper value by the loader when setting up the program for execution. At runtime, a global variable can be accessed at its offset from the $gp. For example, in order to assign the constant value 55 to the global variable at offset 4, the MIPS sequence would look like this: li $t0, 55 Load constant 55 into $t sw $t0, 4($gp) Store value in $t0 at location +4 from $gp The Heap (or Dynamic Data Segment) The heap exists only at runtime and is not configured or laid out by the compiler. At runtime, the heap manager is responsible for increasing the size of the heap as needed (usually through some lower-level OS call such as sbrk or vm_allocate ) and parceling out space within this area on demand. Heap managers come in all sorts of varieties: doubly indirect handles or singly indirect pointers, fast but poor-fit allocators, slower but better-fit versions, those with automatic storage reclamation (garbage collection) and those without, and so on. Some features may be mandated by the language specification, other details may be left up to choice of the implementer. The heap manager implementation is typically written in a high-level language, although some time-critical portions may be hand-coded in assembly. The Stack Segment The stack is another entity that only exists at runtime. The OS loader is responsible for allocating space for the stack segment (often configured to some large constant size, say 8MB on Solaris) and calls a bit of glue code that sets up the outermost frame which then makes a call to the main routine where execution begins. The calling convention used in SPIM is modeled on the Unix gcc convention rather than the full complexity of MIPS (which is more sophisticated but faster). A stack frame consists of the memory between the $fp pointer which points to the base of the current stack frame, and the $sp pointer, which points to the next free word on the stack. As is

typical of Unix systems, the stack grows down, so $fp is above $sp. The parameters are pushed right to left by the caller and the locals are pushed left to right by the callee. The return value is returned in register $v0. Here is a diagram of the layout of the parameters and locals within the MIPS runtime stack: The calling conventions are the part of the ABI that dictate who does what using what memory/registers in order to ensure a smooth transition from caller to callee and back. The caller sets up for the call via these steps:

  1. Make space on stack for and save any caller-saved registers
  2. Pass arguments by pushing them on the stack, one by one, right to left
  3. Execute a jump to the function (saves the next instruction in $ra ) The callee takes over and does the following:
  4. Make space on stack for and save values of $fp and $ra
  5. Configure frame pointer by setting $fp to base of frame
  6. Allocate space for stack frame (total space required for all local and temporary variables)
  7. Execute function body, code can access params at positive offset from $fp , locals/temps at negative offsets from $fp When ready to exit, the callee does the following:
  8. Assign the return value (if any) to $v
  9. Pop stack frame off the stack (locals/temps/saved regs)
  10. Restore the value of $fp and $ra
  11. Jump to the address saved in $ra

The goal is to hold as many values in registers as possible. Values in registers are only written out to memory when we run out of registers or when we cannot guarantee that the value slaved in a register is reliable. This can happen because we are going to call a function, conditionally branch, enter via a label, etc. and we can’t be certain our values will be preserved or reliable afterwards. Spilling a register means writing the current value of a register out to memory, so that we can reuse that register to hold a different value. When all of our registers are in use and we need to bring in a new value, we will have to spill one of the current registers to make space. Choosing the "best" register to spill can be a complicated enterprise. Here’s a fairly naïve algorithm for register usage. On the left we have the TAC code that refers to the variables using symbolic names, on the right, the code generator must translate that into an operation using machine registers: a = b + c to add Rs, Ri, Rj

  1. Select a register for R to hold b. Using the information in the descriptors, we choose in order of preference: a register already holding b, an empty register, an unmodified register, a modified register (we have to spill it). Load the current b into the chosen register and update the descriptors.
  2. We load Rj with c in a similar manner, but we must not reuse Ri unless b and c are the same.
  3. Choose a register for a in the same way but not reusing Ri or Rj, and spilling if we need to.
  4. Generate instruction: op Rs, Ri, Rj Decaf: Final Code Generation You can think of final code generation as just yet another translation task, taking input in one format, and producing equivalent output in another. As we recognize a Decaf language construct, we construct the appropriate sequence of Tac instruction objects for it, and eventually hand the list over to the Mips class to translate to its MIPS equivalent. Our Mips class encapsulates the machine details such as instruction formats and the available registers and how they are used. It also has direct knowledge of how to manage the stack and frame pointers and return address. It tackles the job of assigning variables to registers and spilling as necessary (albeit rather inefficiently). It assumes that the offset for each global, parameter, local, and temp variable has been properly configured (this is one of your jobs) and uses that information to access the correct memory location for load and store operations when moving the value of a variable to and from a register.

Here is the method from our mips.cc that assigns a given variable to a register. It embodies the algorithm given above: we first try to find the variable already assigned in our descriptors, next look for an empty one, and only if necessary, choose a register to spill and replace: Mips::Register Mips::GetRegister(Location *var, Reason reason) { Register reg; if (!FindRegisterWithContents(var, reg)) { // if var not in reg // look for an empty one if (!FindRegisterWithContents(NULL, reg)) { reg = SelectRegisterToSpill(); // none empty, must spill SpillRegister(reg); } regs[reg].decl = var; // update descriptor if (reason == ForRead) { // load cur value const char *base = var->GetSegment() == fpRelative? regs[fp].name : regs[gp].name; Emit("lw %s, %d(%s)", regs[reg].name, var->GetOffset(), base); regs[reg].isDirty = false; } } if (reason == ForWrite) regs[reg].isDirty = true; return reg; } What does it take to translate each Tac instruction object? Most are quite straightforward. Given that the TAC is fairly low-level, there is not a lot of translation needed for, say, a TAC add instruction. We get the operands into registers and emit a MIPS add on those registers. For some of the more complex instructions, there is a bit more going on behind the scenes, but nothing too magical. As a simple example, here's the method that converts a TAC LoadConstant instruction into MIPS: void Mips::EmitLoadConstant(Location *dst, int val) { Register reg = GetRegisterForWrite(dst); Emit("li %s, %d", regs[reg].name, val); } As a slightly more complicated example, converting a TAC BeginFunc instruction produces a sequence of MIPS instructions to set up the stack and frame pointer, save $fp and $ra , and make space for the new stack frame: void Mips::EmitBeginFunction(int frameSz) { Emit("subu $sp, $sp, 8\t# decrement sp to make space to save ra, fp"); Emit("sw $fp, 8($sp)\t# save fp"); Emit("sw $ra, 4($sp)\t# save ra");

BeginFunc 4

subu $sp, $sp, 8 # decrement sp to make space to save ra, fp sw $fp, 8($sp) # save fp sw $ra, 4($sp) # save ra addiu $fp, $sp, 8 # set up new fp subu $sp, $sp, 4 # decrement sp to make space for locals/temps

_tmp0 = "hello world"

.data # create string constant marked with label _string1: .asciiz "hello world" .text la $t0, _string1 # load label

PushParam _tmp

subu $sp, $sp, 4 # decrement sp to make space for param sw $t0, 4($sp) # copy param value to stack

LCall _PrintString

(save modified registers before flow of control change)

sw $t0, -8($fp) # spill _tmp0 from $t0 to $fp- jal _PrintString # jump to function

PopParams 4

add $sp, $sp, 4 # pop params off stack

EndFunc

(below handles reaching end of fn body with no explicit return)

move $sp, $fp # pop callee frame off stack lw $ra, -4($fp) # restore saved ra lw $fp, 0($fp) # restore saved fp jr $ra # return from function Example 2 : A program with a global variable and a function call: int num; int Binky(int a, int b) { int c; c = a + b; return c; } void main() { num = Binky(4, 10); Print(num); } TAC instructions: _Binky: BeginFunc 8 ; _tmp0 = a + b ; c = _tmp0 ; Return c ; EndFunc ; main: BeginFunc 12 ; _tmp1 = 4 ; _tmp2 = 10 ; PushParam _tmp2 ;

PushParam _tmp1 ; _tmp3 = LCall _Binky ; PopParams 8 ; num = _tmp3 ; PushParam num ; LCall _PrintInt ; PopParams 4 ; EndFunc ; MIPS assembly:

standard Decaf preamble

.text .align 2 .globl main _Binky:

BeginFunc 8

subu $sp, $sp, 8 # decrement sp to make space to save ra, fp sw $fp, 8($sp) # save fp sw $ra, 4($sp) # save ra addiu $fp, $sp, 8 # set up new fp subu $sp, $sp, 8 # decrement sp to make space for locals/temps

_tmp0 = a + b

lw $t0, 4($fp) # load a from $fp+4 into $t lw $t1, 8($fp) # load b from $fp+8 into $t add $t2, $t0, $t

c = _tmp

move $t3, $t2 # copy value

Return c

move $v0, $t3 # assign return value into $v move $sp, $fp # pop callee frame off stack lw $ra, -4($fp) # restore saved ra lw $fp, 0($fp) # restore saved fp jr $ra # return from function

EndFunc

(below handles reaching end of fn body with no explicit return)

move $sp, $fp # pop callee frame off stack lw $ra, -4($fp) # restore saved ra lw $fp, 0($fp) # restore saved fp jr $ra # return from function main:

BeginFunc 12

subu $sp, $sp, 8 # decrement sp to make space to save ra, fp sw $fp, 8($sp) # save fp sw $ra, 4($sp) # save ra addiu $fp, $sp, 8 # set up new fp subu $sp, $sp, 12 # decrement sp to make space for locals/temps

_tmp1 = 4

li $t0, 4 # load constant value 4 into $t

_tmp2 = 10

li $t1, 10 # load constant value 10 into $t

PushParam _tmp

subu $sp, $sp, 4 # decrement sp to make space for param sw $t1, 4($sp) # copy param value to stack

PushParam _tmp

subu $sp, $sp, 4 # decrement sp to make space for param sw $t0, 4($sp) # copy param value to stack

_tmp3 = LCall _Binky

sw $t0, -12($fp) # spill _tmp0 from $t0 to $fp- sw $t1, -16($fp) # spill _tmp1 from $t1 to $fp- sw $t2, -20($fp) # spill _tmp2 from $t2 to $fp- beqz $t2, _L0 # branch if _tmp2 is zero

_tmp3 = 1

li $t0, 1 # load constant value 1 into $t

a = _tmp

move $t1, $t0 # copy value

(save modified registers before flow of control change)

sw $t0, -24($fp) # spill _tmp3 from $t0 to $fp- sw $t1, -8($fp) # spill a from $t1 to $fp- _L0: # EndFunc

(below handles reaching end of fn body with no explicit return)

move $sp, $fp # pop callee frame off stack lw $ra, -4($fp) # restore saved ra lw $fp, 0($fp) # restore saved fp Example 4: A program with a class and showing dynamic dispatch: void main() { Cow betsy; betsy = New(Cow); betsy.InitCow(-5, 22); betsy.Moo(); } class Cow { int height; int weight; void InitCow(int h, int w) { height = h; weight = w; } void Moo() { Print("Moo!\n"); } } TAC instructions: main: BeginFunc 48 ; _tmp0 = 12 ; PushParam _tmp0 ; _tmp1 = LCall _Alloc ; PopParams 4 ; _tmp2 = Cow ; *(_tmp1) = _tmp2 ; betsy = _tmp1 ; _tmp3 = 5 ; _tmp4 = 0 ; _tmp5 = _tmp4 - _tmp3 ;

_tmp6 = 22 ; _tmp7 = *(betsy) ; _tmp8 = *(_tmp7) ; PushParam _tmp6 ; PushParam _tmp5 ; PushParam betsy ; ACall _tmp8 ; PopParams 12 ; _tmp9 = *(betsy) ; _tmp10 = *(_tmp9 + 4) ; PushParam betsy ; ACall _tmp10 ; PopParams 4 ; EndFunc ; _Cow.InitCow: BeginFunc 0 ; *(this + 4) = h ; *(this + 8) = w ; EndFunc ; _Cow.Moo: BeginFunc 4 ; _tmp11 = "Moo!\n" ; PushParam _tmp11 ; LCall _PrintString ; PopParams 4 ; EndFunc ; VTable Cow = _Cow.InitCow, _Cow.Moo, ; Its MIPS assembly:

standard Decaf preamble

.text .align 2 .globl main main:

BeginFunc 48

subu $sp, $sp, 8 # decrement sp to make space to save ra, fp sw $fp, 8($sp) # save fp sw $ra, 4($sp) # save ra addiu $fp, $sp, 8 # set up new fp subu $sp, $sp, 48 # decrement sp to make space for locals/temps

_tmp0 = 12

li $t0, 12 # load constant value 12 into $t

PushParam _tmp

subu $sp, $sp, 4 # decrement sp to make space for param sw $t0, 4($sp) # copy param value to stack

_tmp1 = LCall _Alloc

(save modified registers before flow of control change)

sw $t0, -12($fp) # spill _tmp0 from $t0 to $fp- jal _Alloc # jump to function move $t0, $v0 # copy function return value from $v

PopParams 4

add $sp, $sp, 4 # pop params off stack

_tmp2 = Cow

lw $ra, -4($fp) # restore saved ra lw $fp, 0($fp) # restore saved fp jr $ra # return from function _Cow.InitCow:

BeginFunc 0

subu $sp, $sp, 8 # decrement sp to make space to save ra, fp sw $fp, 8($sp) # save fp sw $ra, 4($sp) # save ra addiu $fp, $sp, 8 # set up new fp

*(this + 4) = h

lw $t0, 8($fp) # load h from $fp+8 into $t lw $t1, 4($fp) # load this from $fp+4 into $t sw $t0, 4($t1) # store with offset

*(this + 8) = w

lw $t2, 12($fp) # load w from $fp+12 into $t sw $t2, 8($t1) # store with offset

EndFunc

(below handles reaching end of fn body with no explicit return)

move $sp, $fp # pop callee frame off stack lw $ra, -4($fp) # restore saved ra lw $fp, 0($fp) # restore saved fp jr $ra # return from function _Cow.Moo:

BeginFunc 4

subu $sp, $sp, 8 # decrement sp to make space to save ra, fp sw $fp, 8($sp) # save fp sw $ra, 4($sp) # save ra addiu $fp, $sp, 8 # set up new fp subu $sp, $sp, 4 # decrement sp to make space for locals/temps

_tmp11 = "Moo!\n"

.data # create string constant marked with label _string1: .asciiz "Moo!\n" .text la $t0, _string1 # load label

PushParam _tmp

subu $sp, $sp, 4 # decrement sp to make space for param sw $t0, 4($sp) # copy param value to stack

LCall _PrintString

(save modified registers before flow of control change)

sw $t0, -8($fp) # spill _tmp11 from $t0 to $fp- jal _PrintString # jump to function

PopParams 4

add $sp, $sp, 4 # pop params off stack

EndFunc

(below handles reaching end of fn body with no explicit return)

move $sp, $fp # pop callee frame off stack lw $ra, -4($fp) # restore saved ra lw $fp, 0($fp) # restore saved fp jr $ra # return from function

VTable for class Cow

.data .align 2 Cow: # label for class Cow vtable .word _Cow.InitCow .word _Cow.Moo .text

Bibliography A. Aho, R. Sethi, J.D. Ullman, Compilers: Principles, Techniques, and Tools, Reading, MA: Addison-Wesley, 1986. J.P. Bennett, Introduction to Compiling Techniques. Berkshire, England: McGraw-Hill, 1990. J. Larus, Assemblers, Linkers, and the SPIM Simulator. User Manual, 1998. D. Patterson, J. Hennessy, Computer Organization & Design: The Hardware/Software Interface. Morgan-Kaufmann, 1994. S. Muchnick, Advanced Compiler Design and Implementation. San Francisco, CA: Morgan Kaufmann, 1997. A. Pyster, Compiler Design and Construction. New York, NY: Van Nostrand Reinhold, 1988.