Compiler Design: A Systems Perspective - Lecture Notes, Lecture notes of Compiler Design

These lecture notes provide an overview of the goals and structure of a compiler design course, with a focus on low-level and systems aspects. The course covers topics such as lexical analysis, grammars and parsing, type-checking, intermediate representations, static analysis, common optimizations, instruction selection, register allocation, code generation, and runtime organization. The course aims to develop software engineering and system building skills, and students will build several complete end-to-end compilers for successively more complex languages. The notes also discuss the importance of correctness in compiler design and the testing methods used to verify it.

Typology: Lecture notes

2022/2023

Uploaded on 05/11/2023

shahid_88c
shahid_88c 🇺🇸

4.4

(26)

261 documents

1 / 5

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Lecture Notes on
Compiler Design: Overview
15-411: Compiler Design
Frank Pfenning
Lecture 1
August 24, 2009
1 Introduction
This course is a thorough introduction to compiler design, focusing on
more low-level and systems aspects rather than high-level questions such
as polymorphic type inference or separate compilation. You will be build-
ing several complete end-to-end compilers for successively more complex
languages, culminating in a mildly optimizing compiler for a safe variant
of the C programming language to x86-64 assembly language. For the last
project you will have the opportunity to optimize more aggressively, to im-
plement a garbage collector, or retarget the compiler to an abstract machine.
In this overview we review the goals for this class and give a general
description of the structure of a compiler. Additional material can be found
in the optional textbook [App98, Chapter 1].
2 Goals
After this course you should know how a compiler works in some depth. In
particular, you should understand the structure of a compiler, and how the
source and target languages influence various choices in its design. It will
give you a new appreciation for programming language features and the
implementation challenges they pose, as well as for the actual hardware ar-
chitecture and the runtime system in which your generated code executes.
Understanding the details of typical compilation models will also make
you a more discerning programmer.
LECT URE NO TES AUGUST 24, 2009
pf3
pf4
pf5

Partial preview of the text

Download Compiler Design: A Systems Perspective - Lecture Notes and more Lecture notes Compiler Design in PDF only on Docsity!

Lecture Notes on

Compiler Design: Overview

15-411: Compiler Design

Frank Pfenning

Lecture 1

August 24, 2009

1 Introduction

This course is a thorough introduction to compiler design, focusing on more low-level and systems aspects rather than high-level questions such as polymorphic type inference or separate compilation. You will be build- ing several complete end-to-end compilers for successively more complex languages, culminating in a mildly optimizing compiler for a safe variant of the C programming language to x86-64 assembly language. For the last project you will have the opportunity to optimize more aggressively, to im- plement a garbage collector, or retarget the compiler to an abstract machine. In this overview we review the goals for this class and give a general description of the structure of a compiler. Additional material can be found in the optional textbook [App98, Chapter 1].

2 Goals

After this course you should know how a compiler works in some depth. In particular, you should understand the structure of a compiler, and how the source and target languages influence various choices in its design. It will give you a new appreciation for programming language features and the implementation challenges they pose, as well as for the actual hardware ar- chitecture and the runtime system in which your generated code executes. Understanding the details of typical compilation models will also make you a more discerning programmer.

L1.2 Compiler Design: Overview

You will also understand some specific components of compiler tech- nology, such as lexical analysis, grammars and parsing, type-checking, in- termediate representations, static analysis, common optimizations, instruc- tion selection, register allocation, code generation, and runtime organiza- tion. The knowledge gained should be broad enough that if you are con- fronted with the task of contributing to the implementation of a real com- piler in the field, you should be able to do so confidently and quickly. For many of you, this will be the first time you have to write, main- tain, and evolve a complex piece of software. You will have to program for correctness, while keeping an eye on efficiency, both for the compiler itself and for the code it generates. Because you will have to rewrite the compiler from lab to lab, and also because you will be collaborating with a partner, you will have to pay close attention to issues of modularity and in- terfaces. Developing these software engineering and system building skills are an important goal of this class, although we will rarely talk about them explicitly.

3 Compiler Requirements

As we will be implementing several compilers, it is important to under- stand which requirement compilers should satisfy. We discuss in each case to what extent it is relevant to this course.

Correctness. Correctness is absolutely paramount. A buggy compiler is next to useless in practice. Since we cannot formally prove the correctness of your compilers, we use extensive testing. This testing is end-to-end, ver- ifying the correctness of the generated code on sample inputs. We also ver- ify that your compiler rejects programs as expected when the input is not well-formed (lexically, syntactically, or with respect to the static semantics), and that the generated code raises an exception as expected if the language specification prescribes this. We go so far as to test that your generated code fails to terminate (with a time-out) when the source program should diverge. Emphasis on correctness means that we very carefully define the se- mantics of the source language. The semantics of the target language is given by the GNU assembler on the lab machines together with the seman- tics of the actualy machine. Unlike C, we try to make sure that as little as possible about the source language remains undefined. This is not just for testability, but also good language design practice since an unambigu-

L1.4 Compiler Design: Overview

x86-64 and ARM code), sometimes at very different levels of abstraction (for example, x86-64 assembly or LLVM intermediate code). In this course we will deemphasize retargetability, although if you struc- ture your compiler following the general outline presented in the next sec- tion, it should not be too difficult to retrofit another code generator. One of the options for the last lab in this course is to retarget your compiler to pro- duce code in a low-level virtual machine (LLVM). Using LLVM tools this means you will be able to produce efficient binaries for a variety of concrete machine architectures.

4 The Structure of a Compiler

Certain general common structures have arisen over decades of develop- ment of compilers. Many of these are based on experience and sound en- gineering principles rather than any formal theory, although some parts, such as parsers, are very well understood from the theoretical side. The overall structure of a typical compiler is shown in Figure 1. In this course, we will begin by giving you the front and middle ends of a simple compiler for a very small language, and you have to write the back end, that is, perform instruction selection and register allocation. Con- sequently, Lectures 2 and 3 will be concerned with instruction selection and register allocation, respectively, so that you can write your own. We then turn to the front end and follow through the phases of a com- piler in order to complete the picture, while incrementally complicating the language features you have to compile. Roughly, we will proceed as follows, subject to adjustment throughout the course:

  1. A simple expression language
  2. Loops and conditionals
  3. Functions
  4. Structs and arrays
  5. Memory safety and basic optimizations

The last lab is somewhat open-ended and allows either to implement fur- ther optimizations, a garbage collector, or a new back end which uses the low-level virtual machine (LLVM)^1.

(^1) See http://llvm.org

Compiler Design: Overview L1.

Figure 1: Structure of a typical compiler^2

References

[App98] Andrew W. Appel. Modern Compiler Implementation in ML. Cambridge University Press, Cambridge, England, 1998.

[MHJM09] Michael Matz, Jan Hubi˘cka, Andreas Jaeger, and Mark Mitchell. System V application binary interface, AMD64 ar- chitecture processor supplement. Available at http://www. x86-64.org/documentation/abi.pdf, May 2009. Draft 0.99.

(^2) Thanks to David Koes for this diagram.