




























































































Prepara tus exámenes y mejora tus resultados gracias a la gran cantidad de recursos disponibles en Docsity
Gana puntos ayudando a otros estudiantes o consíguelos activando un Plan Premium
Prepara tus exámenes
Prepara tus exámenes y mejora tus resultados gracias a la gran cantidad de recursos disponibles en Docsity
Prepara tus exámenes con los documentos que comparten otros estudiantes como tú en Docsity
Encuentra los documentos específicos para los exámenes de tu universidad
Estudia con lecciones y exámenes resueltos basados en los programas académicos de las mejores universidades
Responde a preguntas de exámenes reales y pon a prueba tu preparación
Consigue puntos base para descargar
Gana puntos ayudando a otros estudiantes o consíguelos activando un Plan Premium
Comunidad
Pide ayuda a la comunidad y resuelve tus dudas de estudio
Ebooks gratuitos
Descarga nuestras guías gratuitas sobre técnicas de estudio, métodos para controlar la ansiedad y consejos para la tesis preparadas por los tutores de Docsity
Uso de Compiladores Ingeniería
Tipo: Apuntes
1 / 353
Esta página no es visible en la vista previa
¡No te pierdas las partes importantes!





























































































Compiler construction brings together techniques from disparate parts of Com- puter Science. The compiler deals with many big-picture issues. At its simplest, a compiler is just a computer program that takes as input one potentially exe- cutable program and produces as output another, related, potentially executable program. As part of this translation, the compiler must perform syntax analysis to determine if the input program is valid. To map that input program onto the finite resources of a target computer, the compiler must manipulate several distinct name spaces, allocate several different kinds of resources, and synchro- nize the behavior of different run-time components. For the output program to have reasonable performance, it must manage hardware latencies in functional units, predict the flow of execution and the demand for memory, and reason about the independence and dependence of different machine-level operations in the program. Open up a compiler and you are likely to find greedy heuristic searches that explore large solution spaces, finite automata that recognize words in the input, fixed-point algorithms that help reason about program behavior, simple theorem provers and algebraic simplifiers that try to predict the values of expressions, pattern-matchers for both strings and trees that match abstract computations to machine-level operations, solvers for diophantine equations and Pressburger arithmetic used to analyze array subscripts, and techniques such as hash tables, graph algorithms, and sparse set implementations used in myriad applications, The lore of compiler construction includes both amazing success stories about the application of theory to practice and humbling stories about the limits of what we can do. On the success side, modern scanners are built by applying the theory of regular languages to automatic construction of recognizers. Lr parsers use the same techniques to perform the handle-recognition that drives a shift-reduce parser. Data-flow analysis (and its cousins) apply lattice theory to the analysis of programs in ways that are both useful and clever. Some of the problems that a compiler faces are truly hard; many clever approximations and heuristics have been developed to attack these problems. On the other side, we have discovered that some of the problems that com- pilers must solve are quite hard. For example, the back end of a compiler for a modern superscalar machine must approximate the solution to two or more
iii
iv
interacting np-complete problems (instruction scheduling, register allocation, and, perhaps, instruction and data placement). These np-complete problems, however, look easy next to problems such as algebraic reassociation of expres- sions. This problem admits a huge number of solutions; to make matters worse, the desired solution is somehow a function of the other transformations being applied in the compiler. While the compiler attempts to solve these problems (or approximate their solution), we constrain it to run in a reasonable amount of time and to consume a modest amount of space. Thus, a good compiler for a modern superscalar machine is an artful blend of theory, of practical knowledge, of engineering, and of experience.
This text attempts to convey both the art and the science of compiler con- struction. We have tried to cover a broad enough selection of material to show the reader that real tradeoffs exist, and that the impact of those choices can be both subtle and far-reaching. We have limited the material to a manage- able amount by omitting techniques that have become less interesting due to changes in the marketplace, in the technology of languages and compilers, or in the availability of tools. We have replaced this material with a selection of subjects that have direct and obvious importance today, such as instruction scheduling, global register allocation, implementation object-oriented languages, and some introductory material on analysis and transformation of programs.
The book is intended for use in a first course on the design and implementation of compilers. Our goal is to lay out the set of problems that face compiler writers and to explore some of the solutions that can be used to solve these problems. The book is not encyclopedic; a reader searching for a treatise on Earley’s algorithm or left-corner parsing may need to look elsewhere. Instead, the book presents a pragmatic selection of practical techniques that you might use to build a modern compiler. Compiler construction is an exercise in engineering design. The compiler writer must choose a path through a decision space that is filled with diverse alternatives, each with distinct costs, advantages, and complexity. Each decision has an impact on the resulting compiler. The quality of the end product depends on informed decisions at each step of way. Thus, there is no right answer for these problems. Even within “well under- stood” and “solved” problems, nuances in design and implementation have an impact on both the behavior of the compiler and the quality of the code that it produces. Many considerations play into each decision. As an example, the choice of an intermediate representation (ir) for the compiler has a profound impact on the rest of the compiler, from space and time requirements through the ease with which different algorithms can be applied. The decision, however, is given short shrift in most books (and papers). Chapter 6 examines the space of irs and some of the issues that should be considered in selecting an ir. We raise the issue again at many points in the book—both directly in the text and indirectly in the questions at the end of each chapter.
vi
Relate Compiler construction is a complex, multifaceted discipline. The so- lutions chosen for one problem affect other parts of the compiler because they shape the input to subsequent phases and the information available in those phases. Current textbooks fail to clearly convey these relationships. To make students aware of these relationships, we expose some of them di- rectly and explicitly in the context of practical problems that arise in commonly- used languages. We present several alternative solutions to most of the problems that we address, and we discuss the differences between the solutions and their overall impact on compilation. We try to select examples that are small enough to be grasped easily, but large enough to expose the student to the full com- plexity of each problem. We reuse some of these examples in several chapters to provide continuity and to highlight the fact that several different approaches can be used to solve them. Finally, to tie the package together, we provide a couple of questions at the end of each chapter. Rather than providing homework-style questions that have algorithmic answers, we ask exam-style questions that try to engage the stu- dent in a process of comparing possible approaches, understanding the tradeoffs between them, and using material from several chapters to address the issue at hand. The questions are intended as a tool to make the reader think, rather than acting as a set of possible exercises for a weekly homework assignment. (We believe that, in practice, few compiler construction courses assign weekly home- work. Instead, these courses tend to assign laboratory exercises that provide the student with hands-on experience in language implementation.)
Engineer Legendary compilers, such as the Bliss-11 compiler or the Fortran-H compiler, have done several things well, rather than doing everything in mod- eration. We want to show the design issues that arise at each stage and how different solutions affect the resulting compiler and the code that it generates. For example, a generation of students studied compilation from books that assume stack allocation of activation records. Several popular languages include features that make stack allocation less attractive; a modern textbook should present the tradeoffs between keeping activation records on the stack, keeping them in the heap, and statically allocating them (when possible). When the most widely used compiler-construction books were written, most computers supported byte-oriented load and store operations. Several of them had hardware support for moving strings of characters from one memory location to another (the move character long instruction – mvcl). This simplified the treatment of character strings, allowing them to be treated as vectors of bytes (sometimes, with an implicit loop around the operation). Thus, compiler books scarcely mentioned support for strings. Some risc machines have weakened support for sub-word quantities; the compiler must worry about alignment; it may need to mask a character into a word using boolean operations. The advent of register-to-register load-store machines eliminated instructions like mvcl; today’s risc machine expects the compiler to optimize such operations and work together with the operating system to perform them efficiently.
vii
Trademark Notices
In the text, we have used the registered trademarks of several companies.
IBM is a trademark of International Business Machines, Incorporated.
Intel and IA-64 are trademarks of Intel Corporation.
370 is a trademark of International Business Machines, Incorporated.
MC68000 is a trademark of Motorola, Incorporated.
PostScript is a registered trademark of Adobe Systems.
PowerPC is a trademark of (?Motorola or IBM?)
PDP-11 is a registered trademark of Digital Equipment Corporation, now a part of Compaq Computer.
Unix is a registered trademark of someone or other (maybe Novell).
VAX is a registered trademark of Digital Equipment Corporation, now a part of Compaq Computer.
Java may or may not be a registered trademark of SUN Microsystems, Incor- porated.
We particularly thank the following people who provided us with direct and useful feedback on the form, content, and exposition of this book: Preston Briggs, Timothy Harvey, L. Taylor Simpson, Dan Wallach.
ix
x
The role of computers in daily life is growing each year. Modern microproces- sors are found in cars, microwave ovens, dishwashers, mobile telephones, GPSS navigation systems, video games and personal computers. Each of these devices must be programmed to perform its job. Those programs are written in some “programming” language – a formal language with mathematical properties and well-defined meanings – rather than a natural language with evolved properties and many ambiguities. Programming languages are designed for expressiveness, conciseness, and clarity. A program written in a programming language must be translated before it can execute directly on a computer; this translation is accomplished by a software system called a compiler. This book describes the mechanisms used to perform this translation and the issues that arise in the design and construction of such a translator. A compiler is just a computer program that takes as input an executable program and produces as output an equivalent executable program.
compiler
source program
target program
In a traditional compiler, the input language is a programming language and the output language is either assembly code or machine code for some computer system. However, many other systems qualify as compilers. For example, a typesetting program that produces PostScript can be considered a compiler. It takes as input a specification for how the document should look on the printed
page and it produces as output a PostScript file. PostScript is simply a lan- guage for describing images. Since the typesetting program takes an executable specification and produces another executable specification, it is a compiler. The code that turns PostScript into pixels is typically an interpreter, not a compiler. An interpreter takes as input an executable specification and produces as output the results of executing the specification.
interpreter source program results
Interpreters and compilers have much in common. From an implementation perspective, interpreters and compilers perform many of the same tasks. For example, both must analyze the source code for errors in either syntax or mean- ing. However, interpreting the code to produce a result is quite different from emitting a translated program that can be executed to produce the results. This book focuses on the problems that arise in building compilers. However, an implementor of interpreters may find much of the material relevant. The remainder of this chapter presents a high-level overview of the transla- tion process. It addresses both the problems of translation—what issues must be decided along the way—and the structure of a modern compiler—where in the process each decision should occur. Section 1.2 lays out two fundamental principles that every compiler must follow, as well as several other properties that might be desirable in a compiler. Section 1.3 examines the tasks that are involved in translating code from a programming language to code for a target machine. Section 1.4 describes how compilers are typically organized to carry out the tasks of translation.
Compilers are engineered objects—software systems built with distinct goals in mind. In building a compiler, the compiler writer makes myriad design decisions. Each decision has an impact on the resulting compiler. While many issues in compiler design are amenable to several different solutions, there are two principles that should not be compromised. The first principle that a well- designed compiler must observe is inviolable.
The compiler must preserve the meaning of the program being compiled
The code produced by the compiler must faithfully implement the “mean- ing” of the source-code program being compiled. If the compiler can take liberties with meaning, then it can always generate the same code, inde- pendent of input. For example, the compiler could simply emit a nop or a return instruction.
earlier. If the compile time can be kept small and the benefits are large, this strategy can produce noticeable improvements.
In each of these settings, the constraints on time and space differ, as do the expectations with regard to code quality. The priorities and constraints of a specific project may dictate specific so- lutions to many design decisions or radically narrow the set of feasible choices. Some of the issues that may arise are:
Before reading the rest of this book, you should write down a prioritized list of the qualities that you want in a compiler. You might apply the ancient standard from software engineering—evaluate features as if you were paying for them with your own money! Examining your list will tell you a great deal about how you would make the various tradeoffs in building your own compiler.
To gain a better understanding of the tasks that arise in compilation, consider what must be done to generate executable code for the following expression:
w ← w × 2 × x × y × z.
Let’s follow the expression through compilation to discover what facts must be discovered and what questions must be answered.
1.3.1 Understanding the Input Program
The first step in compiling our expression is to determine whether or not
w ← w × 2 × x × y × z.
is a legal sentence in the programming language. While it might be amusing to feed random words to an English to Italian translation system, the results are unlikely to have meaning. A compiler must determine whether or not its input constitutes a well-constructed sentence in the source language. If the input is well-formed, the compiler can continue with translation, optimization, and code generation. If it is not, the compiler should report back to the user with a clear error message that isolates, to the extent possible, the problem with the sentence.
Syntax In a compiler, this task is called syntax analysis. To perform syntax analysis efficiently, the compiler needs:
Here, the symbol → reads “derives” and means that an instance of the right hand side can be abstracted to the left hand side. By inspection, we can discover the following derivation for our example sentence.
Rule Prototype Sentence — sentence 1 subject verb object period 2 noun verb object period 5 noun verb modifier noun period 6 noun verb adjective noun period
At this point, the prototype sentence generated by the derivation matches the abstract representation of our input sentence. Because they match, at this level of abstraction, we can conclude that the input sentence is a member of the language described by the grammar. This process of discovering a valid derivation for some stream of tokens is called parsing. If the input is not a valid sentence, the compiler must report the error back to the user. Some compilers have gone beyond diagnosing errors; they have attempted to correct errors. When an error-correcting compiler encounters an invalid program, it tries to discover a “nearby” program that is well-formed. The classic game to play with an error-correcting compiler is to feed it a program written in some language it does not understand. If the compiler is thorough, it will faithfully convert the input into a syntactically correct program and produce executable code for it. Of course, the results of such an automatic (and unintended) transliteration are almost certainly meaningless.
Meaning A critical observation is that syntactic correctness depended entirely on the parts of speech, not the words themselves. The grammatical rules are oblivious to the difference between the noun “compiler” and the noun “toma- toes”. Thus, the sentence “Tomatoes are engineered objects.” is grammatically indistinguishable from “Compilers are engineered objects.”, even though they have significantly different meanings. To understand the difference between these two sentences requires contextual knowledge about both compilers and vegetables. Before translation can proceed, the compiler must determine that the pro- gram has a well-defined meaning. Syntax analysis can determine that the sen- tences are well-formed, at the level of checking parts of speech against gram- matical rules. Correctness and meaning, however, go deeper than that. For example, the compiler must ensure that names are used in a fashion consistent with their declarations; this requires looking at the words themselves, not just at their syntactic categories. This analysis of meaning is often called either se- mantic analysis or context-sensitive analysis. We prefer the latter term, because it emphasizes the notion that the correctness of some part of the input, at the level of meaning, depends on the context that both precedes it and follows it. A well-formed computer program specifies some computation that is to be performed when the program executes. There are many ways in which the expression
w ← w × 2 × x × y × z
might be ill-formed, beyond the obvious, syntactic ones. For example, one or more of the names might not be defined. The variable x might not have a value when the expression executes. The variables y and z might be of different types that cannot be multiplied together. Before the compiler can translate the expression, it must also ensure that the program has a well-defined meaning, in the sense that it follows some set of additional, extra-grammatical rules.
Compiler Organization The compiler’s front end performs the analysis to check for syntax and meaning. For the restricted grammars used in programming lan- guages, the process of constructing a valid derivation is easily automated. For efficiency’s sake, the compiler usually divides this task into lexical analysis, or scanning, and syntax analysis, or parsing. The equivalent skill for “natural” lan- guages is sometimes taught in elementary school. Many English grammar books teach a technique called “diagramming” a sentence—drawing a pictorial repre- sentation of the sentence’s grammatical structure. The compiler accomplishes this by applying results from the study of formal languages [1]; the problems are tractable because the grammatical structure of programming languages is usually more regular and more constrained than that of a natural language like English or Japanese. Inferring meaning is more difficult. For example, are w, x, y, and z declared as variables and have they all been assigned values previously? Answering these questions requires deeper knowledge of both the surrounding context and the source language’s definition. A compiler needs an efficient mechanism for deter- mining if its inputs have a legal meaning. The techniques that have been used to accomplish this task range from high-level, rule-based systems through ad hoc code that checks specific conditions. Chapters 2 through 5 describe the algorithms and techniques that a com- piler’s front end uses to analyze the input program and determine whether it is well-formed, and to construct a representation of the code in some internal form. Chapter 6 and Appendix B, explore the issues that arise in designing and implementing the internal structures used throughout the compiler. The front end builds many of these structures.
1.3.2 Creating and Maintaining the Runtime Environment
Our continuing example concisely illustrates how programming languages pro- vides their users with abstractions that simplify programming. The language defines a set of facilities for expressing computations; the programmer writes code that fits a model of computation implicit in the language definition. (Im- plementations of QuickSort in scheme, Java, and Fortran would, undoubtedly, look quite different.) These abstractions insulate the programmer from low-level details of the computer systems they use. One key role of a compiler is to put in place mechanisms that efficiently create and maintain these illusions. For ex- ample, assembly code is a convenient fiction that allows human beings to read and write short mnemonic strings rather than numerical codes for operations;