A Portable Compiler for the Language C, Lecture notes of Compiler Design

The implementation of a portable compiler for the programming language C. The compiler has been designed to produce assembly-language code for most register-oriented machines with only minor recoding. The machine-dependent information used in code generation is contained in a set of tables which are constructed automatically from a machine description provided by the implementer. chapters on modeling the target machine, generating code for an abstract machine, and more.

Typology: Lecture notes

2022/2023

Uploaded on 05/11/2023

ekambar
ekambar 🇺🇸

4.8

(25)

264 documents

1 / 77

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
""
"'
^^nw«naa«a|^«p«m^vwwnBw—"
i
i
in
mj^b
^^^wm^^^^^r^mmm^
i
1
■■
'
V
A
PORTABLE
COMPILER
FOR
THE
LANGUAGE
C
MASSACHUSETTS
INSTITUTE
OF
TECHNOLOGY
PREPARED
FOR
NATIONAL
SCIENCE
FOUNDATION
ADVANCED
RESEARCH
PROJECTS
AGENCY
MAY
1975
AD-A010
218
DISTRIBUTED
BY:
mr
National
Technical
Information
Service
U.
S.
DEPARTMENT
OF
COMMERCE
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27
pf28
pf29
pf2a
pf2b
pf2c
pf2d
pf2e
pf2f
pf30
pf31
pf32
pf33
pf34
pf35
pf36
pf37
pf38
pf39
pf3a
pf3b
pf3c
pf3d
pf3e
pf3f
pf40
pf41
pf42
pf43
pf44
pf45
pf46
pf47
pf48
pf49
pf4a
pf4b
pf4c
pf4d

Partial preview of the text

Download A Portable Compiler for the Language C and more Lecture notes Compiler Design in PDF only on Docsity!

V

A PORTABLE COMPILER FOR THE LANGUAGE C

MASSACHUSETTS INSTITUTE OF TECHNOLOGY

PREPARED FOR

NATIONAL SCIENCE FOUNDATION

ADVANCED RESEARCH PROJECTS AGENCY

MAY 1975

AD-A010 218

DISTRIBUTED BY:

mr

National Technical Information Service

U. S. DEPARTMENT OF COMMERCE

S)

c/

00

rH

N

e

O

157070

MAC TR-

A PORTABLE COMPILER FOR

THE LANGUAGE C

Alan Snyder

May -

Rtproi-'uced by

NATIONAL TECHNICAL

INFORMATION SERVICE

U S D«paMment o* Conim«fCt SpringflfldVA 22151

B ^

Work reported herein was supported in part by the Bell Telephone Laboratories, Inc., the National Science Foundation Research Grant GJ-34671, IBM Funds for research in Computer Science and by the Advanced Research Projects Agency of the Department of Defense under ARPA order no. 2095, ARPA Contract No Number N000K-70-A-0362-0006 and ONRTask No. NR-049-189.

CAMBWDGE

MASSACHUSETTS INSTITUTE OF TECHNOLOGY

PROJECT MAC

DPTMBimON STATEMENJ A

Approved for public xml

Dlftribution Unhniied

i MASSACHUSETTS 02139

76

"i "-'i" " ii ^mm^-^m^mm P^^^^^r—^^-^«^

MAC TR-

A PORTABLE COMPILER FOR THE LANGUAGE C

Alan Snyder

May 1975

MASSACHUSETTS INSTITUTE OF TECHNOLOGY PROJECT MAC

CAMBRIDGE MASSACHUSETTS 02139

HMMMMU^MMM. tttthmtutwäiätiä MHMMM

r ■■■iai|ii ii (^) HI i" iimmm^mmi^r "• ' _"w~mimm\ mi^am^mmrm^m^^mm .K_ m i _wmv^i^^^_*

A PORTABLE COMPILER FOR THE LANGUAGE C

by

Alan Snyder __

ABSTRACT

\

This paper describes the implementation of a compiler for the programming language C. The compiler has been designed to be capable of producing assembly-language code for most 'egisler-orienled machines with only minor recoding. Most of the machine-dependent information used in code generation is contained in a set of tables which are constructed automatically from a machine doscription provided by the imolementer. In the machine description, the implementer models the target machine by defining a machine-dependant abstract machine for which the code generator produces intermediate code. The abstract machine is abstract in that it is a C machine: its registers and memory are defined in terms of primitive C data types and its instructions perform basic C operations. The abstract machine is machine- dependent in that »here ;s a close correspondence between the registers of the abstract machine and,' those of the target maciine, and between the behavior of the abstract machine instructions and the corresponding target machine instructions or instruction sequences. The implemonter defines the translation from an abstract machine program to a target machine program by providing in the machine description a set of simple macro definitions for the abstract machine instructions. In addition, macro definitions may be provided in the form of C routines where additional processing capability is needed.

Thit report ii bmrd on n thrti» tuhmitted to the Department oj Electrical

Engineering nt the WomarAmu'iM /n«(ilulr of Technology on May 10, 1974 in

pnrtinl fulfillment of the requirement» for the Degree» of Bachelor of Science

and Ma»tvr of Science. Woik reported herein tea» »upported in part by the Bell

Telephone Laboratorie», Inc., the National Science Foundation Re»earch Grant

CJ-34fi7l, IBM fund» for retearch in Computer Science and by the Advanced

Re»enrch Project» Agency of the Department of Defen»e under ARPA oritr no.

2095, ARP/l Contract Number NOOOli-70-A-03f>2-0006 and ONR Tatk No. NR-

MM^M — —— - - -

_wmi^ermmmt_* (^) -—" mmi« 'ii »in I»IWP^^»P«<""«»^»W»P^HPI (^) I»" • ■«

REFERENCES

FIGURE 1 The GCOS Control Cards

APPENDIX I The Machine Description

I. Definition Statements

1.1 The TYPENAMES Statement 1.2 The REGNAMES Statement 1.3 The MEMNAMES Statement 1.4 The S1ZF. Statement 1.5 The UIGN Statement 1.6 The CLASS Statement 1.7 The CONFLICT Statement 1.8 The SAVEAREASIZE Statement 1.9 The POINTER Statement 1.10 The OFFSETRANGE Statement 1.11 The RETURNREG Statement 1.12 The TYPE Statement

  1. The OPLOC Section
  2. The Macro Section

APPENDIX II The Intermediate Language: AMOPs

APPENDIX III The Intermediate Language: Keyword Macros

APPENDIX IV The HIS -6000 Machine Description

APPENDIX V The HIS-6000 C Routine Macro Definitions

APPENDIX VI Overall Desertion of the Compiler

  1. The Lexical Analysis Phase
  2. The Syntax Analysis Phcse
  3. The Code Generation Phase
  4. The Macro Expansion Phase
  5. The Error Message Editor
  6. Invoking the Compiler Ph»ses

■M/M ■ -^ — ... - ..^^».-^--—

(^111) ^^m^m ■"■^ ■■ ■ "^w

1. Introduction

This paper describes the implementation of a compiler for the programming language C [1,2], an implementation language developed at Bell Laboratories and a descendant of the language BCPL [3]. The compiler has been designed to be capable of producing assembly-language code for most register- oriented machines with only minor receding. Versions of the compiler exist for the Honeywell H1S- and Digital Equipment Corporation PDP-10 'omputers.

C is a procedure-oriented language. It has four primitive data types (integers, characters, a.id smgle- and double-precision floating-point), four data type constructors (pointers, arrays, f.'notions, and records), and a small but conveniei t set of control structures which encourage goto 'ess programming. An important characteristic of C is the minimal run-time support needed. Although C supports recursive procedures, C does not have built-in functions, I/O statements, block structure, string operations, dynamic arrays, dynam.c storage allocation, «r run-time type checking. The only run-time data structure is the stack of nrocedure activation records. Of course, to run any usef-jl programs, an interface to the operating system is required, and a standard set of 1/0 routines has been defined in order to encourage portability. But the implementation of these routines is optional and separate from the task of i nplementing a C compiler which produces code for a given machine.

The compiler described in this paper was designed to be portable, that is, to be capable of generating code for many target machines with a minimum of receding When considering portability, three classes of machines can be defined:

  1. Machines which can support C programs reasonably efficiently: This class of machines depends only upon one's interpretation of the term "reasonably efficiently." Clearly, all real machines can run C programs, limited only by some size constraint related to the availability of memory. However, the following capabilities are desirable: (1) the ability to access the current procedure activation record and the current argui <jnt list in a reentrant manner - this will require one or two base/index registers depending upon the calling sequence, (2) the ability to reference via a pointer * unable - this will require another base/index register or an indirection facility, (3) character addres mg, (4) integer arithmetic, and (5) floating-point arithmetic. Not all of the above capabilities need be present in the target machine; however, Ihe more that are missing, the more interpretive becomes the execution of a C program. For example, the HIS-6000 is word-addressed; thus references to character variables are interpretrd by a small run-time subroutine.

Machines for which the compiler can produce reasonably efficient code: This class of machines is clearly a subset of tho first class; the size of the subset is again determined by one's definition of reasonable. The better the correspondence between the target machine and the machine model implicit in the compiler, the better will be the object code produced. On the other hand, if the correspondence is poor, the compiler may be able to produce only threaded code or instructions to be interpreted by software.

  1. Machines which can support the compiler itself: Because the compiler is written in C, one mav think that thib class of machines is identical to the second class of machines; however, there are added restrictions which must be made in order to run the compiler on a given machine: the word size of the machine must be sufficient to hold all values used by the compiler; any .mplementation restriction on the size of procedures or data areas (as would be likely en the IBM S/360 because of addressing deficiencies) must not be such as to prohioi» the proper execution of the compiler (this includes the ability of the compiler to compile itst il In addition, there are operating system and configuration restrictions: the memory size available to a program must be sufficient to hold the phas«. • of the compier, file space for the source jf the compiler must be available and affordable; the I/O roi/mes used hy the compiler must be implemented. This class of machines is not a subset of the second class of machines sine«* the compiler does not use all of the features of the language, notably floating-point.

This paper concentrates on the second class of machines, those for which the compiler can produce

MMMBM _~m^mmmm—m_*

m^^^mmmm^^^-^mmmmmmm^mm

1.2 Background

A compiler can be considered to consist of two logical phases, analysis and generation. The analysis phase performs lexical and syntactic analysis of the source program, producing as output some convenient internal representation of the program, along with a set of tables containing lexical information and other information derived from the declarative statements of the program. The generation phase then transforms the internal representation into an object language program, using the information contained in the tables produced by the analysis phase. One can confine the machine (object language) dependencies Of a compiler to the generation phase by a suitable choice of internal representation, i.e. one which is machine-independent. On the other hand, it is not practical to also confine the source language dependencies of a compiler to the analysis phase since this would make the internal representation a universal language. Thus the generation phase of a compiler is both source-language-depondent and machine-dependent.

Most portable compilers require that the generation phase be completely rewritten for each target machine [7,8]. This effort may represent onl> about one-fifth of the effort needed to rewrite the entire compiler [8]. In the case of the BCPL compiler [9], for example, moving the compiler may require only three to four weeks under ideal conditions (but otherwise may require up to five months). However, it would be desirable if the amount of recoding necessary to generate code for a new machine could be reduced.

On© approach is that advocated by Poole and Waite for writing portable programs [10,11]. They advocate that before writing a program to solve a particular problem, one define an abstract machine for which the program is then written. With this approach, in order to move the program to a new machine, one need only implement the abstract machine on the target machine, typically via a macro processor. The desired qualities of the abstract machine are that it contain operations and data objects convenient for expressing the problem solution, that it be sufficiently close to the target machines o(^ interest sc that acceptable code can easily be generated, and that the tools for implementing the abstr.-d machine be easily obtainable on the target machines.

This technique can be applied to portable compilers by considering the problem to be the implementation Of an arbitrary source language program. The operations and data objects convenient for expressing the problem solution are then those which are basic to the source language. With this technique, a compiler would be broken into two parts: a machine-independent translator from tK? source language to the abstract machine language and a machine-dependent translator from the abstract machine language to the target machine language. The translator from the absi act machine language to the target machine language should be smaller and simpler than the conventional generation phase wou'd be; typically, it consists of a set of macro definitions which map each abstract machine instruction into the correspondir^ target machine instruction or instruction sequence. Moving the compiler to a new machine bimply requires rewriting the macro definitions.

The major difficulty with the abstract machine approach to portable software is in determining the appropriate abstract machine. If the abstract machine is of a high level (i.e., very problem-orenled), then the program will be very portable but the implementation of the abstract machine will be difficult. On the other hand, if the abstract machine is of a low level (i.e., more machine-onented), then, un'-ss it corresponds closely to the target machine, either the code produced will be inefficient or the implementation will be complicated by optimization code.

The solution to this difficulty proposed by Poole ard Waite is to define a hierarchy of abstract machines, ranging from a high-level problem-oriented abstract machine to a low-level, machine-oriented, and easy- to-implement abstract machine. In this solution, the h'gh«r-lovel abstract machines are imolemented in terms of the lower-level abstract machines, and o^ly the lowest-level abstract machine need be implemented on a target machine in order to trans^e' fh*» program; once it is transferred, higher-level abstract machine« nay be implemented directly in terms of the target machine in order to improve efficiency. While ths technique may be useful for transferring particular programs, if is jrlikely that it

MMaaa^

1^^ "" *"

1

will be acceptable in practical terms as a compilation technique because of the need for additional translation steps. An experiment by Brown [12] indicates that one may implement and then optimize a low-level abstract machine in about the same time as it takes to implement a higher-level abstract machine and that the resulting implementations are similarly efficient. Thus an alternative solution is to use a low-level abstract machine, but allow the implementer to optimize as desired; this solution rs more likely to be acceptable as a compilation technique. A third solution will be advocated in this paper.

The technique of rewriting the generation phase requires that a non-trivial translator from >he internal representation to the target machine language be written for each new target machine. Similarly, the abstract machine approach requires that a translator from the abstract machine language to the target machine language be written for each new target machine; if reasonably efficient code is desired and the abstract machine does not correspond very closely to the target machine, then this translator will also be non-trivial.

A more desirable goal for a portable compiler is that it have a generation phase which can be modified to produce code for a new target machine by a process which is largely automatic. Implicit in this goal is the requirement that the modification process obtain its knowledge about a target machine from a (non- procedural) description of the machine. An early effort in this direction was the SLANG system [13], which attacked the problem of describing a machine-dependent process (code generation) in a machine- independent way. In the SLANG system, source language constructs are translated ir.to a set of basic operations called EMILs; the EMILs are translated into absolute machine code using macro definitions and instruction format definitions. The approach is s:milar to the abstract machine approach in that the EMILs can be considered to be the instructions o an abstract machine; the difference is that the code generation algorithm uses information contained in a machine description in order to tailor the EM1L program to the target machine. The EMILs differ^ from the instructions of a Poole and Waite abstract machine in that they are machine-oriented, rather than problem (source-language) oriented. In addition, the code generator does not seem to know about registers other than index registers, which implies that one will not be able to achieve the desired dose correspondence between the abstract machine and most register-oriented machines. Nevertheless, the method of describing the instructions of a machine by providing simple instruction sequences which interpret the abstract machine instructions seems to be a good compromise between the desire to minimize coding and the difficulty of matnematically defining a machine and utilizing such a definit'On in generating code.

Wore recently. Miller [14] has explored the problem of constructing? code generator from a machine description. Miller proposes that a generation phase be constructed in two steps. In the first step, the language designer sptiifies the language-dependent part of the generation phase by writing a set of procedural machine-independent macro definitions for the operations of the internal representation produced by the analysis phase. These macro definitions define the operationa of the internal representation, such as addition, in terms of machine-independent (i.e., language-oriented) primitives, su^h as integer addition, which are created by the language designer. In the second s'ep, the implementer provides a description of the target machine which is osed by an automatic code generation system named DMACS (Descriptive Macro System) in order to fill out the macro definitions of the first step and thereby produce a code generator for tht target machine. As was the case with the SLANG system, the DMACS machine description defines the primitive operations by giving target machine code seque .es which interpret them. In addition, however, the permitkd locations of the operands (in terms of their being in memory or in particular renters) are specified as are the corresponding result locations. Thus the primitives can be made to correspc d very closely to the instructions of the target machine so that the code sequences in the machine description are simpler and the resulting object code is more efficient.

Both the SLANG system and DMACS are intended to be general in that they are not designed for a specific source language. However, true generality is difficult to obtain and the systems do reflect preconceived notions about source languages. It is believed that, since there are much more significant Vai lations among languages than among machines, a practical implementation of a compiler for any interesting language requires that the system be designed specifically for that language. This idea was recognized to some extent HI DMACS where the primitives are created by the language designer as

—mmmmm ■ --— ■ - ■ ■

■■ '■ '

A code generation algorithm, if it is to he machine-independent, requires a mode' of a machine with which to work. This model may express such notions as memory, registers, addressing, operations, and hardware data types. In the machine description, the implementer defines hi. U.get machine in terms of this model and also specifies the form of the object language. The class of machines for which the code generator can produce acceptable code directly corrssponds to the generality of the machine model.

The machine model used by the C compiler is a C machine: a machine whose registers and memory are described in terms of the primitive C data types and whose operations are primitive C operations. The implementer models the target machine in terms of a C machine, producing an abstract machine. The abstract machine may be very similar to or very different from the target machine, depending upon how closely the target machine fits the machine model. The code generation algorithm, using its machine model, produces code for the abstract machine. The "assembly" language of the abstract macliine is called the intermediate language; an intermediate language program, which is in the form of a series of macro calls, is translated into the target machine assembly language using a set of macro definitions, provided by the implementer in the machine description. Assembly language was chosen over machine language for the output of the compiler because it is far easier to describe and produce in a machine-independent manner than machine code- or object modules.

The abstract C machine plays the same role in the C compiler as would a Poole and Waite abstract machine. The difference is that instead of there being one fixed abstract machine, there is a class of abstract machir ^s, corresponding to the variability in the machine model. This variability allows the implementer to define a particular abstract machine which more closely resembles his target machine. The result is that the translation from the abstract machine language to the target machine language becomes simpler, and more efficient code is produced.

The process of modeling the target machine is described in chapter two. A detailed discussion of the code generation algorithm is presented in chapter three. Conclusions are presented in chapter four.

MB MM Mi

lim itMimw mViOT (^) ~ ~ '■• ' ■ pi ■ • ■ im ^

- u -

  1. Modeling thr Target Machine

The code generator's model of a machine is an abstract C machine, a machine whose instructions perform the primitive operations of the C language. The data types of the abstract machine are the primitive C data types (characters, integers, and single- and doubL'-precision floating point), supplemented by one or more pointer classes which are distinguished by their ability to resolve addresses. The basic addressable unit of the abstraU machine memory is the byte, which holds a single character value (characters are the smallest C data type). Values of the other abstract machine data types occupy an integral number of bytes, possibly aligned in larger units of memory. Th:« ^Dstract machine has a set of registers which may- be used to hold the operands of the abstract ma.- .« instructions. Each abstract machine register is capable of holding values of some subset of the tract machine data types. The instructions of the abstract machine are three-address instructions. .' .h address may specify an abstract machine register or a location in memory; the mechanisms for re(e;.v<cing a memory location correspond to the primitive addressing modes in C.

In the machine description, the impiementer describes the target machine in terms of this machine model by defining a particular abstract machine for which the code generator produces intermediate code. The impiementer specifies the sizes and alignments of the primitive C data types and defines pointer classes as convenient. The impiementer defines the abstract machine registers, which generally correspond to those registers of the target machine which are to be used in the evaluation of expressions. The

. p'ementer also specifies the registers which may hold values of each of the abstract machine data /pes. In addition, the implemerter may specify that any two abstract machine registers conflict in the target machine, meaning that oily one may hold a value at any one time. The impiementer defines the abstract machine instructions n terms of their operand/result locations and possible side-effects on other registers. In addition, the inplementer provides a set of macro definitions which implement the abstract machine instructions on the target machine

2.1 The Irter »ediate Language

The interr.odr-.te language is the assembly language of the abstract machine. Using the information contai. -»d in the tables constructed from the machine description, the code gensrator produces a translatioi. cf the source program m the intermediate language. An intermediate language program consists of a sequence of macro calls, each of which is expanded into one or more object language statements using the macro definitions provided in the machine descriptior.. There are two types of macros in the interrr.edialo language: The first type are macros which represent the three-address abstract machine instructions. The second type are keyword macros which correspond to either assembly-language pseudo-operations or instructions implementing the primitive C control structures.

2.1.1 Abstract Machine Instructions

The abstract machine instructions are three-address instructions which perform the evaluation of C expressions. The operators of the abstract machine instructions are called abstract machine operators (AMOPs), the addresses are called references (REFs).

2.1.1.1 AMOPs

AMOPs are basic C operations which are qualified by the specific abstract machine data types of their operands. For example, in the HIS-6000 implementation there are four AMOPs corresponding to the C operator V:

♦i integer addition ♦d double-precision floating-point addition ♦pO addition of an integer io a pomte' to a byte-aligned object ♦pi addition of an integer^ to a pointer to? word-aligned object

MMMM

mmm •m'mmmmm^.^ ,,^ ■ —^ wwmm

13

macro function

HEAD ENTRY EXTRN

INT

CHAR

FLOAT

NFLOAT

DOUBLE

NDOUBLE

ADCONn STRCON EQU ZERO STATIC STRING ALIGN LN LABCON LABDEF

IDN

END

PROLOG

EPILOG

CALL

RETURN

GOTO

LSWITCH

TSWITCH

prvduce header statements, if needed defne an entry point define an external reference define an integer constant define a character constant define a floating-point constant define a negative floating-point constant define a double-precision float coistant define a negative double-precision constant define a class V pointer constant define a pointer referencing a string constant define a symbol det'ne an area of storage initialized to zero define a static variable define the string constants force an alignment of the location counter define a line-number symbol define a label constant define an internal label transla* J an miernal identifier number into the corresponding assembler symbol produce an end statement, if needed

produce the prolog code of a C function produce the epilog code of a C function produce a function call produce code for a return statement produce a jump to a label expression produce a switch jump (list version) produce a switch jump (table version)

The actual macro names which appear in an intermediate language program are abbreviations of the names listed above.

2.2 The Maohinj Description

The machine description is a "program" written in a special-purpose language from which is constructed the machine-dependent tables of the generation phase. The machine description has two functions: (1) it defines the particular abstract machine for which the code generator produces intermediate code, and (2) it specifies the translation from an intermediate language program to the corresponding object language program.

The abstract machine is defined in two sections of the machine descriotion. First, a set of definition statements defines the registers and memory of the ?Utract machine. Second, in the OPLOC section, the AMOPs are defined in terms of their operand/result locations. The translation from the intermediate language to the object language is specified by a set of macro definitions in the macro section of the machine description. More information on the writing of a machine description may be found in Appendix I; the machine description used in the HIS-6000 implementation is listed in Appendix IV.

2.2.1 Defining the Abstract Machine

In the machine description, the implementer first defines the registers of the abstract machine. For example, the statement

MtM

^^"" ■■■ •"• ■■ ^i mi i i^m^mmammmr^^^mm^^^m^mrnmmmm^mmi

- 14-

regnames (x0,xl,x2^3,x4,a,q,f);

defines the eight abstract machine registers used in the HIS 6000 implementation. The registers XO

through X4 correspond to the first five of eight HIS-6000 index registers, the A and Q correspond to the

accumulators, and the F register is a fictitious floating-point accumulator which corresponds to the

combined A, Q, and E (exponent) registers on the HIS-6000. The fact that the F register conflicts in the

target machine with the A and Q registers is specified by the statement

conflict (a,f),(q,f);

Tha remaining HIS-6000 index registers are not represented in the abstract rrachine since it was not

desired that they be used by the code generator in the evaluation of expressions; two of those registers

hold "environment pointers," the other is used as a scratch register by some of the macro definitions.

There is nothing that requires that the abstract machine registers be implemented as actual machine

registers on the target machine; they may also be implemented as fixed memory locations.

For convenient e, the abstract machine registers can be gathered into classes; for example, in the HIS-

6000 implementation, the statement

class x(x0,xl,x2,x3,x4), r(a,q);

defines the class of index registers X and the class of general registers R.

The implementer also defines the classes of abstract machine pointers. Pointer classes are necessary on

machines which are not byte-addressed since pointers to byte-aligned objects will be handled differently

than pointers to word-aligned objects. In the HIS-6000 machine description, the statement

pointer p0(l), pl(4);

defines the class P0 of byte pointers and the class PI of word pointers. The "4" indicates that the value

of a PI pointer is always a multiple of four bytes. The fact that there are four bytes per word on the

HIS-6000 is specified in the statement

size l(char), 4(int,float), 8(doubleh

A similar statement is used to specify the alignment restrictions.

The statement

type int(r), char<r), float(f), doublet), p0(r), pl(x);

defines the registers which can hold values of each of the abstract machine data types. For example, in

the HIS-6000 implementation, word point jrs are held in the index registers X while byte pointers are held

in the general registers R.

The definition of the abstract machine is completed in the OPLOC section of the machine description

where the implementer specifies the behavior of the abstract machine operations in terms Of their

operand/result locations. For example, the location definition

♦d: f,M,f;

specifies that the AMOP '+d' (double-precision floating-point addition) can take its first operand in the F

register and its second operand in any memory location and, under these circumstances, the result is

placed in the F register. The construct on the right in the location definition is called an OPLOC; it

consists of three location expressions, one for the first operand, second operand, and result (reading from

misnm

" ' • ll1^ I! "Uli Ml

representation. For example, the macro definition for Vi' (integer addition) in the HIS- implementation is

♦•: " ADaR aS"

If the first operand location (which is also the result location) is the A register and the second operand is en external variable "X", then the code produced by this macro definition is

ADA X

which adds the contents of "X" to the A register. A macro definition can also contain character strings whose inclusion in the expansion of a ir.^ro call is conditional upon the locations of the operands end/or result. An example is the HIS-6000 mecro de'irition 'o' '«' (left shift)

«: (.intlit,): (.-intlit,):

which produces different code sequences depending upon whether or not the second operand (the number of bit-positions to shift) is an integer constant. A macro definition may include references to the arguments of the macro call using the character sequences «0, «1, „. «9; a macro definition may include embedded macro calls, such as the •lo(a'S)" in the last example, which returns the value of the integer constant.

A macro definition may also be specified in the .orm of a C routine. C routine macro definitions are used when processing is needed which is beyond the capabilities of the simple macro scheme so far described. C routine macro definitions may define global variables, perform arithmetic and logical operations, and select code sequences on conditions other than operand locations. In the present implementation, however, C routine r.:ro definitions are unable to interact with the code generation algorithm. In the HIS-6000 implementation, C routine macro definitions are used to translate REFs into GMAP symbols, to translate the source language representations of identifiers and floating-point constants into GMAP, to define cha. acter string constants, and to buffer characters while defining storage for variables (GMAP does not have a byte location counter, as is assumed in the intermediate language). The C routine macro definitions used in the HIS-6000 implementation are 'xAed in Appendix V.

•FLS %o(e'Sr

LXL5 (^) •S

•FLS 0,5-

■■-"♦••. fggagttmmm

17

3. Generating Code for an Abstract Machine

The most interesting pirt of the compiler is the code generator since, unlike most code generators which produce code for a fixed target language, the code generator of the C compiler is designed to produce code for a class of abstract machines.

3.1 Functions of the Code Generator

The code generation process consists of three fairly distinct functions. First, there is the generation of intermediate language statements to define and initialize static data areas and constants. Second, there is the translation of source language control structures into labels and branches. Third, there is the translation of source language expressions into sequences of abstract machine operations.

The C compiler is designed to produce assembly language code for conventional machines; thus, the intermediate language statements (or defying and initializing static data areas directly correspond to assembly language statements which define symbols, define constants, and align the location counter. The only complication is that the code generator must use the size and alignment information from the machine description in order to specify the sizes and alignments of data areas. More information and redundancy could be added to the intermediate language in order to accomodate a larger class of target languages; see [16] for examples. Another possible improvement would be to emit segment specifying instructions so that the output could be segregated into different segments according to whether it is code, pure data, impure data, or •minitialize'l data.

The process of translating source language control structures in'o labels and branches is rather straightfoward. The only complications come when emitting conditional branches which test the value of an expression; these problems are covered m the next section.

3.2 Generating Code for Expressions

The generation of code for expressions is the most difficult part of the problem. The code generator must generate a correct sequence of abstract machine instructions to carry out the indicated operations. The operand and result locations it specifies in the abstract machine instructions must conform to the location definitions provided in the machine description. Moreover, the code generator must Keep track of the locations of all intermediate results and correctly administer the abstract machine registers and temporary locations.

The generation of code for expressions is performed in two steps, semantic interpretation and code generation.

3.2.1 Semantic Interpretation

The code generator receives expressions in the term of syntax trees whose interior nodes are source l?nguage operators and whose leaf nodes are identifiers and constants. Thus, an expression can be considered to consist of a "top-level" operator along with zero or more operand expressions. The first step in the processing of an express'on consists of translating a tree in this form to a more descriptive form whose interior nodes are AMOPs. This translation involves checking the data types of operands, inserting conversion operators where necessary, and choosing the appropriate AMOPs to express the semantics of the source language operators. The selection of an AMOP to replace a source language operator is based primarily on the data types of the operands. For example, on this basis, an addition operator may be translated into either integer addition, double-precision floating-point addition, or one of a number of pointer addition AMOPs. However, it is useful to be able to choose AMOPs also on the basis of what is provided in the machine description. The basic idea is that of defaults. If the semantics of a particular AMOP can be expressed in terms of a composition of more basic AMOPs, then the AMOP can be left undefined in the machine description; the code generator can use the equivalent composition of AMOPs instead. The advantage of havir.g optional AMOPs is that the implementer need define one of

HMMMgi (^) -—MMHOi