




Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
The Object Oriented Pre-Compiler. Programming Smalltalk 80 Methods in C Language. Dr. Brad ,L Cox 1. ITT Programming Technology Center. 1000 Oronoque Lane.
Typology: Lecture notes
1 / 8
This page cannot be seen from the preview
Don't miss anything!





T h e O b j e c t O r i e n t e d P r e - C o m p i l e r
Programming Smalltalk 80 Methods in C Language
Dr. B r a d ,L C o x 1
ITT Programming Technology Center 1000 Oronoque Lane Stratford, Connecticut 06497
T h e ITT Programming E n v i r o n m e n t is being developed as part of a company-wide effort to accomplish an order of magnitude increase in p r o g r a m m i n g productivity during the 1980's. It is
a programming e n v i r o n m e n t in which all partici- pants in all phases of the programming life-spiral employ both conventional solitary tools to enhance their personal productivity, and a new class of facility, coordination tools4, to address the coordination problems which develop when large n u m b e r s of individuals need to cooperate on a c o m m o n task. Coordination involves m o v e m e n t of abstract data types (program releases, status reports, schedules, contracts, plans) a m o n g a set of concurrent organizational roles. This is difficult to model on conventional operating systems because of at least two separate but related p r o b - lems: (1) Concurrency: Conventional operating sys- tems hide concurrency from the user, while coordination means that the user must deal with concurrency in a controlled manner. Objects, as defined by OOPC and Smalltalk, are sequential concepts which provide no new leverage on concurrency. (2) C o m p u t i n g B a r r i e r s : A prime purpose of a conventional operating system is to erect a barrier between its users, so that each operates in a private address space protected f r o m interference (and contributions) from the others. To m o v e data across this barrier, it m u s t be purged o f address-dependencies (converted into a stream of bytes) and m o v e d to some inter-process c o m m u n i c a - tion m e d i u m such as a shared file. I Now at Schlumberger-Doll Research, Old Quarry Road, PO Box 307, Ridgefield, Connecticut 06877, (203) 431-5000. 2 Smalltalk 80 is a trademark of Xerox Corporation. See Byte Magazine, Augusx 1981, for full information. 3 UNIX is a trademark of Bell Laboratories. See The Bell Sys- tem Technical Journal, Vol 57, 1978 for full information.
4 This name was invented by Dr. Anatol Holt, whose work in Operations Seienee provides the theory behind much of this effort. See Net Models o f Organizational Systems - In Theory and Practice, Ansatze Zur Organisations Theorie, Rechnergestuetzter Informations Systeme, P. Oldenbourg, Muenchen. 1979.
S I G P L A N N o t i c e s , V18, #i, J a n u a r y 1983
T h e O b j e c t - O r i e n t e d P r e c o m p i l e r
Our interest in object languages began when we realized that the concept could provide a gen- eral technique for moving arbitrary data types tran- sparently across computing barriers ( m e m o r y to disk, networks, h u m a n interface). For example, transparent motion across the m e m o r y / d i s k bar- rier would m e a n the disk could be used as a shared repository of custom data types which the user deals with uniformly as objects. This is in contrast to conventional operating systems, where the p r o g r a m m e r is required produce ad-hoc code which maps his data types onto two disk data types that are predefined by the operating system (flat files and directories). OOPC was our first implementation of this thought, which shares with Smalltalk 80 the limi- tation that objects are n a m e d uniquely in a per- sonal address space, so that object sharing is pos- sible only in the laborious sense described here. A m o r e ambitious design will also be m e n t i o n e d which should allow objects to reside primarily on a shared disk and m o v e automatically into m e m o r y when needed there. If this proves ade- quately efficient and general, it will be used to build an experimental operating system whose disk will automatically handle any data type its users define.
3. SodaMachine Simulation OOPC syntax is primarily that of C language, and is less interesting than its seman- tics. So I'll describe it by way of an simple e x a m - ple; a simulation of a soda vending machine. We'll view this machine first from "outside" as seen by its users, and later f r o m the "inside" as seen by its implementor.
E x t e r n a l View W e speak of OOPC p r o g r a m m e r s writing classes, not programs. Compiling a class creates two new objects which define (1) the behavior of a factory for producing instances of that class and (2) the behavior of each instance 5. Only these two objects exist initially, and only the factory object is addressible, as the compiler publishes its ID ( m e m o r y address) in the C global variable SodaMachine. T h e other will b e c o m e accessible once the factory produces s o m e instances. Figure 1 shows the OOPC source file that defines the SodaMachine class.
5 Careful of the ambiguity that lurks here. as the word class is commonly used in connection with three entirely different things: (1) the file which defines both the factory and instance
The SodaMachine factory responds to a comb:model: message by creating a new instance of SodaMachine (according to specifications we provide as arguments) and replying its ID. For example, the following code c o m m a n d s the SodaMachine factory to create a SodaMachine with 4 can storage racks and the d o o r ' s combina- tion lock set to 23, and saves the ID of the new machine in the variable mySodaMachine. mySodaMachine = {]SodaMachine, "comb:model:", 23, 4 ~; The {I ... ~ notation indicates messaging, which is i m p l e m e n t e d with a C subroutine which locates the object in m e m o r y based on its ID, determines its class, and searches the class to locate the m e t h o d 6 which handles that message. Messages consist of at least two arguments; the receiver which provides the ID of the receiving object and the selector which specifies the action expected of it. OOPC i m p l e m e n t s the object ID as the address of the object in m e m o r y , and the selector as a string of characters. By analogy with Smalltalk, selectors are usually chosen as the string of keywords that describe the arguments. The I D returned by the factory represents a full-fledged new object, which can itself respond to messages. For example, SodaMachines are delivered unlocked, so we can immediately load s o m e cans into two of its four racks. T h e cans are obtained here from a Can factory (not shown): {[mySodaMachine, "load:this:", 0, {1Can, "new" ~ ~; {ImySodaMachine, "load:this:", 1, {ICan, "new" ~ ~; We test the machine by locking its door and retrieving the two cans with coins m a n u f a c t u r e d by a Coin factory. {1mySodaMachine, "lock" I; {ImySodaMachine, "pay:", {tCoin, "new" } ~; can0 = {1mySodaMachine, "button:", 0 }; {ImySodaMachine, "pay:", {ICoin, "new" ~ I; canl = {ImySodaMachine, "button:", 1 ~;
OOPC does not provide automatic garbage collection, so the p r o g r a m m e r is responsible for releasing unused objects. T h e y are managed on a heap by the standard U N I X mallocO function.
behaviors, (2) the object which contains the factory methods, which the Xerox group calls the metaclass, and (3) the object which contains the instance methods. I've introduced the new word factory to relieve the confusion somewhat, but I continue to use class to mean both (1) and (3). 6 Method is the Smalltalk word meaning subroutine. I some- times use behavior asa synonym.
The O b j e c t - O r i e n t e d P r e c o m p i l e r
An additional set of variables is not declared explicitly as they are created at run-time; indexed instance variables. Methods for accessing them are inherited from Objecg they are allocated when the object is created via new: and are accessed by At: to read the /th indexed variable, and At:Put: to write it 7. Any object may have any n u m b e r of indexed instance variables; they immediately follow the regular variables in memory. Indexed instance variables will be used here to hold the IDs of Stack objects which implement the can storage racks. Although the SodaMachine simulation does not do this, it is also possible to append addi- tional factory variables, which are accessible to methods of the factory and its instances and which provide a particularly well-controlled form of global variable. They obey the same inheri- tance rules as instance variables. For example, the SodaMachine factory variables are all inher- ited from Object:
SodaMachine Factory Variables O B J s i z e 'Size of this object in words O B J c l a s s T h e class of this object. OBJfields T h e n u m b e r of instance variables in each So- daMachine instance. OBJsuper T h e SodaMachine super- class, Object. OBJname A C string containing the name of the class, "So- daMachine". O B d m e t h o d d i e t T h e address of the
m e t h o d lookup table. This table is currently not held as an object, but as a C structure array. A C string that encodes the types of each instance variable. Used to assist in object save/restore.
Factory Methods The methods prefixed by < + > are exe- cuted in response to messages sent directly to the ID in SodaMachine, and describe the factory which produces the instances; or more generally, the business of the class as a whole.
7 A set of C macros is also provided to allow speedy access to indexed instance variables.
For example, the SodaMachine factory recognizes a comb:model: message to mean that it is to produce a machine with a specified combina- tion in its lock, and a specified n u m b e r of racks for dispensing drinks. The < W R D > prefix on the arguments warns OOPC that they are both C integers and not the default type < OBJ> (objec- tlD). The statement self = {I sel~ "new. J', anlnt ~," reassigns self to the reply of a new: anlnt mes- sage. Self originally holds the ID of the SodaMachine factory, and is being changed here to point to a newly created instance. Reassigning self in this manner is a c o m m o n idiom, especially in factory methods which create an instance and then initialize its variables. Now note combination = aCombination; and the fact that combination is an instance variable. This is to modify the combination variable in the instance whose ID is now in self. OOPC always assumes that self contains the ID of a valid instance, and addresses instance variables by gen- erating C structure m e m b e r expressions. A simi- lar technique is used to reference factory vari- ables and method arguments. As in the real world, factories seldom build instances out of raw materials but usually sub- contract the work to other classes; another way to reuse code. For example, the SodaMachine fac- tory uses stacks as subcomponents to handle the coinbox and can racks, so it requests a Stack fac- tory to provide them. The n u m b e r of can racks may differ for each SodaMachine, so indexed instance variables are used to record their IDs. locked = FALSE; coinbox = {IStack, "new" ~; for (i = 0; i < anlnt; i + + ) {Iself, "At:Put:", i, {IStack, "new" ~ ~;
Instance Methods Instance methods, introduced in the figure by < - > markers, behave exactly like factory methods, except the ID describes an instance rather than a factory. These methods should be fairly obvious, as they parallel the kinds of behaviors we expect of real drink machines. T h e r e are methods for lock- ing and unlocking the machine, inserting coins, and selecting drinks, and code that enforces a protocol on their legal use; for example to get the machine to deliver a drink in response to press- ing the drink selection button one must first have inserted a coin into the coinslot.
T h e Objeet-Orient, ed P r e e o m p i l e r
The SodaMachine is a simple example, but it does show some of the power of programming with objects. The ideas shown here are equally useful in building programming abstractions like Stacks, Queues, and Lists, and then reusing them in ever larger entities like Documents, Programs, Releases, Spoolers, Schedules, etc. Programming becomes a matter of defining new abstract data types, whose internals can be completely hidden from their user who only needs to know an object's behavioral repertoire to use it success- fully, and not such implementation details as its internal data structures. Each object is like a tiny machine (a v#'tual machine) which contains its own states (its vari- ables) and behaviors (its methods), and one tends to design applications such that each object models a corresponding object in the real world. Programs designed in this way are often easier to understand because of the close parallel with phy- sical objects. However OOPC/Smalltalk objects differ from physical ones in several fundamental ways which become critical in building coordination systems. Contrary to many people's expectations objects do not operate concurrently, as the restric- tion is imposed that the sender sleeps while the receiver is awake. Objects bring no new leverage to the problem of concurrent computation, not because messaging couldn't be redefined to allow a sender to continue after a message, but because the object concept brings no new ideas to help the programmer control the resulting con- currency. So OOPC or Smalltalk are no different with regard to concurrency than C or assembly language. T h e y ' r e all sequential languages, which may of course be used to write non-sequential programs like UNIX or the Smalltalk Environ- ment. The treatment of corns in the SodaMachine example should also warn that the object concept is not in itself sufficient for developing large coordination systems for multiple role players whose interests are may be in conflict. The coins are m no sense transferred to the machine, and the user could easily "attach a string" to the coin in a local variable, and have his way with it (spending it twice, for example). This problem arises out of the efficiency concerns that caused us to use ID's (pointers) to refer to objects, rather than the objects themselves.
Messaging Messaging differs from conventional sub- routine calls only in that the actual subroutine address is chosen at run time rather than compile/link time 8. The subroutine to perform this action is located at runtime by searching for the selector in a method dictionary stored in the receiver's class, or one of its superclasses. OOPC turns the {I ..- ~ notations directly into calls on a C subroutine, msgO, which performs the lookup and calls the method, supplying the receiver as a special argument named self where it will be used as the origin for addressing instance variables. OOPC currently manages selectors as C strings, so the innermost loop of the message lookup involves a costly string compare. It is straightfoward to eliminate this inefficiency, but I've not done it yet because the speed has been adequate for our needs and I value debugging ease more at this stage. Messaging currently costs 100-150 microseconds on our VAX/780, includ- ing several overhead functions (assertion tests, debugging options, etc), compared to VAX sub- routine times of 30-50 microseconds (The vari- ance in these numbers arises in the UNIX profiling tools).
It is possible to guarantee that all objects will recognize a basic repertoire of commands, often by simply inheriting this repertoire from primitive classes like Object. For example, every OOPC object knows h o w to respond to a write: method, since a default write: method is defined in the Object class. This method uses information accumulated by the precompiler to determine the type of each instance variable, and issues instruc- tions to an instance of Filer (supplied as an argu- ment) which prints each data type in a standard format. Variables containing object ID's are han- dled by building a table of objects referenced but not written, and writing the objects referenced in this table when the filer is closed. From the point of view of the user, the whole process looks like this:
8 Bjarne Stroustrup (Classes: An Abstract Data Type Facility for the C Language, 8IGPLAN Notices, January 1982) built a simi- lar precompiler implementing Simula objects which incur no run time costs. I've not used this approach because it requires that the class of each target object be known at compile time.
T h e Objects-Oriented P r e e o m p i l e r
#include "o bj.h" static STR. file--"Soda,c"; #line 1 "Soda.c" /* Soda machine simulation: an exarnpte in pope programming / #define TRUE 1 #define FALSE 0 typedef struct [ / Object / WRD NSTsize; CLS NSTelass; / SodaMachine / WRD locked; WRD combination; OBJ coinslot; OBJ coinbox; I _NST; ~ypedef struct { / Object "/ WRD OBJsizc; CLS OBJcla~s; WRD OBJfields; CLS OBJsuper; STR OBJname; OBJ OBJmethoddict; STR OBJtypes: /* SodaMacbine */ } _CLS; extern M T A S o d a M a c b i n e ; extera C L S _SodaMachine; extern MTA *SodaMachine, Object, Object,_Object; #line 10 "Soda.c" static OBJ comb m0(self, a ) register OBJ self;/ comb:model: */ REG struct I WRD aCombination; WRD anlnt; } *_a; { #line 12 "Soda.d' extern OBJ Stack; WRD il self = msg( file,0,self, "new:", _a->anlnt ); self->combination = a->aCombination; self->locked = FALSE; self-> coinbox = msg(_file,0,Stack, "new"-): for (i = 0; i < a - > a n l n t ; i + + ) msg(file,0,self, "At:Pul:", i, msg(_file,0,Stack, "new" ) ); return (self);
Deletions, to save space static OBJ button0(seff, _a) register OBJ self;/* button: ~/ REG struct { WRD anlnt; } * a; [ #line 48 "Soda.c" if (self-> coinslot) msg( file,0,self->coinbox, "push:", self->coinslot ); sellL>coinslot = 0; returnt msg(_file,0, msgffile.O.self. "At:". _a->anlnt ), "pop" ;J: } else return(FALSEl: static SLT clsDictl]= "comb:model?'. comb m0, 0}; static SLT nstDict[]= "combination:" combin0. "unlock:". unlock0. "lock". lock0. "load:this:'. load t0, "pay:", pay_0. "button:". button0. 0}: _MTA SodaMachine = sizeoft_MTA)/sizeof(WRD). & Object, sizeof(_CLS)/sizeof(WRD) , & O b j e c t , _SodaMachine". &_clsDict. "" }; CLS SodaMachine = sizeofl_CLS)/sizeof(WRD). & SodaMachine sizeof(NST)/sizeoflWRD) , &Object, "SodaMacbine". & nstDict. "wwoo" }; MTA *SodaMachine = & SodaMacbine:
Figure 3: .Soda.c: OOPC-generated C code
One goal of the ITT Programming Environ- ment is to encourage extensive reuse of code, and OOPC is likely to find a place in production shops because of this ability alone. Inheritance, and the ability to reuse sub-objects, increase one's ability to deal with implementation details such that new projects can build upon earlier work for significant time/cost savings. For example, in building a toy application (a simulation of a bucket brigade as an example of a simple con- current organization), I produced 104 lines of throw-away code and 117 lines of code that could be saved and reused in future concurrent simula- tions, a reusability ratio of 52%! I seldom manage above 10% with C code.
Objects also affect the way we do software design, by offering a new alternative to the tradi- tional fimctional vs. data design decomposition choices: decomposition into virtual machines which describe both data and its behavior in a single integrated package. Programs designed in this way are often more understandable, because they closely parallel the real-world entities being modeled. However the technique is currently limited by the lack of graphical design notations comparable to those for conventional decomposi- tion (Nassi-Schneiderman Diagrams, Dataflow Diagrams, Jackson Diagrams, etc) I I For example, compilers are traditionally designed around an expression tree of typed nodes, whose structure is known throughout the compiler. An object-oriented design would instead have each grammar production build computational objects (instances of classes writ- ten to handle each computational entity: vari- ables, operators, statements, etc), so that every level of the tree would become an object. Compi- lation would then involve requesting the top level object to do things: "generate code" (in a com- piler), "evaluate" (in an interpreter), "print" (compiler debugging aid), "save" (in a multi- phase compiler), etc. The expressmn tree still exists inside these objects, but the computations on it are arranged in a more manageable fashion, as methods attached to every node object. But its greatest value was to demonstrate the possibility of operating systems where objects live primarily on a shared disk and are moved automatically into memory when needed there. OOPC is capable of this only in the limited sense described earlier under "Object Save/Restore"; a laborious and costly operation because every address-dependent object ID must be located and translated to an address-independent form when- ever objects move across the memory/disk bar- rier. W e ' v e begun implementing a more ambi- tious version that uses address-independent ID's throughout. These ID's uniquely identify objects globally across a single UNIX system, and are similar in that respect to inode numbers which UNIX uses to identify files. Since objects are only accessed via messaging, the message routine is modified to dynamically load non-resident objects into m e m o r y as t h e y ' r e needed there. Object
11 See Structured Analysis and System Specification by T o m De Marco ( Y o u r d o n ). or Principles'of Program Design by M. A Jackson ( A c a d e m i c Press).
T h e Object-Oriented Precompiler
saving is much simpler, since the objects can now be moved with simple I/O operations, but imple- mentation is complicated by the fact that "the UNIX way of doing things" is infringed and time is lost "reinventing the wheel". For example, dynamic loading of a class implies that new methods be must be linked into the run-time image dynamically, and many UNIX tools weren't designed for this 12. A User class has recently been developed which provides a simple Smalltalk interpreter which will be used to replace the UNIX file- oriented command interpreter, manipulating objects instead of files. Other programs can be salvaged intact by encapsulating them as methods of classes whose indexed instance variables hold the file contents; for example, the C compiler could be encapsulated as an instance method in class CSourceFile. The Smatltalk 80 programming environ- ment has impressive advantages over OOPC because the object metaphor is adhered to at all levels of the system. However we now believe that our interest in coordination systems might be better served by foregoing the Smalltalk personal computer environment and its focus on program- ming as a solitary activity: (1) Operations on primitive data types are han- dled in C rather than messaging, so OOPC programs Can be almost as efficient as C programs, particularly since the programmer is free to recode costly operations directly in C. By foregoing some of the consistency Smalltalk offers, we gain machine power thal can be spent on new notions like tran- sparent motion of objects across computing barriers. (2) OOPC is quite close to C, so it is readily accepted by programmers used to conven- tional languages. It is portable to any operating system that has a C compiler, and allows the programmer to mix conventional programming techniques with object tech- niques. Much of our work involves a fluent mixture of C, OOPC, LEX, YACC and other UNIX components. (3) It may be possible to develop similar precompiters for languages other than C, although this could prove challenging in languages that don't provide free use of
12 Jim Kleckner al U.C. Berkeley has supplied code that s h o u l d m a k e this possible.
pointers (Fortran) and insist on strong type-checking (Pascal). The OOPC system is remarkably small and easy to modify, so it is convenient for experimenting with new concepts. The precompiler consists of 3608 source lines, counting commenIs, header files, and Makefile, and links to a size of 50232 bytes, much of which is reusable code from the runtime library (7590 source lines).
7. Acknowledgements I'd like to thank Adele Goldberg and Peter Deutch of Xerox PARC for stimulating discus- sions comparing this approach with that of Smalltalk, and Stoney Ballard for invaluable technical assistance. And a special word of thanks to all those whose support and encouragement made this work possible, particularly Tom Love, Jack Grimes, and Anatol Holt.
Factories Classes ObjectIDs
I:izure 4: OOPC Data Structures