



















Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Hints for Computer System Design. 1. Butler W. Lampson. Computer Science Laboratory. Xerox Palo Alto Research Center. Palo Alto, CA 94304. Abstract.
Typology: Study notes
1 / 27
This page cannot be seen from the preview
Don't miss anything!




















1
Butler W. Lampson
Computer Science Laboratory Xerox Palo Alto Research Center Palo Alto, CA 94304
Studying the design and implementation of a number of computer has led to some general hints for system design. They are described here and illustrated by many examples, ranging from hardware such as the Alto and the Dorado to application programs such as Bravo and Star.
Designing a computer system is very different from designing an algorithm:
The external interface (that is, the requirement) is less precisely defined, more complex, and more subject to change. The system has much more internal structure, and hence many internal interfaces. The measure of success is much less clear.
The designer usually finds himself floundering in a sea of possibilities, unclear about how one choice will limit his freedom to make other choices, or affect the size and performance of the entire system. There probably isn’t a ‘best’ way to build the system, or even any major part of it; much more important is to avoid choosing a terrible way, and to have clear division of responsibilities among the parts.
I have designed and built a number of computer systems, some that worked and some that didn’t. I have also used and studied many other systems, both successful and unsuccessful. From this experience come some general hints for designing successful systems. I claim no originality for them; most are part of the folk wisdom of experienced designers. Nonetheless, even the expert often forgets, and after the second system [6] comes the fourth one.
Disclaimer : These are not novel (with a few exceptions), foolproof recipes, laws of system design or operation, precisely formulated, consistent, always appropriate, approved by all the leading experts, or guaranteed to work.
(^1) This paper was originally presented at the. 9th ACM Symposium on Operating Systems Principles and appeared in
Operating Systems Review 15 , 5, Oct. 1983, p 33-48. The present version is slightly revised.
They are just hints. Some are quite general and vague; others are specific techniques which are more widely applicable than many people know. Both the hints and the illustrative examples are necessarily oversimplified. Many are controversial.
I have tried to avoid exhortations to modularity, methodologies for top-down, bottom-up, or iterative design, techniques for data abstraction, and other schemes that have already been widely disseminated. Sometimes I have pointed out pitfalls in the reckless application of popular methods for system design.
The hints are illustrated by a number of examples, mostly drawn from systems I have worked on. They range from hardware such as the Ethernet local area network and the Alto and Dorado personal computers, through operating systems such as the SDS 940 and the Alto operating system and programming systems such as Lisp and Mesa, to application programs such as the Bravo editor and the Star office system and network servers such as the Dover printer and the Grapevine mail system. I have tried to avoid the most obvious examples in favor of others which show unexpected uses for some well-known methods. There are references for nearly all the specific examples but for only a few of the ideas; many of these are part of the folklore, and it would take a lot of work to track down their multiple sources.
And these few precepts in thy memory Look thou character.
It seemed appropriate to decorate a guide to the doubtful process of system design with quotations from Hamlet. Unless otherwise indicated, they are taken from Polonius’ advice to Laertes (I iii 58-82). Some quotations are from other sources, as noted. Each one is intended to apply to the text which follows it.
Each hint is summarized by a slogan that when properly interpreted reveals the essence of the hint. Figure 1 organizes the slogans along two axes:
Why it helps in making a good system: with functionality (does it work?), speed (is it fast enough?), or fault-tolerance (does it keep working?). Where in the system design it helps: in ensuring completeness, in choosing interfaces, or in devising implementations.
Fat lines connect repetitions of the same slogan, and thin lines connect related slogans.
The body of the paper is in three sections, according to the why headings: functionality (section 2), speed (section 3), and fault-tolerance (section 4).
The most important hints, and the vaguest, have to do with obtaining the right functionality from a system, that is, with getting it to do the things you want it to do. Most of these hints depend on the notion of an interface that separates an implementation of some abstraction from the clients who use the abstraction. The interface between two programs consists of the set of assumptions that each programmer needs to make about the other program in order to demonstrate the correctness of his program (paraphrased from [5]). Defining interfaces is the most important part of system design. Usually it is also the most difficult, since the interface design must satisfy three conflicting requirements: an interface should be simple, it should be complete, and it should
We are faced with an insurmountable opportunity. (W. Kelley)
When an interface undertakes to do too much its implementation will probably be large, slow and complicated. An interface is a contract to deliver a certain amount of service; clients of the interface depend on the contract, which is usually documented in the interface specification. They also depend on incurring a reasonable cost (in time or other scarce resources) for using the interface; the definition of ‘reasonable’ is usually not documented anywhere. If there are six levels of abstraction, and each costs 50% more than is ‘reasonable’, the service delivered at the top will miss by more than a factor of 10.
KISS: Keep It Simple, Stupid. (Anonymous) If in doubt, leave if out. (Anonymous) Exterminate features. (C. Thacker)
On the other hand,
Everything should be made as simple as possible, but no simpler. (A. Einstein)
Thus, service must have a fairly predictable cost, and the interface must not promise more than the implementer knows how to deliver. Especially, it should not promise features needed by only a few clients, unless the implementer knows how to provide them without penalizing others. A better implementer, or one who comes along ten years later when the problem is better understood, might be able to deliver, but unless the one you have can do so, it is wise to reduce your aspirations.
For example, PL/1 got into serious trouble by attempting to provide consistent meanings for a large number of generic operations across a wide variety of data types. Early implementations tended to handle all the cases inefficiently, but even with the optimizing compilers of 15 years later, it is hard for the programmer to tell what will be fast and what will be slow [31]. A language like Pascal or C is much easier to use, because every construct has a roughly constant cost that is independent of context or arguments, and in fact most constructs have about the same cost.
Of course, these observations apply most strongly to interfaces that clients use heavily, such as virtual memory, files, display handling, or arithmetic. It is all right to sacrifice some performance for functionality in a seldom used interface such as password checking, interpreting user commands, or printing 72 point characters. (What this really means is that though the cost must still be predictable, it can be many times the minimum achievable cost.) And such cautious rules don’t apply to research whose object is learning how to make better implementations. But since research may well fail, others mustn’t depend on its success.
Algol 60 was not only an improvement on its predecessors, but also on nearly all its successors. (C. Hoare)
Examples of offering too much are legion. The Alto operating system [29] has an ordinary read/write- n -bytes interface to files, and was extended for Interlisp-D [7] with an ordinary paging system that stores each virtual page on a dedicated disk page. Both have small implementations (about 900 lines of code for files, 500 for paging) and are fast (a page fault takes one disk access and has a constant computing cost that is a small fraction of the disk access time, and the client
can fairly easily run the disk at full speed). The Pilot system [42] which succeeded the Alto OS follows Multics and several other systems in allowing virtual pages to be mapped to file pages, thus subsuming file input/output within the virtual memory system. The implementation is much larger (about 11,000 lines of code) and slower (it often incurs two disk accesses to handle a page fault and cannot run the disk at full speed). The extra functionality is bought at a high price.
This is not to say that a good implementation of this interface is impossible, merely that it is hard. This system was designed and coded by several highly competent and experienced people. Part of the problem is avoiding circularity: the file system would like to use the virtual memory, but virtual memory depends on files. Quite general ways are known to solve this problem [22], but they are tricky and easily lead to greater cost and complexity in the normal case.
And, in this upshot, purposes mistook Fall’n on th’ inventors’ heads. (V ii 387)
Another example illustrates how easily generality can lead to unexpected complexity. The Tenex system [2] has the following innocent-looking combination of features:
It reports a reference to an unassigned virtual page by a trap to the user program. A system call is viewed as a machine instruction for an extended machine, and any reference it makes to an unassigned virtual page is thus similarly reported to the user program. Large arguments to system calls, including strings, are passed by reference. There is a system call CONNECT to obtain access to another directory; one of its arguments is a string containing the password for the directory. If the password is wrong, the call fails after a three second delay, to prevent guessing passwords at high speed.
CONNECT is implemented by a loop of the form
for i := 0 to Length ( directoryPassword ) do if directoryPassword [ i ] ≠ passwordArgument [ i ] then Wait three seconds; return BadPassword end if end loop ; connect to directory; return Success
The following trick finds a password of length n in 64 n tries on the average, rather than 128 n / (Tenex uses 7 bit characters in strings). Arrange the passwordArgument so that its first character is the last character of a page and the next page is unassigned, and try each possible character as the first. If CONNECT reports BadPassword, the guess was wrong; if the system reports a reference to an unassigned page, it was correct. Now arrange the passwordArgument so that its second character is the last character of the page, and proceed in the obvious way.
This obscure and amusing bug went unnoticed by the designers because the interface provided by a Tenex system call is quite complex: it includes the possibility of a reported reference to an unassigned page. Or looked at another way, the interface provided by an ordinary memory reference instruction in system code is quite complex: it includes the possibility that an improper reference will be reported to the client without any chance for the system code to get control first.
2.2 Corollaries
The rule about simplicity and generalization has many interesting corollaries.
Costly thy habit as thy purse can buy, But not express’d in fancy; rich, not gaudy.
Had I but time (as this fell sergeant, death, Is strict in his arrest) O, I could tell you — But let it be. (V ii 339)
For example, many studies (such as [23, 51, 52]) have shown that programs spend most of their time doing very simple things: loads, stores, tests for equality, adding one. Machines like the 801 [41] or the RISC [39] with instructions that do these simple operations quickly can run programs faster (for the same amount of hardware) than machines like the VAX with more general and powerful instructions that take longer in the simple cases. It is easy to lose a factor of two in the running time of a program, with the same amount of hardware in the implementation. Machines with still more grandiose ideas about what the client needs do even worse [18].
To find the places where time is being spent in a large system, it is necessary to have measurement tools that will pinpoint the time-consuming code. Few systems are well enough understood to be properly tuned without such tools; it is normal for 80% of the time to be spent in 20% of the code, but a priori analysis or intuition usually can’t find the 20% with any certainty. The performance tuning of Interlisp-D sped it up by a factor of 10 using one set of effective tools [7].
For example, the Alto disk hardware [53] can transfer a full cylinder at disk speed. The basic file system [29] can transfer successive file pages to client memory at full disk speed, with time for the client to do some computing on each sector; thus with a few sectors of buffering the entire disk can be scanned at disk speed. This facility has been used to write a variety of applications, ranging from a scavenger that reconstructs a broken file system, to programs that search files for substrings that match a pattern. The stream level of the file system can read or write n bytes to or from client memory; any portions of the n bytes that occupy full disk sectors are transferred at full disk speed. Loaders, compilers, editors and many other programs depend for their
performance on this ability to read large files quickly. At this level the client gives up the facility to see the pages as they arrive; this is the only price paid for the higher level of abstraction.
But this theme has many variations. A more interesting example is the Spy system monitoring facility in the 940 system at Berkeley [10], which allows an untrusted user program to plant patches in the code of the supervisor. A patch is coded in machine language, but the operation that installs it checks that it does no wild branches, contains no loops, is not too long, and stores only into a designated region of memory dedicated to collecting statistics. Using the Spy, the student of the system can fine-tune his measurements without any fear of breaking the system, or even perturbing its operation much.
Another unusual example that illustrates the power of this method is the FRETURN mechanism in the Cal time-sharing system for the CDC 6400 [30]. From any supervisor call C it is possible to make another one CF that executes exactly like C in the normal case, but sends control to a designated failure handler if C gives an error return. The CF operation can do more (for example, it can extend files on a fast, limited-capacity storage device to larger files on a slower device), but it runs as fast as C in the (hopefully) normal case.
It may be better to have a specialized language, however, if it is more amenable to static analysis for optimization. This is a major criterion in the design of database query languages, for example.
The success of monitors [20, 25] as a synchronization device is partly due to the fact that the locking and signaling mechanisms do very little, leaving all the real work to the client programs in the monitor procedures. This simplifies the monitor implementation and keeps it fast; if the client needs buffer allocation, resource accounting or other frills, it provides these functions itself or calls other library facilities, and pays for what it needs. The fact that monitors give no control over the scheduling of processes waiting on monitor locks or condition variables, often cited as a drawback, is actually an advantage, since it leaves the client free to provide the scheduling it needs (using a separate condition variable for each class of process), without having to pay for or fight with some built-in mechanism that is unlikely to do the right thing.
The Unix system [44] encourages the building of small programs that take one or more character streams as input, produce one or more streams as output, and do one operation. When this style is imitated properly, each program has a simple interface and does one thing well, leaving the client to combine a set of such programs with its own code and achieve precisely the effect desired.
2.4 Making implementations work
Perfection must be reached by degrees; she requires the slow hand of time. (Voltaire)
Even when an implementation is successful, it pays to revisit old decisions as the system evolves; in particular, optimizations for particular properties of the load or the environment (memory size, for example) often come to be far from optimal.
Give thy thoughts no tongue, Nor any unproportion’d thought his act.
An efficient program is an exercise in logical brinkmanship. (E. Dijkstra)
There is another danger in keeping secrets. One way to improve performance is to increase the number of assumptions that one part of a system makes about another; the additional assumptions often allow less work to be done, sometimes a lot less. For instance, if a set of size n is known to be sorted, a membership test takes time log n rather than n. This technique is very important in the design of algorithms and the tuning of small modules. In a large system the ability to improve each part separately is usually more important. Striking the right balance remains an art.
O throw away the worser part of it, And live the purer with the other half. (III iv 157)
A good example is in the Alto’s Scavenger program, which scans the disk and rebuilds the index and directory structures of the file system from the file identifier and page number recorded on each disk sector [29]. A recent rewrite of this program has a phase in which it builds a data structure in main storage, with one entry for each contiguous run of disk pages that is also a
contiguous set of pages in a file. Normally files are allocated more or less contiguously and this structure is not too large. If the disk is badly fragmented, however, the structure will not fit in storage. When this happens, the Scavenger discards the information for half the files and continues with the other half. After the index for these files is rebuilt, the process is repeated for the other files. If necessary the work is further subdivided; the method fails only if a single file’s index won’t fit.
Another interesting example arises in the Dover raster printer [26, 53], which scan-converts lists of characters and rectangles into a large m × n array of bits, in which ones correspond to spots of ink on the paper and zeros to spots without ink. In this printer m =3300 and n =4200, so the array contains fourteen million bits and is too large to store in memory. The printer consumes bits faster than the available disks can deliver them, so the array cannot be stored on disk. Instead, the entire array is divided into 16 × 4200 bit slices called bands, and the printer electronics contains two one-band buffers. The characters and rectangles are sorted into buckets, one for each band; a bucket receives the objects that start in the corresponding band. Scan conversion proceeds by filling one band buffer from its bucket, and then playing it out to the printer and zeroing it while filling the other buffer from the next bucket. Objects that spill over the edge of one band are added to the next bucket; this is the trick that allows the problem to be subdivided.
Sometimes it is convenient to artificially limit the resource, by quantizing it in fixed-size units; this simplifies bookkeeping and prevents one kind of fragmentation. The classical example is the use of fixed-size pages for virtual memory, rather than variable-size segments. In spite of the apparent advantages of keeping logically related information together, and transferring it between main storage and backing storage as a unit, paging systems have worked out better. The reasons for this are complex and have not been systematically studied.
And makes us rather bear those ills we have Than fly to others that we know not of. (III i 81)
The user interface for the Star office system [47] has a small set of operations (type text, move, copy, delete, show properties) that apply to nearly all the objects in the system: text, graphics, file folders and file drawers, record files, printers, in and out baskets, etc. The exact meaning of an operation varies with the class of object, within the limits of what the user might find natural. For instance, copying a document to an out basket causes it to be sent as a message; moving the endpoint of a line causes the line to follow like a rubber band. Certainly the implementations are quite different in many cases. But the generic operations do not simply make the system easier to
unexpectedly become permanent. An attractive solution is to have an ‘overflow count’ table, which is a hash table keyed on the address of an object. When the count reaches its limit it is reduced by half, the overflow count is increased by one, and an overflow flag is set in the object. When the count reaches zero, the process is reversed if the overflow flag is set. Thus even with as few as four bits there is room to count up to seven, and the overflow table is touched only in the rare case that the count swings by more than four.
There are many cases when resources are dynamically allocated and freed (for example, real memory in a paging system), and sometimes additional resources are needed temporarily to free an item (some table might have to be swapped in to find out where to write out a page). Normally there is a cushion (clean pages that can be freed with no work), but in the worst case the cushion may disappear (all pages are dirty). The trick here is to keep a little something in reserve under a mattress, bringing it out only in a crisis. It is necessary to bound the resources needed to free one item; this determines the size of the reserve under the mattress, which must be regarded as a fixed cost of the resource multiplexing. When the crisis arrives, only one item should be freed at a time, so that the entire reserve is devoted to that job; this may slow things down a lot but it ensures that progress will be made.
Sometimes radically different strategies are appropriate in the normal and worst cases. The Bravo editor [24] uses a ‘piece table’ to represent the document being edited. This is an array of pieces, pointers to strings of characters stored in a file; each piece contains the file address of the first character in the string and its length. The strings are never modified during normal editing. Instead, when some characters are deleted, for example, the piece containing the deleted characters is split into two pieces, one pointing to the first undeleted string and the other to the second. Characters inserted from the keyboard are appended to the file, and the piece containing the insertion point is split into three pieces: one for the preceding characters, a second for the inserted characters, and a third for the following characters. After hours of editing there are hundreds of pieces and things start to bog down. It is then time for a cleanup, which writes a new file containing all the characters of the document in order. Now the piece table can be replaced by a single piece pointing to the new file, and editing can continue. Cleanup is a specialized kind of garbage collection. It can be done in background so that the user doesn’t have to stop editing (though Bravo doesn’t do this).
This section describes hints for making systems faster, forgoing any further discussion of why this is important. Bentley’s excellent book [55] says more about some of these ideas and gives many others.
Neither a borrower, nor a lender be; For loan oft loses both itself and friend, And borrowing dulleth edge of husbandry.
For example, it is always faster to access information in the registers of a processor than to get it from memory, even if the machine has a high-performance cache. Registers have gotten a bad name because it can be tricky to allocate them intelligently, and because saving and restoring them across procedure calls may negate their speed advantages. But when programs are written in the approved modern style with lots of small procedures, 16 registers are nearly always enough for all the local variables and temporaries, so that allocation is not a problem. With n sets of registers arranged in a stack, saving is needed only when there are n successive calls without a return [14, 39].
Input/output channels, floating-point coprocessors, and similar specialized computing devices are other applications of this principle. When extra hardware is expensive these services are provided by multiplexing a single processor, but when it is cheap, static allocation of computing power for various purposes is worthwhile.
The Interlisp virtual memory system mentioned earlier [7] needs to keep track of the disk address corresponding to each virtual address. This information could itself be held in the virtual memory (as it is in several systems, including Pilot [42]), but the need to avoid circularity makes this rather complicated. Instead, real memory is dedicated to this purpose. Unless the disk is ridiculously fragmented the space thus consumed is less than the space for the code to prevent circularity.
The remarks about registers above depend on the fact that the compiler can easily decide how to allocate them, simply by putting the local variables and temporaries there. Most machines lack multiple sets of registers or lack a way of stacking them efficiently. Good allocation is then much more difficult, requiring an elaborate inter-procedural analysis that may not succeed, and in any case must be redone each time the program changes. So a little bit of dynamic analysis (stacking the registers) goes a long way. Of course the static analysis can still pay off in a large procedure if the compiler is clever.
A program can read data much faster when it reads the data sequentially. This makes it easy to predict what data will be needed next and read it ahead into a buffer. Often the data can be allocated sequentially on a disk, which allows it to be transferred at least an order of magnitude faster. These performance gains depend on the fact that the programmer has arranged the data so that it is accessed according to some predictable pattern, that is, so that static analysis is possible. Many attempts have been made to analyze programs after the fact and optimize the disk transfers, but as far as I know this has never worked. The dynamic analysis done by demand paging is always at least as good.
Some kinds of static analysis exploit the fact that some invariant is maintained. A system that depends on such facts may be less robust in the face of hardware failures or bugs in software that falsify the invariant.
hoc way is extremely error-prone. The best organizing principle is to recompute the entire state after each change but cache all the expensive results of this computation. A change must invalidate at least the cache entries that it renders invalid; if these are too hard to identify precisely, it may invalidate more entries at the price of more computing to reestablish them. The secret of success is to organize the cache so that small changes invalidate only a few entries.
For example, the Bravo editor [24] has a function DisplayLine ( document, firstChar ) that returns the bitmap for the line of text in the displayed document that has document [ firstChar ] as its first character. It also returns lastChar and lastCharUsed, the numbers of the last character displayed on the line and the last character examined in computing the bitmap (these are usually not the same, since it is necessary to look past the end of the line in order to choose the line break). This function computes line breaks, does justification, uses font tables to map characters into their raster pictures, etc. There is a cache with an entry for each line currently displayed on the screen, and sometimes a few lines just above or below. An edit that changes characters i through j invalidates any cache entry for which [ firstChar .. lastCharUsed ] intersects [ i .. j ]. The display is recomputed by
loop ( bitMap, lastChar, ) := DisplayLine ( document, firstChar ); Paint ( bitMap ); firstChar : = lastChar + 1 end loop
The call of DisplayLine is short-circuited by using the cache entry for [ document, firstChar ] if it exists. At the end any cache entry that has not been used is discarded; these entries are not invalid, but they are no longer interesting because the line breaks have changed so that a line no longer begins at these points.
The same idea can be applied in a very different setting. Bravo allows a document to be structured into paragraphs, each with specified left and right margins, inter-line leading, etc. In ordinary page layout all the information about the paragraph that is needed to do the layout can be represented very compactly:
the number of lines; the height of each line (normally all lines are the same height); any keep properties; the pre and post leading.
In the usual case this can be encoded in three or four bytes. A 30 page chapter has perhaps 300 paragraphs, so about 1k bytes are required for all this data; this is less information than is required to specify the characters on a page. Since the layout computation is comparable to the line layout computation for a page, it should be possible to do the pagination for this chapter in less time than is required to render one page. Layout can be done independently for each chapter.
What makes this idea work is a cache of [ paragraph, ParagraphShape ( paragraph )] entries. If the paragraph is edited, the cache entry is invalid and must be recomputed. This can be done at the time of the edit (reasonable if the paragraph is on the screen, as is usually the case, but not so good for a global substitute), in background, or only when repagination is requested.
For the apparel oft proclaims the man.
For example, in the Alto [29] and Pilot [42] operating systems each file has a unique identifier, and each disk page has a ‘label’ field whose contents can be checked before reading or writing the data without slowing down the data transfer. The label contains the identifier of the file that contains the page and the number of that page in the file. Page zero of each file is called the ‘leader page’ and contains, among other things, the directory in which the file resides and its string name in that directory. This is the truth on which the file systems are based, and they take great pains to keep it correct.
With only this information, however, there is no way to find the identifier of a file from its name in a directory, or to find the disk address of page i , except to search the entire disk, a method that works but is unacceptably slow. Each system therefore maintains hints to speed up these operations. Both systems represent directory by a file that contains triples [string name, file identifier, address of first page]. Each file has a data structure that maps a page number into the disk address of the page. The Alto uses a link in each label that points to the next label; this makes it fast to get from page n to page n + 1. Pilot uses a B-tree that implements the map directly, taking advantage of the common case in which consecutive file pages occupy consecutive disk pages. Information obtained from any of these hints is checked when it is used, by checking the label or reading the file name from the leader page. If it proves to be wrong, all of it can be reconstructed by scanning the disk. Similarly, the bit table that keeps track of free disk pages is a hint; the truth is represented by a special value in the label of a free page, which is checked when the page is allocated and before the label is overwritten with a file identifier and page number.
Another example of hints is the store and forward routing first used in the Arpanet [32]. Each node in the network keeps a table that gives the best route to each other node. This table is updated by periodic broadcasts in which each node announces to all the other nodes its opinion about the quality of its links to its neighbors. Because these broadcast messages are not synchronized and are not guaranteed to be delivered, the nodes may not have a consistent view at any instant. The truth in this case is that each node knows its own identity and hence knows when it receives a packet destined for itself. For the rest, the routing does the best it can; when things aren’t changing too fast it is nearly optimal.
A more curious example is the Ethernet [33], in which lack of a carrier signal on the cable is used as a hint that a packet can be sent. If two senders take the hint simultaneously, there is a collision that both can detect; both stop, delay for a randomly chosen interval, and then try again. If n successive collisions occur, this is taken as a hint that the number of senders is 2 n , and each sender sets the mean of its random delay interval to 2 n^ times its initial value. This ‘exponential backoff’ ensures that the net does not become overloaded.
replacement in background. Electronic mail can be delivered and retrieved by background processes, since delivery within an hour or two is usually acceptable. Many banking systems consolidate the data on accounts at night and have it ready the next morning. These four ex- amples have successively less need for synchronization between foreground and background tasks. As the amount of synchronization increases more care is needed to avoid subtle errors; an extreme example is the on-the-fly garbage collection algorithm given in [13]. But in most cases a simple producer-consumer relationship between two otherwise independent processes is possible.
Be wary then; best safety lies in fear. (I iii 43)
The sad truth about optimization was brought home by the first paging systems. In those days memory was very expensive, and people had visions of squeezing the most out of every byte by clever optimization of the swapping: putting related procedures on the same page, predicting the next pages to be referenced from previous references, running jobs together that share data or code, etc. No one ever learned how to do this. Instead, memory got cheaper, and systems spent it to provide enough cushion for simple demand paging to work. We learned that the only important thing is to avoid thrashing, or too much demand for the available memory. A system that thrashes spends all its time waiting for the disk.
The only systems in which cleverness has worked are those with very well-known loads. For instance, the 360/50 APL system [4] has the same size workspace for each user and common system code for all of them. It makes all the system code resident, allocates a contiguous piece of disk for each user, and overlaps a swap-out and a swap-in with each unit of computation. This works fine.
The nicest thing about the Alto is that it doesn’t run faster at night. (J. Morris)
A similar lesson was learned about processor time. With interactive use the response time to a demand for computing is important, since a person is waiting for it. Many attempts were made to
tune the processor scheduling as a function of priority of the computation, working set size, memory loading, past history, likelihood of an I/O request, etc.. These efforts failed. Only the crudest parameters produce intelligible effects: interactive vs. non-interactive computation or high, foreground and background priority levels. The most successful schemes give a fixed share of the cycles to each job and don’t allocate more than 100%; unused cycles are wasted or, with luck, consumed by a background job. The natural extension of this strategy is the personal computer, in which each user has at least one processor to himself.
Give every man thy ear, but few thy voice; Take each man’s censure, but reserve thy judgment.
Bob Morris suggested that a shared interactive system should have a large red button on each terminal. The user pushes the button if he is dissatisfied with the service, and the system must either improve the service or throw the user off; it makes an equitable choice over a sufficiently long period. The idea is to keep people from wasting their time in front of terminals that are not delivering a useful amount of service.
The original specification for the Arpanet [32] was that a packet accepted by the net is guaranteed to be delivered unless the recipient machine is down or a network node fails while it is holding the packet. This turned out to be a bad idea. This rule makes it very hard to avoid deadlock in the worst case, and attempts to obey it lead to many complications and inefficiencies even in the normal case. Furthermore, the client does not benefit, since it still has to deal with packets lost by host or network failure (see section 4 on end-to-end). Eventually the rule was abandoned. The Pup internet [3], faced with a much more variable set of transport facilities, has always ruthlessly discarded packets at the first sign of congestion.
The unavoidable price of reliability is simplicity. (C. Hoare )
Making a system reliable is not really hard, if you know how to go about it. But retrofitting reliability to an existing design is very difficult.
This above all: to thine own self be true, And it must follow, as the night the day, Thou canst not then be false to any man.
For example, consider the operation of transferring a file from a file system on a disk attached to machine A, to another file system on another disk attached to machine B. To be confident that the right bits are really on B’s disk, you must read the file from B’s disk, compute a checksum of