Dynamic Storage Allocation, Lecture Slide - Computer Science, Slides of Software Engineering

Scope of this survey. In most of this survey,we will concentrate on issues of overall memory usage, rather than time costs. W e believe that detailed measures of time costs are usually a red herring, because they ob- scure issues of strategy and policy; we believe that most good strategies can yield good policies that are amenable to ecient implem entation. (W ebe- lieve that it's easier to makeavery fast allocator than a very memory-ecien t one, using fairly straight- forward techniques (Sect

Typology: Slides

2011/2012

Uploaded on 04/16/2012

shyrman
shyrman 🇺🇸

4.2

(6)

239 documents

1 / 78

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Dynamic Storage Allo cation:
A Survey and Critical Review
???
Paul R. Wilson, Mark S. Johnstone, Michael Neely, and David Boles
???
Department of Computer Sciences
UniversityofTexas at Austin
Austin, Texas, 78751, USA
(
wilson|markj|neely@cs.utexas.edu
)
Abstract.
Dynamic memory allocation
has been a fundamental part of most com-
puter systems since roughly 1960, and mem-
ory allocation is widely considered to be ei-
ther a solved problem or an insoluble one. In
this survey,we describe a variety of memory
allocator designs and point out issues rele-
vant to their design and evaluation. We then
chronologically survey most of the litera-
ture on allocators between 1961 and 1995.
(Scores of papers are discussed, in varying
detail, and over 150 references are given.)
We argue that allocator designs have been
unduly restricted by an emphasis on mech-
anism, rather than policy, while the latter is
more important; higher-level
strategic
issues
are still more important, but have not been
given much attention.
Most theoretical analyses and empirical al-
locator evaluations to date have relied on
very strong assumptions of randomness and
independence, but real program b ehavior
exhibits important regularities that must be
exploited if allocators are to perform well in
practice.
?
A slightly dierentversion of this paper appears
in
Proc. 1995Int'l. Workshop on Memory Management
,
Kinross, Scotland, UK, September 27{29, 1995,
Springer Verlag LNCS
. This version diers in several
very minor respects, mainly in formatting, correction of
several typographical and editing errors, clarication of
a few sentences, and addition of a few footnotes and
citations.
??
This work was supported by the National Science Foun-
dation under grant CCR-9410026, and by a gift from
Novell, Inc.
???
Convex Computer Corporation, Dallas, Texas, USA.
(dboles@zeppelin.convex.com)
1 Introduction
In this survey,we will discuss the design and evalua-
tion of conventional dynamic memory allocators. By
\conventional," we mean allocators used for general
purpose \heap" storage, where the a program can re-
quest a block of memory to store a program object,
and free that blockatany time. A heap, in this sense,
is a pool of memory available for the allocation and
deallocation of arbitrary-sized blocks of memory in ar-
bitrary order.
4
An allo cated blockistypically used to
store a program \object," which is some kind of struc-
tured data item suchasaPascal record, a C struct,
or a C++ object, but not necessarily an ob ject in the
sense of ob ject-oriented programming.
5
Throughout this paper, we will assume that while
a blockisinuseby a program, its contents (a data
object) cannot be relocated to compact memory (as
is done, for example, in copying garbage collectors
[Wil95]). This is the usual situation in most im-
plementations of conventional programming systems
(suchasC,Pascal, Ada, etc.), where the memory
manager cannot nd and update pointers to program
objects when they are moved.
6
The allocator does not
4
This sense of \heap" is not to be confused with a quite
dierent sense of \heap," meaning a partially ordered
tree structure.
5
While this is the
typical
situation, it is not the only
one. The \objects" stored by the allocator need not
correspond directly to language-level objects. An exam-
ple of this is a growable array, represented by a xed
size part that holds a pointer to a variable-sized part.
The routine that grows an object might allo cate a new,
larger variable-sized part, copy the contents of the old
variable-sized part into it, and deallocate the old part.
We assume that the allocator knows nothing of this, and
would view each of these parts as separate and indepen-
dent objects, even if normal programmers would see a
\single" object.
6
It is also true of many garbage-collected systems. In
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27
pf28
pf29
pf2a
pf2b
pf2c
pf2d
pf2e
pf2f
pf30
pf31
pf32
pf33
pf34
pf35
pf36
pf37
pf38
pf39
pf3a
pf3b
pf3c
pf3d
pf3e
pf3f
pf40
pf41
pf42
pf43
pf44
pf45
pf46
pf47
pf48
pf49
pf4a
pf4b
pf4c
pf4d
pf4e

Partial preview of the text

Download Dynamic Storage Allocation, Lecture Slide - Computer Science and more Slides Software Engineering in PDF only on Docsity!

Dynamic Storage Allo cation:

A Survey and Critical Review?^ ??

Paul R. Wilson, Mark S. Johnstone, Michael Neely, and David Boles???

Department of Computer Sciences University of Texas at Austin Austin, Texas, 78751, USA (wilson|markj|[email protected])

Abstract. Dynamic memory allo cation has b een a fundamental part of most com- puter systems since roughly 1960, and mem- ory allo cation is widely considered to b e ei- ther a solved problem or an insoluble one. In this survey, we describ e a variety of memory allo cator designs and p oint out issues rele- vant to their design and evaluation. We then chronological l y survey most of the litera- ture on allo cators b etween 1961 and 1995. (Scores of pap ers are discussed, in varying detail, and over 150 references are given.)

We argue that allo cator designs have b een unduly restricted by an emphasis on mech- anism, rather than p olicy, while the latter is more imp ortant; higher-level strategic issues are still more imp ortant, but have not b een given much attention.

Most theoretical analyses and empirical al- lo cator evaluations to date have relied on very strong assumptions of randomness and indep endence, but real program b ehavior exhibits imp ortant regularities that must b e exploited if allo cators are to p erform well in practice.

? A slightly di erent version of this pap er app ears in Proc. 1995 Int'l. Workshop on Memory Management, Kinross, Scotland, UK, Septemb er 27{29, 1995, Springer Verlag LNCS. This version di ers in several very minor resp ects, mainly in formatting, correction of several typ ographical and editing errors, clari cation of a few sentences, and addition of a few fo otnotes and citations. ?? This work was supp orted by the National Science Foun- dation under grant CCR-9410026, and by a gift from Novell, Inc. ??? Convex Computer Corp oration, Dallas, Texas, USA.

(db oles@zepp el in .convex.com)

1 Intro duction

In this survey, we will discuss the design and evalua- tion of conventional dynamic memory allo cators. By \conventional," we mean allo cators used for general purp ose \heap" storage, where the a program can re- quest a blo ck of memory to store a program ob ject, and free that blo ck at any time. A heap, in this sense, is a p o ol of memory available for the allo cation and deallo cation of arbitrary-sized blo cks of memory in ar- bitrary order.^4 An allo cated blo ck is typically used to store a program \ob ject," which is some kind of struc- tured data item such as a Pascal record, a C struct, or a C++ ob ject, but not necessarily an ob ject in the sense of ob ject-oriented programming.^5 Throughout this pap er, we will assume that while a blo ck is in use by a program, its contents (a data ob ject) cannot b e relo cated to compact memory (as is done, for example, in copying garbage collectors [Wil95]). This is the usual situation in most im- plementations of conventional programming systems (such as C, Pascal, Ada, etc.), where the memory manager cannot nd and up date p ointers to program ob jects when they are moved.^6 The allo cator do es not (^4) This sense of \heap" is not to b e confused with a quite di erent sense of \heap," meaning a partially ordered tree structure. (^5) While this is the typical situation, it is not the only one. The \ob jects" stored by the allo cator need not corresp ond directly to language-level ob jects. An exam- ple of this is a growable array, represented by a xed size part that holds a p ointer to a variable-sized part. The routine that grows an ob ject might allo cate a new, larger variable-sized part, copy the contents of the old variable-sized part into it, and deallo cate the old part. We assume that the allo cator knows nothing of this, and would view each of these parts as separate and indep en- dent ob jects, even if normal programmers would see a \single" ob ject. (^6) It is also true of many garbage-collected systems. In

examine the data stored in a blo ck, or mo dify or act on it in any way. The data areas within blo cks that are used to hold ob jects are contiguous and nonoverlap- ping ranges of (real or virtual) memory. We generally assume that only entire blo cks are allo cated or freed, and that the allo cator is entirely unaware of the typ e of or values of data stored in a blo ck|it only knows the size requested.

Scope of this survey. In most of this survey, we will concentrate on issues of overall memory usage, rather than time costs. We b elieve that detailed measures of time costs are usually a red herring, b ecause they ob- scure issues of strategy and p olicy; we b elieve that most go o d strategies can yield go o d p olicies that are amenable to ecient implementation. (We b e- lieve that it's easier to make a very fast allo cator than a very memory-ecient one, using fairly straight- forward techniques (Section 3.12). Beyond a certain p oint, however, the e ectiveness of sp eed optimiza- tions will dep end on many of the same subtle issues that determine memory usage.) We will also discuss lo cality of reference only brie y. Lo cality of reference is increasingly imp ortant, as the di erence b etween CPU sp eed and main memory (or disk) sp eeds has grown dramatically, with no sign of stopping. Lo cality is very p o orly understo o d, however; aside from making a few imp ortant general comments, we leave most issues of lo cality to future research. Except where lo cality issues are explicitly noted, we assume that the cost of a unit of memory is xed and uniform. We do not address p ossible interactions with unusual memory hierarchy schemes such as com- pressed caching, which may complicate lo cality issues and interact in other imp ortant ways with allo cator design [WLM91, Wil91, Dou93]. We will not discuss sp ecialized allo cators for partic- ular applications where the data representations and allo cator designs are intertwined.^7

some, insucient information is available from the com- piler and/or programmer to allow safe relo cation; this is esp ecially likely in systems where co de written in di er- ent languages is combined in an application [BW88 ]. In others, real-time and/or concurrent systems, it is dif- cult for the garbage collector to relo cate data with- out incurring undue overhead and/or disruptiveness [Wil95 ]. (^7) Examples inlude sp ecialized allo cators for chained- blo ck message-bu ers (e.g., [Wol65 ]), \cdr-co ded" list- pro cessing systems [BC79], sp ecialized storage for over- lapping strings with shared structure, and allo cators

Allo cators for these kinds of systems share many prop erties with the \conventional" allo cators we dis- cuss, but intro duce many complicating design choices. In particular, they often allow logically contiguous items to b e stored non-contiguously, e.g., in pieces of one or a few xed sizes, and may allow sharing of parts or (other) forms of data compression. We assume that if any fragmenting or compression of higher-level \ob- jects" happ ens, it is done ab ove the level of abstrac- tion of the allo cator interface, and the allo cator is en- tirely unaware of the relationships b etween the \ob- jects" (e.g., fragments of higher-level ob jects) that it manages. Similarly, parallel allo cators are not discussed, due to the complexity of the sub ject.

Structure of the paper. This survey is intended to serve two purp oses: as a general reference for tech- niques in memory allo cators, and as a review of the literature in the eld, including metho dological con- siderations. Much of the literature review has b een separated into a chronological review, in Section 4. This section may b e skipp ed or skimmed if metho d- ology and history are not of interest to the reader, esp ecially on a rst reading. However, some p oten- tially signi cant p oints are covered only there, or only made suciently clear and concrete there, so the seri- ous student of dynamic storage allo cation should nd it worthwhile. (It may even b e of interest to those interested in the history and philosophy of computer science, as do cumentation of the development of a sci- enti c paradigm.^8 ) The remainder of the current section gives our mo- tivations and goals for the pap er, and then frames the central problem of memory allocation|fragmen- tation|and the general techniques for dealing with it. Section 2 discusses deep er issues in fragmentation, and metho dological issues (some of which may b e skipp ed) in studying it. Section 3 presents a fairly traditional taxonomy of

used to manage disk storage in le systems. (^8) We use \paradigm" in roughly the sense of Kuhn [Kuh70 ], as a \pattern or mo del" for research. The paradigms we discuss are not as broad in scop e as the ones usually discussed by Kuhn, but on our reading, his ideas are intended to apply at a variety of scales. We are not necessarily in agreement with all of Kuhn's ideas, or with some of the extreme and anti-scienti c purp oses they have b een put to by some others.

5 Summary and Conclusions : : : : : : : 69 5.1 Mo dels and Theories : : : : : : : : : : 69 5.2 Strategies and Policies : : : : : : : : : 70 5.3 Mechanisms : : : : : : : : : : : : : : : 70 5.4 Exp eriments : : : : : : : : : : : : : : 71 5.5 Data : : : : : : : : : : : : : : : : : : : 71 5.6 Challenges and Opp ortunities : : : : : 71

1.1 Motivation

This pap er is motivated by our p erception that there is considerable confusion ab out the nature of memory allo cators, and ab out the problem of memory allo ca- tion in general. Worse, this confusion is often unrec- ognized, and allo cators are widely thought to b e fairly well understo o d. In fact, we know little more ab out allo cators than was known twenty years ago, which is not as much as might b e exp ected. The literature on the sub ject is rather inconsistent and scattered, and considerable work app ears to b e done using ap- proaches that are quite limited. We will try to sketch a unifying conceptual framework for understanding what is and is not known, and suggest promising ap- proaches for new research. This problem with the allo cator literature has con- siderable practical imp ortance. Aside from the human e ort involved in allo cator studies per se, there are ef- fects in the real world, b oth on computer system costs, and on the e ort required to create real software. We think it is likely that the widespread use of p o or allo cators incurs a loss of main and cache memory (and CPU cycles) upwards of of a billion U.S. dollars worldwide|a signi cant fraction of the world's mem- ory and pro cessor output may b e squandered, at huge cost.^9 Perhaps even worse is the e ect on programming style due to the widespread use of allo cators that are simply bad ones|either b ecause b etter allo cators are known but not widely known or understo o d, or b ecause allo cation research has failed to address the (^9) This is an unreliable estimate based on admittedly ca- sual last-minute computations, approximately as fol- lows: there are on the order of 100 million PC's in the world. If we assume that they have an average of 10 megabytes of memory at $30 p er megabyte, there is 30 billio n dollars worth of RAM at stake. (With the ex- p ected p opularity of Windows 95, this seems like it will so on b ecome a fairly conservative estimate, if it isn't al- ready.) If just one fth (6 billion dollars worth) is used for heap-allocated data, and one fth of that is unnec- essarily wasted, the cost is over a billio n dollars.

prop er issues. Many programmers avoid heap allo ca- tion in many situations, b ecause of p erceived space or time costs.^10 It seems signi cant to us that many articles in non- refereed publications|and a numb er in refereed pub- lications outside the ma jor journals of op erating sys- tems and programming languages|are motivated by extreme concerns ab out the sp eed or memory costs of general heap allo cation. (One such pap er [GM85] is discussed in Section 4.1.) Often, ad hoc solutions are used for applications that should not b e problem- atic at all, b ecause at least some well-designed gen- eral allo cators should do quite well for the workload in question. We susp ect that in some cases, the p erceptions are wrong, and that the costs of mo dern heap allo cation are simply overestimated. In many cases, however, it app ears that p o orly-designed or p o orly-implemented allo cators have lead to a widespread and quite under- standable b elief that general heap allo cation is neces- sarily exp ensive. To o many p o or allo cators have b een supplied with widely-distributed op erating systems and compilers, and to o few practitioners are aware of the alternatives. This app ears to b e changing, to some degree. Many op erating systems now supply fairly go o d allo cators, and there is an increasing trend toward marketing li- braries that include general allo cators which are at least claimed to b e go o d, as a replacement for de- fault allo cators. It seems likely that there is simply a lag b etween the improvement in allo cator technology and its widespread adoption, and another lag b efore programming style adapts. The combined lag is quite long, however, and we have seen several magazine ar- ticles in the last year on how to avoid using a general allo cator. Postings praising ad hoc allo cation schemes are very common in the Usenet newsgroups oriented toward real-world programming. The slow adoption of b etter technology and the lag in changes in p erceptions may not b e the only prob- lems, however. We have our doubts ab out how well allo cators are really known to work, based on a fairly thorough review of the literature. We wonder whether some part of the p erception is due to o ccasional pro-

(^10) It is our impression that UNIX programmers' usage of heap allo cation went up signi cantl y when Chris Kings- ley's allo cator was distributed with BSD 4.2 UNIX| simply b ecause it was much faster than the allo cators they'd b een accustomed to. Unfortunately, that allo ca- tor is somewhat wasteful of space.

grams that interact pathologically with common allo- cator designs, in ways that have never b een observed by researchers.

This do es not seem unlikely, b ecause most exp eri- ments have used non-representative workloads, which are extremely unlikely to generate the same problem- atic request patterns as real programs. Sound studies using realistic workloads are to o rare. The total num- b er of real, nontrivial programs that have b een used for go o d exp eriments is very small, apparently less than 20. A signi cant numb er of real programs could exhibit problematic b ehavior patterns that are simply not represented in studies to date.

Long-running pro cesses such as op erating sys- tems, interactive programming environments, and networked servers may p ose sp ecial problems that have not b een addressed. Most exp eriments to date have studied programs that execute for a few minutes (at most) on common workstations. Little is known ab out what happ ens when programs run for hours, days, weeks or months. It may well b e that some seemingly go o d allo cators do not work well in the long run, with their memory eciency slowly degrad- ing until they p erform quite badly. We don't know| and we're fairly sure that nob o dy knows. Given that long-running pro cesses are often the most imp ortant ones, and are increasingly imp ortant with the spread of client/server computing, this is a p otentially large problem.

The worst case p erformance of any general allo ca- tor amounts to complete failure due to memory ex- haustion or virtual memory thrashing (Section 1.2). This means that any real allo cator may have lurking \bugs" and fail unexp ectedly for seemingly reasonable inputs.

Such problems may b e hidden, b ecause most pro- grammers who encounter severe problems may simply co de around them using ad hoc storage management techniques|or, as is still painfully common, by stat- ically allo cating \enough" memory for variable-sized structures. These ad-ho c approaches to storage man- agement lead to \brittle" software with hidden limi- tations (e.g., due to the use of xed-size arrays). The impact on software clarity, exibility, maintainability, and reliability is quite imp ortant, but dicult to esti- mate. It should not b e underestimated, however, b e- cause these hidden costs can incur ma jor p enalties in pro ductivity and, to put it plainly, human costs in sheer frustration, anxiety, and general su ering.

A much larger and broader set of test applications

and exp eriments is needed b efore we have any assur- ance that any allo cator works reliably, in a crucial p erformance sense|much less works well. Given this caveat, however, it app ears that some allo cators are clearly b etter than others in most cases, and this pa- p er will attempt to explain the di erences.

1.2 What an Allo cator Must Do

An allo cator must keep track of which parts of mem- ory are in use, and which parts are free. The goal of allo cator design is usually to minimize wasted space without undue time cost, or vice versa. The ideal allo- cator would sp end negligible time managing memory, and waste negligible space. A conventional allo cator cannot control the num- b er or size of live blo cks|these are entirely up to the program requesting and releasing the space managed by the allo cator. A conventional allo cator also can- not compact memory, moving blo cks around to make them contiguous and free contiguous memory. It must resp ond immediately to a request for space, and once it has decided which blo ck of memory to allo cate, it cannot change that decision|that blo ck of memory must b e regarded as inviolable until the application^11 program cho oses to free it. It can only deal with mem- ory that is free, and only cho ose where in free mem- ory to allo cate the next requested blo ck. (Allo cators record the lo cations and sizes of free blo cks of mem- ory in some kind of hidden data structure, which may b e a linear list, a totally or partially ordered tree, a bitmap, or some hybrid data structure.) An allo cator is therefore an online algorithm, which must resp ond to requests in strict sequence, immedi- ately, and its decisions are irrevo cable. The problem the allo cator must address is that the application program may free blo cks in any or- der, creating \holes" amid live ob jects. If these holes are to o numerous and small, they cannot b e used to satisfy future requests for larger blo cks. This prob- lem is known as fragmentation, and it is a p oten- tially disastrous one. For the general case that we have outlined|where the application program may allo cate arbitrary-sized ob jects at arbitrary times and free them at any later time|there is no reliable algo- rithm for ensuring ecient memory usage, and none (^11) We use the term \applicati on" rather generally; the \ap- plication" for which an allo cator manages storage may b e a system program such as a le server, or even an op erating system kernel.

{ a policy is an implementable decision pro cedure for placing blo cks in memory, and { a mechanism is a set of algorithms and data struc- tures that implement the p olicy, often over-sim- ply called \an algorithm."^13

An ideal strategy is \put blo cks where they won't cause fragmentation later"; unfortunately that's im- p ossible to guarantee, so real strategies attempt to heuristically approximate that ideal, based on as- sumed regularities of application programs' b ehavior. For example, one strategy is \avoid letting small long- lived ob jects prevent you from reclaiming a larger con- tiguous free area." This is part of the strategy underly- ing the common \b est t" family of p olicies. Another part of the strategy is \if you have to split a blo ck and p otentially waste what's left over, minimize the size of the wasted part." The corresp onding (b est t) p olicy is more concrete|it says \always use the smallest blo ck that is at least large enough to satisfy the request." The placement p olicy determines exactly where in memory requested blo cks will b e allo cated. For the b est t p olicies, the general rule is \allo cate ob jects in the smallest free blo ck that's at least big enough to

(^13) This set of distinction s is doubtless indirectly in uenced

by work in very di erent areas, notably Marr's work in natural and arti cial visual systems [Mar82 ] and Mc- Clamro ck's work in the philosophy of science and cog- nition [McC91, McC95]. The distinctions are imp or- tant for understanding a wide variety of complex sys- tems, however. Similar distinctions are made in many elds, includin g empirical computer science, though of- ten without making them quite clear. In \systems" work, mechanism and p olicy are often distingui shed , but strategy and p olicy are usually not distingui shed explicitly. This makes sense in some con- texts, where the p olicy can safely b e assumed to im- plement a well-understo o d strategy, or where the choice of strategy is left up to someone else (e.g., designers of higher-level co de not under discussion). In empirical evaluations of very p o orly understo o d strategies, however, the distinction b etween strategy and p olicy is often crucial. (For example, errors in the implementation of a strategy are often misinterpreted as evidence that the exp ected regularities don't actu- ally exist, when in fact they do, and a slightly di erent strategy would work much b etter.) Mistakes are p ossible at each level; equally imp ortant, mistakes are p ossible between levels, in the attempt to \cash out" (implement) the higher-level strategy as a p olicy, or a p olicy as a mechanism.

hold them." That's not a complete p olicy, however, b ecause there may b e several equally go o d ts; the complete p olicy must sp ecify which of those should b e chosen, for example, the one whose address is lowest. The chosen p olicy is implemented by a sp eci c mechanism, chosen to implement that p olicy e- ciently in terms of time and space overheads. For b est t, a linear list or ordered tree structure might b e used to record the addresses and sizes of free blo cks, and a tree search or list search would b e used to nd the one dictated by the p olicy. These levels of the allo cator design pro cess inter- act. A strategy may not yield an obvious complete p olicy, and the seemingly slight di erences b etween similar p olicies may actually implement interestingly di erent strategies. (This results from our p o or un- derstanding of the interactions b etween application b ehavior and allo cator strategies.) The chosen p olicy may not b e obviously implementable at reasonable cost in space, time, or programmer e ort; in that case some approximation may b e used instead. The strategy and p olicy are often very p o orly- de ned, as well, and the p olicy and mechanism are arrived at by a combination of educated guessing, trial and error, and (often dubious) exp erimental validation.^14 (^14) In case the imp ortant distinctions b etween strategy, p ol- icy, and mechanism are not clear, a metaphorical exam- ple may help. Consider a software company that has a strategy for improving pro ductivity: reward the most pro ductive programmers. It may institute a policy of rewarding programmers who pro duce the largest num- b ers of lines of program co de. To implement this p olicy, it may use the mechanisms of instructing the managers to count lines of co de, and providing scripts that count lines of co de according to some particular algorithm. This example illustrates the p ossible failures at each level, and in the mapping from one level to another. The strategy may simply b e wrong, if programmers aren't particularly motivated by money. The p olicy may not implement the intended strategy, if lines of co de are an inappropriate metric of pro ductivity, or if the p olicy has unintended \strategic" e ects, e.g., due to programmer resentment. The mechanism may also fail to implement the sp ec- i ed p olicy, if the rules for line-counting aren't enforced by managers, or if the supplied scripts don't correctly implement the intended counting function. This distinction b etween strategy and p olicy is over- simpli ed, b ecause there may b e multiple levels of strat- egy that shade o into increasingly concrete p olicies. At di erent levels of abstraction, something might b e

Splitting and coalescing. Two general techniques for supp orting a range of (implementations of ) place- ment p olicies are splitting and coalescing of free blo cks. (These mechanisms are imp ortant subsidiary parts of the larger mechanism that is the allo cator implementation.) The allo cator may split large blo cks into smaller blo cks arbitrarily, and use any suciently-large sub- blo ck to satisfy the request. The remainders from this splitting can b e recorded as smaller free blo cks in their own right and used to satisfy future requests. The allo cator may also coalesce (merge) adjacent free blo cks to yield larger free blo cks. After a blo ck is freed, the allo cator may check to see whether the neighb oring blo cks are free as well, and merge them into a single, larger blo ck. This is often desirable, b e- cause one large blo ck is more likely to b e useful than two small ones|large or small requests can b e satis- ed from large blo cks. Completely general splitting and coalescing can b e supp orted at fairly mo dest cost in space and/or time, using simple mechanisms that we'll describ e later. This allows the allo cator designer the maximum free- dom in cho osing a strategy, p olicy, and mechanism for the allo cator, b ecause the allo cator can have a com- plete and accurate record of which ranges of memory are available at all times. The cost may not b e negligible, however, esp e- cially if splitting and coalescing work too wel l|in

viewed as a strategy or p olicy. The key p oint is that there are at least three quali- tatively di erent kinds of levels of abstraction involved [McC91]; at the upp er levels, there are is the general de- sign goal of exploiting exp ected regularities, and a set of strategies for doing so; there may b e subsidiary strate- gies, for example to resolve con icts b etween strategies in the b est p ossible way. At at a somewhat lower level there is a general p olicy of where to place ob jects, and b elow that is a more detailed p olicy that exactly determines placement. Below that there is an actual mechanism that is in- tended to implement the p olicy (and presumably ef- fect the strategy), using whatever algorithms and data structures are deemed appropriate. Mechanisms are of- ten layered, as well, in the usual manner of structured programming [Dij69 ]. Problems at (and b etween) these levels are the b est understo o d|a computation may b e improp erly sp eci ed, or may not meet its sp eci cation. (Analogous problems o ccur at the upp er levels o ccur as well|if exp ected regularities don't actually o ccur, or if they do o ccur but the strategy do es't actually exploit them, and so on.)

that case, freed blo cks will usually b e coalesced with neighb ors to form large blo cks of free memory, and later allo cations will have to split smaller chunks o of those blo cks to obtained the desired sizes. It of- ten turns out that most of this e ort is wasted, b e- cause the sizes requested later are largely the same as the sizes freed earlier, and the old small blo cks could have b een reused without coalescing and splitting. Be- cause of this, many mo dern allo cators use deferred coalescing|they avoid coalescing and splitting most of the time, but use it intermittently, to combat frag- mentation.

2 A Closer Lo ok at Fragmentation,

and How to Study It

In this section, we will discuss the traditional concep- tion of fragmentation, and the usual techniques used for studying it. We will then explain why the usual un- derstanding is not strong enough to supp ort scienti c design and evaluation of allo cators. We then prop ose a new (though nearly obvious) conception of fragmen- tation and its causes, and describ e more suitable tech- niques used to study it. (Most of the exp eriments us- ing sound techniques have b een p erformed in the last few years, but a few notable exceptions were done much earlier, e.g., [MPS71] and [LH82], discussed in Section 4.)

2.1 Internal and External Fragmentation

Traditionally, fragmentation is classed as external or internal [Ran69], and is combatted by splitting and coalescing free blo cks. External fragmentation arises when free blo cks of memory are available for allo cation, but can't b e used to hold ob jects of the sizes actually requested by a pro- gram. In sophisticated allo cators, that's usually b e- cause the free blo cks are to o small, and the program requests larger ob jects. In some simple allo cators, ex- ternal fragmentation can o ccur b ecause the allo cator is unwilling or unable to split large blo cks into smaller ones. Internal fragmentation arises when a large-enough free blo ck is allo cated to hold an ob ject, but there is a p o or t b ecause the blo ck is larger than needed. In some allo cators, the remainder is simply wasted, caus- ing internal fragmentation. (It's called internal b e- cause the wasted memory is inside an allo cated blo ck,

evolutionary theory is extremely dicult|and some would say imp ossible|b ecause to o many low-level (or higher-level) details matter,^17 and there may intrinsic unpredictabilities in the systems describ ed [Den95].^18 We are not saying that the development of a go o d theory of memory allo cation is as hard as develop- ing a predictive evolutionary theory|far from it. The problem of memory allo cation seems far simpler, and we are optimistic that a useful predictive theory can b e develop ed.^19 Our p oint is simply that the paradigm of simple statistical mechanics must b e evaluated relative to other alternatives, which we nd more plausible in this domain. There are ma jor interactions b etween work- loads and allo cator p olicies, which are usually ignored. No matter how large the system, and no matter how asymptotic the analyses, ignoring these e ects seems likely to yield ma jor errors|e.g., analyses will simply yield the wrong asymptotes. A useful probabilistic theory of memory allo cation may b e p ossible, but if so, it will b e based on a quite di erent set of statistics from those used so far|statistics which capture e ects of systematicities, rather than assuming such systematicities can b e ig- nored. As in biology, the theory must b e tested against reality, and re ned to capture systematicities that had previously gone unnoticed.

Random simulations.The traditional technique for evaluating allo cators is to construct several traces (recorded sequences of allo cation and deallo cation re- quests) thought to resemble \typical" workloads, and use those traces to drive a variety of actual allo cators.

ciently understo o d. (^17) For example, the di erent evolutionary strategies im-

plied by the varying replication techniques and muta- tion rates of RNA-based vs. DNA-based viruses, or the impact of environmental change on host/parasite inter- actions [Gar94]. (^18) For example, a single chance mutation that results in

an adaptive characteristic in one individual may have a ma jor impact on the subsequent evolution of a sp ecies and its entire ecosystem [Dar59 ]. (^19) We are also not suggesting that evolutionary theory pro-

vides a go o d paradigm for allo cator research; it is just an example of a go o d scienti c paradigm that is very di erent from the ones typically seen in memory allo ca- tion research. It demonstrates the imp ortant and neces- sary interplay b etween high-level theories and detailed empirical work.

Since an allo cator normally resp onds only to the re- quest sequence, this can pro duce very accurate simu- lations of what the allo cator would do if the workload were real|that is, if a real program generated that request sequence. Typically, however, the request sequences are not real traces of the b ehavior of actual programs. They are \synthetic" traces that are generated automati- cally by a small subprogram; the subprogram is de- signed to resemble real programs in certain statisti- cal ways. In particular, ob ject size distributions are thought to b e imp ortant, b ecause they a ect the frag- mentation of memory into blo cks of varying sizes. Ob- ject lifetime distributions are also often thought to b e imp ortant (but not always), b ecause they a ect whether blo cks of memory are o ccupied or free. Given a set of ob ject size and lifetime distributions, the small \driver" subprogram generates a sequence of requests that ob eys those distributions. This driver is simply a lo op that rep eatedly generates requests, us- ing a pseudo-random numb er generator; at any p oint in the simulation, the next data ob ject is chosen by \randomly" picking a size and lifetime, with a bias that (probabilistically) preserves the desired distribu- tions. The driver also maintains a table of ob jects that have b een allo cated but not yet freed, ordered by their scheduled death (deallo cation) time. (That is, the step at which they were allo cated, plus their randomly- chosen lifetime.) At each step of the simulation, the driver deallo cates any ob jects whose death times indi- cate that they have expired. One convenient measure of simulated \time" is the volume of ob jects allo cated so far|i.e., the sum of the sizes of ob jects that have b een allo cated up to that step of the simulation.^20 An imp ortant feature of these simulations is that they tend to reach a \steady state." After running for a certain amount of time, the volume of live (simu- (^20) In many early simulations, the simulator mo deled real time, rather than just discrete steps of allo cation and deallo cation. Allo cation times were chosen based on ran- domly chosen \arrival" times, generated using an \inter- arrival distributio n" and their deaths scheduled in con- tinuous time|rather than discrete time based on the numb er and/or sizes of ob jects allo cated so far. We will generally ignore this distinction in this pap er, b ecause we think other issues are more imp ortant. As will b e- come clear, in the metho dology we favor, this distinction is not imp ortant b ecause the actual sequences of actions are sucient to guarantee exact simulation, and the ac- tual sequence of events is recorded rather than b eing (approximately) emulated.

lated) ob jects reaches a level that is determined by the size and lifetime distributions, and after that ob- jects are allo cated and deallo cated in approximately equal numb ers. The memory usage tends to vary very little, wandering probabilistically (in a random walk) around this \most likely" level. Measurements are typically made by sampling memory usage at p oints after the steady state has presumably b een reached, or by averaging over a p erio d of \steady-state" variation. These measurements \at equilibrium" are assumed to b e imp ortant. There are three common variations of this simu- lation technique. One is to use a simple mathemat- ical function to determine the size and lifetime dis- tributions, such as uniform or (negative) exp onential. Exp onential distributions are often used b ecause it has b een observed that programs are typically more likely to allo cate small ob jects than large ones,^21 and are more likely to allo cate short-lived ob jects than long-lived ones.^22 (The size distributions are gener- ally truncated at some plausible minimum and max- imum ob ject size, and discretized, rounding them to the nearest integer.) The second variation is to pick distributions intu- itively, i.e., out of a hat, but in ways thought to re- semble real program b ehavior. One motivation for this is to mo del the fact that many programs allo cate ob- jects of some sizes and others in small numb ers or not at all; we refer to these distributions as \spiky."^23 The third variation is to use statistics gathered from real programs, to make the distributions more realis- tic. In almost all cases, size and lifetime distributions

(^21) Historically, uniform size distributio ns were the most

common in early exp eriments; exp onential distributions then b ecame increasingl y common, as new data b e- came available showing that real systems generally used many more small ob jects than large ones. Other dis- tributions have also b een used, notably Poisson and hyp er-exp onentia l. Still, relatively recent pap ers have used uniform size distribution s, sometimes as the only distributio n. (^22) As with size distributi ons, there has b een a shift over

time toward non-uniform lifetime distribution s, often exp onential. This shift o ccurred later, probably b ecause real data on size information was easier to obtain, and lifetime data app eared later. (^23) In general, this mo deling has not b een very precise.

Sometimes the sizes chosen out of a hat are allo cated in uniform prop ortions, rather than in skewed prop ortions re ecting the fact that (on average) programs allo cate many more small ob jects than large ones.

are assumed to b e indep endent|the fact that di er- ent sizes of ob jects may have di erent lifetime distrib- utions is generally assumed to b e unimp ortant. In general, there has b een something of a trend toward the use of more realistic distributions,^24 but this trend is not dominant. Even now, researchers of- ten use simple and smo oth mathematical functions to generate traces for allo cator evaluation.^25 The use of smo oth distributions is questionable, b ecause it b ears directly on issues of fragmentation|if ob jects of only a few sizes are allo cated, the free (and uncoalesca- ble) blo cks are likely to b e of those sizes, making it p ossible to nd a p erfect t. If the ob ject sizes are smo othly distributed, the requested sizes will almost always b e slightly di erent, increasing the chances of fragmentation.

Probabilistic analyses.Since Knuth's derivation of the \ fty p ercent rule" [Knu73] (discussed later, in Section 4), there have b een many attempts to rea- son probabilistically ab out the interactions b etween program b ehavior and allo cator p olicy, and assess the overall cost in terms of fragmentation (usually) and/or CPU time. These analyses have generally made the same as- sumptions as random-trace simulation exp eriments| e.g., random ob ject allo cation order, indep endence of size and lifetimes, steady-state b ehavior|and often stronger assumptions as well. These simplifying assumptions have generally b een made in order to make the mathematics tractable. In particular, assumptions of randomness and indep en- dence make it p ossible to apply well-develop ed theory (^24) The trend toward more realistic distributions can b e ex- plained historicall y and pragmatically. In the early days of computing, the distributions of interest were usually the distribution of segment sizes in an op erating sys- tem's workload. Without access to the inside of an op- erating system, this data was dicult to obtain. (Most researchers would not have b een allowed to mo dify the implementation of the op erating system running on a very valuable and heavily-timeshared computer.) Later, the emphasis of study shifted away from segment sizes in segmented op erating systems, and toward data ob- ject sizes in the virtual memories of individu al pro cesses running in paged virtual memories. (^25) We are unclear on why this should b e, except that a par- ticular theoretical and exp erimental paradigm [Kuh70] had simply b ecome thoroughly entrenched in the early 1970's. (It's also somewhat easier than dealing with real data.)

Markov pro cesses to approximate program and allo- cator b ehavior, and have derived conclusions based on the well-understo o d prop erties of Markov mo dels.

In a rst-order Markov mo del, the probabilities of state transitions are known and xed. In the case of fragmentation studies, this corresp onds to assuming that a program allo cates ob jects at random, with xed probabilities of allo cating di erent sizes. The space of p ossible states of memory is viewed as a graph, with a no de for each con guration of allo- cated and free blo cks. There is a start state, represent- ing an empty memory, and a transition probability for each p ossible allo cation size. For a given place- ment p olicy, there will b e a known transition from a given state for any p ossible allo cation or deallo cation request. The state reached by each p ossible allo cation is another con guration of memory. For any given request distribution, there is a net- work of p ossible states reachable from the start state, via successions of more or less probable transitions. In general, for any memory ab ove a very, very small size, and for arbitrary distributions of sizes and lifetimes, this network is inconceivably large. As describ ed so far, it is therefore useless for any practical analyses. To make the problem more tractable, certain as- sumptions are often made. One of these is that life- times are exp onentially distributed as well as random, and have the convenient half-life prop erty describ ed ab ove, i.e., they die completely at random as well as b eing b orn at random.

This assumption can b e used to ensure that b oth the states and the transitions b etween states have def- inite probabilities in the long run. That is, if one were to run a random-trace simulation for a long enough p erio d of time, all reachable states would b e reached, and all of them would b e reached many times|and the numb er of times they were reached would re ect the probabilities of their b eing reached again in the future, if the simulation were continued inde nitely. If we put a counter on each of the states to keep track of the numb er of times each state was reached, the ratio b etween these counts would eventually stabilize, plus or minus small short-term variations. The rela- tive weights of the counters would \converge" to a stable solution. Such a network of states is called an ergodic Markov mo del, and it has very convenient mathematical prop- erties. In some cases, it's p ossible to avoid running a simulation at all, and analytically derive what the network's probabiblities would converge to.

Unfortunately, this is a very inappropriate mo del for real program and allo cator b ehavior. An ergo dic Markov mo del is a kind of (probabilistic) nite au- tomaton, and as such the patterns it generates are very, very simple, though randomized and hence un- predictable. They're almost unpatterned, in fact, and hence very predictable in a certain probabilistic sense. Such an automaton is extremely unlikely to gener- ate many patterns that seem likely to b e imp ortant in real programs, such as the creation of the ob jects in a linked list in one order, and their later destruction in exactly the same order, or exactly the reverse order.^28 There are much more p owerful kinds of machines| which have more complex state, like a real program| which are capable of generating more realistic pat- terns. Unfortunately, the only machines that we are sure generate the \right kinds" of patterns are actual real programs. We do not understand what regularities exist in real programs well enough to mo del them formally and p erform probabilistic analyses that are directly appli- cable to real program b ehavior. The mo dels we have are grossly inaccurate in resp ects that are quite rele- vant to problems of memory allo cation. There are problems for which Markov mo dels are useful, and a smaller numb er of problems where as- sumptions of ergo dicity are appropriate. These prob- lems involve pro cesses that are literally random, or can b e shown to b e e ectively random in the neces- sary ways. The general heap allo cation problem is not in either category. (If this is not clear, the next section should make it much clearer.) Ergo dic Markov mo dels are also sometimes used for problems where the basic assumptions are known to b e false in some cases|but they should only b e used in this way if they can b e validated, i.e., shown by ex- tensive testing to pro duce the right answers most of the time, despite the oversimpli cations they're based on. For some problems it \just turns out" that the di erences b etween real systems and the mathemati- cal mo dels are not usually signi cant. For the general problem of memory allo cation, this turns out to b e false as well|recent results clearly invalidate the use

(^28) Technically, a Markov mo del will eventually generate such patterns, but the probabili ty of generating a par- ticular pattern within a nite p erio d of time is vanish- ingly small if the pattern is large and not very strongly re ected in the arc weights. That is, many quite prob- able kinds of patterns are extremely improbable in a simple Markov mo del.

of simple Markov mo dels [ZG94, WJNB95].^29

2.3 What Fragmentation Really Is, and Why the Traditional Approach is Unsound

A single death is a tragedy. A million deaths is a statistic. |Joseph Stalin

We suggested ab ove that the shap e of a size dis- tribution (and its smo othness) might b e imp ortant in determining the fragmentation caused by a work- load. However, even if the distributions are completely realistic, there is reason to susp ect that randomized synthetic traces are likely to b e grossly unrealistic. As we said earlier, the allo cator should emb o dy a strategy designed to exploit regularities in program b ehavior|otherwise it cannot b e exp ected to do par- ticularly well. The use of randomized allo cation order eliminates some regularities in workloads, and intro- duces others, and there is every reason to think that the di erences in regularities will a ect the p erfor- mance of di erent strategies di erently. To make this concrete, we must understand fragmentation and its causes. The technical distinction b etween internal and ex- ternal fragmentation is useful, but in attempting to

(^29) It might seem that the problem here is the use of rst-

order Markov mo dels, whose states (no des in the reach- ability graph) corresp ond directly to states of memory. Perhaps \higher-order" Markov mo dels would work, where no des in the graph represent sequences of con- crete state transitions. We think this is false as well. The imp ortant kinds of patterns pro duced by real programs are generally not simple very-short-term se- quences of a few events, but large-scale patterns involv- ing many events. To capture these, a Markov mo del would have to b e of such high order that analyses would b e completely infeasible. It would essentially have to b e pre-programmed to generate sp eci c literal sequences of events. This not only b egs the essential question of what real programs do, but seems certain not to con- cisely capture the right regularities. Markov mo dels are simply not p owerful enough| i.e., not abstract enough in the right ways|to help with this problem. They should not b e used for this purp ose, or any similarly p o orly understo o d purp ose, where complex patterns may b e very imp ortant. (At least, not without extensive validation .) The fact that the regularities are complex and unknown is not a go o d reason to assume that they're e ectively random [ZG94 , WJNB95] (Section 4.2).

design exp eriments measuring fragmentation, it is worthwhile to stop for a moment and consider what fragmentation real ly is, and how it arises. Fragmentation is the inability to reuse memory that is free. This can b e due to p olicy choices by the allo- cator, which may cho ose not to reuse memory that in principle could b e reused. More imp ortantly for our purp oses, the allo cator may not have a choice at the moment an allo cation request must b e serviced: there may b e free areas that are to o small to service the request and whose neighbors are not free, making it imp ossible to coalesce adjacent free areas into a su- ciently large contiguous blo ck.^30 Note that for this latter (and more fundamental) kind of fragmentation, the problem is a function b oth of the program's request stream and the allo cator's choices of where to allo cate the requested ob jects. In satisfying a request, the allo cator usually has consid- erable leeway; it may place the requested ob ject in any suciently large free area. On the other hand, the allo cator has no control over the ordering of re- quests for di erent-sized pieces of memory, or when ob jects are freed. We have not made the notion of fragmentation par- ticularly clear or quanti able here, and this is no ac- cident. An allo cator's inability to reuse memory de- p ends not only on the numb er and sizes of holes, but on the future b ehavior of the program, and the fu- ture resp onses of the allo cator itself. (That is, it is a complex matter of interactions b etween patterned workloads and strategies.) For example, supp ose there are 100 free blo cks of size 10, and 200 free blo cks of size 20. Is memory highly fragmented? It dep ends. If future requests are all for size 10, most allo cators will do just ne, using the size 10 blo cks, and splitting the size 20 blo cks as necessary. But if the future requests are for blo cks of size 30, that's a problem. Also, if the future requests are for 100 blo cks of size 10 and 200 blo cks of size 20, whether it's a problem may dep end on the order in which the requests arrive and the allo cator's moment-

(^30) Beck [Bec82 ] makes the only clear statement of this prin- ciple which we have found in our exhausting review of the literature. As we will explain later (in our chronolog- ical review, Section 4.1), Beck also made some imp or- tant inferences from this principle, but his theoretical mo del and his empirical metho dology were weakened by working within the dominant paradigm. His pap er is seldom cited, and its imp ortant ideas have generally gone unnoticed.

tend to be relatively stable. This has ma jor implications for external fragmen- tation. External fragmentation means that there are free blo cks of memory of some sizes, but those are the wrong sizes to satisfy current needs. This happ ens when ob jects of one size are freed, and then ob jects of another size are allocated|that is, when there is an unfortunate change in the relative prop ortions of ob jects of one size and ob jects of a larger size. (For al- lo cators that never split blo cks, this can happ en with requests for smaller sizes as well.) For synthetic ran- dom traces, this is less likely to o ccur|they don't systematically free ob jects of one size and then allo- cate ob jects of another. Instead, they tend to allo cate and free ob jects of di erent sizes in relatively stable prop ortions. This minimizes the need to coalesce ad- jacent free areas to avoid fragmentation; on average, a free memory blo ck of a given size will b e reused rel- atively so on. This may bias exp erimental results by hiding an allo cator's inability to deal well with ex- ternal fragmentation, and favor allo cators that deal well with internal fragmentation at a cost in external fragmentation. Notice that while random deaths cause fragmen- tation, the aggregate b ehavior of random walks may reduce the extent of the problem. For some allo ca- tors, this balance of unrealistically bad and unrealis- tically go o d prop erties may average out to something like realism, but for others it may not. Even if|by sheer luck|random traces turn out to yield realis- tic fragmentation \on average," over many allo cators, they are inadequate for comparing di erent allo cators, which is usually the primary goal of such studies.

2.4 Some Real Program Behaviors

...and suddenly the memory returns. |Marcel Proust, Swann's Way

Real programs do not generally b ehave randomly| they are designed to solve actual problems, and the metho ds chosen to solve those problems have a strong e ect on their patterns of memory usage. To b egin to understand the allo cator's task, it is necessary to have a general understanding of program b ehavior. This understanding is almost absent in the literature on memory allo cators, apparently b ecause many re- searchers consider the in nite variation of p ossible program b ehaviors to b e to o daunting. There are strong regularities in many real pro- grams, however, b ecause similar techniques are ap-

plied (in di erent combinations) to solve many prob- lems. Several common patterns have b een observed.

Ramps, p eaks, and plateaus. In terms of overall memory usage over time, three patterns have b een observed in a variety of programs in a variety of con- texts. Not all programs exhibit all of these patterns, but most seem to exhibit one or two of them, or all three, to some degree. Any generalizations based on these patterns must therefore b e qualitative and quali- ed. (This implies that to understand the quantitative imp ortance of these patterns, a small set of programs is not sucient.)

{ Ramps. Many programs accumulate certain data structures monotonically over time. This may b e b ecause they keep a log of events, or b ecause the problem-solving strategy requires building a large representation, after which a solution can b e found quickly. { Peaks. Many programs use memory in bursty pat- terns, building up relatively large data structures which are used for the duration of a particular phase, and then discarding most or all of those data structures. Note that the \surviving" data structures are likely to b e of di erent typ es, b e- cause they represent the results of a phase, as op- p osed to intermediate values which may b e rep- resented di erently. (A p eak is like a ramp, but of shorter duration.) { Plateaus. Many programs build up data struc- tures quickly, and then use those data structures for long p erio ds (often nearly the whole running time of the program).

These patterns are well-known, from anecdotal ex- p erience by many p eople (e.g., [Ros67, Han90]), from research on garbage collection (e.g., [Whi80, WM89, UJ88, Hay91, Hay93, BZ95, Wil95]),^32 and from a re- cent study of C and C++ programs [WJNB95]. (^32) It may b e thought that garbage collected systems are suciently di erent from those using conventional stor- age management that these results are not relevant. It app ears, however, that these patterns are common in b oth kinds of systems, b ecause similar problem-solvin g strategies are used by programmers in b oth kinds of systems. (For any particular problem, di erent qualita- tive program b ehaviors may result, but the general cat- egories seem to b e common in conventional programs as well. See [WJNB95].)

(Other patterns of overall memory usage also o ccur, but app ear less common. As we describ e in Section 4, backward ramp functions have b een observed [GM85]. Combined forward and backward ramp b ehavior has also b een observed, with one data structure shrinking as another grows [Abr67].) Notice that in the case of ramps and ramp-shap ed p eaks, lo oking at the statistical distributions of ob ject lifetimes may b e very misleading. A statistical distri- bution suggests a random decay pro cess of some sort, but it may actually re ect sudden deaths of groups of ob jects that are born at di erent times. In terms of fragmentation, the di erence b etween these two mo d- els is ma jor. For a statistical decay pro cess, the allo- cator is faced with isolated deaths, which are likely to cause fragmentation. For a phased pro cess where many ob jects often die at the same time, the allo ca- tor is presented with an opp ortunity to get back a signi cant amount of memory all at once. In real programs, these patterns may b e comp osed in di erent ways at di erent scales of space and time. A ramp may b e viewed as a kind of p eak that grows over the entire duration of program execution. (The distinction b etween a ramp and a p eak is not pre- cise, but we tend to use \ramp" to refer to something that grows slowly over the whole execution of a pro- gram, and drops o suddenly at the end, and \p eak" to refer to faster-growing volumes of ob jects that are discarded b efore the end of execution. A p eak may also b e at on top, making it a kind of tall, skinny plateau.) While the overall long-term pattern is often a ramp or plateau, it often has smaller features (p eaks or pla- teus) added to it. This crude mo del of program b e- havior is thus recursive. (We note that it is not gen- erally fractal^33 |features at one scale may b ear no resemblance to features at another scale. Attempting to characterize the b ehavior of a program by a simple numb er such as fractal dimension is not appropriate, b ecause program b ehavior is not that simple.^34 )

(^33) We are using the term \fractal" rather lo osely, as is com-

mon in this area. Typically, \fractal" mo dels of program b ehavior are not in nitely recursive, and are actually graftals or other nite fractal-like recursive entities. (^34) We b elieve that this applies to studies of lo cality of ref-

erence as well. Attempts to characterize memory refer- encing b ehavior as fractal-like (e.g., [VMH+^ 83, Thi89 ]) are ill-concei ved or severely limited|if only b ecause memory allo cation b ehavior is not generally fractal, and memory-referencing b ehavior dep ends on memory al-

Ramps, p eaks, and plateus have very di erent im- plications for fragmentation. An overall ramp or plateau pro le has a very conve- nient prop erty, in that if short-term fragmentation can b e avoided, long term fragmentation is not a problem either. Since the data making up a plateau are stable, and those making up a ramp accumulate monotonic- ally, inability to reuse freed memory is not an issue| nothing is freed until the end of program execution. Short-term fragmentation can b e a cumulative prob- lem, however, leaving many small holes in the mass of long lived-ob jects. Peaks and tall, skinny plateaus can p ose a challenge in terms of fragmentation, since many ob jects are allo- cated and freed, and many other ob jects are likely to b e allo cated and freed later. If an earlier phase leaves scattered survivors, it may cause problems for later phases that must use the spaces in b etween. More generally, phase b ehavior is the ma jor cause of fragmentation|if a program's needs for blo cks of particular sizes change over time in an awkward way. If many small ob jects are freed at the end of a phase| but scattered ob jects survive|a later phase may run into trouble. On the other hand, if the survivors hap- p en to have b een placed together, large contiguous areas will come free.

Fragmentation at p eaks is imp ortant. Not all p erio ds of program execution are equal. The most im- p ortant p erio ds are usually those when the most mem- ory is used. Fragmentation is less imp ortant at times of lower overall memory usage than it is when mem- ory usage is \at its p eak," either during a short-lived p eak or near the end of a ramp of gradually increas-

lo cation p olicy. (We susp ect that it's ill-conceived for understanding program b ehavior at the level of refer- ences to ob jects, as well as at the level of references to memory.) If the fractal concept is used in a strong sense, we b elieve it is simply wrong. If it is taken in a weak sense, we b elieve it conveys little useful informa- tion that couldn't b e b etter summarized by simple sta- tistical curve- tting; using a fractal conceptual frame- work tends to obscure more issues than it clari es. Av- erage program b ehavior may resemble a fractal, b ecause similar features can o ccur at di erent scales in di erent programs; however, an individual program's b ehavior is not fractal-like in general, any more than it is a simple Markov pro cess. Both kinds of mo dels fail to capture the \irregularly regular" and scale-dep endent kinds of patterns that are most imp ortant.

appropriate, and which go o d strategies can b e com- bined successfully. This is not to say that exp eriments with many variations on many designs aren't useful| we're in the midst of such exp eriments ourselves|but that the goal should b e to identify fundamental inter- actions rather than just \hacking" on things until they work well for a few test applications.

Pro les of some real programs. To make our dis- cussion of memory usage patterns more concrete, we will present pro les of memory use for some real pro- grams. Each gure plots the overall amount of live data for a run of the program, and also the amounts of data allo cated to ob jects of the ve most p opu- lar sizes. (\Popularity" here means most volume al- lo cated, i.e., sum of sizes, rather than ob ject counts.) These are pro les of program b ehavior, indep endent of any particular allo cator.

GCC. Figure 1 shows memory usage for GCC, the GNU C compiler, compiling the largest le of its own source co de (combine.c). (A high optimization switch was used, encouraging the compiler to p erform exten- sive inlining, analyses, and optimization.) We used a trace pro cessor to remove \obstack" allo cation from the trace, creating a trace with the equivalent allo- cations and frees of individual ob jects; obstacks are heavily used in this program.^37 The use of obstacks may a ect programming style and memory usage pat- terns; however, we susp ect that the memory usage patterns would b e similar without obstacks, and that obstacks are simply used to exploit them.^38 This is a heavily phased program, with several strong and similar p eaks. These are two-horned p eaks, where one (large) size is allo cated and deallo cated, and much smaller size is allo cated and deallo cated, out of phase.^39 (This is an unusual feature, in our

(^37) See the discussion of [Han90 ] (Section 4.1) for a descrip-

tion of obstacks. (^38) We've seen similarly strong p eaks in a pro le of a com-

piler of our own, which relies on garbage collection rather than obstacks. (^39) Interestingly, the rst of the horns usually consists of

a size that is sp eci c to that p eak|di erent p eaks use di erent-sized large ob jects, but the out-of-phase part- ner horn consists of the same small size each time. The di erences in sizes used by the rst horn explains why only three of these horns show up in the plot, and they show up for the largest p eaks|for the other p eaks' large sizes, the total memory used do es not make it into the top ve.

limited exp erience.) Notice that this program exhibits very di erent usage pro les for di erent sized ob jects. The use of one size is nearly steady, another is strongly p eaked, and others are p eaked, but di erent.

Grobner. Figure 2 shows memory usage for the Grob- ner program^40 which decomp oses complex expressions into linear combinations of p olynomials (Grobner bases).^41 As we understand it, this is done by a pro- cess of expression rewriting, rather like term rewriting or rewrite-based theorem proving techniques. Overall memory usage tends upward in a general ramp shap e, but with minor short-term variations, es- p ecially small plateaus, while the pro les for usage of different-sized ob jects are roughly similar, their ramps start at di erent p oints during execution and have di erent slop es and irregularities|the prop ortions of di erent-sized ob jects vary somewhat.^42

Hypercube. Figure 3 shows memory usage for a hy- p ercub e message-passing simulator, written by Don Lindsay while at CMU. It exhibits a large and simple plateau. This program allo cates a single very large ob ject near the b eginning of execution, which lives for al- most the entire run; it represents the no des in a hy- p ercub e and their interconnections.^43 A very large numb er of other ob jects are created, but they are small and very short-lived; they represent messages

(^40) This program (and the hyp ercub e simulator describ ed b elow) were also used by Detlefs in [Det92] for evalu- ation of a garbage collector. Based on several kinds of pro les, we now think that Detlefs' choice of test pro- grams may have led to an overestimation of the costs of his garbage collector for C++. Neither of these pro- grams is very friendly to a simple GC, esp ecially one without compiler or OS supp ort. (^41) The function of this program is rather analogous to that of a Fourier transform, but the basis functions are p oly- nomials rather than sines and cosines, and the mecha- nism used is quite di erent. (^42) Many of the small irregulariti es in overall usage come from sizes that don't make it into the top ve|small but highly variable numb ers of these ob jects are used. (^43) In these plots, \time" advances at the end of each allo- cation. This accounts for the horizontal segments visible after the allo catons of large ob jects|no other ob jects are allo cated or deallo cated b etween the b eginning and end of the allo cation of an individual ob ject, and allo- cation time advances by the size of the ob ject.

0

500

1000

1500

2000

2500

0 2 4 6 8 10 12 14 16 18

KBytes in Use

Allocation Time in Megabytes

cc1 -O2 -pipe -c combine.c, memory in use by object sizes (Top 5)

all objects 178600 byte objects 16 byte objects 132184 byte objects 20 byte objects 69720 byte objects

Fig. 1. Pro le of memory usage in the GNU C compiler.

sent b etween no des randomly.^44 This program quickly reaches a steady state, but the steady state is quite di erent from the one reached by most randomized al- lo cator simulations|a very few sizes are represented, and lifetimes are b oth extremely skewed and strongly correlated with sizes.

Perl. Figure 4 shows memory usage for a script (pro- gram) written in the Perl scripting language. This pro- gram pro cesses a le of string data. (We're not sure exactly what it is doing with the strings, to b e hon- est; we do not really understand this program.) This program reaches a steady state, with heavily skewed usage of di erent sizes in relatively xed prop ortions.

(^44) These ob jects account for the slight increase and irregu-

laritiy in the overall lifetime curve at around 2MB, after the large, long-lived ob jects have b een allo cated.

(Since Perl is a fairly general and featureful program- ming language, its memory usage may vary tremen- dously dep ending on the program b eing executed.)

LRUsim. Figure 5 shows memory usage for a lo cality pro ler written by Doug van Wieren. This program pro cesses a memory reference trace, keeping track of how recently each blo ck of memory has b een touched and a accumulating a histogram of hits to blo cks at di erent recencies (LRU queue p ositions). At the end of a run, a PostScript grayscale plot of the time-vary- ing lo cality characteristics is generated. The recency queue is represented as a large mo di ed AVL tree, which dominates memory usage|only a single ob- ject size really matters much. At the parameter set- ting used for this run, no blo cks are ever discarded, and the tree grows monotonically; essentially no heap- allo cated ob jects are ever freed, so memory usage is a