Generational Garbage Collection and Augmenting Data Structures in CS61B, Study notes of Data Structures and Algorithms

Two topics from the cs61b course at uc berkeley: generational garbage collection and augmenting data structures. Generational garbage collection separates objects into young and old generations, with the young generation being divided into eden and survivor spaces. References from old objects to young objects are kept in a special table to enable the minor collection to find them. Augmenting data structures involves adding 'extra' abilities to existing data structures, such as combining a hash table with a 2-3-4 tree to determine next smaller and larger keys in o(1) time, or recording size and height information in splay trees to maintain this information during rotations.

Typology: Study notes

2010/2011

Uploaded on 10/29/2011

jokerxxx
jokerxxx 🇺🇸

4.3

(36)

330 documents

1 / 2

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
12/03/10
19:16:57 1
41
CS61B: Lecture 41
Friday, December 3, 2010
Generational Garbage Collection
-------------------------------
Studies of memory allocation have shown that most objects allocated by most
programs have short lifetimes, while a few go on to survive through many
garbage collections. This observation has inspired generational garbage
collectors, which separate old from new objects.
A generational collector has two or more generations, which are like the
separate spaces used by copying collectors, except that the generations can be
of different sizes, and can change size during a program’s lifetime.
Sun’s 1.3 JVM divides objects into an old generation and a young generation.
Because old objects tend to last longer, the old generation doesn’t need to be
garbage collected nearly as often. Hence, the old generation uses a compacting
mark-and-sweep collector, because speed is not critical, but memory efficiency
might be. Because old objects are long-lived, and because mark and sweep only
uses one memory space, the old generation tends to remain compact.
The young generation is itself divided into three areas. The largest area is
called "Eden", and it is the space where all objects are born, and most die.
Eden is large enough that most objects in it will become garbage long before it
gets full. When Eden fills up, it is garbage collected and the surviving
objects are copied into one of two _survivor_spaces_. The survivor spaces are
just the two spaces of a copying garbage collector.
If an unexpectedly large number of objects survive Eden, the survivor spaces
can expand if necessary to make room for additional objects.
Objects move back and forth between the two survivor spaces until they age
enough to be _tenured_ - moved to the old generation. Young objects benefit
from the speed of the copying collector while they’re still wild and prone to
die young.
Thus, the Sun JVM takes advantage of the best features of both the
mark-and-sweep and copying garbage collection methods.
There are two types of garbage collection: minor collections, which happen
frequently but only affect the young generation - thereby saving lots of time -
and major collections, which happen much less often but cover all the objects
in memory.
This introduces a problem. Suppose a young object is live only because an old
object references it. How does the minor collection find this out, if it
doesn’t search the old generation?
References from old objects to young objects tend to be rare, because old
objects are set in their ways and don’t change much. Since references from old
objects to young are so rare, the JVM keeps a special table of them, which it
updates whenever such a reference is created. The table of references is added
to the roots of the young generation’s copying collector.
-------------------------------------------------------------------------
| |
| old generation |
| |
| |
-------------------------------------------------------------------------
| |
| young generation |
| |
| ------------------------------------------------------------------- |
| | survivor space | |
| | | |
| ------------------------------------------------------------------- |
| | survivor space | |
| | | |
| ------------------------------------------------------------------- |
| _____ ____ |
| /----\ /----/ \/\/\ /---/ \____ |
| _/ \ -\ / \___--__/ \ |
| / \___/ \__--/ | |
| | / |
| | Eden \ |
| \ | |
| \ _ ^ / |
| -\ /\_ _/--\ /\ /\ / \--\ /--\ / \__ / |
| --/ \__/ \___/ \/\_/ \__/ \/\/ --/ \_/ |
-------------------------------------------------------------------------
pf2

Partial preview of the text

Download Generational Garbage Collection and Augmenting Data Structures in CS61B and more Study notes Data Structures and Algorithms in PDF only on Docsity!

12/03/1019:16:^

CS61B:^ Lecture 41Friday, December 3, 2010Generational Garbage Collection-------------------------------Studies of memory allocation have shown that most objects allocated by mostprograms have short lifetimes, while a few go on to survive through manygarbage collections. This observation has inspired generational garbagecollectors, which separate old from new objects.A generational collector has two or more generations, which are like theseparate spaces used by copying collectors, except that the generations can beof different sizes, and can change size during a program’s lifetime.Sun’s 1.3 JVM divides objects into an old generation and a young generation.Because old objects tend to last longer, the old generation doesn’t need to begarbage collected nearly as often.^ Hence, the old generation uses a compactingmark-and-sweep collector, because speed is not critical, but memory efficiencymight be. Because old objects are long-lived, and because mark and sweep onlyuses one memory space, the old generation tends to remain compact.The young generation is itself divided into three areas.^ The largest area iscalled "Eden", and it is the space where all objects are born, and most die.Eden is large enough that most objects in it will become garbage long before itgets full. When Eden fills up, it is garbage collected and the survivingobjects are copied into one of two survivor_spaces.^ The survivor spaces arejust the two spaces of a copying garbage collector.If an unexpectedly large number of objects survive Eden, the survivor spacescan expand if necessary to make room for additional objects.Objects move back and forth between the two survivor spaces until they ageenough to be tenured - moved to the old generation.^ Young objects benefitfrom the speed of the copying collector while they’re still wild and prone todie young.Thus, the Sun JVM takes advantage of the best features of both themark-and-sweep and copying garbage collection methods.There are two types of garbage collection:^ minor collections, which happenfrequently but only affect the young generation - thereby saving lots of time -and major collections, which happen much less often but cover all the objectsin memory.This introduces a problem. Suppose a young object is live only because an oldobject references it. How does the minor collection find this out, if itdoesn’t search the old generation?References from old objects to young objects tend to be rare, because oldobjects are set in their ways and don’t change much.^ Since references from oldobjects to young are so rare, the JVM keeps a special table of them, which itupdates whenever such a reference is created.^ The table of references is addedto the roots of the young generation’s copying collector.

-------------------------------------------------------------------------|^

| old generation^

|^

|^

-------------------------------------------------------------------------|^

| young generation^

|^

|^ -------------------------------------------------------------------

|^ | survivor space^

|^ |

|^ |^

|^ |

|^ -------------------------------------------------------------------

|^ | survivor space^

|^ |

|^ |^

|^ |

|^ -------------------------------------------------------------------

|^ _____

____^ |

|^ /----^ /----/^

//^ /---/^ ____^ |

|^ _/^ ^ -^ /^

_--/^ ^ |

|^ /^ __/^ _--/^

|^ |

|^ |^

/^ |

|^ |^ Eden

^ |

|^ ^

|^ |

|^ ^

_^ ^^ /^ |

|^ -^ /_^ _/--^ /^ /\

/^ --^ /--^ /^ __^ /^ |

|^ --/^ _/^ __/^ /_/^

__/^ //^ --/^ _/^ |

12/03/1019:16:^

AUGMENTING^ DATA^ STRUCTURES==========================Once^ you^ know^ how^ to^ design one of the data structures taught in this class,it’s^ sometimes^ easy^ to^ augment it to have "extra" abilities.You’ve^ already^ augmented data structures in Project 3.

For example, the set E of^ edges^ is^ stored^ as^ both a hash table and an adjacency list.

The hash table allows^ you^ to^ test^ set^ membership in O(1) time, unlike the adjacency list.

The adjacency^ list^ tells^ you the edges adjoining a vertex in O(degree) time, unlikethe^ hash^ table.2-3-4^ Trees^ with^ Fast^ Neighbors-------------------------------Suppose^ you^ have^ a^ 2-3-4 tree with no duplicate keys.

Given a key k, you want to^ be^ able^ to^ determine^ whether k is in the tree, and what the next smaller andlarger^ keys^ are,^ in^ O(1) time.^ The insert() and delete() operations must stilltake^ O(log^ n)^ time.^ Can you do it?It’s^ easy^ if^ you^ combine the 2-3-4 tree with a hash table.

The hash table maps each^ key^ to^ an^ record^ that stores the next smaller and next larger keys in thetree.---------------- ---------------| | | ----- ----- || Hash table | | | 4 | | 9 | | 5 ----+///////+----->| ----- ----- |---------------- | prev^ next |--------------- The^ trick^ is^ that^ when^ you insert a key into the tree, you can determine bytree^ search^ in^ O(log^ n)^ time what the next smaller and larger keys are.

Then, you^ update^ all^ three^ keys’ records in the hash table in O(1) time.Similarly,^ when^ you^ delete an key from the tree, you must delete it from thehash^ table^ too,^ and^ update the records for the two neighboring keys.

This too takes^ O(1)^ time.

Splay Trees with Node Information---------------------------------Sometimes it’s useful for a^ binary^ search

tree^ to^ record^ extra^ information^ in each node, like the size and^ height^ of^ each^ subtree^ at^ each^ node. In splay trees, this is easy^ to^ maintain.

Splaying^ is^ just^ a^ sequence^ of^ tree rotations.^ Each rotation changes^ the

sizes^ of^ only^ two^ subtrees,^ and^ we^

can easily compute their new sizes^ after the^ rotation.^ Let^ size(Y)^ be^ the^ number of nodes in the subtree rooted^ at^ node

Y.^ After^ a^ right^ rotation^ (for instance) you can recompute^ the^ information

as^ follows: size(Y) = 1 + size(B) + size(C)^

Y^ X

size(X) = 1 + size(A) + size(Y)^

/^ ^ /^ \X ^^ ^^ Y

height(Y) = 1 + max{height(B),^ height(C)}

/^ ^ /C^ /A^ /^ \

height(X) = 1 + max{height(A),^ height(Y)}

^^ ^^ ------------>^ ^^

^

(Note:^ to make this work,^ we^ must^ say

/A/B^ rotate^ right^ /B/C
that the height of an empty^ tree^ is^

Be forwarned that a rotation^ does^ not

just^ change^ the^ heights^ of^ X^ and^ Y--it also can change the heights^ of^ all^ their

ancestors.^ But^ X^ gets^ splayed^ all^ the way to the root, so all the^ ancestors’

heights^ are^ fixed^ on^ the^ way^ up. Likewise, inserting or removing^ an^ item

changes^ the^ subtree^ sizes^ of^ all^ the ancestors of the affected item,^ and^ possibly^ their^ heights^ as^ well.^ But a newly inserted item gets splayed^ to^ the^ top;^ and^ a^ deleted^ node’s^ parent is splayed to the top.^ So again,^ all^ the

sizes^ and^ heights^ will^ get^ fixed^ during the rotations.^ Let’s watch^ the^ size fields^ as^ we^ insert^ a^ new^ node^ X^ into

a splay tree.^ (The following^ numbers^ are^ sizes,^ not^ keys.) Note that the very first rotation^ is at^ the^ grandparent^ of^ node^ X^ (zig-zig). 10 10 10 10

[11]

/^ ^ /^ ^ /^ \

/^ ^ /^ \

1 [9]^1

/ ^ / ^ /^ ^

/^ ^ /^ /\

/ ^ / ^ /^ ^

/^ ^ ^ /^ \

4 1 =zig=>^5 1 =zig=>^ [5]^

1 =zig-zag=>^1 4 1 =zig=>^3

/ ^ / ^ /^

/^ /^ \

1 2 3 [1]^4

/ ^ / ^ /^

/^ \

1 [X]^1 1

1 1 / \ 1 1

How can we use this information?^ We can^ answer^ the^ query^ "How^3 find(4) many keys are there between^ x^ and^ y?"

in^ O(log^ n)^ amortized^ /^
time if the splay tree has^ no^ duplicate

keys^ and^ we^ label^ every^2 subtree with its size.^ Our^ strategy is^ to^ set^ c^ =^ n,^ then^ /^

\

deduct from c the number of^ keys^ outside

the^ range^ [x,^ y].^1 8 /^
find(x);^ // After the splaying,^ the

keys^ in^ the^ root’s^ left^6

// subtree are all less than^ x,^ so^ subtract

their^ number^ from^ c. c = c - size(root’s left^ subtree);if (root key < x)^ // Only^ possible^

if^ x^ is^ not^ in^ the^ tree--^6 find(7)c--; // otherwise x was^ splayed^ to^ the^ root.^ /^ ^3 find(y);^ // After the splaying,^ the

keys^ in^ the^ root’s^ /^ ^ // right subtree all exceed^ y.^2 5 c = c - size(root’s right^ subtree);^

if (root key > y) c--;^

Now, c is the number of keys^ in^ [x,^ y].