Scapegoat Trees: An Amortized Analysis of Sorted Set Operations, Study notes of Data Structures and Algorithms

An overview of Scapegoat Trees, a data structure used for maintaining sorted sets with logarithmic worst-case time complexity for search operations and logarithmic amortized time complexity for insertion and deletion. the principles of Scapegoat Trees, their advantages, and a comparison with other data structures like 2-4 trees and Red-Black Trees.

Typology: Study notes

2020/2021

Uploaded on 01/02/2022

guna_shekar
guna_shekar 🇮🇳

4 documents

1 / 23

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Outline
Scapegoat Trees ( O(log n) amortized time)
2-4 Trees ( O(log n) worst case time)
Red Black Trees ( O(log n) worst case time)
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17

Partial preview of the text

Download Scapegoat Trees: An Amortized Analysis of Sorted Set Operations and more Study notes Data Structures and Algorithms in PDF only on Docsity!

Outline

  • (^) Scapegoat Trees ( O(log n) amortized time)
  • (^) 2-4 Trees ( O(log n) worst case time)
  • (^) Red Black Trees ( O(log n) worst case time)

Scapegoat trees

  • Deterministic^ data structure
  • Lazy^ data structure
    • Only does work when search^ paths get too long
  • Search^ in^ O(log n)^ worst-case^ time
  • Insert/delete^ in^ O(log n)^ amortized^ time
    • Starting with an empty scapegoat tree, a sequence of^ m insertions and deletions takes O(mlog n) time

Scapegoat philosophy

  • (^) We cannot do it to often if we want to keep the order of O(log n) amortized time.
  • (^) Rebuild the tree cost O(n) time

• How to know when we need to rebuild the tree?

  • (^) Scapegoat trees keep two counters:
    1. n: the number of items in the tree (size)
    2. q: an overestimate of n
  • (^) We maintain the following two invariants:
  1. q/2 ≤ n ≤ q
  2. No node has depth greater than log 3/ q

Search and Delete

  • (^) How can we perform a search in a Scapegoat tree?
    1. run the standard deletion algorithm for binary search trees.
    2. decrement n
    3. if n < q/2 then
      • (^) rebuild the entire tree and set q=n
  • (^) How can we delete a value x from a Scapegoat tree?
  • (^) How can we insert a value x into a Scapegoat tree?

7 9

0 3 6 1 4 2 8 5 n = q = 10 n = q = 10 u=3. 5 u=3. 5

Inserting into a Scapegoat tree

( easy case )

  1. Create a node u and insert in the normal way.
  2. Increment n and q
  3. depth(u) = 4 ≤ log3/2 q = 5. n = q = 11 n = q = 11 u u

u=3. 5 u=3. 5

Inserting into a Scapegoat tree

( bad case )

n = q = 11 n = q = 11 (^59) 6 8 7 (^03) 1 4 2 d(u) = 6 > log 3/ q =

d(u) = 6 > log3/2 q =

w w 1 ≤ (2/3)2 =

1 ≤ (2/3)2 =

size( size(ww)) >> ((2/32/3)) size(w.parent)

u=3. 5 u=3. 5

Inserting into a Scapegoat tree

( bad case )

n = q = 11 n = q = 11 (^59) 6 8 7 (^03) 1 4 2 d(u) = 6 > log 3/ q =

d(u) = 6 > log3/2 q =

w w 3 ≤ (2/3) = 4 3 ≤ (2/3) = 4 size( size(ww)) >> ((2/32/3)) size(w.parent)

( Scapegoat )

u=3. 5 u=3. 5

Inserting into a Scapegoat tree

( bad case )

n = q = 11 n = q = 11 (^59) 6 8 7 (^03) 1 4 2 d(u) = 6 > log 3/ q =

d(u) = 6 > log3/2 q =

w w 6 > (2/3)7 =

6 > (2/3)7 =

size( size(ww)) >> ((2/32/3)) size(w.parent)

Why is there always a scapegoat?

  • Lemma: if^ d > log3/2 q^ then there exists a^ scapegoat^ node.
  • (^) Proof by contradiction
    • (^) Assume (for contradiction) that we don't find a scapegoat node.
    • (^) Then size(w) ≤ (2/3) size(w.parent) for all nodes w on the path to u
    • (^) The size of a node at depth i is at most n(2/3) I
    • But d > log 3/ q ≥ log 3/ n, so size(u) ≤ n(2/3) d < n(2/3) log3/2 n n = n/n = 1
    • (^) Contradiction! (Since size(u)=1) So there must be a scapegoat node.

Summary

  • (^) So far, we know
    • (^) Insert and delete maintain the invariants:
      • the^ depth^ of any node is at most^ log 3/ q
      • (^) q < 2n
    • (^) So the depth of any node is most log 3/ 2n ≤ 2 + log 3/ n
    • (^) So, we can search in a scapegoat tree in O(log n) time
    • (^) Some issues still to resolve
      • (^) How do we keep track of size(w) for each node w?
      • (^) How much time is spent rebuilding nodes during deletion and insertion?

(Not) keeping track of the size

(^59) 6 8 7 (^03) 1 4 2

  • (^) We only need the size(w) while looking for a scapegoat
    • (^) Knowing size(w), we can compute size(w.parent) by traversing the subtree rooted at sibling(w)
      • (^) But we do O(size(v)) work when we rebuild v anyway, so this doesn't add anything to the cost of rebuilding
  • (^) So, in O(size(v)), we know all sizes up to the scapegoat node time

Analysis of deletion

  • (^) When deleting, if n < q/2, then we rebuild the whole tree
  • (^) This takes O(n) time
  • (^) If n < q/2 then we have done at least q - n > n/2 deletions
  • (^) The amortized (average) cost of rebuilding (due to deletions) is O(1) per deletion

Review: Maintaining

Sorted Sets

  • (^) We have seen the following data structures for implementing a SortedSet − Skiplists: find(x)/add(x)/remove(x) in O(log n) expected time per operation − Treaps: find(x)/add(x)/remove(x) in O(log n) expected time per operation − Scapegoat trees: find(x) in O(log n) worst-case time per operation, add(x)/remove(x) in O(log n) amortized time per operation
  • (^) No data structures course would be complete without covering − 2-4 trees: find(x)/add(x)/remove(x) in O(log n) worst-case time per operation − Red-black trees: find(x)/add(x)/remove(x) in O(log n) worst-case time per operation

Review: Maintaining

Sorted Sets