Amortized Analysis, Lecture Notes - Computer Science, Study notes of Data Structures and Algorithms

Amortized Analysis, Lecture Notes - Computer Science - Prof Jonathan Shewchuk.pdf, Computer Science, University of California, USA, Prof. Jonathan Shewchuk, Computer Science, Amortized Analysis, Averaging Method, Accounting Method

Typology: Study notes

2010/2011

Uploaded on 10/29/2011

jokerxxx
jokerxxx 🇺🇸

4.3

(36)

330 documents

1 / 2

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
11/24/10
18:33:13 1
38
CS61B: Lecture 38
Wednesday, November 24, 2010
AMORTIZED ANALYSIS
==================
We’ve seen several data structures for which I claimed that the average time
for certain operations is always better than the worst-case time: hash tables,
tree-based disjoint sets, and splay trees.
The mathematics that proves these claims is called _amortized_analysis_.
Amortized analysis is a way of proving that even if an operation is
occasionally expensive, its cost is made up for by earlier, cheaper operations.
The Averaging Method
--------------------
Most hash table operations takes O(1) time, but sometimes an operation forces
a hash table to resize itself, at great expense. What is the average time to
insert an item into a hash table with resizing? Assume that the chains never
grow longer than O(1), so any insert operation that doesn’t resize the table
takes O(1) time--more precisely, suppose it takes at most one second.
Let n be the number of items in the hash table, and N the number of buckets.
Suppose it takes one second for the insert operation to insert the new item,
increment n, and then check if n = N. If so, it doubles the size of the table
from N to 2N, taking 2N additional seconds. This resizing scheme ensures that
the load factor n/N is always less than one.
Suppose every newly constructed hash table is empty and has just one bucket--
that is, initially n = 0 and N = 1. After i insert operations, n = i. The
number of buckets N must be a power of two, and we never allow it to be less
than or equal to n; so N is the smallest power of two > n, which is <= 2n.
The total time in seconds for _all_ the table resizing operations is
2 + 4 + 8 + ... + N/4 + N/2 + N = 2N - 2.
So the cost of i insert operations is at most i + 2N - 2 seconds. Because
N <= 2n = 2i, the i insert operations take <= 5i - 2 seconds. Therefore, the
_average_ running time of an insertion operation is (5i - 2)/i = 5 - 2/i
seconds, which is in O(1) time.
We say that the _amortized_running_time_ of insertion is in O(1), even though
the worst-case running time is in Theta(n).
For almost any application, the amortized running time is more important than
the worst-case running time, because the amortized running time determines the
total running time of the application. The main exceptions are some
applications that require fast interaction (like video games), for which one
really slow operation might cause a noticeable glitch in response time.
The Accounting Method
---------------------
Consider hash tables that resize in both directions: not only do they expand
as the number of items increases, but they also shrink as the number of items
decreases. You can’t analyze them with the averaging method, because you don’t
know what sequence of insert and remove operations an application might
perform.
Let’s try a more sophisticated method. In the _accounting_method_, we "charge"
each operation a certain amount of time. Usually we overcharge. When we
charge more time than the operation actually takes, we can save the excess time
in a bank to spend on later operations.
Before we start, let’s stop using seconds as our unit of running time. We
don’t actually know how many seconds any computation takes, because it varies
from computer to computer. However, everything a computer does can be broken
down into a sequence of constant-time computations. Let a _dollar_ be a unit
of time that’s long enough to execute the slowest constant-time computation
that comes up in the algorithm we’re analyzing. A dollar is a real unit of
time, but it’s different for different computers.
Each hash table operation has
- an _amortized_cost_, which is the number of dollars that we "charge" to do
that operation, and
- an _actual_cost_, which is the actual number of constant-time computations
the operation performs.
The amortized cost is usually a fixed function of n (e.g. $5 for insertion into
a hash table, or $2 log n for insertion into a splay tree), but the actual cost
may vary wildly from operation to operation. For example, insertion into a
hash table takes a long, long time when the table is resized.
When an operation’s amortized cost exceeds its actual cost, the extra dollars
are saved in the bank to be spent on later operations. When an operation’s
actual cost exceeds its amortized cost, dollars are withdrawn from the bank to
pay for an unusually expensive operation.
If the bank balance goes into surplus, it means that the actual total running
time is even faster than the total amortized costs imply.
THE BANK BALANCE MUST NEVER FALL BELOW ZERO. If it does, you are spending more
total dollars than your budget claims, and you have failed to prove anything
about the amortized running time of the algorithm.
Think of amortized costs as an allowance. If your dad gives you $500 a month
allowance, and you only spend $100 of it each month, you can save up the
difference and eventually buy a car. The car may cost $30,000, but if you
saved that money and don’t go into debt, your _average_ spending obviously
wasn’t more than $500 a month.
pf2

Partial preview of the text

Download Amortized Analysis, Lecture Notes - Computer Science and more Study notes Data Structures and Algorithms in PDF only on Docsity!

11/24/1018:33:^

CS61B:^ Lecture 38Wednesday, November 24, 2010 AMORTIZED^ ANALYSIS==================We’ve^ seen^ several

data^ structures for which I claimed that the average time for^ certain^ operations

is always better than the worst-case time:

hash tables,

tree-based^ disjoint

sets, and splay trees. The^ mathematics

that^ proves these claims is called amortized_analysis. Amortized^ analysis

is^ a^ way of proving that even if an operation is occasionally^

expensive,^ its cost is made up for by earlier, cheaper operations. The^ Averaging

Method --------------------Most^ hash^ table

operations takes O(1) time, but sometimes an operation forces a^ hash^ table^

to^ resize^ itself, at great expense.

What is the average time to

insert^ an^ item

into^ a^ hash table with resizing?

Assume that the chains never

grow^ longer^ than

O(1),^ so any insert operation that doesn’t resize the table takes^ O(1)^ time--more

precisely, suppose it takes at most one second. Let^ n^ be^ the^

number^ of^ items in the hash table, and N the number of buckets. Suppose^ it^ takes

one^ second for the insert operation to insert the new item, increment^ n,^

and^ then^ check if n = N.

If so, it doubles the size of the table

from^ N^ to^ 2N,

taking^ 2N^ additional seconds.

This resizing scheme ensures that

the^ load^ factor

n/N^ is^ always less than one. Suppose^ every

newly^ constructed hash table is empty and has just one bucket-- that^ is,^ initially

n^ =^ 0 and N = 1.

After i insert operations, n = i.

The

number^ of^ buckets

N^ must be a power of two, and we never allow it to be less than^ or^ equal

to^ n;^ so^ N is the smallest power of two > n, which is <= 2n. The^ total^ time

in^ seconds for all the table resizing operations is 2 + 4 + 8 +^ ...^ +^ N/4 + N/2 + N = 2N - 2. So^ the^ cost^ of

i^ insert^ operations is at most i + 2N - 2 seconds.

Because

N^ <=^ 2n^ =^ 2i,

the^ i^ insert operations take <= 5i - 2 seconds.

Therefore, the

average^ running

time^ of an insertion operation is (5i - 2)/i = 5 - 2/i seconds,^ which

is^ in^ O(1) time. We^ say^ that^ the

amortized_running_time of insertion is in O(1), even though the^ worst-case

running^ time is in Theta(n). For^ almost^ any

application, the amortized running time is more important than the^ worst-case

running^ time, because the amortized running time determines the total^ running

time^ of^ the application.

The main exceptions are some

applications^

that^ require fast interaction (like video games), for which one really^ slow^ operation

might cause a noticeable glitch in response time.

The Accounting Method---------------------Consider hash tables that resize

in^ both^ directions:

not^ only^ do

they^ expand

as the number of items increases,

but^ they^ also

shrink^ as^ the

number^ of^ items

decreases.^ You can’t analyze

them^ with^ the

averaging^ method,

because^ you

don’t

know what sequence of insert

and^ remove^ operations

an^ application

might

perform.Let’s try a more sophisticated

method.^ In^

the^ accounting_method,

we^ "charge"

each operation a certain amount

of^ time.^ Usually

we^ overcharge.

When^ we

charge more time than the operation

actually^ takes,

we^ can^ save

the^ excess^ time

in a bank to spend on later

operations. Before we start, let’s stop

using^ seconds

as^ our^ unit

of^ running^ time.

We

don’t actually know how many

seconds^ any^

computation^

takes,^ because

it^ varies

from computer to computer.

However,^ everything

a^ computer^

does^ can^ be^ broken

down into a sequence of constant-time

computations.

Let^ a^ dollar

be^ a^ unit

of time that’s long enough

to^ execute^

the^ slowest^ constant-time

computation

that comes up in the algorithm

we’re^ analyzing.

A^ dollar^ is

a^ real^ unit^

of

time, but it’s different for

different^ computers. Each hash table operation has- an amortized_cost, which

is^ the^ number

of^ dollars^

that^ we^ "charge"

to^ do

that operation, and- an actual_cost, which is

the^ actual^ number

of^ constant-time

computations

the operation performs.The amortized cost is usually

a^ fixed^ function

of^ n^ (e.g.^

$5^ for^ insertion

into

a hash table, or $2 log n for

insertion^ into

a^ splay^ tree),

but^ the^ actual

cost

may vary wildly from operation

to^ operation.

For^ example,

insertion^ into

a

hash table takes a long, long

time^ when^ the

table^ is^ resized.

When an operation’s amortized

cost^ exceeds

its^ actual^

cost,^ the^ extra

dollars

are saved in the bank to be

spent^ on^ later

operations.

When^ an^ operation’s

actual cost exceeds its amortized

cost,^ dollars

are^ withdrawn

from^ the^ bank

to

pay for an unusually expensive

operation. If the bank balance goes into

surplus,^ it^

means^ that^ the

actual^ total

running

time is even faster than the

total^ amortized

costs^ imply.

THE BANK BALANCE MUST NEVER

FALL^ BELOW^

ZERO.^ If^ it^

does,^ you^ are

spending^ more

total dollars than your budget

claims,^ and^

you^ have^ failed

to^ prove^ anything

about the amortized running

time^ of^ the

algorithm.

Think of amortized costs as

an^ allowance.

If^ your^ dad

gives^ you^ $

a^ month

allowance, and you only spend

$100^ of^ it^ each

month,^ you^ can

save^ up^ the

difference and eventually buy

a^ car.^ The^

car^ may^ cost

$30,000,^ but

if^ you

saved that money and don’t

go^ into^ debt,

your^ average

spending^ obviously

wasn’t more than $500 a month.

11/24/1018:33:^

Amortized^ Analysis

of^ Hash Tables ---------------------------------Suppose^ every

operation^ (insert, find, remove) takes one dollar of actual running^ time^

unless^ the^ hash table is resized.

We resize the table in two

circumstances.-^ An^ insert^ operation

doubles the table size if n = N AFTER the new item is inserted^ and

n^ is^ incremented, taking 2N additional dollars of time for resizing^ to^

2N^ buckets.^

Thus, the load factor is always less than one.

-^ The^ remove^

operation^ halves the table size if n = N/4 AFTER the item isdeleted and n^ is^ decremented, taking N additional dollars of time forresizing to N/2^ buckets.

Thus, the load factor is always greater than 0. (except^ when

n^ =^ 0,^ i.e. the table is empty). Either^ way,^ a

hash^ table that has just been resized has n = N/2. A^ newly^ constructed

hash table has n = 0 items and N = 1 buckets. By^ trial^ and^

error,^ I^ came up with the following amortized costs.insert: 5 dollarsremove: 5 dollarsfind: 1 dollar Is^ this^ accounting

valid, or will we go broke? The^ crucial^ insight

is^ that at any time, we can look at a hash table and know a lower^ bound^ for

how^ many dollars are in the bank from the values of n and N. We^ know^ that^

the^ last^ time the hash table was resized, the number of items n was^ exactly^ N/2.

So^ if^ n != N/2, there have been subsequent insert/remove operations,^ and

these^ have put money in the bank. We^ charge^ an^

amortized^ $5 for an insert or remove operation.

Every insert or

remove^ operation

that^ doesn’t resize the table costs one actual dollar and puts the^ remaining

$4^ in^ the^ bank.

For every step n takes away from N/2, we accumulate^ another

$4.^ So there must be at least 4|n - N/2| dollars saved (or^ 4n^ dollars

for^ a^ never-resized one-bucket hash table). IMPORTANT:^ Note

that^ 4|n - N/2| is a function of the data structure, and does NOT^ depend^ on

the^ history of hash table operations performed.

In general, the

accounting^ method

only^ works if you can tell how much money is in the bank (or, more^ commonly,

a^ minimum bound on that bank balance) just by looking at the current^ state

of^ the^ data structure--without knowing how the data structure reached^ that^

state. An^ insert^ operation

only resizes the table if the number of items n reaches N. According^ to^

the^ formula above, there are at least 2N dollars in the bank. Resizing^ the^

hash^ table^ from N to 2N buckets costs 2N dollars, so we can afford it.A^ remove^ operation

only^ resizes the table if the number of items n drops to N/4.^ According

to^ the^ formula above, there are at least N dollars in the bank. Resizing^ the^

hash^ table^ from N to N/2 buckets costs N dollars, so we can afford it.The^ bank^ balance

never^ drops below zero, so my amortized costs above are valid. Therefore,^ the

amortized cost of all three operations is in O(1). Observe^ that^

if^ we^ alternate between inserting and deleting the same item over and^ over,^ the

hash^ table is never resized, so we save up a lot of money in the bank.^ This^ isn’t

a^ problem; it just means the algorithm is faster (spends fewer^ dollars)

than^ my^ amortized costs indicate.

Why Does Amortized Analysis

Work? ---------------------------------Why does this metaphor about

putting^ money

in^ the^ bank

tell^ us^ anything

about

the actual running time of

an^ algorithm? Suppose our accountant keeps

a^ ledger^ with

two^ columns:

the^ total^ amortized

cost of all operations so far,

and^ the^ total

actual^ cost^

of^ all^ operations

so

far.^ Our bank balance is the

sum^ of^ all^ the

amortized^ costs

in^ the^ left

column, minus the sum of all

the^ actual^ costs

in^ the^ right

column.^ If^

the^ bank

balance never drops below zero,

the^ total^ actual

cost^ is^ less

than^ or^ equal

to

the total amortized cost.Total amortized cost

|^ Total^ actual

cost

------------------------------------------$^

|^ $

$^

|^ $

$^

|^ $3. |^

.^ |

.^ |

$^

|^ $

$^

|^ $2,

$^

|^ $

------------------------------------------$12,^

>=^ $10,

Therefore, the total running

time^ of^ all^

the^ actual^ operations

never^ takes

longer than the total amortized

cost^ of^ all

the^ operations.

Amortized analysis (as presented

here)^ only^

tells^ us^ an^ upper

bound^ (big-Oh)

on

the actual running time, and

not^ a^ lower^

bound^ (big-Omega).

It^ might^ happen

that we accumulate a big bank

balance^ and^

never^ spend^

it,^ and^ the^ total

actual

running time might be much

less^ than^ the

amortized^ cost.

For^ example,

splay

tree operations take amortized

O(log^ n)^ time,

where^ n^ is^ the

number^ of^ items

in

the tree, but if your only

operation^ is

to^ find^ the^

same^ item^ n^

times^ in^ a^ row,

the actual average running

time^ is^ in^

O(1).

If you want to see the amortized

analysis^ of

splay^ trees,

Goodrich^ and

Tamassia

have it.^ If you take CS 170,

you’ll^ see^ an

amortized^ analysis

of^ disjoint

sets.^ I am saddened to report

that^ both^ analyses

are^ too^ complicated

to

provide much intuition about

their^ running

times.^ (Especially

the^ inverse

Ackermann function, which is

ridiculously

nonintuitive,

though^ cool

nonetheless.)