Greedy Algorithms: Storing Files on Tape and Huffman Coding | Papers Algorithms and Programming

Algorithms Non-Lecture A: Greedy Algorithms

The point is, ladies and gentleman, greed is good. Greed works, greed is right.

Greed clarifies, cuts through, and captures the essence of the evolutionary

spirit. Greed in all its forms, greed for life, money, love, knowledge has marked

the upward surge in mankind. And greed—mark my words—will save not only

Teldar Paper but the other malfunctioning corporation called the USA.

— Michael Douglas as Gordon Gekko, Wall Street (1987)

There is always an easy solution to every human problem—

neat, plausible, and wrong.

— H. L. Mencken, New York Evening Mail (November 16, 1917)

A Greedy Algorithms

A.1 Storing Files on Tape

Suppose we have a set of nfiles that we want to store on a tape. In the future, users will want to

read those files from the tape. Reading a file from tape isn’t like reading from disk; first we have to

fast-forward past all the other files, and that takes a significant amount of time. Let L[1 .. n]be an

array listing the lengths of each file; specifically, file ihas length L[i]. If the files are stored in order

from 1to n, then the cost of accessing the kth file is

cost(k) =

i=1

L[i].

The cost reflects the fact that before we read file kwe must first scan past all the earlier files on the

tape. If we assume for the moment that each file is equally likely to be accessed, then the expected

cost of searching for a random file is

E[cost] =

k=1

cost(k)

k=1

i=1

L[i]

If we change the order of the files on the tape, we change the cost of accessing the files; some

files become more expensive to read, but others become cheaper. Different file orders are likely

to result in different expected costs. Specifically, let π(i)denote the index of the file stored at

position ion the tape. Then the expected cost of the permutation πis

E[cost(π)] =

k=1

i=1

L[π(i)]

Which order should we use if we want the expected cost to be as small as possible? The answer

is intuitively clear; we should store the files in order from shortest to longest. So let’s prove this.

Lemma 1. E[cost(π)] is minimized when L[π(i)] ≤L[π(i+ 1)] for all i.

Proof: Suppose L[π(i)] > L[π(i+ 1)] for some i. To simplify notation, let a=π(i)and b=π(i+ 1).

If we swap files aand b, then the cost of accessing aincreases by L[b], and the cost of accessing b

decreases by L[a]. Overall, the swap changes the expected cost by (L[b]−L[a])/n. But this change

is an improvement, because L[b]< L[a]. Thus, if the files are out of order, we can improve the

expected cost by swapping some mis-ordered adjacent pair. 

Greedy Algorithms: Storing Files on Tape and Huffman Coding, Papers of Algorithms and Programming

Related documents

Partial preview of the text

Download Greedy Algorithms: Storing Files on Tape and Huffman Coding and more Papers Algorithms and Programming in PDF only on Docsity!

A Greedy Algorithms

A.1 Storing Files on Tape

A.2 Scheduling Classes

A.3 General Structure

A.4 Huffman codes