Scheduling and Huffman Encoding Techniques - Prof. J. W. Demmel, Exams of Computer Science

Information on two important techniques used in computer science: scheduling and huffman encoding. The scheduling section explains how to find the critical path in a task graph and how to determine the maximum number of tasks that can be executed in parallel. The huffman encoding section discusses how to compress files using this method and how to determine the optimal encoding for each symbol. Both sections include examples and algorithms.

Typology: Exams

2010/2011

Uploaded on 06/19/2011

koofers-user-v3o
koofers-user-v3o 🇺🇸

8 documents

1 / 9

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
CS 170 Second Midterm 5 April 2007
NAME (1 pt):
TA (1 pt):
Name of Neighbor to your left (1 pt):
Name of Neighbor to your right (1 pt):
Instructions: This is a closed book, closed calculator, closed computer, closed network, open
brain exam, but you are permited a 1 page, double-sided set of notes, large enough to read
without a magnifying glass.
You get one point each for filling in the 4 lines at the top of this page. Each other question
is worth the amount shown.
Write all your answers on this exam. If you need scratch paper, ask for it, write your name
on each sheet, and attach it when you turn it in (we have a stapler).
1
2
3
Tot al
2
1
pf3
pf4
pf5
pf8
pf9

Partial preview of the text

Download Scheduling and Huffman Encoding Techniques - Prof. J. W. Demmel and more Exams Computer Science in PDF only on Docsity!

CS 170 Second Midterm 5 April 2007

NAME (1 pt):

TA (1 pt):

Name of Neighbor to your left (1 pt):

Name of Neighbor to your right (1 pt):

Instructions: This is a closed book, closed calculator, closed computer, closed network, open brain exam, but you are permited a 1 page, double-sided set of notes, large enough to read without a magnifying glass. You get one point each for filling in the 4 lines at the top of this page. Each other question is worth the amount shown. Write all your answers on this exam. If you need scratch paper, ask for it, write your name on each sheet, and attach it when you turn it in (we have a stapler).

Total

2

(1) Scheduling (30 points).

  • You’re an engineer planning the construction of a large bridge. There are constraints between the tasks involved. For instance, a portion of road cannot be attached to the suspensions until these are in place, or until the road itself is built. Let each task be represented as a node in a graph, where a directed edge joins A to B if task A must be completed before B begins. Give a condition on this graph for you to be able to order the tasks so as to satisfy all the constraints. Answer. The graph must be a DAG. Give a method to check this condition. Answer. Run Depth First Search. The graph is a DAG if and only if no back-edge is found. Give a method to find an ordering of the tasks that satisfies the constraints if one exists. Answer. Run DFS and order the nodes by decreasing post number.
  • In fact in a real schedule, tasks can happen simultaneously, unless a constraint forces us to finish one before beginning the other. To represent task duration, let the length of edge A→B be the duration of A. (In other words, every outgoing edge from A must have the same length.) Let S (“Start”) and F (“Finish”) be special tasks of length 0, that must happen respectively before and after all other tasks. For instance they can represent the contract signature and the inauguration. An important concept in scheduling is the critical path, that is a sequence of tasks S, A 1 ,... Ak, F such that Ai → Ai+1 and the length of the path is equal to that of the shortest possible schedule. We call this length the construction time.

A

D

S C F

B

E

Figure 1: A list of tasks (nodes), constraints (edges) and task durations (edge lengths).

In Fig. 1, find the critical path and the construction time. Answer. The critical path is the longest path from S to F. Here it is S → A → C → E → F , of length 9. Give a method to find the critical path automatically on a given graph (hint: this method uses the property of this graph that you found in the first part of this problem).

(1) Scheduling (30 points).

  • Planning the construction of a large building is a challenging engineering task, including dealing with constraints among the tasks involved. For instance, a portion of roof cannot be attached to the building until supports are in place, or until the roof itself is available. Let each task be represented as a node in a graph, where a directed edge joins A to B if task A must be completed before B begins. Give a condition on this graph for you to be able to order the tasks so as to satisfy all the constraints. Answer. The graph must be a DAG. Give a method to check this condition. Answer. Run Depth First Search. The graph is a DAG if and only if no back-edge is found. Give a method to find an ordering of the tasks that satisfies the constraints if one exists. Answer. Run DFS and order the nodes by decreasing post number.
  • In fact in a real schedule, tasks can happen simultaneously, unless a constraint forces us to finish one before beginning the other. To represent task duration, let the length of edge A→B be the duration of A. (In other words, every outgoing edge from A must have the same length.) Let S (“Start”) and F (“Finish”) be special tasks of length 0, that must happen respectively before and after all other tasks. For instance they can represent the contract signature and the inauguration. An important concept in scheduling is the critical path, that is a sequence of tasks S, A 1 ,... Ak, F such that Ai → Ai+1 and the length of the path is equal to that of the shortest possible schedule. We call this length the construction time.

A

D

S C F

B

E

Figure 2: A list of tasks (nodes), constraints (edges) and task durations (edge lengths).

In Fig. 2, find the critical path and the construction time. Answer. The critical path is the longest path from S to F. Here it is S → A → C → E → F , of length 9. Give a method to find the critical path automatically on a given graph (hint: this method uses the property of this graph that you found in the first part of this problem).

Answer. The best method is to adapt the “dags-shortest-path” method to find the longest path instead. Find a linearization order as in the third question. Iterate through each node u in linearized order, calling update() on each edge from u, except update is now: update(u,v): dist(v) = max(dist(v),dist(u)+l(u,v)) and the array dist() is initialized to −∞, or to 0 since all durations are positive. We can call this method “dags-longest-path”.

  • What we really want is not just the construction time, but an entire schedule, specifying for each task when to start it. How can you use the intermediate results of your algorithm to output a start time for each task (each node)? Show that this schedule is indeed valid, i.e. it does not violate any constraint. Answer. The array element dist(u) is the length of the longest path from S to u. We can use it as start time. To prove that this gives us a valid schedule, look at a constraint u → v. The start time of u is dist(u), and the update equation gives us dist(v) ≥dist(u)+l(u, v) so the start time of v is after the end time of u.
  • If each task requires a team of workers, show how to compute the number of teams we need to hire, i.e. the maximum number of tasks that will be executed in parallel. For example, if all tasks take unit time and A → B, A → C, B → D, C → D, then answer is 2 teams, because B and C can be done in parallel. Answer. Now that we have a start time and end time for each task where start time is the longest path to the vextex, and end time = start time + task duration (lengths of outgoing edges), we can form records of the form (start time,+1), (endtime, −1) and sort all the records by their first entry yielding the list (t 1 , s 1 ), (t 2 , s 2 ), ...(tn, sn) where t 1 ≤ t 2 ≤ ... ≤ tn and each si is +1 or -1. If there are ties (multiple equal ti) then put all the records with si = − 1 before those with si = +1. Now do

parallel tasks = 0; max parallel tasks = 0; for i = 1 to n parallel tasks = parallel tasks + s i // increases by 1 when a new task starts // decreases by 1 when one ends max parallel tasks = max(max parallel tasks, parallel tasks) end

(2) (20 points) Let G be a file containing symbols b 1 ,...,bm, where bi appears ci times. Suppose c 1 = 1, c 2 = 2 and ci = ci− 1 + ci− 2 for i > 2 (a Fibonacci sequence). We want to use Huffman encoding to compress G. Determine an optimal encoding of each symbol bi. Answer. The algorithm for Huffman encoding creates a priority queue of nodes (representing sets of symbols) ordered by increasing frequency of appearance (the sum of all the frequencies of symbols in the set). Initially the priority queue contains (b 1 , c 1 ), ... , (bm, cm). We claim that at step i of the algorithm the two lowest frequency sets removed from the priority queue are (bi+1, ci+1) and ({b 1 , ..., bi}, c 1 + · · · + ci), where c 1 + · · · + ci = ci+2 − 2 ≥ ci+1. We prove this by induction. The base case is i = 1, and this is clearly true. Now suppose it is true for i, we must show it true for i + 1. Then at step i, we remove these two sets, merge them to form {b 1 , ..., bi+1} with frequency ci+1 + ci+2 − 2 = ci+3 − 2 , where we have used the definition of the Fibonacci sequence. The two other lowest frequency entries in the queue are (bi+2, ci+2) and (bi+3, ci+3). Since ci+2 < ci+3 and ci+3 − 2 < ci+3, the two lowest frequency items on the queue at step i + 1 are (bi+2, ci+2) and ({b 1 , ..., bi+1}, ci+3 − 1), as claimed. The resulting tree created by the Huffman encoding algorithm is therefore a chain, with b 1 and b 2 being leaves at the the bottom (level m), and bk being the only leaf at level m − k + 2, for k > 2 (the root being at level 1). So one possible optimal encoding is for b 1 to get symbol 0 · · · 0 (m − 1 zeros), and bk to get symbol 0 · · · 01 (m − k zeros) for 1 < k ≤ m.

(3) (20 points) Let p(n) be the number of ways you can write the positive integer n as a sum of positive integers. For example, 3 can be written as 3, 2 + 1 and 1 + 1 + 1, so p(3) = 3. (Note that 2 + 1 = 1 + 2 is only counted once, i.e. the order of summands doesn’t matter.) Give a dynamic programming algorithm for computing p(n). Hint: Start with a dynamic programming algorithm for the slightly different function p(n, k) = the number of ways you can write n as a sum of positive integers less than or equal to k. You should include an update formula for p(n, k) (with justification), a program for filling in the values of p(n, k), a bound on the running time (using O()), and how to compute p(n) from the function p(n, k). Answer. The base case for p(n, k) is p(n, 1) = 1 (since n = 1 + 1 + · · · + 1 can only be written one way). We will also need p(0, 0) = 1 for convenience. The update formula is p(n, k) = p(n, k − 1) + p(n − k, min(k, n − k)), since n can either be written not using k (p(n, k−1) ways) or using k (p(n−k, min(k, n−k)) ways). The reason for having min(k, n−k) instead of just k is that n − k cannot be written using numbers any larger than n − k. We can now compute p(n, k) using the following program (where N is the largest value of n that we are interested in).

p(0, 0) = 1 for n = 1 to N , p(n, 1) = 1, end for for n = 2 to N for k = 2 to n p(n, k) = p(n, k − 1) + p(n − k, min(k, n − k))

The cost of this algorithm is O(N 2 ). Finally, p(n) = p(n, n).