Knapsack Problems: Fractional and 0/1 Knapsack Algorithms, Cheat Sheet of Design and Analysis of Algorithms

Detailed Lecture Notes with Description

Typology: Cheat Sheet

2017/2018

Uploaded on 01/26/2022

vasundhara-jhobta
vasundhara-jhobta 🇺🇸

4.6

(8)

8 documents

1 / 28

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
15CS204J- ALGORITHM DES IGN AND ANALYSIS UNIT-III!
Mrs.%Selva%Mary.%G% % Page%1%
UNIT%III%
GREEDY%AND%DYNAMIC%PROGRAMMING%
Introduction%-%Greedy:%Huffman%Coding%-%Knapsack%Problem%-%Minimum%Spanning%Tree%(Kruskals%
Algorithm).% Dynamic% Programming:% 0/1% Knapsack% Problem% -% Travelling% Salesman% Problem% -%
Multistage%Graph-%Forward%path%and%backward%path.%
%
INTRODUCTION%
Greedy%Method%
A% greedy% algorithm% is% an% algorithmic% paradigm% that% follows% the% problem% solving% heuristic% of%
making%the%locally%optimal%choice%at%each%stage%with%the%hope%of%finding%a%global%optimum.%
%
Greedy%is%a%strategy%that%works%well%on%optimization%problems%with%the%following%characteristics:%
%1.%Greedy-choice%property:%A%global%optimum%can%be%arrived%at%by%selecting%a%local%optimum.%%
2.% Optimal% substructure:% An% optimal% solution% to% the% problem% contains% an% optimal% solution% to%
subproblems.%%
The%second%property%may%make%greedy%algorithms%look%like%dynamic%programming.%However,%the%
two%techniques%are%quite%different.%
%
Applications%
Greedy%algorithms%mostly%(but%not%always)%fail%to%find%the%globally%optimal%solution,%because%they%
usually%do%not%operate%exhaustively%on%all%the%data.%They%can%make%commitments%to%certain%choices%
too%early%which%prevent%them%from%finding%the%best%overall%solution%later.%%
Examples%of%such%greedy%algorithms%are%%
Kruskal's%algorithm%for%finding%minimum%spanning%trees%
Prim's%algorithm%for%finding%minimum%spanning%trees%
Huffman%coding%Algorithm%for%finding%optimum%Huffman%trees.%
Used% in% Networking% too.% Greedy% algorithms% appear% in% network% routing%as% well.% Using%
greedy%routing,%a%message% is%forwarded%to% the%neighboring%node%which%is% "closest"%to%the%
destination.%The%notion%of%a%node's%location%(and%hence%"closeness")%may%be%determined%by%
its%physical%location,%as%in%geographic%routing%used%by%ad%hoc%networks.%Location%may%also%
be%an%entirely%artificial%construct%as%in%small%world%routing%and%distributed%hash%table.%
%
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c

Partial preview of the text

Download Knapsack Problems: Fractional and 0/1 Knapsack Algorithms and more Cheat Sheet Design and Analysis of Algorithms in PDF only on Docsity!

UNIT III

GREEDY AND DYNAMIC PROGRAMMING

Introduction - Greedy: Huffman Coding - Knapsack Problem - Minimum Spanning Tree (Kruskals Algorithm). Dynamic Programming: 0/1 Knapsack Problem - Travelling Salesman Problem - Multistage Graph- Forward path and backward path.

INTRODUCTION

Greedy Method A greedy algorithm is an algorithmic paradigm that follows the problem solving heuristic of making the locally optimal choice at each stage with the hope of finding a global optimum. Greedy is a strategy that works well on optimization problems with the following characteristics:

  1. Greedy-choice property: A global optimum can be arrived at by selecting a local optimum.
  2. Optimal substructure: An optimal solution to the problem contains an optimal solution to subproblems. The second property may make greedy algorithms look like dynamic programming. However, the two techniques are quite different. Applications Greedy algorithms mostly (but not always) fail to find the globally optimal solution, because they usually do not operate exhaustively on all the data. They can make commitments to certain choices too early which prevent them from finding the best overall solution later. Examples of such greedy algorithms are
  • Kruskal's algorithm for finding minimum spanning trees
  • Prim's algorithm for finding minimum spanning trees
  • Huffman coding Algorithm for finding optimum Huffman trees.
  • Used in Networking too. Greedy algorithms appear in network routing as well. Using greedy routing, a message is forwarded to the neighboring node which is "closest" to the destination. The notion of a node's location (and hence "closeness") may be determined by its physical location, as in geographic routing used by ad hoc networks. Location may also be an entirely artificial construct as in small world routing and distributed hash table.

HUFFMAN ENCODING

Huffman codes are very effective and widely used technique for compressing data. Huffman encoding problem is of finding the minimum length bit string which can be used to encode a string of symbols. It uses a table of frequencies of occurrence of each character to represent each character as a binary string, optimally. It uses a simple heap based priority queue. Each leaf is labeled with a character and its frequency of occurrence. Each internal node is labeled with the sum of the weights of the leaves in its subtree. Huffman coding is a lossless data compression algorithm. The idea is to assign variable-legth codes to input characters, lengths of the assigned codes are based on the frequencies of corresponding characters. The most frequent character gets the smallest code and the least frequent character gets the largest code. The variable-length codes assigned to input characters are Prefix Codes, means the codes (bit sequences) are assigned in such a way that the code assigned to one character is not prefix of code assigned to any other character. This is how Huffman Coding makes sure that there is no ambiguity when decoding the generated bit stream. Huffman's greedy algorithm looks at the occurrence of each character and it as a binary string in an optimal way. There are mainly two major parts in Huffman Coding

  1. Build a Huffman Tree from input characters.
  2. Traverse the Huffman Tree and assign codes to characters. Steps to build Huffman Tree Input is array of unique characters along with their frequency of occurrences and output is Huffman Tree.
  1. Create a leaf node for each unique character and build a min heap of all leaf nodes (Min Heap is used as a priority queue. The value of frequency field is used to compare two nodes in min heap. Initially, the least frequent character is at root)
  2. Extract two nodes with the minimum frequency from the min heap.
  3. Create a new internal node with frequency equal to the sum of the two nodes frequencies. Make the first extracted node as its left child and the other extracted node as its right child. Add this node to the min heap.
  4. Repeat steps#2 and #3 until the heap contains only one node. The remaining node is the root node and the tree is complete.

Fixed Length Code : In fixed length code, needs 3 bits to represent six(6) characters. A B C D E F Frequency 45,000 13,000 12,000 16,000 9,000 5, Fixed Length code 000 001 010 011 100 101

  • Total number of characters are 45,000 + 13,000 + 12,000 + 16,000 + 9,000 + 5,000 = 1000,000.
  • Add each character is assigned 3-bit codeword => 3 * 1000,000 = 3000,000 bits. Variable code Length A B C D E F frequency (in thousands) 45000 13000 12000 16000 9000 5000 fixed-length Code 0 1 10 11 100 101 variable-length code 0 101 100 111 1101 1100
  • requires 45× 1 + 13 × 3 + 12 × 3 + 16 × 3 + 9 × 4 + 5 × 4 =224,000 bits

Practice Problems

  1. Find the Huffman code for the following message “COLLEGE OF ENGINEERING”
  2. Consider the following set of frequencies A=2, B=5, C=7, D=8, E=7, F=22, G=4, H=17. Find Huffman code for the same.
  3. Consider the following set of frequencies A=8, B=15, C=37, D=18, E=47, F=22, G=20, H=17. Find Huffman code for the same.

KNAPSACK PROBLEMS - Fractional knapsack

There are n items in a store. For i =1,2,... , n, item i has weight wi > 0 and worth vi > 0. Thief can carry a maximum weight of W pounds in a knapsack. In this version of a problem the items can be broken into smaller piece, so the thief may decide to carry only a fraction xi of object i , where 0 ≤ xi ≤ 1. Item i contributes xiwi to the total weight in the knapsack, and xivi to the value of the load. Applications The problem often arises in resource allocation where there are financial constraints and is studied in fields such as combinatory, computer science, complexity theory, cryptography and applied mathematics. Algorithm Greedy-fractional-knapsack ( w, v, W ) FOR i =1 to n do x [ i ] = weight = 0 while weight < W do i = best remaining item IF weight + w [ i ] ≤ W then x [ i ] = 1 weight = weight + w [ i ] else x [ i ] = ( w - weight) / w [ i ] weight = W return x

Kruskal’s Algorithm

Kruskal's algorithm to find minimum cost spanning tree uses greedy approach. This algorithm treats the graph as a forest and every node it as an individual tree. A tree connects to another only and only if it has least cost among all available options and does not violate MST properties. Algorithm

  1. Sort all the edges in non-decreasing order of their weight.
  2. Pick the smallest edge. Check if it forms a cycle with the spanning tree formed so far. If cycle is not formed, include this edge. Else, discard it.
  3. Repeat step#2 until there are (V-1) edges in the spanning tree. To understand Kruskal's algorithm we shall take the following example − Step 1 - Remove all loops & Parallel Edges Remove all loops and parallel edges from the given graph. In case of parallel edges, keep the one which has least cost associated and remove all others.

Step 2 - Arrange all edges in their increasing order of weight: Next step is to create a set of edges & weight and arrange them in ascending order of weightage (cost). Step 3 - Add the edge which has least weightage Now we start adding edges to graph beginning from the one which has least weight. At all time, we shall keep checking that the spanning properties are remain intact. In case, by adding one edge, the spanning tree property does not hold then we shall consider not to include the edge in graph. The least cost is 2 and edges involved are B,D and D,T so we add them. Adding them does not violate spanning tree properties so we continue to our next edge selection. Next cost is 3, and associated edges are A,C and C,D. So we add them − Next cost in the table is 4, and we observe that adding it will create a circuit in the graph

DYNAMIC PROGRAMMING:

It is used when the solution can be recursively described in terms of solutions to subproblems (optimal substructure). Algorithm finds solutions to subproblems and stores them in memory for later use. More efficient than “brute-force methods”, which solve the same subproblems over and over again. Optimal substructure: Optimal solution to problem consists of optimal solutions to subproblems Overlapping subproblems: Few subproblems in total, many recurring instances of each Bottom up approach: Solve bottom-up, building a table of solved subproblems that are used to solve larger ones

Greedy vs Dynamic Programming

Greedy method Dynamic Programming

make an optimal choice (without knowing solutions to subproblems) and then solve remaining subproblems solve subproblems first, then use those solutions to make an optimal choice solutions are top down solutions are bottom up Best choice does not depend on solutions to subproblems. Choice at each step depends on solutions to subproblems Make best choice at current time, then work on subproblems. Best choice does depend on choices so far Many subproblems are repeated in solving larger problems. This repetition results in great savings when the computation is bottom up Optimal Substructure : solution to problem contains within it optimal solutions to subproblems Optimal Substructure : solution to problem contains within it optimal solutions to subproblems Fractional knapsack: at each step, choose item with highest ratio 0 - 1 Knapsack: to determine whether to include item i for a given size, must consider best solution, at that size, with and without item i

0/1 KNAPSACK PROBLEM

The most common problem being solved is the 0 - 1 knapsack problem , which restricts the number xi of copies of each kind of item to zero or one. Given a set of n items numbered from 1 up to n , each with a weight wi and a value vi , along with a maximum weight capacity W , maximize subject to and. Here xi represents the number of instances of items i to include in the knapsack. Informally, the problem is to maximize the sum of the values of the items in the knapsack so that the sum of the weights is less than or equal to the knapsack's capacity. Optimal substructure: To consider all subsets of items, there can be two cases for every item: (1) the item is included in the optimal subset, (2) not included in the optimal set. Therefore, the maximum value that can be obtained from n items is max of following two values. (1) Maximum value obtained by n-1 items and W weight (excluding nth item). (2) Value of nth item plus maximum value obtained by n-1 items and W minus weight of the nth item (including nth item). If weight of nth item is greater than W, then the nth item cannot be included and case 1 is the only possibility. Pseudo Code // Input: // Values (stored in array v) // Weights (stored in array w) // Number of distinct items (n) // Knapsack capacity (W) for j from 0 to W do: m[0, j] := 0 for i from 1 to n do: for j from 0 to W do: if w[i] <= j then: m[i, j] := max(m[i-1, j], m[i-1, j-w[i]] + v[i]) else: m[i, j] := m[i-1, j]

ii) Second item is selected Object 3 1 2 5 4 W 30 10 20 50 40 Ratio = !" #" 2.2^2 1.5^ 1.2^1 Selected item 1 1 Sack Weight = 30 +10 = 40 < Sack value = 66 + (210) = iii) Third item is selected Object 3 1 2 5 4 W 30 10 20 50 40 Ratio = !" #" 2.2^2 1.5^ 1.2^1 Selected item 1 1 1 Sack Weight = 40 +20 = 60 < Sack value = 86 + (1.520) = 116 iv) Fourth item is selected Object 3 1 2 5 4 W 30 10 20 50 40 Ratio = !" #" 2.2^2 1.5^ 1.2^1 Selected item 1 1 1 1 Sack Weight = 60 +50 = 110 >100 Hence item 4 is selected partially. Sack Weight = 60+(100-60) = 100 <= Sack value = 116 + (1.2*40) = 116+48= 164 Now the sack is FULL. Hence we stop Total selected weight 100 and total value = 2. 2 ∗ 30 + 2 ∗ 10 + 1. 5 ∗ 20 + 1. 2 ∗ 40 = 164. Total value = 164

O-1 Knapsack Input: 5 objects, C = 100 W 10 20 30 40 50 V 20 30 66 40 60 Solution: Given Total no of items = 5, sack capacity = 100 ,

  1. Step 1 : Find the Value/weight ratio Ratio = !" #" Object 1 2 3 4 5 Ratio = !" #" 2 1.5^ 2.2^1 1.
  2. Step 2 : Sort the items according to the ratio and Select the item according to its highest ratio i) First item is selected Object 3 1 2 5 4 W 30 10 20 50 40 Ratio = !" #" 2.2^2 1.5^ 1.2^1 Selected item 1 Sack Weight = 30 < Sack value = 2.2 * 30 = 66 ii) Second item is selected Object 3 1 2 5 4 W 30 10 20 50 40 Ratio = !" #" 2.2^2 1.5^ 1.2^1 Selected item 1 1 Sack Weight = 30 +10 = 40 <

TRAVELLING SALESMAN PROBLEM

Travelling Salesman Problem (TSP): Given a set of cities and distance between every pair of cities, the problem is to find the shortest possible route that visits every city exactly once and returns to the starting point. Hamiltonian Path in an undirected graph is a path that visits each vertex exactly once. A Hamiltonian cycle (or Hamiltonian circuit) is a Hamiltonian Path such that there is an edge (in graph) from the last vertex to the first vertex of the Hamiltonian Path. Note the difference between Hamiltonian Cycle and TSP. The Hamiltoninan cycle problem is to find if there exist a tour that visits every city exactly once. Here we know that Hamiltonian Tour exists (because the graph is complete) and in fact many such tours exist, the problem is to find a minimum weight Hamiltonian Cycle. For example, consider the graph shown in figure on right side. A TSP tour in the graph is 1 - 2 - 4 - 3 -

  1. The cost of the tour is 10+25+30+15 which is 80. The problem is a famous NP hard problem. There is no polynomial time know solution for this problem. Following are different solutions for the traveling salesman problem. Naive Solution:
    1. Consider city 1 as the starting and ending point. 2) Generate all (n-1)! Permutations of cities. 3) Calculate cost of every permutation and keep track of minimum cost permutation. 4) Return the permutation with minimum cost. Time Complexity: ?(n!)

Dynamic Programming: Let the given set of vertices be {1, 2, 3, 4,….n}. Let us consider 1 as starting and ending point of output. For every other vertex i (other than 1), we find the minimum cost path with 1 as the starting point, i as the ending point and all vertices appearing exactly once. Let the cost of this path be cost(i), the cost of corresponding Cycle would be cost(i) + dist(i, 1) where dist(i, 1) is the distance from i to 1. Finally, we return the minimum of all [cost(i) + dist(i, 1)] values. This looks simple so far. Now the question is how to get cost(i)? To calculate cost(i) using Dynamic Programming, we need to have some recursive relation in terms of sub-problems. Let us define a term C(S, i) be the cost of the minimum cost path visiting each vertex in set S exactly once, starting at 1 and ending at i. We start with all subsets of size 2 and calculate C(S, i) for all subsets where S is the subset, then we calculate C(S, i) for all subsets S of size 3 and so on. Note that 1 must be present in every subset. If size of S is 2, then S must be {1, i}, C(S, i) = dist(1, i) Else if size of S is greater than 2. C(S, i) = min { C(S-{i}, j) + dis(j, i)} where j belongs to S, j != i and j != 1. For a set of size n, we consider n-2 subsets each of size n-1 such that all subsets don’t have nth in them. Using the above recurrence relation, we can write dynamic programming based solution. There are at most O(n*2n) subproblems, and each one takes linear time to solve. The total running time is therefore O(n^2 *2n). The time complexity is much less than O(n!), but still exponential. Space required is also exponential. So this approach is also infeasible even for slightly higher number of vertices. Example Distance matrix: g(2,Ø) = c21 = 1 g(3,ø) = c31 = 15 g(4,ø) = c41 = 6

MULTISTAGE GRAPH

Dynamic programming is an algorithm design method that can be used when the solution to a problem may be viewed as the result of a sequence of decisions” Minimum spanning of multistage graph using dynamic programming a. Forward Approach b. Backward Approach a. Forward Approach: Spanning a multiple stage graph using following considerations · Identify source and destination nodes. · Find all possible paths to reach destination from source and sum of weights of adjacent nodes. · The path giving the least weight will be the minimum spanning path. Consider a multistage graph given below Identifying source and destination nodes. Source node - > S Destination node - > D The possible ways to connect S & D d(S,D) = min { 1 + d(A,D) ; 2 + d(F,D) ; 5 + d(C,D) } (1) d(A,D) = min{ 4 + d(B,D) ; 9 + d(G,D) } = min{ 4 + 18 ; 9 + 13 } ‘Substation weights = min{ 22 ; 22 } d(A,D) = 22 (2)

d(F,D) = min{ 9 + d(B,D) ; 5 + d(G,D) ; 16 + d(E,D) } = min{ 9 + d(B,D) ; 5 + d(G,D) ; 16 + d(E,D) } = min{ 9 + 18 ; 5 + 13 ; 16 + 2} ‘Substation weights = min{ 27 ; 18 ; 18} d(F,D) = 18 (3) d(C,D) = min{ 2 + d(E,D) } = min{ 2 + 2} ‘Substation weights = min{ 4} d(C,D) = 4 (4) substitution of 2,3,4 in 1 gives d(S,D) = min { 1 + d(A,D) ; 2 + d(F,D) ; 5 + d(C,D) } d(S,D) = min { 1 + 22 ; 2 + 18 ; 5 + 4 } d(S,D) = min { 23 ; 20 ; 9 } d(S,D) = 9 Hence according to Forward Approach minimum spanning path from S to D is S - > C - > E - > D b. Backward Approach: Backward Approach is just the reverse of forward approach, here Source node and the next node is considered at every stage. Considering same Multi staged Graph, 1 - > 2 Source node S to next nodes A, F and C d(S,A) = 1