## Search in the document preview

Lossless Compression

Docsity.com

Compression

• **Compression**: the process of coding that will
effectively reduce the total number of bits
needed to represent certain information.

Docsity.com

Compression

• There are two main categories
– **Lossless
**– **Lossy
**

• Compression ratio:

Docsity.com

Information Theory

• We define the entropy η of an information
source with alphabet *S *= {*s1, s2, …, sn*} as

• *pi - *probability that* si *occurs in the source and
log21/*pi* is amount of information in *si*

Docsity.com

Information Theory

• Figure (a) has a maximum entropy of 256 × (1/256 × log2256) = 8.

• Any other distribution has lower entropy

Docsity.com

Entropy and Code Length

• The entropy η gives a lower bound on the
average number of bits needed to code a
symbol in the alphabet
– η ≤ *l *where *l *is the average bit length of the code

words produced by the encoder assuming a memoryless source

Docsity.com

Run-Length Coding

• Run-length coding is a very widely used and simple compression technique which does not assume a memoryless source – We replace runs of symbols (possibly of length

one) with pairs of (*run-length, symbol*)
– For images, the maximum run-length is the size of

a row

Docsity.com

Variable Length Coding

• A number of compression techniques are based on the entropy ideas seen previously.

• These are known as **entropy coding** or
**variable length coding
**– The number of bits used to code symbols in the

alphabet is variable – Two famous entropy coding techniques are

**Huffman coding** and **Arithmetic coding
**

Docsity.com

Huffman Coding

• Huffman coding constructs a binary tree starting with the probabilities of each symbol in the alphabet – The tree is built in a bottom-up manner – The tree is then used to find the codeword for

each symbol – An algorithm for finding the Huffman code for a

given alphabet with associated probabilities is given in the following slide

Docsity.com

Huffman Coding Algorithm

1. Initialization: Put all symbols on a list sorted according to their frequency counts.

2. Repeat until the list has only one symbol left: a. From the list pick two symbols with the lowest

frequency counts. Form a Huffman subtree that has these two symbols as child nodes and create a parent node.

Docsity.com

Huffman Coding Algorithm

b. Assign the sum of the children's frequency counts to the parent and insert it into the list such that the order is maintained.

c. Delete the children from the list.

3. Assign a codeword for each leaf based on the path from the root.

Docsity.com

Huffman Coding Algorithm

Docsity.com

Huffman Coding Algorithm

Docsity.com

Properties of Huffman Codes

• No Huffman code is the prefix of any other Huffman codes so decoding is unambiguous

• The Huffman coding technique is optimal (but we must know the probabilities of each symbol for this to be true)

• Symbols that occur more frequently have shorter Huffman codes

Docsity.com

Huffman Coding

• Variants:
– In **extended Huffman coding** we group the symbols into *k*

symbols giving an extended alphabet of *nk* symbols
• This leads to somewhat better compression

– In **adaptive Huffman coding** we don’t assume that we
know the exact probabilities

• Start with an estimate and update the tree as we encode/decode

• **Arithmetic Coding** is a newer (and more
complicated) alternative which usually performs
better

Docsity.com

Dictionary-based Coding

• LZW uses fixed-length codewords to represent variable-length strings of symbols/characters that commonly occur together, e.g., words in English text.

• The LZW encoder and decoder build up the same dictionary dynamically while receiving the data.

• LZW places longer and longer repeated entries into a dictionary, and then emits the code for an element, rather than the string itself, if the element has already been placed in the dictionary.

Docsity.com

LZW Compression Algorithm

Docsity.com

LZW Compression Example

• We will compress the string – "ABABBABCABABBA"

• Initially the dictionary is the following

Docsity.com

LZW Example

**Code****String**

1 a

2 b

2 c

Docsity.com

LZW Example

Docsity.com

LZW Decompression

Docsity.com

LZW Decompression Example

Docsity.com

Quadtrees

• Quadtrees are both an indexing structure for and compression scheme for binary images – A quadtree is a tree where each non-leaf node has

four children – Each node is labelled either B (black), W (white) or

G (gray) – Leaf nodes can only be B or W

Docsity.com

Quadtrees

• Algorithm for construction of a quadtree for an N × N binary image: – 1. If the binary images contains only black pixels, label the

root node B and quit. – 2. Else if the binary image contains only white pixels, label

the root node W and quit. – 3. Otherwise create four child nodes corresponding to the

4 N/4 × N/4 quadrants of the binary image. – 4. For each of the quadrants, recursively repeat steps 1 to

3. (In worst case, recursion ends when each sub-quadrant is a single pixel).

Docsity.com

Quadtree Example

1 1 0 1 1 1 0 0 0 1 1 1 1 1 1 1 1 1 0 1 1 1 0 0 0 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Docsity.com