






Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
This lecture was delivered by Dr. Ameet Shashank at B R Ambedkar National Institute of Technology. Its relate to Data Representation and Algorithm Design course. Its main points are: Data, Compression, Encoding, Decoding, Message, Encode, Decode, Communication, Ratio
Typology: Slides
1 / 11
This page cannot be seen from the preview
Don't miss anything!







2
3
4
hopefully uses fewer bits
5 Ancient Ideas Ancient ideas. ! Braille. ! Morse code. ! Natural languages. ! Mathematical notation. ! Decimal number system.
6 Natural Encoding Natural encoding. ( 19 " 51) + 6 = 9 75 bits. needed to encode number of characters per line 000000000000000000000000000011111111111111000000000 000000000000000000000000001111111111111111110000000 000000000000000000000001111111111111111111111110000 000000000000000000000011111111111111111111111111000 000000000000000000001111111111111111111111111111110 000000000000000000011111110000000000000000001111111 000000000000000000011111000000000000000000000011111 000000000000000000011100000000000000000000000000111 000000000000000000011100000000000000000000000000111 000000000000000000011100000000000000000000000000111 000000000000000000011100000000000000000000000000111 000000000000000000001111000000000000000000000001110 000000000000000000000011100000000000000000000111000 011111111111111111111111111111111111111111111111111 011111111111111111111111111111111111111111111111111 011111111111111111111111111111111111111111111111111 011111111111111111111111111111111111111111111111111 011111111111111111111111111111111111111111111111111 011000000000000000000000000000000000000000000000011
7 Run-Length Encoding Natural encoding. ( 19 " 51) + 6 = 9 75 bits. Run-length encoding. ( 63 " 6) + 6 = 384 bits. 28 14 9 26 18 7 23 24 4 22 26 3 20 30 1 19 7 18 7 19 5 22 5 19 3 26 3 19 3 26 3 19 3 26 3 19 3 26 3 20 4 23 3 1 22 3 20 3 3 1 50 1 50 1 50 1 50 1 50 1 2 46 2
63 6-bit run lengths 8 Run-Length Encoding Run-length encoding (RLE). ! Exploit long runs of repeated characters. ! Binary alphabet: runs alternate between 0 and 1; output counts. ! "File inflation" possible if runs are short. Applications. ! JPEG. ! ITU-T T4 fax machines. (black and white graphics)
13 ITU-T T 4 Group 3 Fax Group 3 fax. Transmit image comprised of up to 1728 pels per line, typically mostly white. RLE. Compute run-lengths of white and black pels. Prefix-free code. Encode run-lengths using following prefix-free code. picture element = black or white 00110101 0000110111 white black 0 run 1 000111 010 2 0111 11 3 1000 10 … … … 63 00110100 000001100111 64 11011 0000001111 128 10010 000011001000 192 010111 000011001001 … … … 1728 010011011 0000001100101
14 How to represent? Use a binary trie. ! Symbols are stored in leaves. ! Encoding is path to leaf. Encoding. ! Method 1: start at leaf; follow path up to the root, and print bits in reverse order. ! Method 2: create ST of symbol-encoding pairs. Decoding. ! Start at root of tree. ! Go left if bit is 0; go right if 1. ! If leaf node, print symbol and return to root. a d
Prefix-Free Code: Encoding and Decoding !
c
r
b
a 0 b 111 c 1011 d 100 r 110 ! 1010 15 How to Transmit the Trie How to transmit the trie? ! Send preorder traversal of trie.
a d
c
r
b
a 0 b 111 c 1011 d 100 r 110 ! 1010 16 Prefix-Free Decoding Implementation public class HuffmanDecoder { private Node root = new Node(); private class Node { char ch; Node left, right; Node() { ch = StdIn.readChar(); if (ch == '*') { left = new Node(); right = new Node(); } } boolean isInternal() { } }
build tree from preorder traversal
17
public void decode() { int N = StdIn.readInt(); for (int i = 0 ; i < N; i++) { Node x = root; while (x.isInternal()) { char bit = StdIn.readChar(); if (bit == ' 0 ') x = x.left; else if (bit == ' 1 ') x = x.right; } System.out.print(x.ch); } } use bits in real applications instead of chars
18
David Huffman 19
20
27 What Data Can be Compressed? Theorem. Impossible to losslessly compress all files. Pf. ! Consider all 1,000 bit messages. ! 21000 possible messages. ! Only 2^999 + 2^998 + … + 1 can be encoded with ' 9 99 bits. ! Only 1 in 2^499 can be encoded with ' 5 00 bits! 28 A Difficult File To Compress f gcglmklklamcnieffonbhjjoeflmmkggjdnccojiciicdnlfhmkcgplchjcfecncianbicdkmjmagmnoolbnkiehgklbpgnoabdcajnbfnbgejciocoenebeilephfgglcmjpinihhkpkpaemndeffflplahcgjlmfgeomjdmecmagaabhplcbjpcie c apelefabpgdajmeloifdkclepkdlehgfhkikdmbigoclajidenmoaajalglihhnenidaipgaeiimhlbcenblilfjenmiagcgfpaedannkbnmholjbggkicnccopgmpimmkdmoogiifmigoeeeiokmegfejlijdfbjagdkgldkcdegpnnhhadfpofhjo n gpojgbhmclikhaopddndlhhmaehpldlnhchkoeoajjdefbamcenkhnamdehegjknfmaehnmbemanenbfcecfadcbghepomjiggibkbcjokanpjkmnolboimmfaimgjnjaanaeadfmfahjlihnilgogmapljjobaaiifjinaeebjibdcpjacediigbdp k eoafgcgbmmcjlilolbmbdphdahffkilldfmaijmkdhbfpmlpgpgcbnbgedjnkecemihnlpknjfoacfjajellcandgbchkhfipffeackhminnpkinapadnnfpfldajmdalclkepihneikiolkaiegnpbdndimkmgjfnnbjnfgckolphhibbgokhhddfg m iffpjocephjfpmpknbapbiladijgonapahfopgbbdkaeflipfhgocbooinbmbfcgmhopfdfgbmbkgfiegdmknfgkecfhpckemeblhdbkgcholfbadljdnignelnjljdchgganffibmdalacnmejldmmfokfolkhccjoegkfejfoklbdmfejgpcagojk e ppnnddpehjfmcnnbnigeaanlmkbamonfgigfpepkhdnjckkgbeidegkdnljghbiiloaafckfelgfbjolialfabjcdapepmcgopdjpleaogdbddlgdijlfhccpfidaclaomahlffmepopmfjgbophhjkpmhhlnmphdemgcjpegckhggedapokhkfldjm cnegmibjkajcdcpjcpgjminhhakihfgiiachfepffnilcooiciepoapmdjniimfbolchkibkbmhbkgconimkdchahcnhap f gdcklidacpfiokdeinkcaelgechjbacpckpjbkbfclfjagkdklbmlgmnocopbbdamkdhgjdikdaafpbbblkdjcngiedeokikkjfaaokijhcblnbbcihgammhbbcopeanegppfmeeidnlmieonfmgpioooodfaddcopehceblgabpmifjkckeegaebji l kahoeekopgcgfgghbogekkmakmlaiipbfhbjiiibkedfkmdocpbpoblhoffbhheflfiappbjknbojdmljoffoeeimhpmcjcanhemaeeibpkeljilbfmhclaodedflgmmjjdhcjanjlpommhahpajeaekdhbhbjfppnjcaofmcinpdpieaannlbfickj m jmobgfgnpcbpkdhclcccmofdheilpdfhlmhnmbndmjimfepajjmmcboolcdlbkeggghopjcgehklhoghmlnbahhgijjkphakbmncolmonhokkgljdajmcfpmpcpcbhckpcghcmjioafdnjggmjbhdjcephefelecibikilcflimfbabcmkfbjekchgl anckidhmbeanmlabncnccpbhoafajjicnfeenppoekmlddholnbdjapbfcajblbooiaepfmmeoafedflmdcbaodgeahimc g ipocoaimomkjdlijhojebbbpoffmigohooigbfackkagdmjonmeedcldoidpemeoidibjckelmipiicdnfinolicmgagbplgfbdfpfjoinacfacjfpinnceoemcablagfoaiaimkheeoilimpieggleigiikbjikooolcgmollmfhkjdegkifiijkjf e jfjfhmedbdihdojadkplhlpndiefpdihbcfkgmdboonnjblcjhffghihfcilmencijiapmojgiolcdkopnijdjadmpnikfcnpgibpogjliaafnmpllmjoaahcpjkiibnodgkndbopalneljlndickbdmolemfhcjcfdopmfikfohmmknicmifkfoglj g biaellgogcfdbeamjndbalmipmlfpmhbdkgfibihmkeehlgklppfiokhbeiopknfkokfeoccehkbhmiilfclhehehcfbagelamfnbfbbfndfmnccjjomjeffhnpiphodncgolekifedfmlhljepmcnioeholffcdncjmgbkdpfiebbcdbmibbelefbp n jjplgebjakdcapkpobehcobpkojhcdagehblbjalnchonkfhhdponafhkffmblfgplobhdmknlkilaijgbpmgnfkkkjfacdookmldhjljenlhhljnhfoaiiglifnpacimmngoclaoblcdfjeebkmejodlnhbdfkheobhikfjfpehbnakjljcpbchlcg hllmemegncknmkkeoogilijmmkomllbkkabelmodcohdhppdakbelmlejdnmbfmcjdebefnjihnejmnogeeafldabjcgfo a cemhklcdkchmbkfbfnebiahfppkcaijegfihnlfohpdiocliffnaldgbnpapdgemffanmglefcojchdeifbnhkfbkjimaloifoedehgeplphcijcinlfnodoenpglnegiehmkdpdkekpgckhpkhobkndkjgagfjeiganoplgnloeldbajkpmobbcjpd cehglelckbhjilafccfipgebpc.... One million pseudo-random characters (a – p) 29 A Difficult File To Compress
resulting file sizes (bytes)
231 bytes, but its output is hard to compress (assume random seed is fixed) 30 Information Theory Intrinsic difficulty of compression. ! Short program generates large data file. ! Optimal compression algorithm has to discover program! ! Undecidable problem. Q. How do we know if our algorithm is doing well? A. Want lower bound on # bits required by any compression scheme.
31 Language Model Q. How compression algorithms work? A. Exploit statistical biases of input messages. ! White patches occur in typical images. ! Word Princeton occurs more frequently than Yale. Compression is all about probability. ! Formulate probabilistic model to predict symbols.
33 Entropy and Compression Theorem. [Shannon, 1948] If data source is an order 0 Markov model, any compression scheme must use ( H(S) bits per symbol on average. ! Cornerstone result of information theory. ! Ex: to transmit results of fair die, need ( 2.58 bits per roll. Theorem. [Huffman, 1952] If data source is an order 0 Markov model, Huffman code uses ' H(S) + 1 bits per symbol on average. Q. Is there any hope of doing better than Huffman coding? A. Yes. Huffman wastes up to 1 bit per symbol.
39 LZW Algorithm Lempel-Ziv-Welch. [variant of LZ78] ! Create ST and associate an integer with each useful string. ! When input matches string in ST, output associated integer. Encoding. ! Find longest string s in ST that is a prefix of remaining part of string to compress. ! Output integer associated with s. ! Add s ) x to dictionary, where x is next char in string to compress. Ex. Dictionary: a, aa, ab, aba, abb, abaa, abaab, abaaa, ! String to be compressed: abaababbb… ! s = abaab, x = a. ! Output integer associated with s; insert abaaba into ST. 40 LZW Example
41 LZW Implementation Implementation. ! Use trie to create symbol table on-the-fly. ! Note that prefix of every word is also in ST. Encode. ! Lookup string suffix in trie. ! Output ST index at bottom. ! Add new node to bottom of trie. Decode. ! Lookup index in array ! Output string ! Insert string + next letter.
aa a ab a (^) b aba abb a b abaa a abaab b abaab a a abaaba a 42 LZW Encoder: Java Implementation public class LZWEncoder { public static void main(String[] args) { String text = StdIn.readAll(); StringST
43 LZW Decoder: Java Implementation public class LZWDecoder { public static void main(String[] args) { ST<Integer, String> st = new ST<Integer, String>(); int i; for (i = 0 ; i < 256 ; i++) { String s = Character.toString((char) i); st.put(i, s); } int code = StdIn.readInt(); String prev = st.get(code); System.out.print(prev); while (!StdIn.isEmpty()) { code = StdIn.readInt(); String s = st.get(code); if (i == code) s = prev + prev.charAt( 0 ); System.out.print(s); st.put(i++, prev + s.charAt( 0 )); prev = s; } } } special case, e.g., for "ababababab" in real applications, integers will be encoded in binary 44 LZW Implementation Details What to do when ST gets too large? ! Throw away and start over. GIF ! Throw away when not effective. Unix compress 45 LZW in the Real World Lempel-Ziv and friends. ! LZ77. ! LZ78. ! LZW. ! Deflate = LZ77 variant + Huffman. PNG: LZ77. Winzip, gzip, jar: deflate. Unix compress: LZW. Pkzip: LZW + Shannon-Fano. GIF, TIFF, V.42bis modem: LZW. Google: zlib which is based on deflate. never expands a file LZ77 not patented # widely used in open source LZW patent #4,558,302 expired in US on June 20, 2003 some versions copyrighted 46 Summary Lossless compression. ! Simple approaches. [RLE] ! Represent fixed length symbols with variable length codes. [Huffman] ! Represent variable length symbols with fixed length codes. [LZW] Lossy compression. [not covered in this course] ! JPEG, MPEG, MP3. ! FFT, wavelets, fractals, SVD, … Limits on compression. Shannon entropy. Theoretical limits closely match what we can achieve in practice!