03: Huffman Coding
CSCI 6990 Data Compression
UNIVERSITY of NEW ORLEANS DEPARTMENT OF COMPUTER SCIENCE
CSCI 6990.002: Data Compression
03: Huffman Coding Vassil Roussev
Shannon-Fano Coding
The first code based on Shannon’s theory ¾
Suboptimal (it took a graduate student to fix it!)
Algorithm ¾ ¾ ¾ ¾ ¾ ¾
Start with empty codes Compute frequency statistics for all symbols Order the symbols in the set by frequency Split the set to minimize* difference Add ‘0’ to the codes in the first set and ‘1’ to the rest Recursively assign the rest of the code bits for the two subsets, until sets cannot be split 2
Vassil Roussev
1
03: Huffman Coding
CSCI 6990 Data Compression
Shannon-Fano Coding (2) 0
0
1
1
1
1
a
b
c
d
e
f
9
8
6
5
4
2
0
1
a
b
c
d
e
f
9
8
6
5
4
2
3
Shannon-Fano Coding (3) 0
0
1
1
1
1
a
b
c
d
e
f
9
8
6
5
4
2
0
0
1
a
b
9
8
1
c
d
e
f
6
5
4
2
4
Vassil Roussev
2
03: Huffman Coding
CSCI 6990 Data Compression
Shannon-Fano Coding (4) 00
01
1
1
1
1
a
b
c
d
e
f
9
8
6
5
4
2
0
0
1
a
b
9
8
1
c
d
e
f
6
5
4
2
5
Shannon-Fano Coding (5) 00
01
1
1
1
1
a
b
c
d
e
f
9
8
6
5
4
2
0
1
0
1
a
b
c
d
e
f
9
8
6
5
4
2
0
1
6
Vassil Roussev
3
03: Huffman Coding
CSCI 6990 Data Compression
Shannon-Fano Coding (6) 00
01
10
10
11
11
a
b
c
d
e
f
9
8
6
5
4
2
0
1
0
1
0
1
a
b
c
d
e
f
9
8
6
5
4
2
7
Shannon-Fano Coding (7) 00
01
10
10
11
11
a
b
c
d
e
f
9
8
6
5
4
2
0
Vassil Roussev
0
1
a
b
9
8
1
0
0
1
1
c
d
6
5
e
f
4
2
8
4
03: Huffman Coding
CSCI 6990 Data Compression
Shannon-Fano Coding (8) 00
01
100
101
11
11
a
b
c
d
e
f
9
8
6
5
4
2
0
0
1
a
b
9
8
1
0
0
1
1
c
d
6
5
e
f
4
2
9
Shannon-Fano Coding (9) 00
01
100
101
11
11
a
b
c
d
e
f
9
8
6
5
4
2
0
Vassil Roussev
0
1
a
b
9
8
1
0
0
1
1
0
1
c
d
e
f
6
5
4
2
10
5
03: Huffman Coding
CSCI 6990 Data Compression
Shannon-Fano Coding (10) 00
01
100
101
110
111
a
b
c
d
e
f
9
8
6
5
4
2
0
0
1
a
b
9
8
1
0
0
1
1
0
1
c
d
e
f
6
5
4
2
11
Optimum Prefix Codes Key observations on optimal codes
1. 2.
Symbols that occur more frequently will have shorter codewords The two least frequent symbols will have the same length
Proofs
1. 2.
Assume the opposite—code is clearly sub‐optimal Assume the opposite
Let X, Y be the least frequent symbols & |code(X)| = k, |code(Y)| = k+1
Then by unique decodability (UD), code(X) cannot be a prefix for code(Y) also, all other codes are shorter Î Dropping the last bit of |code(Y)| would generate a new, shorter, uniquely decodable code
!!! This contradicts optimality assumption !!! 12
Vassil Roussev
6
03: Huffman Coding
CSCI 6990 Data Compression
Huffman Coding
David Huffman (1951) ¾
Grad student of Robert M. Fano (MIT) Term paper(!)
Explained by example Letter
Code
Probability
a
0.2
b
0.4
c
0.2
d
0.1
e
0.1
Set
Set Prob
13
Huffman Coding by Example Letter
Init: Create a set out of each letter Code
Probability
Set
Set Prob
a
0.2
a
0.2
b
0.4
b
0.4
c
0.2
c
0.2
d
0.1
d
0.1
e
0.1
e
0.1
14
Vassil Roussev
7
03: Huffman Coding
CSCI 6990 Data Compression
Huffman Coding by Example 1.
Sort sets according to probability (lowest first)
Letter
Code
Probability
Set
Set Prob
a
0.2
d a
0.2 0.1
b
0.4
b e
0.4 0.1
c
0.2
a c
0.2
d
0.1
d c
0.1 0.2
e
0.1
b e
0.1 0.4
15
Huffman Coding by Example 2.
Insert prefix ‘1’ into the codes of top set letters Letter
Probability
Set
Set Prob
a
0.2
d
0.1
b
0.4
e
0.1
c
0.2
a
0.2
0.1
c
0.2
0.1
b
0.4
d e
Code
1
16
Vassil Roussev
8
03: Huffman Coding
CSCI 6990 Data Compression
Huffman Coding by Example 3.
Insert prefix ‘0’ into the codes of the second set letters Letter
Code
Probability
Set
Set Prob
a
0.2
d
0.1
b
0.4
e
0.1
0.2
a
0.2
d
c 1
0.1
c
0.2
e
0
0.1
b
0.4
17
Huffman Coding by Example 4. Letter
Merge the top two sets
Code
Probability
Set
Set Prob
a
0.2
de d
0.1 0.2
b
0.4
e a
0.1 0.2
c
0.2
a c
0.2
d
1
0.1
d c
0.2 0.4
e
0
0.1
b
0.4
18
Vassil Roussev
9
03: Huffman Coding
CSCI 6990 Data Compression
Huffman Coding by Example 1.
Sort sets according to probability (lowest first)
Letter
Code
Probability
Set
Set Prob
a
0.2
de
0.2
b
0.4
a
0.2
0.2
c
0.2
d
c 1
0.1
b
0.4
e
0
0.1
19
Huffman Coding by Example 2.
Insert prefix ‘1’ into the codes of top set letters Letter
Code
Probability
Set
Set Prob
a
0.2
de
0.2
b
0.4
a
0.2
c
0.2
c
0.2
b
0.4
d
11 1
0.1
e
10 0
0.1
20
Vassil Roussev
10
03: Huffman Coding
CSCI 6990 Data Compression
Huffman Coding by Example 3.
Insert prefix ‘0’ into the codes of the second set letters Letter
Code
a
0
b c
Probability
Set
Set Prob
0.2
de
0.2
0.4
a
0.2
0.2
c
0.2
d
11
0.1
b
0.4
e
10
0.1
21
Huffman Coding by Example 4. Letter a
Merge the top two sets
Code 0
Probability
Set
Set Prob
0.2
dea de
0.2 0.4
b
0.4
a c
0.2
c
0.2
b c
0.2 0.4
b
0.4
d
11
0.1
e
10
0.1
22
Vassil Roussev
11
03: Huffman Coding
CSCI 6990 Data Compression
Huffman Coding by Example 1.
Sort sets according to probability (lowest first)
Letter
Code
a
0
b c
Probability
Set
Set Prob
0.2
dea c
0.4 0.2
0.4
dea c
0.2 0.4
0.2
b
0.4
d
11
0.1
e
10
0.1
23
Huffman Coding by Example 2.
Insert prefix ‘1’ into the codes of top set letters Letter a
Code 0
b
Probability
Set
Set Prob
0.2
c
0.2
0.4
dea
0.4
c
1
0.2
b
0.4
d
11
0.1
e
10
0.1
24
Vassil Roussev
12
03: Huffman Coding
CSCI 6990 Data Compression
Huffman Coding by Example 3.
Insert prefix ‘0’ into the codes of the second set letters Letter
Code
a
Probability 00 0
b
Set
Set Prob
0.2
c
0.2
0.4
dea
0.4
b
0.4
c
1
0.2
d
011 11
0.1
e
001 10
0.1
25
Huffman Coding by Example 4. Letter a
Merge the top two sets
Code 00
b
Probability
Set
Set Prob
0.2
cdea c
0.2 0.6
0.4
dea b
0.4
c
1
0.2
b
0.4
d
011
0.1
e
010
0.1
26
Vassil Roussev
13
03: Huffman Coding
CSCI 6990 Data Compression
Huffman Coding by Example 1.
Sort sets according to probability (lowest first)
Letter
Code
a
00
b
Probability
Set
Set Prob
0.2
cdea b
0.6 0.4
0.4
cdea b
0.4 0.6
c
1
0.2
d
011
0.1
e
010
0.1
27
Huffman Coding by Example 2.
Insert prefix ‘1’ into the codes of top set letters Letter
Code
Probability
Set
Set Prob
a
00
0.2
b
0.4
b
1
0.4
cdea
0.6
c
1
0.2
d
011
0.1
e
010
0.1
28
Vassil Roussev
14
03: Huffman Coding
CSCI 6990 Data Compression
Huffman Coding by Example 3.
Insert prefix ‘0’ into the codes of the second set letters Letter
Code
Probability
Set
Set Prob
a
000 00
0.2
b
0.4
b
1
0.4
cdea
0.6
c
01 1
0.2
d
0011 011
0.1
e
0010 010
0.1
29
Huffman Coding by Example 4. Letter
Merge the top two sets Probability
Set
Set Prob
000
0.2
abcde b
0.4 1.0
b
1
0.4
cdea
0.6
c
01
0.2
d
0011
0.1
e
0010
0.1
a
Code
The END 30
Vassil Roussev
15
03: Huffman Coding
CSCI 6990 Data Compression
Example Summary
Average code length ¾
Entropy ¾
l = 0.4x1 + 0.2x2 + 0.2x3 + 0.1x4 + 0.1x4 = 2.2 bits/symbol H = Σs=a..eP(s) log2P(s) = 2.122 bits/symbol
Redundancy ¾
l ‐ H = 0.078 bits/symbol
31
Huffman Tree
0 Letter a
1
Code 000
b
1
c
01
d
0011
e
0010
0 0
a 0.2
1
0
b
1
0.4
c 0.2
1
e
d
0.1
0.1 32
Vassil Roussev
16
03: Huffman Coding
CSCI 6990 Data Compression
Building a Huffman Tree
Letter
Code
b
a
0.4
b
c
c d
1
e
0
0.2
a
0
0.2
0.2
1
e
d
0.1
0.1 33
Building a Huffman Tree
Letter a
Code
b
0
0.4
b c d
11 1
e
10 0
0
a 0.2
0.4
0
c
1
0.2
0.2
1
e
d
0.1
0.1 34
Vassil Roussev
17
03: Huffman Coding
CSCI 6990 Data Compression
Building a Huffman Tree
Letter
Code
a
00 0
0
b
1
0.6
0.4
b c
1
d
011 11
e
010 10
0
a
0.4
0
0.2
c
1
0.2
0.2
1
e
d
0.1
0.1 35
Building a Huffman Tree
0 Letter
1.0
Code
a
000 00
b
1
c
01 1
d
0011 011
e
0010 010
0 0
a 0.2
0.4
0
0.6
1
b
1
0.4
c
1
0.2
0.2
1
e
d
0.1
0.1 36
Vassil Roussev
18
03: Huffman Coding
CSCI 6990 Data Compression
An Alternative Huffman Tree
Letter
Code
b
a
0.4
b c d
1
e
0
0
0.2
1
a
c
e
d
0.2
0.2
0.1
0.1
37
An Alternative Huffman Tree
Letter a
Code
b
0
0.4
b c
1
d
1
e
0
0
0.4
1
0
0.2
1
a
c
e
d
0.2
0.2
0.1
0.1
38
Vassil Roussev
19
03: Huffman Coding
CSCI 6990 Data Compression
An Alternative Huffman Tree
Letter
Code
a
0
00 0
b
1
0.6
0.4
b c
01 1
d
11 1
e
10 0
0
0.4
1
0
1
0.2
a
c
e
d
0.2
0.2
0.1
0.1
39
An Alternative Huffman Tree
0 Letter a
1
Code
0
000 00
b
1
c
001 01
d
011 11
e
010 10
0
0.4
0.6
1
b
1
0
0.4
0.2
1
a
c
e
d
0.2
0.2
0.1
0.1
Average code length ¾
l = 0.4x1 + (0.2 + 0.2 + 0.1 + 0.1)x3= 2.2 bits/symbol 40
Vassil Roussev
20
03: Huffman Coding
CSCI 6990 Data Compression
Yet Another Tree Letter
Code
1
0
a
00
b
11
c
01
d
101
a
c
e
100
0.2
0.2
0
0.4
1
0
0
Average code length ¾
0.6
1
b 0.2
0.4
1
e
d
0.1
0.1
l = 0.4x2+ (0.2 + 0.2)x2 + (0.1 + 0.1)x3= 2.2 bits/symbol 41
Min Variance Huffman Trees
Huffman codes are not unique ¾
All versions yield the same average length
Which one should we choose? ¾
The one with the minimum variance in codeword lengths I.e. with the minimum height tree
Why? ¾
It will ensure the least amount of variability in the encoded stream
How to achieve it? ¾
During sorting, break ties by placing smaller sets higher Alternatively, place newly merged sets as low as possible
42
Vassil Roussev
21
03: Huffman Coding
CSCI 6990 Data Compression
Extended Huffman Codes
Consider the source: ¾ ¾
Huffman code: a b c ¾ ¾
A = {a, b, c}, P(a) = 0.8, P(b) = 0.02, P(c) = 0.18 H = 0.816 bits/symbol 0 11 10
l = 1.2 bits/symbol Redundancy = 0.384 b/sym (47%!)
Q: Could we do better?
43
Extended Huffman Codes (2)
Idea ¾
Consider encoding sequences of two letters as opposed to single letters Letter
Probability
Code
aa
0.6400
0
ab
0.0160
10101
ac
0.1440
11
ba
0.0160
101000
bb
0.0004
10100101
bc
0.0036
1010011
ca
0.1440
100
cb
0.0036
10100100
cc
0.0324
1011
l = 1.7228/2 = 0.8614 Red. = 0.0045
bits/symbol
44
Vassil Roussev
22
03: Huffman Coding
CSCI 6990 Data Compression
Extended Huffman Codes (3)
The idea can be extended further ¾
In theory, by considering more sequences we can improve the coding In reality, the exponential growth of the alphabet makes this impractical ¾
Consider all possible nm sequences (we did 32)
E.g., for length 3 ASCII seq.: 2563 = 224 = 16M
Most sequences would have zero frequency Î Other methods are needed
45
Adaptive Huffman Coding
Problem ¾ ¾
Huffman requires probability estimates This could turn it into a two‐pass procedure: 1. Collect statistics, generate codewords 2. Perform actual encoding
¾
Not practical in many situations
Theoretical solution ¾ ¾
E.g. compressing network transmissions
Start with equal probabilities Based on the first k symbol statistics (k = 1, 2, …) regenerate codewords and encode k+1st symbol
Too expensive in practice 46
Vassil Roussev
23
03: Huffman Coding
CSCI 6990 Data Compression
Adaptive Huffman Coding (2)
Basic idea ¾ ¾ ¾ ¾
Alphabet A = {a1, …, an} Pick a fixed default binary codes for all symbols Start with an empty Huffman tree Read symbol s from source If NYT(s)
// Not Yet Transmitted
• Send NYT, default(s) • Update tree (and keep it Huffman)
Else • Send codeword for s • Update tree ¾
Until done
Notes: ¾ ¾
Codewords will change as a function of symbol frequencies Encoder & decoder follow the same procedure so they stay in sync 47
Adaptive Huffman Tree Tree has at most 2n - 1 nodes Node attributes
¾ ¾
symbol, left, right, parent, siblings, leaf weight If xk is leaf then weight(xk) = frequency of symbol(xk) Else xk = weight( left(xk)) + weight( right(xk))
¾
id, assigned as follows: If weight(x1) ≤ weight(x2) ≤ … ≤ weight(x2n-1) then id(x1) ≤ id(x2) ≤ … ≤ id(x2n-1) Also, parent(x2k-1) = parent(x2k), for 1≤ k ≤ n • Sibling property 48
Vassil Roussev
24
03: Huffman Coding
CSCI 6990 Data Compression
Updating the Tree
Assign id(root) = 2n-1, weight(NYT) = 0 Start with an NYT node Whenever a new symbols is seen, a new node is formed by splitting the NYT Maintaining sibling property Whenever node x is updated
¾
Repeat • If weight(x)