3.11. Greedy Algorithm

3.11.1. Huffman Coding

Huffman coding is a lossless data compression algorithm. The idea is to assign variable-length codes to input characters, lengths of the assigned codes are based on the frequencies of corresponding characters. The most frequent character gets the smallest code and the least frequent character gets the largest code. The variable-length codes assigned to input characters are Prefix Codes, means the codes (bit sequences) are assigned in such a way that the code assigned to one character is not the prefix of code assigned to any other character. This is how Huffman Coding makes sure that there is no ambiguity when decoding the generated bitstream.

Give a character to frequency map, print out the Huffman coding of each character.

input: {{'a',4}, {'b',5}, {'c':7}, {'d':1}}

              root+(17)
              /    \
           +(10)   c(7)
          /    \
      +(5)       b(5)
     /    \
  a(4)   d(1)


0 -> left child, 1 -> right child, we can make a rule that left child is greater than right child, or vice versa.

map:
  a: 000
  b: 01
  c: 1
  d: 001
encoding: input: abc  , output: 000011
decoding: input: 00011, output: abc

There are two key points of huffman-coding: 1. Priority queue. 2. Greedy algorithm to decode.

Code Block 3.11.1 Huffman coding can be implemented with a minheap
 1namespace ns_huffman_coding {
 2struct hnode { // A Huffman tree node
 3  char data;
 4  unsigned freq;
 5  hnode *left, *right;
 6  hnode(char data, unsigned freq) {
 7    left = right = NULL,this->data = data,this->freq = freq;
 8  }
 9};
10struct compare {
11bool operator()(hnode *l, hnode *r) {return l->freq > r->freq;}
12};
13
14// Time complexity: O(nlogn) where n is the number of unique characters.
15hnode* HuffmanCodes(map<char,int>& counter) {
16  priority_queue<hnode *, vector<hnode *>, compare> pq;
17  for (auto& [c, n]: counter)
18    pq.push(new hnode(c, n));
19  while (pq.size() > 1) {
20    auto right = pq.top(); pq.pop();
21    auto left = pq.top(); pq.pop();
22    auto top = new hnode('\0', left->freq + right->freq);
23    top->left = left, top->right = right;
24    pq.push(top);
25  }
26  return pq.top();
27}
28
29void dfs_get_path(hnode *root, string str, map<char,string> &m) {
30  if (!root) return;
31  if (root->data != '\0'){
32    m[root->data] = str;
33    return;
34  }
35  dfs_get_path(root->left, str + "0", m);
36  dfs_get_path(root->right, str + "1", m);
37}
38map<char,string> get_encoding(hnode* root){
39  map<char,string> m;
40  dfs_get_path(root, "", m);
41  return m;
42}
43
44string encode(string o, map<char,string>& m){
45  string res;
46  for(char c: o) res+=m[c];
47  return res;
48}
49
50string decode(string d, map<char,string>& m){
51  map<string, char> decode_map;
52  for(auto& [k,v]: m) decode_map[v]=k;
53  string res;
54  while(!d.empty()){
55    for(int i=1; i<=d.size(); i++){
56      string x = d.substr(0, i);
57      if (decode_map.count(x)){
58        res += decode_map[x];
59        d = d.substr(i);
60        break;
61      }
62    }
63  }
64  return res;
65}
66}

3.11.2. Maximum Length of Pair Chain