2.4.2. Index Heap

Index heap is also called Indexed Priority Queue. In an Indexed Priority Queue, data is stored just like standard priority queue and along with this, the value of a data can be updated using its key. It is called “indexed” because a hash map can be used to store the index in container using the key of key-value pair input as the key of hash map. This comes handy while implementing Dijkstra’s Algorithm using min-heap. It can also be used in any other program where it is required to put key-value pairs in priority queue and along with it you need to update value of keys with push or pop function .

2.4.2.1. TopK Traded Stocks

You work in an electronic exchange. Throughout the day, you receive ticks (trading data) which consists of ticker symbol and its traded volume of stocks. Eg: {name: UBER, volume: 20}

What data structure will you maintain if you have to tell top k ticker traded by volume at runtime throughout the day. What’s the most efficient solution that you can think of?


This is an index heap question. To avoid using more than one map, we create a new class to store stock information.

1struct stock {
2  string ticker;  // ticker is unique for each stock
3  int volume;
4};

Then we implement this max heap binary tree:

 1class index_heap {
 2  vector<stock> stocks;  // heap core data, which is basically a graph array
 3  unordered_map<string, int> tk2idx;  // ticker symbol to base array index
 4
 5  void _swap(int x, int y) {
 6    swap(tk2idx[stocks[x].ticker], tk2idx[stocks[y].ticker]);
 7    swap(stocks[x], stocks[y]);
 8  }
 9  void _sift_up(int idx) {  // TC: O(LogN)
10    if (0 < idx and idx < stocks.size()) {
11      int pidx = PAR(idx);
12      if (pidx >= 0 and stocks[pidx].volume < stocks[idx].volume)
13        _swap(idx, pidx), _sift_up(pidx);
14    }
15  }
16  void _sift_down(int idx) {  // TC: O(LogN)
17    if (idx >= stocks.size()) return;
18    int lchild = LEF(idx), rchild = RIG(idx);
19    if (stocks.size() - 1 >= rchild) {
20      if (stocks[lchild].volume > stocks[rchild].volume) {
21        _swap(idx, lchild);
22        _sift_down(lchild);
23      } else {
24        _swap(idx, rchild);
25        _sift_down(rchild);
26      }
27    } else if (stocks.size() - 1 >= lchild) {
28      _swap(idx, lchild);
29      _sift_down(lchild);
30    } else
31      return;
32  }
33
34 public:
35  void push(const stock &s) {  // TC: O(LogN)
36    stocks.push_back(s);
37    tk2idx[s.ticker] = (int)stocks.size() - 1;
38    _sift_up((int)stocks.size() - 1);
39  }
40  void update(const string &t, int new_vol) {  // TC: O(LogN)
41    int vol = stocks[tk2idx[t]].volume;
42    if (new_vol == vol) return;
43    if (new_vol > vol)
44      increase_volume(t, new_vol - vol);
45    else
46      increase_volume(t, vol - new_vol);
47  }
48  void increase_volume(const string &t, int vol) {  // TC: O(LogN)
49    stocks[tk2idx[t]].volume += vol;
50    _sift_up(tk2idx[t]);
51  }
52  void decrease_volume(const string &t, int vol) {  // TC: O(LogN)
53    stocks[tk2idx[t]].volume -= vol;
54    _sift_down(tk2idx[t]);
55  }
56  vector<stock> topK(int k) {  // TC: O(KLogK)
57    vector<stock> r;
58    if (stocks.empty()) return r;
59    auto comp = [&](const stock &s1, const stock &s2) {
60      return stocks[tk2idx[s1.ticker]].volume <
61             stocks[tk2idx[s2.ticker]].volume;
62    };
63    priority_queue<stock, vector<stock>, decltype(comp)> tmpQ(comp);  // 💣
64    tmpQ.push(stocks[0]);
65    while (k-- and !tmpQ.empty()) {
66      stock tmp = tmpQ.top();
67      tmpQ.pop();
68      r.push_back(tmp);
69      int l_child = LEF(tk2idx[tmp.ticker]), r_child = RIG(tk2idx[tmp.ticker]);
70      if (l_child < stocks.size()) tmpQ.push(stocks[l_child]);
71      if (r_child < stocks.size()) tmpQ.push(stocks[r_child]);
72    }
73    return r;
74  }
75};
76}  // namespace ns_indexheap

In this problem, we cannot pop the top, so we have to use another extra priority queue to get the topK at runtime. The key idea is to pop top element in the new heap, and add both of its children into the new heap. tk2idx here is very important to get the base array index from the ticker symbol.

👽 A website logs all the ip address which visits its webpage(streaming data). How to display top 10 ip address at real time?