3.13.8. Questions

3.13.8.1. Longest Consecutive Sequence

Given an unsorted array of integers, find the length of the longest consecutive elements sequence.

For example, given [100, 4, 200, 1, 3, 2], the longest consecutive elements sequence is [1, 2, 3, 4], it will return its length: 4.

The algorithm should run in O(N) complexity.

  • Algo 1: union find

 1struct longest_consecutive {
 2  vector<int> bo, sz;
 3  void makeset(int len) {
 4    bo.resize(len), sz.resize(len);
 5    fill(bo.begin(), bo.end(), -1);
 6    fill(sz.begin(), sz.end(), 1);
 7  }
 8  int findbo(int x) {  // find boss, x is index
 9    return bo[x] == -1 ? x : (bo[x] = findbo(bo[x]));
10  }
11  void merge(int x, int y) {  // team merge by index
12    x = findbo(x), y = findbo(y);
13    if (x == y) return;
14    if (sz[x] > sz[y]) {      // make y always big boss
15      swap(x, y);
16    }                           
17    bo[x] = y;                  // assign big as small's boss
18    sz[y] += sz[x], sz[x] = 0;  // winner takes all, loser gets nothing
19  }
20  int run(vector<int>& nums) {
21    if (nums.empty()) return 0;
22    int L = nums.size(), i = 0;
23    makeset(L);
24    unordered_map<int, int> um;
25    for (int v : nums) {
26      if (um.count(v) == 0) {  // for corner case
27        um[v] = i;
28        if (um.count(v + 1)) merge(i, um[v + 1]);
29        if (um.count(v - 1)) merge(i, um[v - 1]);
30      }
31      i++;
32    }
33    return *max_element(sz.begin(), sz.end());
34  }
  • Algo 2: hashtable

Since the O(n) algorithm is required, sorting is obviously not possible, so it is natural to use a hash table. Store all the numbers in the sequence in an unordered_set. For any number A[i] in the sequence, we can immediately know A through the set Whether [i] 1 and A[i]-1 are also in the sequence. If so, continue to find A[i] 2 and A[i]-2, and so on, until the entire continuous sequence is found. In order to avoid scanning When A[i]-1 is reached, the sequence is searched again, and the searched number is removed from the set at the same time as each search. Until the set is empty, all consecutive sequence searches end. Complexity: Since each Numbers are inserted into the set only once, and removed once, so the algorithm is O(n).

 1struct longest_consecutive_v2 {
 2  int run(vector<int>& num) {
 3    if (num.empty()) return 0;
 4    unordered_set<int> ht(num.begin(), num.end());
 5    int max_len = 1;
 6    for (int i = 0; i < num.size(); i++) {
 7      if (ht.empty()) break;
 8      int cur_len = 0, cur_num = num[i];
 9      while (ht.count(cur_num)) {  // search in right direction
10        ht.erase(cur_num);
11        cur_len++, cur_num++;
12      }
13      cur_num = num[i] - 1;
14      while (ht.count(cur_num)) {  // search in left direction
15        ht.erase(cur_num);
16        cur_len++, cur_num--;
17      }
18      max_len = max(max_len, cur_len);
19    }
20    return max_len;
21  }
22};

3.13.8.2. Number of Connected Components in an Undirected Graph

Given n nodes labeled from 0 to n - 1 and a list of undirected edges (each edge is a pair of nodes), write a function to find the number of connected components in an undirected graph.

Example 1:

0          3
|          |
1 --- 2    4

Given n = 5 and edges = [[0, 1], [1, 2], [3, 4]], return 2.

Example 2:

0           4
|           |
1 --- 2 --- 3
Given n = 5 and edges = [[0, 1], [1, 2], [2, 3], [3, 4]], return 1.

3.13.8.3. Groups of Strings

You are given a 0-indexed array of strings words. Each string consists of lowercase English letters only. No letter occurs more than once in any string of words.

Two strings s1 and s2 are said to be connected if the set of letters of s2 can be obtained from the set of letters of s1 by any one of the following operations:

- Adding exactly one letter to the set of the letters of s1.
- Deleting exactly one letter from the set of the letters of s1.
- Replacing exactly one letter from the set of the letters of s1 with any letter, including itself.

The array words can be divided into one or more non-intersecting groups. A string belongs to a group if any one of the following is true:

- It is connected to at least one other string of the group.
- It is the only string present in the group.

Note that the strings in words should be grouped in such a manner that a string belonging to a group cannot be connected to a string present in any other group. It can be proved that such an arrangement is always unique.

Return an array ans of size 2 where:
- ans[0] is the maximum number of groups words can be divided into, and
- ans[1] is the size of the largest group.

Example 1:

Input: words = ["a","b","ab","cde"]; Output: [2,3]
Explanation:
- words[0] can be used to obtain words[1] (by replacing 'a' with 'b'), and words[2] (by adding 'b'). So words[0] is connected to words[1] and words[2].
- words[1] can be used to obtain words[0] (by replacing 'b' with 'a'), and words[2] (by adding 'a'). So words[1] is connected to words[0] and words[2].
- words[2] can be used to obtain words[0] (by deleting 'b'), and words[1] (by deleting 'a'). So words[2] is connected to words[0] and words[1].
- words[3] is not connected to any string in words.
Thus, words can be divided into 2 groups ["a","b","ab"] and ["cde"]. The size of the largest group is 3.
Example 2:

Input: words = ["a","ab","abc"]; Output: [1,3]
Explanation:
- words[0] is connected to words[1].
- words[1] is connected to words[0] and words[2].
- words[2] is connected to words[1].

Since all strings are connected to each other, they should be grouped together.
Thus, the size of the largest group is 3.

This question is to cluster all 1-edit-distance words in an array, and return the cluster number and the size of the largest cluster.

Code Block 3.13.3 Group of Strings
 1#include <gtest/gtest.h>
 2#include <sein.hpp>
 3
 4namespace ns_group_strings {
 5class SolutionUF {
 6  vector<int> bo;
 7  void union_(int i, int j) {
 8    bo[i]=bo[j]=bo[find(i)]=find(j);
 9  }
10  int find(int i) {
11    while(i!=bo[i]) i=bo[i];
12    return i;
13  }
14  pair<int,int> findMaxGroup() {
15    unordered_map<int,int> mp;
16    int gcnt=0, cnt=0;
17    for (int i=0; i<bo.size(); i++) {
18      gcnt = max(gcnt,++mp[find(i)]);
19      cnt+=(bo[i]==i);
20    }
21    return {cnt, gcnt};
22  }
23public:
24  pair<int,int> groupStrings(vector<string>& words) {
25    int n = words.size(), group=n;
26    bo.resize(n);
27    iota(bo.begin(), bo.end(), 0);
28    unordered_map<int, int> mp; // bit-encoding to index
29    for (int i=0; i<n; i++) {
30      int bw = 0;
31      for (auto& c: words[i]) bw |= 1<<(c-'a');
32      for (int k=0; k<26; k++) {
33        int w = bw | (1<<k);
34        if (mp.count(w) && find(mp[w])!=find(i)) union_(i, mp[w]);
35        mp[w] = i;
36      }
37    }
38    return findMaxGroup();
39  }
40};
41
42class SolutionDFS {
43  int _dfs(int x, unordered_map<int, int>& nums) {
44    int cur = 0;
45    if (auto it = nums.find(x); it != end(nums) && it->second) {
46      cur += it->second;
47      it->second = 0;
48      for (size_t m = 1; m < 1<<26; m <<= 1) {
49        cur += _dfs(x^m, nums);
50        for (size_t m2 = 1; m2 < 1<<26; m2 <<= 1)
51          if (x&m && !(x&m2))
52            cur += _dfs(x^m^m2, nums);
53      }
54    }
55    return cur;
56  }
57public:
58  pair<int,int> groupStrings(vector<string>& words) {
59    unordered_map<int, int> nums;
60    for (const auto& w : words) {
61      int x = accumulate(begin(w), end(w), 0,
62                         [](int s, char c) { return s|(1<<(c-'a')); });
63      nums[x]++;
64    }
65    int ngroups = 0, largest = 0;
66    for (auto [x, count] : nums) {
67      if (count) {
68        ngroups++;
69        largest = max(largest, _dfs(x, nums));
70      }
71    }
72    return {ngroups, largest};
73  }
74};
75}  // namespace ns_group_strings
76
77using namespace ns_group_strings;
78TEST(_dp_group_strings, a) {
79  vector<string> vs={"a","b","ab","cde"};
80  cout << (SolutionDFS().groupStrings(vs).first==2) << endl;
81  cout << (SolutionUF().groupStrings(vs).second==3) << endl;
82}

3.13.8.4. Percentile Query Tree

[Google L5]Coding round interviewer showed up and asked me to do the question without saying any extra word:

Given a stream of prices from transactions: 79.20, 20.05, 96.82, ...
Implement 2 methods:
1) insert(price)
2) query(percentile) - e.g.: query(0.2) should give a price that 20% of prices is lower than it, 80% of prices should be higher.

Code Block 3.13.4 An order statistic tree
 1struct node {
 2  double val = 0;
 3  int card = 1;  // 💣 cardinality
 4  node *l = nullptr, *r = nullptr;
 5  explicit node(double n) : val(n) {}
 6};
 7
 8node* add(node* x, double n) {
 9  if (x == nullptr) return new node(n);
10  if (n > x->val)
11    x->r = add(x->r, n);
12  else
13    x->l = add(x->l, n);
14  x->card++;
15  return x;
16}
17
18node* order(node* nd, int rank) {
19  if (rank <= 0) return nullptr;
20  if (nd->l and nd->l->card >= rank) return order(nd->l, rank);
21  if (nd->l and nd->l->card + 1 == rank) return nd;
22  if (nd->r) return order(nd->r, rank - (nd->l ? nd->l->card : 0) - 1);
23  return nd;
24}
25
26struct order_statistics_tree {
27  node* root = nullptr;
28  explicit order_statistics_tree(node* r) : root(r) {}
29  node* insert(double numeric) { return add(root, numeric); }
30  double query(double pr) {
31    int o = root->card * pr;
32    node* r = order(root, o);
33    return r->val;
34  }
35};

Refer to Section 3.13.5, Section 2.6.1.4

3.13.8.5. Maze Generation

Generate a maze in a matrix.

The number in the cell is encoded by the edge in a binary number format: 0b[left][bottom][right][up]
    0
  +---+
3 | 1 | 1
  +---+
    2

So the number has a range of [0-15], from 0b0000 to 0b1111.

0001     0010     0011    0100     0101     0110      0111    1000     1001     1010        1111
+---+    +   +    +---+   +   +    +---+    +   +    +---+    +   +    +---+    +   +       +---+
  1        2 |      3 |     4        5        6 |      7 |    | 8      | 9      | 10|  ...  | 15|
+   +    +   +    +   +   +---+    +---+    +---+    +---+    +   +    +   +    +   +       +---+

The following are two example of maze generation:

+---+---+---+---+---+---+               +---+---+---+---+---+---+                  +---+---+---+---+---+---+
  7 | 15| 15| 15| 15| 15|      start -->  5   1   5   5   5   3 |         start -->  1   5   3 | 11| 9   3 |
+---+---+---+---+---+---+               +---+   +---+---+---+   +                  +   +---+   +   +   +   +
| 15| 15| 15| 15| 15| 15|               | 11| 12  5   5   3 | 10|                  | 8   3 | 12  4   2 | 10|
+---+---+---+---+---+---+    =>         +   +---+---+---+   +   +        or        +   +   +---+---+   +   +
| 15| 15| 15| 15| 15| 15|               | 8   5   1   5   6 | 14|                  | 10| 14| 11| 13  2 | 14|
+---+---+---+---+---+---+               +   +---+   +---+---+---+                  +   +---+   +---+   +---+
| 15| 15| 15| 15| 15| 13                | 12  7 | 12  5   5   5   -->end           | 12  7 | 12  5   4   5  -->end
+---+---+---+---+---+---+               +---+---+---+---+---+---+                  +---+---+---+---+---+---+

Desired Properties
  • None of the boundary is deleted (except at 'start' and 'end').
  • Every cell is reachable from every other cell.
  • There are no cycles – no cell can reach itself by a path unless it retraces some part of the path.

A sample API is:

vector<vector<int>> maze_generation(int row, int column) {...}

There are many ways to generate a maze. Something in common is we need randomized function since it is a maze generation.

Code Block 3.13.5 Maze Generation by Union Find
 1#include <gtest/gtest.h>
 2#include <sein.hpp>
 3
 4namespace ns_maze {
 5static vector<pair<int, int>> DIRS = {{-1, 0}, {1,  0}, {0,  -1}, {0,  1}};
 6struct uf {
 7  vector<int> bo;
 8  uf(int sz) {
 9    bo.resize(sz);
10    fill(bo.begin(), bo.end(), -1);
11  }
12  int _find(int x) {
13    if (bo[x] == -1) return x;
14    return bo[x] = _find(bo[x]);
15  }
16  void _union(int x, int y) {
17    int bx = _find(x), by = _find(y);
18    if (bx == by) return;
19    bo[bx] = by;  // 💣
20  }
21  bool connected(int x, int y) { return _find(x) == _find(y); }
22  int group_size() {
23    return count_if(bo.begin(), bo.end(), [](int x) { return x == -1; });
24  }
25};
26
27void track(vector<vector<int>> &g, int i, int j, int di, int dj) {
28  if (di == 0 and dj == 1) g[i][j] &= 0b1101, g[i][j + 1] &= 0b0111; // right
29  else if (di == 0 and dj == -1) g[i][j] &= 0b0111, g[i][j - 1] &= 0b1101; // left
30  else if (di == 1 and dj == 0) g[i][j] &= 0b1011, g[i + 1][j] &= 0b1110; // down
31  else if (di == -1 and dj == 0) g[i][j] &= 0b1110, g[i - 1][j] &= 0b1011; // up
32}
33
34vector<vector<int>> maze_generation_UF(int m, int n) {
35  vector<vector<int>> g(m, vector<int>(n, 0b1111));
36  g[0][0] = 7, g[m - 1][n - 1] = 13;
37  uf ufo(m * n);
38  srand(0xdeadbeef);
39
40  while (ufo.group_size() > 1) {  // 💣
41    for (int i = 0; i < m; i++) {
42      for (int j = 0; j < n; j++) {
43        auto [di, dj] = DIRS[rand() % 4];  // 💣
44        int x = i * n + j, y = (i + di) * n + j + dj;
45        if (i + di < 0 or i + di >= m or j + dj < 0 or j + dj >= n) continue;
46        if (!ufo.connected(x, y)) {
47          ufo._union(x, y);  // connect two cells
48          track(g, i, j, di, dj);
49        }
50      }
51    }
52  }
53  return g;
54}
55
56struct MazeGenDFS {
57  int R, C;
58  vector<bool> visited;  // 💣
59  void dfs(vector<vector<int>> &g, int i, int j) {
60    visited[i * C + j] = true;  // 💣
61    if (i == R - 1 and j == C - 1) return;  // 💣
62    vector<pair<int, int>> dirs = DIRS;
63    random_shuffle(dirs.begin(), dirs.end());  // 💣
64    for (auto [di, dj]: dirs) {
65      int ni = i + di, nj = j + dj;
66      if (ni >= 0 and ni < R and nj >= 0 and nj < C and !visited[ni * C + nj]) {
67        track(g, i, j, di, dj);
68        dfs(g, ni, nj);
69      }
70    }
71  }
72  vector<vector<int>> maze_generation_DFS(int m, int n) {
73    vector<vector<int>> g(m, vector<int>(n, 0b1111));
74    visited.resize(m * n);
75    g[0][0] = 7, g[m - 1][n - 1] = 13; // negative means visited
76    srand(0xdeadbeef);
77    R = m, C = n;
78    dfs(g, 0, 0);  // 💣
79    return g;
80  }
81};
82}
83using namespace ns_maze;
84TEST(_maze, a) {
85  vector<vector<int>> expected = {{1,  5,  7, 11}, {10, 11, 9, 2}, {8,  0,  6, 10}, {14, 12, 7, 12}};
86  auto vvi = maze_generation_UF(4, 4);
87  EXPECT_EQ(vvi, expected);
88  vvi = MazeGenDFS().maze_generation_DFS(4, 4);
89  expected = { { 3, 13, 5, 3 }, { 12, 5, 5, 2 }, { 9, 5, 5, 6 }, { 12, 5, 5, 5 } };
90  EXPECT_EQ(vvi, expected);
91}

3.13.8.6. Multi-language Translator

Design a multi-language translator where there are two APIs:

add(String inputLanguage, String inputWord, String outputLanguage, String outputWord)
get(String inputLanguage, String inputWord, String outputLanguage)

For example:
    add("English", "hello", "Spanish", "hola")
    add("Spanish", "hola", "French", "Bon jour")
    add("French", "Bon jour", "Chinese", "nihao")
Then:
    get("English", "hello", "French") => "Bon jour"
    get("Spanish", "hola", "Chinese") => "nihao"

An intuitive way to solve this problem is graph traversal like BFS, DFS, but union find can be used to solve it in a more efficient way.

Code Block 3.13.6 Multilang Translator using Union Find
 1#include <gtest/gtest.h>
 2#include <sein.hpp>
 3
 4namespace ns_uf_multilang_translator{
 5struct union_find {
 6  vector<int> bo, sz;
 7  union_find(int z=0) {
 8    bo.resize(z), sz.resize(z);
 9    iota(bo.begin(), bo.end(), 0); // 💣
10    fill(sz.begin(), sz.end(), 1);
11  }
12  int find(int i) { // find final big boss' index
13    int j=i;
14    while (bo[i] != i){
15      bo[i]=bo[bo[i]], i = bo[i]; // path halving
16    }
17    return bo[j]=i; // path compression
18  }
19  bool union_(int x, int y) {
20    int bx = find(x), by = find(y);
21    if (bx == by) return false;
22    if (sz[bx] < sz[by])  // merge by rank, winner takes all
23      bo[bx] = by, sz[by] += sz[bx], sz[bx] = 0;
24    else
25      bo[by] = bx, sz[bx] += sz[by], sz[by] = 0;
26    return true;
27  }
28  void add(int n){
29    while(n--) bo.push_back(bo.size()), sz.push_back(1);
30  }
31};
32struct Solution{
33  unordered_map<string,map<string, int>> m1; // lang => word => UF index
34  unordered_map<int, map<string, string>> m2;  // final boss => lang => word
35  union_find ufo;
36
37  void add(string l1, string w1, string l2, string w2){
38    m1[l1], m1[l2];
39    if (!m1[l1].count(w1)) m1[l1][w1] = ufo.bo.size(), ufo.add(1);
40    if (!m1[l2].count(w2)) m1[l2][w2] = ufo.bo.size(), ufo.add(1);
41    ufo.union_(m1[l1][w1], m1[l2][w2]);  // 💣
42    auto& m2_ = m2[ufo.find(m1[l1][w1])];
43    m2_[l1]=w1, m2_[l2]=w2;
44  }
45
46  string get(string l1, string w1, string l2){
47    int idx = m1[l1][w1];
48    return m2[ufo.find(idx)][l2];
49  }
50};
51}
52using namespace ns_uf_multilang_translator;
53TEST(_uf_multilang_translator, a) {
54  Solution sln;
55  sln.add("english","hello","spanish","hola");
56  sln.add("spanish","hola", "french","bon jour");
57  sln.add("french","bon jour", "chinese","nihao");
58  sln.add("english","good", "chinese","hao");
59  sln.add("chinese","hao", "polish","dobry");
60  sln.add("polish","dobry", "hindi","अच्छा");
61  EXPECT_EQ(sln.get("spanish","hola", "chinese"), "nihao");
62  EXPECT_EQ(sln.get("english","hello", "chinese"), "nihao");
63  EXPECT_EQ(sln.get("english","good", "hindi"), "अच्छा");
64  EXPECT_EQ(sln.get("english","good", "polish"), "dobry");
65}