Problem
Recently, a friend has met a question in pre-interview machine test:
Given a string \(s\) containing characters 0
, 1
, or ?
, and integer parameters \(x\) and \(y\), while \(1 \leq len(s) < 10^5, 1 \leq x < 10^5, 1 \leq y < 10^5\).
For any string only containing 0
and 1
, we define its cost as
$$x \cdot p_{01} + y \cdot p_{10}$$ while:
- \( p_{01} \): the number of 01 subsequences in \(s\), not substrings, digit 0 and digit 1 needn’t to be neighbors in \(s\).
- \( p_{10} \): the number of 10 subsequences in \(s\).
You should replace all the characters ?
in string \(s\) with characters 0
or 1
to get a string \(s'\), please calculate the minimum cost of all the possible \(s'\).
Brute Force Solution
Suppose there are \(m\) ?
characters in string \(s\) and length of \(s\) is \(n\), then there are \(2^m\) cases for \(s'\) while \(m \leq n\).
For cases that \(m\) is small, we could find the result by brute force search for all possible cases. Bellow is the code:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
|
#!/usr/bin/env python3
import click
import random
import math
from dataclasses import dataclass, field
MOD = 10**9 + 7
MAX = 100000
@dataclass
class Result:
cost: int = math.inf
digits: list = field(default_factory=list)
def cal_cost(digits, x, y):
zero_one_pairs = 0
one_zero_pairs = 0
acc_zeros = 0
acc_ones = 0
for d in digits:
if d == "0":
acc_zeros += 1
one_zero_pairs += acc_ones
else:
acc_ones += 1
zero_one_pairs += acc_zeros
return (x * zero_one_pairs + y * one_zero_pairs)
def search(x: int, y: int, origin: str, marks_num: int, selections: list, res: Result):
if len(selections) >= marks_num:
digits = ""
j = 0
for i, d in enumerate(origin):
if d == "?":
digits += selections[j]
j += 1
else:
digits += d
cost = cal_cost(digits, x, y)
if cost < res.cost:
res.cost = cost
res.digits = ["".join(selections)]
elif cost == res.cost:
res.digits.append("".join(selections))
return
for d in ["0", "1"]:
selections.append(d)
search(x, y, origin, marks_num, selections, res)
selections.pop()
def generate(n: int, m: int):
ones = random.randint(1, n - m) # number of 1.
origin = ["?"] * m + ["1"] * ones + ["0"] * (n - m - ones)
random.shuffle(origin)
origin = "".join(origin)
x, y = random.randint(1, MAX), random.randint(1, MAX)
return origin, m, x, y
def solve(origin, x, y, m):
pass
@click.command()
@click.option("-n", "--length", type=int, help="The length of total.")
@click.option("-m", "--marks", type=int, help="The number of ? marks.")
@click.option("-r", "--rounds", default=10, type=int, help="The rounds of test.")
def test(length, marks, rounds):
for _ in range(rounds):
origin, m, x, y = generate(length, marks)
print(f"test for {origin=}, {x=}, {y=}")
res = Result()
selections = []
search(x, y, origin, m, selections, res)
print(f"minimum cost: {res.cost}")
print(f"minimum digits: {res.digits}")
res = solve(origin, x, y, m)
print(f"result by solve: {res}")
print()
if __name__ == "__main__":
test()
|
Function cal_cost
For any replaced string \(s'\), compute its cost. Its time complexity is \(O(n)\).
1
2
3
4
5
6
7
8
9
10
11
12
13
|
def cal_cost(digits, x, y):
zero_one_pairs = 0
one_zero_pairs = 0
acc_zeros = 0
acc_ones = 0
for d in digits:
if d == "0":
acc_zeros += 1
one_zero_pairs += acc_ones
else:
acc_ones += 1
zero_one_pairs += acc_zeros
return (x * zero_one_pairs + y * one_zero_pairs)
|
Function search
For the given \(s\), search all possible replaced string \(s'\), calculate their cost, and output the minimum cost with corresponding selection characters for all characters ?
. Its time complexity is \(O(2^m)\).
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
|
def search(x: int, y: int, origin: str, marks_num: int, selections: list, res: Result):
if len(selections) >= marks_num:
digits = ""
j = 0
for i, d in enumerate(origin):
if d == "?":
digits += selections[j]
j += 1
else:
digits += d
cost = cal_cost(digits, x, y)
if cost < res.cost:
res.cost = cost
res.digits = ["".join(selections)]
elif cost == res.cost:
res.digits.append("".join(selections))
return
for d in ["0", "1"]:
selections.append(d)
search(x, y, origin, marks_num, selections, res)
selections.pop()
|
Function generate
For specific \(n\) and \(m\), generate a random string \(s\) for test.
1
2
3
4
5
6
7
|
def generate(n: int, m: int):
ones = random.randint(1, n - m) # number of 1.
origin = ["?"] * m + ["1"] * ones + ["0"] * (n - m - ones)
random.shuffle(origin)
origin = "".join(origin)
x, y = random.randint(1, MAX), random.randint(1, MAX)
return origin, m, x, y
|
Test output
We could run command below for test:
1
|
./test2.py -n 40 -m 15 -r 30
|
Then we could get result like this:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
|
test for origin='0010?0?1110??1111?0?0??0?1?11101?10???0?', x=80886, y=87696
minimum cost: 26667504
minimum digits: ['111111111111111']
result by solve: None
test for origin='1????111??1?11???11?111?010111111??1?011', x=80516, y=38007
minimum cost: 5366520
minimum digits: ['111111111111111']
result by solve: None
test for origin='11?11???111?1?11?11?11?111?11?111?11??1?', x=93404, y=60168
minimum cost: 0
minimum digits: ['111111111111111']
result by solve: None
test for origin='00?0?10001?011011??1?000??10???0???1?101', x=87804, y=62587
minimum cost: 24277360
minimum digits: ['000000000000000']
result by solve: None
test for origin='11?001?1??11?0????01?100?10?1??0000101?0', x=62164, y=29545
minimum cost: 14114137
minimum digits: ['111111111111110']
result by solve: None
test for origin='111??0?1111?1??011???00101???0?1101111??', x=58073, y=3674
minimum cost: 5866557
minimum digits: ['111111111111100']
result by solve: None
test for origin='??11011101?0?1??00???011110?1010??10???0', x=84537, y=45210
minimum cost: 19809789
minimum digits: ['111111111111111']
result by solve: None
test for origin='01101?00000???1?0001?01??1000??001????0?', x=42640, y=15946
minimum cost: 6538024
minimum digits: ['000000000000000']
result by solve: None
test for origin='??0?0101?1???10?101??1001?0011??110?1?11', x=5914, y=78297
minimum cost: 8277350
minimum digits: ['000000001111111']
result by solve: None
test for origin='11?1?001?11111?1?1???1??111001?01?01?1??', x=38823, y=81025
minimum cost: 12815324
minimum digits: ['111111111111111']
result by solve: None
test for origin='1???00000???????001?0000101010?0??0000?1', x=2719, y=66507
minimum cost: 5667341
minimum digits: ['000000000000001']
result by solve: None
test for origin='????0??00?1??0010?0100?0010?00?0?000000?', x=66945, y=95582
minimum cost: 11959677
minimum digits: ['000000000000000']
result by solve: None
test for origin='11??1111?0?1?0?100??001??1010?1???0000?1', x=6804, y=7491
minimum cost: 2441406
minimum digits: ['111111111111111']
result by solve: None
test for origin='10?11?1???101101?1011???1???111111?110??', x=77828, y=80904
minimum cost: 13838296
minimum digits: ['111111111111111']
result by solve: None
test for origin='0??000?0?1?0??00?00000?1??1?00000??000?0', x=44105, y=59615
minimum cost: 5764215
minimum digits: ['000000000000000']
result by solve: None
test for origin='?0?011??0?10??00010??111001?1??1?111?00?', x=9000, y=49510
minimum cost: 7155880
minimum digits: ['000000000111111', '000000011111111']
result by solve: None
test for origin='?0??10000??0?00?10011010?01???0?0?110?1?', x=62644, y=16373
minimum cost: 12580791
minimum digits: ['111000000000000']
result by solve: None
test for origin='??1000?0??0?111101?01?001?11??0????00001', x=38235, y=12918
minimum cost: 7785174
minimum digits: ['111111000000000']
result by solve: None
test for origin='?1??111?101011101?1?1??0?111?0?0111?1???', x=80515, y=94877
minimum cost: 17904346
minimum digits: ['111111111111111']
result by solve: None
test for origin='11?11??1?11?1?110101????111?1?11?11?111?', x=44020, y=39386
minimum cost: 3192598
minimum digits: ['111111111111111']
result by solve: None
test for origin='?0?00110??1??110?10????001000?1?001?011?', x=5996, y=70342
minimum cost: 9776982
minimum digits: ['000000000001111']
result by solve: None
test for origin='00?0?0?000??0???000???0?00?000010?00000?', x=86952, y=44205
minimum cost: 3049152
minimum digits: ['000000000000000']
result by solve: None
test for origin='??110010????0????0?0100100?000?100?00?00', x=5315, y=13737
minimum cost: 2212808
minimum digits: ['000000000000000']
result by solve: None
test for origin='111110??1?01?1?10???001?01??11??01100??0', x=73211, y=28772
minimum cost: 13386573
minimum digits: ['111111111111111']
result by solve: None
test for origin='0??01??0101?0??1???1101100?1??1?111?1101', x=30979, y=63434
minimum cost: 11986006
minimum digits: ['111111111111111']
result by solve: None
test for origin='0111011?111?1??101???100??1?1?10?001?1??', x=33268, y=3257
minimum cost: 3904219
minimum digits: ['111111111100000']
result by solve: None
test for origin='1110??0??11110111?11??1111??1110?0?0????', x=1583, y=3105
minimum cost: 484264
minimum digits: ['111111111111111']
result by solve: None
test for origin='00001000??0?10?0?10?00???010101????110??', x=84174, y=7317
minimum cost: 14170272
minimum digits: ['000000000000000']
result by solve: None
test for origin='00?010?1001?0??000??01?000?0??01000?0???', x=17084, y=48188
minimum cost: 6473348
minimum digits: ['000000000000000']
result by solve: None
test for origin='?110101?1??0?1?1??10??0?1?011101??1011?1', x=70939, y=83554
minimum cost: 19712029
minimum digits: ['111111111111111']
result by solve: None
|
It’s obvious that the time complexity of this easy brute force search solution is \(n \cdot 2^m\). If \(m\) is large, we can’t get the result in appropriate time limit. We should find a solution with time complexity nearly \(O(n)\).
We should seek some intuition to reduce the search space of this problem.
Great solution
Analysis of test output
Let’s review the output of our random test cases, we could find some rules:
- The output pattern must be repeated
0
plus repeated 1
or reverse, or only repeated 0
or only repeated1
.
- If \(x < y\), then the result will be repeated
0
plus repeated 1
, or only repeated 0
.
- If \(x > y\), then the result will be repeated
1
plus repeated 0
, or only repeated 1
.
We could suppose that these rules are general rules for any string \(s\) and \(m\). Then we should only consider cases complying with these rules. For any given \(s, x, y\), there are only \(m + 1\) cases.
For example, \(s=01?10??1, x=2, y=5\), then \(m=3\). We only need consider these candidates:
- replacement
000
: \(s=01010001\)
- replacement
001
: \(s=01010011\)
- replacement
011
: \(s=01010111\)
- replacement
111
: \(s=01110111\)
Then the time complexity will be \(n \cdot m\), which is great compared to the brute force solution’s \(n \cdot 2^m\). But it’s still not accepted for \(0 \leq m \leq n < 10^5\), we should optimize the calculation of cost to make its average complexity is \(O(1)\).
Proof of simplfication
Consider string \(s'\), the position \(i\) is replaced with 0
, and position \(j\) is replaced with 1
while \(i < j\). Like bellow:
We could find the cost is:
$$cost_{01} = x (1 + d_0 + d_1)$$
And if we swap them, the cost is:
$$cost_{10} = y (1 + d_0 + d_1)$$
Then we could conclude that if \(x \leq y, i < j\), we should make \(s'[i] = 0, s'[j] = 1\). If \(x > y\), we should make \(s'[i] = 1, s'[j] = 0\). We apply this rule for any \(i < j\) if their characters could be swapped, then the replacement string must be:
- \(x \leq y\): repeated
0
plus repeated 1
, or only repeated 1
.
- \(x > y\): repeated
1
plus repeated 0
, or only repeated 0
.
Optimization for cal_cost
If we don’t do preprocessing for cost calculation, it is obvious that the time complexity of cost calculation is \(O(n)\), just shown by the function cal_cost
.
But we could use preprocessing with complexity \(O(n)\) to make the complexity of cal_cost
for any cases is \(O(1)\).
- left_acc_cost[i]: the cost of substring \(s'[:i]\)
- left_acc_zeros[i]: the number of zeros of substring \(s'[:i]\)
- left_acc_ones[i]: the number of ones of substring \(s'[:i]\)
- right_acc_cost[i + 1]: the cost of substring \(s'[i + 1:]\)
- right_acc_zeros[i + 1]: the number of zeros of substring \(s'[i + 1:]\)
- right_acc_ones[i + 1]: the nubmer of ones of substring \(s'[i + 1:]\)
Then we could calculate the cost for any position \(i\) is replaced with:
$$ cost[i] = left\_acc\_cost[i] + right\_acc\_cost[i + 1] + x \cdot left\_acc\_zeros[i] \cdot right\_acc\_ones[i + 1] + y \cdot left\_acc\_ones[i] \cdot right\_acc\_zeros[i + 1]$$
Final solution
From the previous analysis and proof, we could get the great solution with time complexity \(O(n)\).
We could run the script again to check solution:
1
|
./test2.py -n 40 -m 15 -r 30
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
|
def solve(origin, x, y, m):
# If x <= y, the candidate digits should be repeated 0 adding repeated 1.
# If x > y, the candidate digits should be repeated 1 adding repeated 0.
n = len(origin)
right_cost = [0] * (n + 1)
right_zeros = [0] * (n + 1)
right_ones = [0] * (n + 1)
right_acc_zeros = 0
right_acc_ones = 0
right_acc_cost = 0
def cal_acc_cost(acc_cost, choose, acc_zeros, acc_ones, left_direction=True):
if choose == '0':
acc_cost += y * acc_ones if left_direction else x * acc_ones
acc_zeros += 1
else:
acc_cost += x * acc_zeros if left_direction else y * acc_zeros
acc_ones += 1
return acc_cost, acc_zeros, acc_ones
for i in range(n - 1, -1, -1):
d = origin[i]
if d == '?':
d = '1' if x <= y else '0'
right_acc_cost, right_acc_zeros, right_acc_ones = cal_acc_cost(
right_acc_cost, d, right_acc_zeros, right_acc_ones, left_direction=False
)
right_cost[i] = right_acc_cost
right_zeros[i] = right_acc_zeros
right_ones[i] = right_acc_ones
left_acc_zeros = 0
left_acc_ones = 0
left_acc_cost = 0
marks_count = 0
choose_zero = (x <= y)
ans = right_cost[0]
# at least choose one zero if x <= y
digits = '1' * m if choose_zero else '0' * m
for i in range(n):
d = origin[i]
if d == '?':
d = '0' if choose_zero else '1'
marks_count += 1
left_acc_cost, left_acc_zeros, left_acc_ones = cal_acc_cost(
left_acc_cost, d, left_acc_zeros, left_acc_ones
)
total_cost = left_acc_cost + right_cost[i + 1] +\
x * left_acc_zeros * right_ones[i + 1] +\
y * left_acc_ones * right_zeros[i + 1]
if total_cost < ans:
ans = total_cost
if choose_zero:
digits = '0' * marks_count + '1' * (m - marks_count)
else:
digits = '1' * marks_count + '0' * (m - marks_count)
return ans, digits
|
output
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
|
test for origin='000?001?10?000?0??0??10??0??10000?00??01', x=39869, y=16052
minimum cost: 5000264
minimum digits: ['000000000000000']
result by solve: (5000264, '000000000000000')
test for origin='011?00?1??110?10?1?00?00100100???0??0??1', x=92613, y=43990
minimum cost: 18399661
minimum digits: ['000000000000000']
result by solve: (18399661, '000000000000000')
test for origin='11??0???1?100????1?1110?1?101?111?111110', x=80814, y=22343
minimum cost: 10638956
minimum digits: ['111111111111111']
result by solve: (10638956, '111111111111111')
test for origin='?111?1111?11????11?111?111111????1111?1?', x=86178, y=15846
minimum cost: 0
minimum digits: ['111111111111111']
result by solve: (0, '111111111111111')
test for origin='???1?0111?111?1?1?1?111111?11?11?11?11??', x=58822, y=86291
minimum cost: 2431403
minimum digits: ['111111111111111']
result by solve: (2431403, '111111111111111')
test for origin='011??000?010??00?110??001?10?1??0?0??111', x=12953, y=29203
minimum cost: 6293257
minimum digits: ['000000000000000']
result by solve: (6293257, '000000000000000')
test for origin='00??0?00???10??0011?00?0?0000?010?110??0', x=77511, y=14015
minimum cost: 11177036
minimum digits: ['000000000000000']
result by solve: (11177036, '000000000000000')
test for origin='100???01?0011???0?01010??1???0101?1000?0', x=80171, y=37389
minimum cost: 17462872
minimum digits: ['000000000000000']
result by solve: (17462872, '000000000000000')
test for origin='00?000?00???00??0001?0??1??01?00110?0?00', x=59065, y=25345
minimum cost: 8684095
minimum digits: ['000000000000000']
result by solve: (8684095, '000000000000000')
test for origin='??0001??0000??00?001?0?01???0000??0?0010', x=83156, y=88502
minimum cost: 12316608
minimum digits: ['000000000000000']
result by solve: (12316608, '000000000000000')
test for origin='?010?0?1110??1??11??1?10011??11011?1?1?1', x=11524, y=37273
minimum cost: 4618968
minimum digits: ['111111111111111']
result by solve: (4618968, '111111111111111')
test for origin='11?110??01?10??0?0???00110??11?0100?01?0', x=8637, y=30573
minimum cost: 7099611
minimum digits: ['000000000000111']
result by solve: (7099611, '000000000000111')
test for origin='??0?0??000??000?0?0000000?0000?00?00?1??', x=69391, y=88290
minimum cost: 2744047
minimum digits: ['000000000000000']
result by solve: (2744047, '000000000000000')
test for origin='0?0?1??00??1?100?00?10?0?011?1?0?10000?1', x=49500, y=38263
minimum cost: 12540719
minimum digits: ['000000000000000']
result by solve: (12540719, '000000000000000')
test for origin='11??011?1?0?00?1?11011?1??010?0?0??10?10', x=68450, y=8805
minimum cost: 8642665
minimum digits: ['111111100000000']
result by solve: (8642665, '111111100000000')
test for origin='11?1??1??11??1010?10?10??100101??11111??', x=89261, y=69335
minimum cost: 17989059
minimum digits: ['111111111111111']
result by solve: (17989059, '111111111111111')
test for origin='0?0111?11?11??1010?11??1?1?11111???111??', x=74090, y=97970
minimum cost: 11337600
minimum digits: ['111111111111111']
result by solve: (11337600, '111111111111111')
test for origin='0??0??1111111?1110?1?1?0?11?1?111??111??', x=65607, y=46417
minimum cost: 8737378
minimum digits: ['111111111111111']
result by solve: (8737378, '111111111111111')
test for origin='10?00001???1?1??10??0?1?1?01?010?000111?', x=85971, y=68357
minimum cost: 26402682
minimum digits: ['000000000000000']
result by solve: (26402682, '000000000000000')
test for origin='0100??0?1??010?001?100?0??0?0?0??001?010', x=33755, y=7010
minimum cost: 4534515
minimum digits: ['000000000000000']
result by solve: (4534515, '000000000000000')
test for origin='?1??11?111?11?11?1?10?011?11????0111101?', x=89532, y=67212
minimum cost: 10549008
minimum digits: ['111111111111111']
result by solve: (10549008, '111111111111111')
test for origin='00???01?00?00?111101?0??0111?1?100??1?0?', x=96911, y=13087
minimum cost: 17504580
minimum digits: ['111111000000000']
result by solve: (17504580, '111111000000000')
test for origin='0?011?11??11?000?1111101???11???0?011??0', x=22845, y=74771
minimum cost: 13383765
minimum digits: ['111111111111111']
result by solve: (13383765, '111111111111111')
test for origin='1?0?0??0001?0?000?01?0???1?0?0?00?100110', x=88513, y=70631
minimum cost: 18855005
minimum digits: ['000000000000000']
result by solve: (18855005, '000000000000000')
test for origin='1?11111?1010????11????0?1??1000?1111111?', x=99936, y=14822
minimum cost: 10573458
minimum digits: ['111111111111110']
result by solve: (10573458, '111111111111110')
test for origin='??01111?010?0?0??0111?????1011??111111?1', x=56892, y=70950
minimum cost: 14112054
minimum digits: ['111111111111111']
result by solve: (14112054, '111111111111111')
test for origin='01???1???0?01110?0110000???0?10??0010?11', x=63476, y=26608
minimum cost: 14592248
minimum digits: ['111111100000000']
result by solve: (14592248, '111111100000000')
test for origin='??10?11?0000???01?1?00?010?110101?1???11', x=69896, y=92234
minimum cost: 26590038
minimum digits: ['111111111111111']
result by solve: (26590038, '111111111111111')
test for origin='110??1??0????11?01?111111?111??1?111101?', x=20485, y=39000
minimum cost: 4005195
minimum digits: ['111111111111111']
result by solve: (4005195, '111111111111111')
test for origin='1?000??00001?0?1101?111?00?1?1?1?1??0?0?', x=85711, y=80227
minimum cost: 27926940
minimum digits: ['000000000000000']
result by solve: (27926940, '000000000000000')
|
We could find that this great solution’s result is identical to the brute force solution.
Review
This problem is very hard to solve in an onsite interview without brute force experiments. You could firstly consider the easy cases, and express the bottleneck of your easy methods. Then you could express and share your consideration and ask for some hints from the interviewer.
Its difficulty is the math analysis and proof to reduce the search space. If we meet this type problem in pre-interview machine test, we should firstly consider writing the brute force solution and then observe the results’ rule for easy cases. Don’t be anxious to reach the best solution immediately.