## Topcoder SRM 551 Div-II-500 tutorial

I didn’t participate in the SRM, but solved this problem later. What follows is strictly a personal account of my attempt to this. I tried to be elaborate to make this accessible to newcomers. Hoping people facing difficulty in solving this problem may find this post helpful. My blog revives with this post!

The problem

Excluding the unnecessary details, the problem statement reduces to:

Given a string chocolates[] and an integer maxSwapsand you are allowed to swap any two adjacent elements of the string. Return length of the longest substring of identical characters achievable by no more than maxSwaps swaps.

First thoughts

Observing the sample cases, I devised a trivial one: AAABCAA. In this case BC is sandwiched between two segments of A’s. With 4 swaps, we can move BC to the right, and get the string AAAAABC. Also, 4 is the optimal number of swaps required to get a longer substring of identical characters from the given string.

Up to this point observation confirmed that, joining segments of identical characters is the only way to produce longer segments. But, how to join them with minimum possible swaps? For trivial cases like AAABCAA, solution is quite obvious—moving all sandwiched characters to either left or right whose length is shorter, a greedy approach. But, what about cases with more than two segments of identical characters, like AAABCABBACAA? What kind of complication do they introduce? Still no idea. Once I am with an idea how to do this, the rest is simple—for all possible $O(n^2)$ substrings of chocolates[] flanked by segments of identical characters, find the minimum number of swaps (swap) needed to take all characters not identical to flanks’ out, if swap $\leq$ minSwaps take the maximum of the lengths among all such substrings  of identical characters obtained.

More analysis

Then, I cast closer look at the more general case: AAABCABBACAA. Taking B,C,B,B, and C out of this produces substring AAAAAAA of leghth 7. I found the following steps to get this by visual inspection:

(1)  AAABCABBACAA

(2)  BAAACABBAAAC

(3)  BCAAAABAAABC

(4)  BCAAAAAAABBC

This case is flanked by AAA and AA from left and right respectively. In step 1, B can be moved either to the left with 3 swaps or with 8 swaps to the right. Moreover, whenever we move B to the left it takes 3 swaps and to right it can reduce to 4 swaps(as there are 4 A’s in the right). So, it will be optimal to move B to the left in this step. Similar reasoning applies for moving C to the left in the same step. Other steps are evident from this step except whenever we encounter an A we add it to left or right flank which meets it first. So, it looks like successively moving characters between flanks out and keeping A’s if we encounter one may be the solution—indication of a greedy strategy.

First solution

In order to facilitate further discussion let’s define some notation.

Denote length of the leftmost segment of identical characters as lc, similarly rightmost segment’s as rc. Also, denote index of the character adjacent to the leftmost segment as left, similarly rightmost’s adjacent as right (all illustrated in the figure above).

Each character between left and right inclusive can move either left or right. The crucial point is: characters at indices left and right must be moved before any other characters between them. Here’s why: take an index ind such that, leftind < right (shown in the figure above). It’d take exactly lc + ind − left swaps if we moved ind to the left before moving characters at indices left through ind − 1, but there’s a chance we can move it with less than lc + ind − left moves if we have moved them beforehand. Similar fact can be proved for moving ind to right. These observations resolved into this greedy solution: if we keep moving left and right out iteratively with their shortest distances out, hence shrinking difference between left and right to 0, we end up with the desired string.

Shortest distance out for left is min(lcright − left + rc), symmetrically for right it is min(rcright − left + lc). In each iteration, we keep moving left and right out according to the rules described above until left > right.

As I said before, trying all possible beginning and ending positions for legitimate strings provides a $O(n^3)$ solution—that is, time is not an issue at all as $n \leq 50$. As this approach seemed to be promising, I rushed into coding. Following C++ code implements this idea:

class ColorfulChocolates
{
public:
int maximumSpread( string chocolates, int maxSwaps )
{
int ln = chocolates.size(), mx = 0;

for(int i = 0; i < ln; i++)
{
for(int j = i; j < ln; j++)
{
if(chocolates[i] == chocolates[j])
{
int left = i, right = j;
int lc = 0, rc = 0, swap = 0;

while(left <= right)
{
while(left <= right && chocolates[left] == chocolates[i])
{
lc++;
left++;
}

while(left <= right && chocolates[right] == chocolates[i])
{
rc++;
right--;
}

if(left <= right && chocolates[left] != chocolates[i])
{
int m = right - left + rc;

if(lc < m)
{
left++;
swap += lc;
}
else
{
left++;
swap += m;
}
}

if(left <= right && chocolates[left] != chocolates[i])
{
int m = right - left + lc;

if(rc < m)
{
right--;
swap += rc;
}
else
{
right--;
swap += m;
}
}
}

if(swap <= maxSwaps)
{
mx = max(mx, lc + rc);
}
}
}
}

return mx;
}
};


Mistake!

Unfortunately, this solution fails on case 4. As is my way in these situations, I thought my approach to the problem was wrong; the real approach may be totally different. But, a div II 500 is unlikely to be something too different. With this hope, I decided to stick to the current idea. As case 4 is rather long, fixing it by inspection won’t be wise. Rather, I chose to find faults in my reasoning.

Correct solution

I found a flaw after careful examination. Actually, my proof of the greedy choice was incomplete—careful readers might have already noticed. As the code shows, in each iteration I am moving left and right together. I didn’t prove explicitly whether this is always optimal. In fact, it is not. At any iteration, if lc > rc and we keep moving right, there’s a chance right − left + rc becomes less than lc. So, it is better not moving left right now. For right situation is symmetric. My solution didn’t address this issue.

I found a simple case illustrating this flaw: AAABCDBA and maxSwaps = 4. The output should be 4 (AAAABCDB), but mine returns 3. What happens in this case is: at the very first step shortest distance out for left B is 3, but if CDB were moved to the right beforehand with 3 swaps(AAABACDB), then this B could be moved to the right with just 1 swap. Consequently, we could get AAAABCDB for 4 swaps.

Changing the highlighted block(lines 31 to 61) in the previous code to the following fixed the error and got the system test passed! I didn’t need to fix case 4 explicitly.

if(left <= right && chocolates[left] != chocolates[i] && lc < rc)
{
left++;
swap += lc;
}
else if(left <= right && chocolates[right] != chocolates[i])
{
right--;
swap += rc;
}