How to solve the valid parentheses problem in Java?

In this post, we are going to solve the problem of “valid parentheses” or “matching parentheses”. We will begin this parenthetical journey by looking at multiple examples in order to get a thorough understanding of the problem along with the solution. This will be followed by a technical analysis where all corner cases will be considered. We will then move on to write the code in Java where I will be explaining all the important parts of the code. As a final step, we will refactor the code which will make it easier to read, understand and maintain the code.

What is this problem about ?

In the matching parentheses problem, we are given an input string consisting of opening and closing brackets. The opening brackets can be any one of the following – ‘(‘ , ‘{‘ or ‘[‘ . You must have figured out the closing brackets, it could be any one of – ‘)’ , ‘}’ or ‘]’ . Given an input string with a combination of opening and closing brackets, we need to find out if the string has matching parentheses. There are 2 conditions for the input string to be valid –

  1. Every opening bracket must have a closing bracket of the same type.
  2. The opening and closing order must match.
Valid and invalid examples of matching parentheses

Barring the last example, the valid and invalid examples are pretty easy to spot. For the last example, the matching parentheses are shown with the same color to make it easier to understand. For the invalid one, notice the arrows,the order is not matching. Note that the spaces between the parentheses have been added only for readability.

Technical analysis

If the input string is small, it is pretty easy to match a parenthesis with your eyes irrespective of whether you look at the string from the left or the right. The moment, the strings get bigger, it becomes difficult to match them. Well, what will help you is if you start looking from the inside towards the outside. Once you start looking at the string from inside out, you are able to spot the matching ones, and then just take it virtually off from the string. Does that help you ? I think it does. I want you to remember that, it becomes easier to start from the inside, then move towards the outside and then simply knock off the matching ones so that you have fewer ones to match!

Thinking from a technical perspective

We have an input string and the output must indicate if the input is either valid or invalid. We can start writing the code with a method which takes a String as an input and returns a boolean.

public boolean isValid(String s) {
...
}

That was easy, let’s move on to the actual logic now. Every opening bracket must be matched by a closing bracket of the same type and the order must be maintained.

If the input is {}, we might think of the following – let’s begin at the first character and if the next character is a closing bracket, let’s check the previous index for an opening bracket. Well, that might work for {}, there are only 2 characters in the string. But for longer strings like {{{[]}}}, this logic could get pretty messy since there would be too many forward and backward movements in the string. Do you remember the inside out visualisation ? Well, there is no center point to start from the inside and even if we could start approximately somewhere close to the middle, we need to chop the matching brackets off. Manipulating the same input string over and over again is not a good idea either, it would not be efficient. In that case, should we copy the original string into some temporary storage ?

The input string itself is composed of an array of characters. We need some kind of a storage where we iterate through the input string and store every opening bracket until we find a closing bracket, a closing bracket of any type. Let’s consider the simplest data structure, an array. Let’s create a new array and use that to store each character until we encounter a closing bracket. If the input string is {{{[]}}}, the array could be filled up as shown below-

Using an array to find matching brackets

When we encounter the first closing bracket, ‘]’ , what should we do ? We need to check for a matching opening bracket , ‘[‘. It should be the latest opening bracket that was added into the array since the order has to be maintained, notice value at array[3]. This is also giving us an inside out visualization.

Should we add the closing bracket into the array as well ? Well, we don’t need to .Think about it, once we find a closing bracket in the input string, we simply match it with the opening bracket stored in the array, but with the one at the latest index. It looks like this might work, but what happens next ? We need to remove the latest item from the array once we find a matching element and adjust the index. After this step, in the input string , we might have another opening bracket(s) which needs to stored. There is going to be a lot of addition, deletion of elements, adjustment of the index in the array. Rather than implementing the array the way we just described, can we do better ? Just flip the array to the left by 90 degrees. What do you get ?

Does this data structure look familiar ?

Does it look familiar ? As we iterate through the string, we are adding elements to the array. When we find a closing bracket, we want to access the topmost element in the array. Once a match is found, we want to get rid of that element.The elements also seem to be ‘stacked’ over one another. Hello,Mr.Stack !

Using the Deque interface and not the Stack class

Well, we are clearly wanting to achieve the functionality of a stack.There is a Stack class in Java and a good number of blog posts solve this problem using the Stack class. We will not be using the Stack class as it is not recommended by the creators of the Java language itself. The preferred data structure to simulate a stack like behaviour is Deque. Deque is an interface, we will use the ArrayDeque class which implements the Deque interface. Can’t we implement our own stack ? Yes, we can, but for this problem, let’s make use of the ArrayDeque class which gives us that same functionality.

Using ArrayDeque

Since we want to add elements which is the same as pushing elements, and want to remove the topmost element which is equivalent to popping an element, the Deque interface has a corresponding push and a pop method. The ArrayDeque class implements these two methods. Internally, ArrayDeque is backed by an array, a resizable array.

When do we decide if the string is valid or invalid ?

The simplest condition is when we don’t find the same type of matching bracket. If the latest opening bracket does not match the closing bracket in the string, we return false. This takes care of the order of the brackets. An example of this is : { [ } }. The inner brackets don’t match. The stack will have the topmost element as ‘[‘ but we encounter the closing bracket as ‘}’and they don’t match. When there is a mismatch between the type of brackets, we return false.

Let’s take another example now, the input string is: “{ { } ” . The first two characters are added to the stack. When we reach the 3rd character, it is a closing bracket, we pop from the stack and find a matching bracket. But we have reached the end of the string and we still have an element left on the stack, the first opening bracket, ‘{‘. What does it mean if there are elements still left on the stack when we have already iterated through the entire string ? It means that we have not been able to find a matching bracket(s). We need to handle this scenario when we still have elements on the stack but we have iterated through the entire string.This can be done using the isEmpty method on the ArrayDeque class. If there are still some element(s) left on the stack after we have iterated through the complete string, we return false.

What about the input string “{ } ]” ? The first bracket would be pushed on the stack. When we encounter the first closing bracket, we pop from the stack and check if they match. They do, we move on to the last character, ‘]’. It’s a closing bracket, we need to pop from the stack. But, wait a minute, the stack is empty! This means the closing bracket does not have a corresponding opening bracket. We found another condition, if the stack is empty when we encounter a closing bracket, we return false.

If we have not returned false from the three conditions which we have just seen, the input string is valid. What about an empty input ? Well, let’s consider an empty string or a string with just spaces in it to be a valid input.

Summary of the technical analysis
  1. We will be using the ArrayDeque class as a stack to push opening brackets and then pop the topmost element when we encounter a closing bracket in the input string.
  2. If we pop the topmost bracket from the stack and it does not match the corresponding closing bracket in the input string, we return false. This is a mismatch in the type of bracket.
  3. If the current input character is a closing bracket, we need to pop from the stack but if there are no elements on the stack, the stack is empty, we return false. There is a closing bracket but no corresponding opening bracket.
  4. If there are still some element(s) left on the stack even after complete iteration of the string, we return false. This indicates opening brackets without any closing ones.
  5. If the input string is an empty string, we return true.
Time for some actual code

Let’s translate the summary above into code.

Complete code

Let’s take a look at the first cut of our code.

public class ValidParentheses {

    public boolean isValid(String s) {

        if (null == s || s.trim().equals("")) {
            return true;
        }

        if (s.length() % 2 != 0) {
            return false;
        }

        int length = s.length();

        Deque<Character> stack = new ArrayDeque<>(length);

        for (int i = 0; i < length; i++) {
            if (s.charAt(i) == '(' || s.charAt(i) == '[' || s.charAt(i) == '{') {
                stack.push(s.charAt(i));
            } else {
                if (stack.isEmpty()) {
                    return false;
                }
                switch (s.charAt(i)) {

                    case ')':
                        if (stack.pop() != '(') {
                            return false;
                        }
                        break;
                    case ']':
                        if (stack.pop() != '[') {
                            return false;
                        }
                        break;

                    case '}':
                        if (stack.pop() != '{') {
                            return false;
                        }
                        break;
                }
            }
        }

        return stack.isEmpty();
    }
}
  1. Lines 5-7 check for an empty string and return true if the input is an empty string.
  2. Lines 9- 11 check if the string is of odd length, if it is , we return false since it means we have an extra bracket in the string without a match. This early check can save us space and time for the code to execute.
  3. Line 15 creates a deque(as a stack) with appropriate length. In the worst case, we will have all open brackets which will be pushed into the stack.
  4. Lines 18-20 check if the character under consideration is an opening bracket, and if it is one, it is pushed into the stack.
  5. If the character under consideration is not an opening bracket, it must be a closing bracket. Lines 20 -23 check if the stack is empty, it should not be empty since there should be an opening bracket on the stack.
  6. Line 24- 42 check if the brackets are matching. Matching is done between the appropriate pairs, ‘(‘ and ‘)’ , ‘[‘ and ‘]’ and finally between ‘{‘ and ‘}’.
  7. Finally, line 46 checks if the stack is empty. At this point, the stack must be empty since we have iterated through the entire string. If it is not, we have extra brackets which do not have any match.

We have discussed all the scenarios earlier, you can refer to the summary section of technical analysis. The code above works but it is a little bit raw, there is definitely room for refactoring the code. Some of the code can be extracted into private functions and we can also modify the code to use the new ‘switch’ construct.

Complete refactored code
package ds;

import java.util.ArrayDeque;
import java.util.Deque;

public class ValidParentheses {

    public boolean isValid(String s) {

        if (isEmptyString(s)) {
            return true;
        }

        if (isOddLength(s)) {
            return false;
        }

        int length = s.length();

        Deque<Character> stack = new ArrayDeque<>(length);

        for (int i = 0; i < length; i++) {

            if (isOpenBracket(s.charAt(i))) {
                stack.addFirst(s.charAt(i));
            } else if (stack.isEmpty()) {
                return false;
            } else {
                boolean isMatchingClosingBrace = isValidClosingBracket(stack, s.charAt(i));
                if (!isMatchingClosingBrace) {
                    return false;
                }
            }
        }

        return stack.isEmpty();
    }

    private boolean isValidClosingBracket(Deque<Character> stack, char c) {
        return switch (c) {
            case ')' -> (stack.removeFirst() != '(') ? Boolean.FALSE : Boolean.TRUE;
            case ']' -> (stack.removeFirst() != '[') ? Boolean.FALSE : Boolean.TRUE;
            case '}' -> (stack.removeFirst() != '{') ? Boolean.FALSE : Boolean.TRUE;
            default -> throw new IllegalStateException("Unexpected value: " + c);
        };
    }


    private static boolean isOpenBracket(char c) {
        return (c == '(' || c == '[' || c == '{');
    }


    private static boolean isEmptyString(String input) {
        return (null == input || input.trim().equals(""));
    }

    private static boolean isOddLength(String input) {
        return input.length() % 2 != 0;
    }
}

In case you want to read more about the new switch construct which was introduced in Java 14, I have a post which introduces you to the same.

Conclusion

This problem of parentheses matching must be solved in many computing applications. One such application is in compiler construction, where the compiler needs to determine how the different kinds of parentheses match up with each other in the source code.

The overall code is simple. It was quite important to identify and understand the usage of a stack to solve this problem. Once we realized that the stack is the appropriate data structure, it was important to use the ArrayDeque class to implement the functionality of the stack. Towards the end, we refactored the code which is something you should always strive for. Start refactoring once you have the basic working code. Clean code plays a very vital role in the long run !

References

https://cis300.cs.ksu.edu/stacks-queues/paren/

How to find the longest common subsequence in Java?

In this post we are going to solve the problem of finding the longest common subsequence. Driven by examples, we will begin by thoroughly understanding the problem and then look at an efficient technique to get to the solution. With this foundation, we will start writing the code in Java.

I have written a detailed blog on finding the longest common substring in Java. That post gives you a step by step explanation about why we chose a 2 dimensional array to represent the two strings, the technique applied to arrive at the solution and a complete walkthrough of the code. We will be using a very similar approach to solve the longest common subsequence problem but there are a few important differences between the two problems. Hence, in this post, we will concentrate mainly on the differences but I will also highlight some of the common points with respect to the solution between these two problems.

What is the meaning of longest common subsequence?

We know that a string is composed of characters. In the longest common substring problem, we need to find the common characters between the two strings but the characters have to be contiguous. In longest common subsequence problem, it is a subsequence, it does not have to be contiguous, that is the main difference! Let’s look at some examples.

Difference between longest common subsequence and longest common substring

As shown above, in the first example, when the input strings are “instagram” and “instantgrammar”, the first string, “instagram” occurs as a subsequence in the second string and it is also the longest among others. In the 2nd example, the longest common subsequence between “facebook” and “brook” is book. Finally, in the 3rd example, it is the complete string,tiktok, appearing as the longest common subsequence in ticktock.

Logical approach – Applying dynamic programming

We applied dynamic programming to solve the longest common substring problem. We will be using the same technique for this problem as well. The key to dynamic programming is –

  1. Breaking down the main problem into subproblems.
  2. Solving a subproblem by using the solution from the previous subproblem and then storing the solution to the current subproblem.

If we have two strings and reference the characters at a given index using ‘i’ and ‘j’, we need to find an answer to the question- “What is the longest common subsequence at i , j ? “. In the longest common substring, the answer is- The longest common substring at [ i , j ] is: 1 plus the longest common substring at [ i- 1, j- 1 ] , provided the characters at ‘i’ and ‘j’ are equal. Can we say the same about the longest common subsequence ? Let’s find out.

Finding out the longest common subsequence at a given index
Longest common subsequence at [i ,j] is not equal to [i -1 , j -1] when charAt[ i ] != charAt[ j ]

The characters under consideration at i and j as shown above, are ‘t’ and ‘k’ respectively, they are are unequal. Is the length of the longest common subsequence at [ i , j ] equal to the longest common subsequence at [ i – 1, j – 1 ], the previous subproblem ? If this is true, it would give us 2. But, it is not true ! If you look at the input strings, it is 3, the subsequence, “tik”.

Is the longest common subsequence at [ i , j ] , [ i-1 , j ] ?

Here, we see that the right answer is [ i -1 , j ]. At this point, “tik” is the longest common subsequence having a length of 3. But, is that check sufficient ? Let’s take a look at another example.

Longest common subsequence at [i , j] is not at [ i-1 , j -1] when charAt( i ) != charAt( j )

In this example, the characters at [ i , j ] are ‘a’ and ‘b’ respectively, they are not equal. As you can see, the longest common subsequence is not at [ i -1, j -1 ], this will give us 0. Will [ i -1 , j ] give us the right answer ?

Is the longest common subsequence at [ i- 1 , j ] ?

Well, in this example, it is not [ i -1 , j ] either. This will give us 0 but the right answer is 1, it is the character ‘a’. How do we get it ?

Longest common subsequence is at [i , j – 1 ]

The answer is the longest common subsequence at [ i , j-1 ] as shown above. In one case, when the characters at [i , j] were unequal, the longest common subsequence at [ i , j ] was at [ i -1 , j ] and for another one it was [ i , j – 1 ]. How do we decide which one to take ?

In football, the Golden Shoe award is given to the leading goalscorer in league matches from the top division of every European national league. Similarly, to find the longest common subsequence, we just take the maximum of [ i – 1 , j ] and [ i , j – 1 ]. To conclude, if the characters at [ i , j ] are not equal, the longest common subsequence at [ i , j ] is the maximum of { [ i , j -1 ] , [ i -1 , j] }.

Summary of the logical approach
  1. If the characters at [ i , j ] are equal, the answer is 1 + the value at [ i – 1 , j -1 ]. This is the same as longest common substring.
  2. If the characters at [ i , j ] are unequal, the answer is to take the maximum value from: [ i , j -1 ] and [ i -1 , j ].

Let’s take a took at a snapshot of our two dimensional array if we were to apply these 2 steps.

Partial results for longest common subsequence

The input strings are tiktok and ticktock. We compared each character in a row against a column in the longest common substring, we will follow the same procedure in longest common subsequence. When the first characters, ‘t’ in these strings are compared, they are equal. Hence, the value is one added to diagonal, which gives 1. Similarly, the comparison between ‘i’ in tiktok and “ticktock”, gives us 2 , one added to diagonal value.

When we compare the character ‘i’ with ‘c’, they are unequal. Here i = 2 and j = 3. So we need to consider : [ i – 1 ] [ j ] = value at [1] [3] = 1 , and [ i ] [ j – 1 ] = value at [2] [2] = 2. The maximum between 1 and 2 is 2. This is indicated with green color above.We apply the same process for the rest of the table. Note that [ i -1] [ j ] indicates the previous row and the same column, and [ i ] [ j -1 ] is same row but previous column.

Result shows complete table with common subsequence with a maximum value of 6

The table above shows you the result at the end of the procedure. As we can see, the length is 6, it is the length of the string, “tiktok”, which is the common subsequence between the two strings.

Breaking down the code
Logic for the comparisons

The code for creating and initializing the 2-d table remains the same as the code for finding the longest common substring. The major change involved is in the step when the characters under comparison are different.

for (int i = 1; i <= m; i++) {
    for (int j = 1; j <= n; j++) {
        if (x.charAt(i - 1) == y.charAt(j - 1)) {
            lcsTable[i][j] = 1 + lcsTable[i - 1][j - 1];
            if (maxLength < lcsTable[i][j]) {
                maxLength = lcsTable[i][j];
                maxRow = i;
                maxColumn = j;
            }
        } else {
               lcsTable[i][j] = Math.max(lcsTable[i - 1][j], lcsTable[i][j - 1]);
        }
    }
}

If the characters at an index [i , j ] are not equal, the code on line 11 takes care of finding the maximum of 2 values. If you need an in depth explanation of of the other lines in the code, please refer to my earlier post on finding the longest common substring. The code above will fill up the table with appropriate values.

Retrieving the longest common subsequence

Now that we have our table ready, it’s time to get the actual common subsequence with the longest length. The maxLength, maxRow and maxColumn values are maintained for the same purpose. The values in maxRow and maxColumn reflect the cell where the maximum length was found.

To get the actual subsequence,we travel backwards with the same logic we travelled forward. If the characters at a particular index [ i , j ] or [row, column] are equal, we take that character and move backwards to the diagonal. If they are unequal, we decide the maximum between [ i – 1 , j ] and [ i , j – 1 ]. If [ i -1 ] [j] is greater, we go to the previous row else to the previous column. This is simply a reduction in index value of either i or j.

Tracing backward to get the actual longest common subsequence

To help you to trace backwards, start at [ 6, 8] which shows a maximum length of 6. Notice the row and column number along with the character. If they are all in same color, it’s a match and we go to the diagonal. Now, we are at cell [ 5, 7 ]. The previous row is [4 , 7] and previous column is [5 , 6 ], see the length shown in circles ( full and dotted). We take the maximum value(full circle) and decrement either the row number or column number by 1.(In his case it is the column number). This is just the exact opposite of how we filled the value at cell, [ 5 , 7 ]. The code for this track back is-

        StringBuilder longestSubsequence = new StringBuilder();
        while (maxRow >= 1 && maxColumn >= 1) {
            if (x.charAt(maxRow - 1) == y.charAt(maxColumn - 1)) {
                longestSubsequence.append(x.charAt(maxRow - 1));
                maxRow--;
                maxColumn--;
            } else {
                if (lcsTable[maxRow - 1][maxColumn] >= lcsTable[maxRow][maxColumn - 1]) {
                    maxRow--;
                } else {
                    maxColumn--;
                }
            }
        }
Complete code
package ds;

public class LongestCommonSubsequence {

    private static String findLongestCommonSubsequence(String x, String y) {
        int m = x.length();
        int n = y.length();
        int[][] lcsTable = new int[m + 1][n + 1];

        int maxLength = 0;
        int maxRow = 0;
        int maxColumn = 0;

        // init first row with 0
        for (int i = 0; i < m; i++) {
            lcsTable[i][0] = 0;
        }

        // init first col with 0
        for (int j = 0; j < n; j++) {
            lcsTable[0][j] = 0;
        }

        // starting from 1 as row 0 and col 0 filled with 0. <= since it has go up to
        // string length.
        for (int i = 1; i <= m; i++) {
            for (int j = 1; j <= n; j++) {
                if (x.charAt(i - 1) == y.charAt(j - 1)) {
                    lcsTable[i][j] = 1 + lcsTable[i - 1][j - 1];
                    if (maxLength < lcsTable[i][j]) {
                        maxLength = lcsTable[i][j];
                        maxRow = i;
                        maxColumn = j;
                    }
                } else {
                    lcsTable[i][j] = Math.max(lcsTable[i - 1][j], lcsTable[i][j - 1]);
                }
            }
        }


        StringBuilder longestSubsequence = new StringBuilder();
        //Remember that row 0 and column 0 indicate absence of one of the strings.
        while (maxRow >= 1 && maxColumn >= 1) {
            if (x.charAt(maxRow - 1) == y.charAt(maxColumn - 1)) {
                longestSubsequence.append(x.charAt(maxRow - 1));
                maxRow--;
                maxColumn--;
            } else {
                if (lcsTable[maxRow - 1][maxColumn] >= lcsTable[maxRow][maxColumn - 1]) {
                    maxRow--;
                } else {
                    maxColumn--;
                }
            }
        }

        return longestSubsequence.reverse().toString();
    }


    public static void main(String[] args) {
        String a = "tiktok";
        String b = "ticktock";
        String longestCommonSubsequence = findLongestCommonSubsequence(a, b);

        System.out.println("The longest common subsequence is " + longestCommonSubsequence);

    }

}

Since we are tracking backwards now, notice that on line 58, we reverse the string.

Conclusion

Using the technique of dynamic programming, we were able to find a solution to this problem. A 2 dimensional array was sufficient to store the length found at every position in the two strings. By addressing a subproblem and storing the results in a table, we were able to determine the result for the next subproblem. The time complexity of finding the length of the longest common subsequence is O(m * n) where ‘m’ and ‘n’ are the lengths of the two strings.

The code for the longest common subsequence and longest common substring have a lot in common but I decided to keep them in separate blogs. I did not want to put everything into one post, I personally think keeping it separate will help you understand both better.Understanding longest common substring first followed by subsequence is a good idea.

How to find the longest palindromic substring in Java?

In this blog we are going to solve the problem of finding the longest palindromic substring. We will begin our journey by understanding the theory behind it and then, guided by examples, we will a take a detailed look at the approach to the solution.Finally,we will write the code in Java. Every corner condition in the code will be explained with appropriate examples so that you get a complete understanding of the code.

Understanding the meaning of longest palindromic substring

The input that is given to us is a string and we need to find the longest palindromic substring. Let’s break this down a little bit.

A substring is part of a string, a continuous sequence of characters in a string. If the input string is “abracadabra”, every single character is a substring. Strings like “ab”,”br”, “da”,”ra”, “abr”,”dabra” etc are all substrings. What is not a substring is “dbra” since it is not a continuous sequence of characters(‘a’ is missing) in abracadabra.

A palindrome is a word, phrase or a sequence of characters that reads the same backward as forward. Some examples of palindromes are kayak, rotor, or the names Hannah and Ava. You read it in the forward or backward direction and you get the same sequence of characters. Having understood the meaning of a substring and a palindrome, let’s understand the meaning of a palindromic substring.

In abracadabra, the highlighted string, “aca”, is a substring. It is also a palindrome and hence a palindromic substring. Another example is the substring,abba, in the string, dabba. By the way, dabba is an Indian-style tiffin box. Having understood the meaning of palindromic substring,we are left with finding the longest among them. Let’s jump on to an example right away.

Multiple palindromic substrings

As you can see in the above string, 3 palindromes have been highlighted. The first one, shown in red color is “ABA”, it has a length of 3. Then we have the palindromic substring, “ABCDEDCBA”, which is of length 9. Finally, we have the 3rd palindromic substring, “1234567887654321” which has a length of 16, which is the largest palindromic substring. With this understanding, let’s go ahead and analyse the approach to solve this problem.

Logical and technical analysis of the solution

A palindrome reads the same forwards as backwards. Let’s consider the strings rotor, dabba, bob and otto. Other than the fact that they are palindromes, what do you see?

Middle point in any palindrome

A very important observation is the fact that the center of a palindrome is either :

  1. A single character in the middle and the rest of characters are a mirror image on either side of that character.
  2. Two similar characters in the middle and then the rest of the characters are mirror images.

Given a string, we can look at it and then decide if it is case one or case two. But from the perspective of writing code, we will need to handle these 2 cases. I urge you to read and understand these 2 cases again since it is a very important factor to understand the solution from a theoretical and technical point of view.

This can occur anywhere in the string, it could be the first few characters in the string, it could be the last few characters, in the middle or elsewhere.

A quick overview of indexes in a string

The index in a string starts from 0. If the total length of the string is ‘n’, the index ends at n-1. I know you already knew this but a quick refresher always helps.

String index starts from 0 and ends at n-1 for a string of length ‘n’

In every image and example so far, have you observed a dotted box around the center of the palindrome? It’s either a single character or 2 similar characters. I want you to imagine the case with 2 similar characters in the middle as a single one. Just remember this, we will revisit this soon. Now, let’s look at the procedure to get to the solution considering the 2 cases about the center of the character.

Logical steps to get to the answer

We know that around the center of a string, which is a palindrome, the characters on the left and right must be a mirror image. If they are not, it is not a palindrome.

General procedure

We assume that each character in a string is a central point of a potential palindrome. With that as our reference, we compare the character on the left side with the right side. If they are equal, we continue checking by moving further towards the left and right. If they are unequal, we move to next character which will again be considered as a potential central reference.We need to keep on repeating this procedure until we reach the end of the string. At each step, if we find a potential palindrome, we need to maintain the length so that at the end of it , we can find out the one with maximum length. This will be the overall procedure.

The case of two characters in the middle

Do you remember the 2 scenarios about the character(s) at the center ? The first case has a single character in the middle. The 2nd case has 2 characters in the middle and here, the characters will be next to each other. How do we decide which one to take ?

For every character under consideration, we check for a palindrome considering both cases but take the maximum length of the two. Remember, we can visually see and determine it, but both cases needs to be dealt with in the code. These 2 cases need to be part of the general procedure described earlier.

A visual representation of the procedure
Procedure to find the palindromic substring up to ‘b’

Notice that for each index, we are checking the two cases and then moving to the next character. The execution continues below for i = 2.

Procedure continues with i =2 and centre as ‘bb’

Some of the important things to notice about this procedure are the following:

  1. We iterate through the string only once.
  2. When we move the towards the left and the right after considering a center point, we need to ensure that we don’t move towards the left, beyond the first character and the same holds true for the movement towards the right.
  3. Towards the end of the procedure, we are able to find out the maximum length of a palindromic substring, but we need to also find out what that substring is. In the above case, it is “abba”. To do this, we will need to maintain the starting and ending indices of a substring whose length is maximum at any given point in the procedure.
Technical analysis to arrive at the complete code

We need to fix or assume each character as the center point and start exploring towards the left hand side and right hand side to check if it is a palindrome. I know I am kind of reminding you again, but consider the 2 cases – single character as center and 2 characters at the center. We need to find the maximum among the two.

Iterating ‘n’ times where ‘n’ is the length of the string

Let’s name the method which checks for the matching characters on the left and right as “exploreFromCenter”. This method will keep checking if the characters on the left and right of that middle point are the same and return the length of the palindrome.

for (int i = 0; i < s.length(); i++) {
    int lengthSingleCharacter = exploreFromCenter(s, i, i);
    int lengthTwoCharacters = exploreFromCenter(s, i, i + 1);
    int length = Math.max(lengthSingleCharacter, lengthTwoCharacters);
}

A ‘for’ loop is required to consider each character as the middle point. We call the method, exploreFromCenter, which returns the length of the palindrome. It is called twice for the 2 cases. The first time, it is called with ‘i’ and the second time with ‘i+1’. Note that we are not changing the value of ‘i’ in this step.

When i = 0 , ‘d ‘ is the center and then ‘da’
Code for exploring around the central character
private static int exploreFromCenter(String s, int left, int right) {

        if (s == null || left > right)
            return 0;

        while (left >= 0 && right < s.length() && s.charAt(left) == s.charAt(right)) {
            left--;
            right++;
        }

        return right - left - 1;
}
  1. Other than the input string, the input to this method is the index of the single character or indices of 2 characters, current index and the next one.
  2. Since we are moving towards the left and right by one, we decrement left index by one and increment right index by one. But, we cannot move the left and right index endlessly, can we ? We move only upto the first character, hence the condition left>=0. We can move the right pointer upto the last character and hence the condition, right < s.length(). Note, it is < than and not <= since we want to move upto the last character, strings begin from index 0.
  3. The last line of code, right – left – 1 , which returns the length of palindromic substring needs a bit of explanation. I hope the image below will give you a good idea about the same.
Explanation for the ‘right-left – 1’ in the code on the last line
Code for maintaining the largest length of the palindromic substring

We started writing the code with the ‘for’ loop which handled the 2 cases and found the maximum length among those 2 cases. But this happens for every character in the string. We also need to maintain the maximum length among all the substrings that we have found so far. How can we do that when the method, ‘exploreFromCenter’, returns only the length ?

We need to maintain the start and end index of each palindromic substring we have found so far. This start and end index will point to the first and last character of the palindromic string. Each time we find a bigger palindrome, we update/reset the start and end indices.The final positions of the start and end indices should give us a reference to the actual substring. Let’s expand our initial code further to incorporate this logic-

private static String findLongestPalindromicSubString(String s) {

        if (s == null || s.trim().length() == 0) return "";

        int start = 0;
        int end = 0;

        for (int i = 0; i < s.length(); i++) {
            int lengthSingleCharacter = exploreFromCenter(s, i, i);
            int lengthTwoCharacters = exploreFromCenter(s, i, i + 1);

            int length = Math.max(lengthSingleCharacter, lengthTwoCharacters);
          

            if (length > end - start + 1) {
                start = i - ((length - 1) / 2); //length -1 for even length strings
                end = i + (length / 2);
            }
        }
        return s.substring(start, end + 1);
}

We have initialized the start and end indices to zero on lines 5 and 6. Later, we reset and maintain the indices to the largest palindrome. Remember, once the middle character is fixed, the left and the right index equally move by 1 until matching characters are found. This should give you an insight into lines 16 and 17.

Why does the start index subtract 1 from the length but end index does not ?

Since we have moved equally towards the left and the right and we have a reference to the center, we think that the total length divided by 2 will give us the actual result. But it does not, let’s go ahead and understand the reason behind this.

Retrieve rotor with i =2 and length of the palindromic substring as 5

When the input is “rotor” and i = 2, the 1st case where the single character is in the middle returns a palindromic length of 5. We need to find the start and end indices based on this. We do that with the help of the fact that i = 2 and we have moved equally towards the left and right. How do we get to the character ‘r’ on the left side in ‘rotor’ based on i = 2 and length = 5? We need to subtract something from ‘i’ and with the same logic, add something to ‘i ‘ to reach ‘r’ on the right side. Using this logic, if we were to do the following, we think we have the right result.

start = i - (length / 2);
end = i + (length / 2);

This seems to work. Since the length is 5, length /2 gives us 2. The value of i is also 2. So, 2 -(5/ 2) is equal to 0, this is where we want our start index pointer. The end index is calculated as 2 + (5/2) which gives us 4. Note that the length of the palindromic substring is 5 and it has a single character in the middle.

But let’s consider another example now, the one which we have been looking at for a long time now, dabba, which is making me a little hungry now ! But hold on, we are almost there, let’s understand this and then go and get our dabba.

Getting the substring “abba” from the length and current index of middle character.

The value of the start index is calculated as i – (length/2) which gives us 2 – ( 4/2) = 0. This will take the start index to 0. This is an incorrect result. The end index is calculated as i + (length/2), which is 2 + (4/2) = 4, this is where we want our end index to point to. To fix the logic for the start index, we need to subtract one from the length and then divide by 2.

start = i - ((length - 1) / 2);
end = i + (length / 2);

With this fix, start index will be 2-((4-1)/2) = 2 -(3/2) = (2 -1) = 1. This needs to be done for palindromes which are of even length and the fact that there are really two characters virtually considered as one in the middle.This needs to be adjusted in the total length. When i =2, we executed a call to the function,exploreFromCenter, but we sent the value of i and i+1.

The last line which makes a call to the substring function has end +1 as the 2nd parameter. This is done because the function,substring, returns all characters before that index, it does not include the character at the index specified as the second parameter.

Complete code
package ds;

public class LongestPalindromicSubstring {

    private static String findLongestPalindromicSubString(String s) {

        if (s == null || s.trim().length() == 0) return "";

        int start = 0;
        int end = 0;

        for (int i = 0; i < s.length(); i++) {
            int lengthSingleCharacter = exploreFromCenter(s, i, i); // Single character in the middle.
            int lengthTwoCharacters = exploreFromCenter(s, i, i + 1); //Two characters in the middle .

            int length = Math.max(lengthSingleCharacter, lengthTwoCharacters);
         

            if (length > end - start + 1) {
                start = i - ((length - 1) / 2);
                end = i + (length / 2);
            }
        }
        return s.substring(start, end + 1);
    }

    private static int exploreFromCenter(String s, int left, int right) {

        if (s == null || left > right)
            return 0;

        while (left >= 0 && right < s.length() && s.charAt(left) == s.charAt(right)) {
            left--;
            right++;
        }

        return right - left - 1;
    }

    public static void main(String[] args) {
        String answer = findLongestPalindromicSubString("HYTBCABADEFGHABCDEDCBAGHTFYW1234567887654321ZWETYGDE");
        System.out.println(answer);
    }

}
Conclusion

We were able to solve this problem with an overall time complexity of O(n2). We used a single ‘for ‘ loop but for each character we had to compare, move indices around to find the maximum length of the palindromic substring. Overall, the code is not difficult but there are some corner cases, adjustments which we had to handle in the code and these cases can be difficult to understand without going through multiple examples.

I hope you have not only got an answer to the ‘how’ parts of the solution but also the ‘why’ parts of it. Just remember the two cases about the single character or two characters in the middle and you will definitely be able to come up with the code.

Can we improve the time complexity ? Yes, we can ! This can be done using Manacher’s algorithm. But right now, I am definitely going to grab my ‘dabba’ , what about you ?

How to find the longest common substring in Java?

In this blog we are going to solve the problem of finding the longest common substring between two strings. Guided with examples, we will begin by getting a thorough understanding of the problem and the solution. We will then look at the approach theoretically and finally we will move on to write the code in Java. Like always, I will be explaining each line of the code so that you get a good understanding of not just the ‘how’ sections of the code but also the ‘why’ sections of the code.

What is the longest common substring ?

We are given two strings as input and we need to find the longest common substring between them. What does longest common substring really mean ? Let’s break the problem down, let’s understand the meaning of each word in ‘longest common substring’.

A substring is a part of another string. A common substring is a part of the string that occurs in both the strings. Longest common substring is not just a part of string that occurs in the two strings but also has the biggest length. Let’s look at some examples to understand this better.

Examples of longest common substring
Examples of longest common substring
  1. In the first example, the input is “instagram” and “instantgrammar”. There are many common substrings like “i”, “in”, “sta”, “ram”, “gra” and a few more but the longest is “insta” or “gram” which has a length of 4.
  2. The input in the second example is “facebook” and “brook”. It might seem like “book” is present in both strings but note that it does not occur as a continuous string and hence not considered as a valid outcome. The right solution is “ook” .
  3. Finally, when the input is “tiktok” and “ticktock”, there is a “ti” , “to” and “kt” common to them which have a length of two. There are substrings with the single matching characters as well. But the longest common substring is “kto” which occurs right in the middle of the two strings.

Now that we have understood the problem, the solution, and the meaning of the words, longest common substring, let’s go ahead and understand the technique that we will use to solve this problem.

Technique applied to arrive at the solution
Brute force ?

If we are given 2 strings as input, we need to find the longest common substring. When we think of a substring, the immediate solution that comes to mind is comparing the strings. What might seem like a simple approach to arrive at the solution is probably the following-

  1. We start by comparing first character from the first string with every character in the second string.
  2. If they are equal at some point, we will then have to compare the next character from the first string and look at the next character in the 2nd string and check for equality.
  3. At the end of step 1, if we don’t find a matching character in the second string, we will move to the next character in the first string, start comparing it with every character in the second string and repeat the same procedure.
  4. Between all these comparisons, if are able to find a common substring, we will also have to maintain the length of each one. This will enable us to find the length of the longest common substring.

This approach might seem simple but this brute force solution seems to be systematically listing all the possible candidates or combinations for the solution and keeps checking if it satisfies the condition of the solution. Let’s take a look at a better solution.

Using a table or a matrix to store results

What is clear is that we need to compare the characters in each string and find a matching substring that has the biggest length.

At any given point or index in the strings, if the characters are equal, what will be the length of the common substring at that index ? If we were to represent current characters under comparison with indices i and j, and the characters are equal, the length of the longest common substring will be the length calculated upto the index, plus one.

Length of longest common substring at an index

When the characters are equal as shown using red arrow on top half of the image,what is the length of the substring at that point ? It will be the length of the common substring before that, plus one. So for this particular string, it will be one, it is the first common string since we have found nothing common at the previous index.

Similarly, in the bottom half the image, the length of the common string will become one plus the previous value. In this case, it will be 2. So we could generalize this a little bit and say that- At any given index, i , j of two strings, if the two characters at i and j are equal, the length of the common substring will be length at ( i – 1, j – 1 ) + 1 .

What we have done by deducing this is actually solved a subproblem. We are able to find the length at a particular index, which is a part of the main problem. What we are also doing is making use of the results from the previous index which is the previous subproblem. This technique is called dynamic programming.

During every comparison , we need to store the length at the appropriate indices. We have 2 indices, i and j. Let’s try and represent the 2 strings using a 2-dimensional matrix as shown below.

2 dimensional matrix to store the lengths

If we are given 2 strings, x and y as “facebook” and “brook”, respectively, we could represent it as shown above. This is a row,column format where a particular [row, column] combination represents a character each from the two inputs strings. The extra row and column represents a case when the input string is empty. The 0 in that row and column will represent that the length of common substring is zero when one of the the strings is empty.

How to make use of this representation?

We start comparing the characters in the two strings. If they are not equal,we enter a 0. What if they are equal ? Do you remember the generalized case ? If they are equal, at any index, [ i , j ] , the length will be [i – 1, j – 1] + 1. What is the cell [i -1 , j – 1] ? If ‘i’ represents the row and ‘j’ the column, it means the value at the previous row and previous column, it is the diagonal. Let’s take a look-

Obtaining the value at [i , j] using diagonal element

If we are at i = 2 and j = 1, [2,1] , it represents the 3rd character in first string and 1st character in 2nd string. If they are equal, we need to get the length of [i -1 , j -1 ]. So, reducing the row and column value by 1 in [ 2 , 1 ] gives us [ 1, 0] which is the diagonal element. We add one to the value at diagonal element and store it at that cell. This will indicate that we have found another continuous matching character and the length is 1 more than the length of the previous substring.

With this representation in place, we need to fill this table up with values corresponding to the comparison. The logic during this comparison will be:

  1. If the characters are equal, add 1 to the value at the diagonal and store the value at cell under consideration.
  2. If they are unequal, enter 0 at that cell.

With this 2 step procedure, the strings “facebook” and “brook” can be represented as follows-

We start by comparing the character ‘F’ in “FACEBOOK” with every character in the string “BROOK”. If there is a match, we add one to the value at diagonal and enter this value at corresponding cell. If they are unequal, we simply enter 0 in that cell.

As you can see, we have a few substrings. To track some of them, simply look at the cell with a value greater than or equal to one. Some of the substrings are “B”,”O”. A substring of length 2 indicates the substring “OO” after we match the 2nd ‘O’.

Finally, we can see a substring of length 3. We got this value because of a matching “OO” before the matching “K”. The value at this cell is obtained by adding 1 to the value at diagonal element which has value 2. We have reached the end of the strings.

Tracking the longest common substring

We made use of a two dimensional table and filled each cell with the length at a particular index, [ i , j ], of the 2 strings. When we reached the end, we were able to calculate the length of the longest common substring. That is definitely what we need, but in addition to that, we also need to know what the substring is. Let’s revisit our table with some additional information.

Table shows row and column values for longest common substring

The table above shows row and column values as well. Can we make use of the row and column values to track the string “OOK” ? We have found the maximum length, 3, which is shown at the bottom right corner. What row and column does it belong to ? It belongs to [8 , 5]. Mapping this row number 8 to the 8th character in the string,”FACEBOOK”, which character do you see at that position ? The letter at that position is K. This can be accessed as character number 8 of the string, “FACEBOOK. We got our first character! What do we do next ?

We travel the diagonal which is reducing the row number by a value of 1, this gives us 7. Let’s access character number 7 in the string “FACEBOOK”. It is the character ‘O’, we append it to the previous letter, ‘K’. This gives us KO.We got our 2nd character. Taking the next diagonal element gives us row number 6. Accessing character number 6 in the string “FACEBOOK” give us ‘O’. So far, we have “KOO”, 3 characters. We should stop now since the maximum length is 3 and we extracted 3 characters from the table and created a substring out of it. We simply reverse the substring, “KOO” to get “OOK”. We have got our desired result !

An important thing to remember is that in order to get the actual substring, we start the journey along the diagonal from the point that we have found the maximum length. This could be at the end of the string,in the middle or at the beginning. So it’s important to keep track of the maximum length. Let’s look at another example to solidify our understanding.

Largest common substring table for strings “tiktok” and “ticktock”

If the 2 strings are “tiktok” and “ticktock”, the largest common substring is “kto” as shown above. We start tracking the string from the maximum length,which is 3. We access the character at that particular index and take the path of the diagonal three times since the maximum length is 3. That gives us the longest common substring. Having understood the theory with these examples, I think it is time to summarize what we have done so far and move on to the code.

Summary of the technique
  1. We used a table, a 2 dimensional table to represent the two strings. Each cell in this table helped us in representing the length of the common substring at a given point. This gave us a nice { row , column } pair.
  2. We compared each character in the first string with all the characters in the 2nd and filled up each cell in the table with appropriate length-
    • If they were equal, value was calculated as 1 + [value at diagonal]. Cell at diagonal is [ i – 1 , j – 1 ].
    • If unequal, value was 0.
  3. We kept track of the maximum length as we progressively filled up the table.
  4. Once we reached the end of table or the string, we picked this maximum value and started travelling across the diagonal since that gave us the longest common substring.

I think we have gone through a lot of theory, we need to get technical now, let’s translate this summary into code.

Step by step analysis of the code

We need a table or matrix to represent the length at each point [i , j] of the strings. How do we represent a matrix like this ? We need to also store the length up to a certain index, [ i , j ]. The strings are of fixed length. We can do this with an array, a 2-d array. Let’s call that array, lcsTable. An array has a size.What should be the size of the 2-d array ? One thing to note is that we have 1 additional row and column filled with zeros to indicate the absence of one of the strings. The number of rows can be the size of the first string and number of columns is the size of the 2nd string. Let’s represent the length of the first and second string with ‘m’ and ‘n’ respectively. Now, our 2-d array can be initialized as follows-

private static void lcs(String x, String y) {
        int m = x.length();
        int n = y.length();

        //to compensate for additional row and column with 0
        int[][] lcsTable = new int[m + 1][n + 1];
}

The first row and the first column needs to be filled up with zeros. So we require two ‘for’ loops which accesses lcsTable[i][0] and lcsTable[0][j] and assigns them to 0. The first loop keeps column constant and changes row number and 2nd one keeps row constant and changes column number. The code for this will be-

Fill the first column with zeros
 // init first column with 0
 for (int i = 0; i < m; i++) {
    lcsTable[i][0] = 0;
 }

   
Fill first row with zeros
// init first row with 0
for (int j = 0; j < n; j++) {
    lcsTable[0][j] = 0;
}

Logic for actual comparisons
for (int i = 1; i <= m; i++) {
            for (int j = 1; j <= n; j++) {
                if (x.charAt(i - 1) == y.charAt(j - 1)) {
                    lcsTable[i][j] = 1 + lcsTable[i - 1][j - 1];
                    if (maxLength < lcsTable[i][j]) {
                        maxLength = lcsTable[i][j];
                        maxRow = i;
                        maxColumn = j;
                    }
                } else {
                    lcsTable[i][j] = 0;
                }
            }
}
  1. At lines 1 and 2, we have a nested for loop. The index ‘i’ iterates through the characters in the first string, x, and ‘j’ iterates through string ‘y’. So each character in the string ‘x’ is compared with the characters in string ‘y’. We have started from index 1 since we have row 0 and column 0 to represent a condition where one input string is empty/missing.
  2. Line 3 checks if the characters are equal. An ‘i-1’ and ‘j-1’ is required since first character in a string begins at index 0 but loop starts from index 1. If they are equal , we access the diagonal cell which is [ i-1, j -1] and add 1 to it.
  3. The variable, maxLength, is required as we need to know the maximum length which can be anywhere in the table and not necessarily at the end.
  4. We have also introduced 2 new variables, ‘maxRow’ and ‘maxColumn’. This will point to the actual cell where the maximum length of substring was found. We will then use these variables to access the substring as shown in the previous examples.
Shows intermediate stage, comparisons begin at [1,1]
Code for retrieving the longest common substring

Once the table is filled and we have completed all iterations, the maximum length is in place. We need to find the actual string. We can use the variables, ‘maxRow’ and ‘maxLength’ for this. When our strings were “tiktok” and “ticktock”, we can see that the value of ‘maxRow’ is 5 and ‘maxLength’ is 3.

Mapping the results of lcs using row values, maxRow is 5 and maxLength is 3

As you can see above , value at cell [ 5 , 6 ] is 3. The row value in [ 5 ,6 ] is 5. If you refer to the string, ‘X’ which is “tiktok” and access 5th character, we get ‘o’. You could consider using the variable,’maxColumn’. In that case the string we must use is “ticktock”. We extract the characters three times since the maximum length that has been found is 3. The code for this will be:

StringBuilder longestCommonSubstring = new StringBuilder(maxLength);
while (maxLength > 0) {
    longestCommonSubstring.append(x.charAt(maxRow - 1));
    maxRow--;
    maxLength--;
}

Finally, don’t forget to reverse the string ! Note that we access the character in the input string by reducing 1 from maxRow. Remember that we have 1 additional row and column and strings begin at index 0.

Putting all the snippets of code together
package ds;

public class LongestCommonSubstring {

    public static String findLCS(String x, String y) {
        int m = x.length();
        int n = y.length();

        //to compensate for additional row and column with 0
        int[][] lcsTable = new int[m + 1][n + 1];

        // to find the maximum length
        int maxLength = 0;
        int maxRow = 0;
        int maxColumn = 0;

        // init first row with 0
        for (int i = 0; i < m; i++) {
            lcsTable[i][0] = 0;
        }

        // init first col with 0
        for (int j = 0; j < n; j++) {
            lcsTable[0][j] = 0;
        }

        // starting from 1 as row 0 and col 0 filled with 0. <= since it has go up to string length.
        for (int i = 1; i <= m; i++) {
            for (int j = 1; j <= n; j++) {
                if (x.charAt(i - 1) == y.charAt(j - 1)) {
                    lcsTable[i][j] = 1 + lcsTable[i - 1][j - 1];
                    if (maxLength < lcsTable[i][j]) {
                        maxLength = lcsTable[i][j];
                        maxRow = i;
                        maxColumn = j;
                    }
                } else {
                    lcsTable[i][j] = 0;
                }
            }
        }
        return fetchLCS(x, maxLength, maxRow, maxColumn);

    }

    private static String fetchLCS(String x, int maxLength, int maxRow, int maxColumn) {
        System.out.println("The length of longest common substring is: " + maxLength);

        System.out.println("Max Row is: " + maxRow);
        System.out.println("Max Column is: " + maxColumn);

        StringBuilder longestCommonSubstring = new StringBuilder(maxLength);

        while (maxLength > 0) {
            longestCommonSubstring.append(x.charAt(maxRow - 1));
            maxRow--;
            maxLength--;
        }

        return longestCommonSubstring.reverse().toString();
    }

    public static void main(String[] args) {

        String x = "tiktok";
        String y = "ticktock";

        String longestCommonSubstring = findLCS(x, y);

        System.out.println("The longest common substring is " + longestCommonSubstring);
    }

}
Conclusion

We solved the problem of finding the longest common substring by solving the subproblem at a given index where we made use of the earlier results stored in a table. This is the essence of dynamic programming and it is a very effective technique to solve these types of problems. This technique is much better and it is a very good solution as compared to the brute force technique which we saw at the beginning. The time complexity using the dynamic technique approach is O(m*n), m and n are the lengths of the 2 strings.

Can we improve this any further ? Yes, we can! Using a suffix tree we can improve this solution even further but I think the solution using suffix trees deserves to be in another blog.

Solution to the maximum subarray problem in linear time : Complete code in Java

In this blog we are going to discuss the problem of finding the maximum contiguous sum of elements in an array. Now, that was a mouthful ! Don’t worry, we will break this problem up step by step so that you will get a thorough understanding of the problem and the solution. This problem is also referred to as the Maximum subarray problem.

Our first approach to the solution will be the usage of brute force technique and then we will move on to solve the problem using Kadane’s algorithm. As we write the code in Java, we will keep improving the code so that we get a complete understanding of each line of code required to solve this problem.

What is the maximum contiguous sum in a subarray ?

We have an array of integers as input. The array can contain both positive and negative values. Let’s break the 4 important words in ‘Maximum contiguous sum in subarray‘ :

  1. Subarray means part of an array.
  2. Sum is the addition of the elements of an array.
  3. Contiguous means a sequence, no breaks. Hence, contiguous sum means the addition should be between a sequence of values one after the other without a break.
  4. Maximum contiguous sum is finding the maximum value after adding such a sequence of values.

Let’s look at a few examples to understand this better-

Examples of contiguous and non contiguous subarray
Example of contiguous and non-contiguous elements in array

Barring the last one, all the other examples show contiguous elements in an array. It can be only a single element anywhere in the array. It can be 2,3…n elements as long as there is no break in between. Since these are all part of the array, we refer to them as subarray. In the last example, the green blocks form a subarray but they are not contiguous and hence cannot be part of the solution. Having understood the concept of a contiguous subarray, we need to find the maximum sum contiguous subarray.

Examples of maximum sum contiguous subarray
Examples of maximum sum contiguous subarray, answers are on the right

We have understood the concept of a contiguous subarray, now let’s understand the concept of maximum sum in a contiguous subarray. First, take a look at examples 1,2,3 and 4, they are quite straight forward. Starting from the leftmost element, add the elements one by one without breaking the sequence and try to get a maximum value as shown on the right.

Let’s take a look at example number 5. At first glance, it might seem that the answer is 12. But it’s not ! Let’s start from the left, initially, the sum is 0. Adding 10,the first element in the array, gives us 10. Add the 2nd element, -5 to it, gives us 5. Now we add 12 to it, this gives us a total of 17 which is bigger than 12.

Example 6 is also straight forward. Here, it might seem that 10 + 20 = 30 which is bigger than 20, but note that 10 and 20 are not contiguous. Let’s move to the last example, number 7.

Let us approach example number 7 step wise –

  1. We start in a similar way.Initially, the sum is 0, adding 30, the first element in the array to the sum, gives us 30. The maximum value so far is 30.
  2. Add -40 to it and we get -10. We don’t change the maximum value as it is greater than -10.
  3. Add 20 to it, we get a 10 and we again don’t change the maximum value.
  4. If we were to add 40 to it, we get a 50. At this point, we have a new maximum value of 50. But wait a minute…
  5. If you consider only the 2 last elements which form a subarray and add their values, we get 60, which is bigger than 50. So, 60 is the right answer.
  6. Let’s revisit step 2 and 3. The result upto step 2 is -10. Adding the next element, 20, gives us 10. But when we add that current element,20,the outcome of the addition results in lower value than the current element,20. Let’s rephrase it, the current element has higher value as compared to adding it to previous sum. This means, we can reset the continuity or discard the previous sum. Remember this, it is an important condition to solve this problem.

I hope that makes sense. If the current element by itself has a bigger value than adding it to the previous sum, it means we can discard the previous sum since we are on a quest to find contiguous maximum sum. If we did not do this, the previous sums will act as a deterrent to our overall sum. With this, we also restart the continuity from this point onwards.

Using brute force to solve the problem

Let’s consider the following example

30-402040
Sample input to find maximum contiguous subarray using brute force

We need to find the maximum sum contiguous subarray. A subarray as we know can be just a single element or a sequence of ‘n’ elements. Let’s take a simple approach to solve this problem.

  1. We put a marker at 30, initialize the sum to 0 and –
    • Add 30 to the sum , resulting in 30.
    • Add -40 to current sum, resulting in a sum of -10.
    • Add 20 to current sum, resulting in a sum of 10.
    • Add 40 to resulting sum, resulting in a sum of 50.
    • At every step above, compare the sums to maintain a maximum value among them. At the end, what we have done is explored every possible contiguous subarray from the first element, 30 and found maximum value among them.
  2. Add a marker at -40, re-initialize the sum to 0, maintain the current maximum value from step 1 and –
    • Add -40 to the current sum, resulting in -40.
    • Add 20 to the current sum, resulting in -20.
    • Add 40 to the current sum, resulting in 20.
    • At each step, compare the sum with the current maximum value to find maximum among them. All possible contiguous subarrays starting from the second element, 40 have been explored.
  3. Add a marker at 20, initialize the sum to 0 and –
    • Add 20 to current sum, resulting in 20.
    • Add 40 to current sum, resulting in 60.
    • At each step, compare each sum with maximum and find maximum among them. All the contiguous subarrays starting from 20 have been explored.
  4. Add a marker at 40, initialize the sum to 0 and:
    • Add 40 to current sum, resulting in 40,
    • In this step, compare the sum,40, with maximum obtained in steps above and set appropriate value.
    • This is the end of the array,we should have found maximum value by now.

Why are we putting markers ? We are putting these markers to have a kind of starting point for the subarray. A subarray can start at every element in the array. For an array of size 4, we put 4 markers. So, I hope you can imagine a ‘for’ loop from 0 – n representing each marker.

What do we do for each marker ? From the element being pinned (Sorry, too much usage of Zoom) as a marker, we first initialize sum to 0. Then we start adding the numbers after that pinned element upto the last element. So, another for loop inside the already existing loop, we get a nested for loop. Inside the nested for loop, we find the sum. So, sum = sum + element. Remember, we need to find the maximum at each step now. This can be done using another variable which will be compared against sum.

Brute force approach

As you can see above, the marker starts at the leftmost element and keeps moving ahead one by one. For each position of the marker, we iterate from the marker upto the end of the array and keep finding the sum. At every iteration ,we evaluate if we have found a new maximum value.

Code using brute force approach
public class MaxSumSubArray {

    public static void main(String[] args) {
        
        int[] arr = {30, -40, 20, 40};

        int sum = findMaxSum(arr);
        System.out.println("The maximum contiguous sum in the sub array is : " + sum);
    }

    private static int findMaxSum(int[] arr) {

        int currentSum;
        int maxSum = 0;

        for (int marker = 0; marker < arr.length; marker++) {
            //reinitialize sum when marker moves ahead.
            currentSum = 0;

            for (int j = marker; j < arr.length; j++) {
                currentSum = currentSum + arr[j];
                if (currentSum > maxSum) {
                    maxSum = currentSum;
                }
            }
        }
        return maxSum;
    }
}

Notice that on line 20, j is initialized to marker which moves ahead by one each time. We don’t want to find subarrays every time from the first element.

Does the code work for all inputs ?

The input that we passed has a mix of positive and negative numbers. What if we pass the input as a set of all negative numbers ? If we were to pass the input as {-30, -40 , -20 , -40} , we get the output as 0. We need to fix our code to handle this scenario. Take a look at the following code snippet –

private static int findMaxSum(int[] arr) {

    ...
    int maxSum = 0;

    for (int marker = 0; marker < arr.length; marker++) {
        ...
        for (int j = marker; j < arr.length; j++) {
            currentSum = currentSum + arr[j];
            if (currentSum > maxSum) {
                maxSum = currentSum;
            }
        }
    }
    return maxSum;
}

As you can see, the maximum sum is initialized to 0. Since all the values in the array are negative values, the current sum is always a negative value, it is never greater than the maximum sum, which is 0. Hence the ‘if’ condition is never satisfied and the maximum sum is returned as 0.

The fix for this is changing a single line of code –

int maxSum = Integer.MIN_VALUE;

With this change, an input consisting of only negative values should work as expected.

Time complexity of the brute force approach

In the code above, there are 2 nested ‘for’ loops. For every position of the marker, we are iterating through the rest of the array. The time complexity of this code is hence O(N2). If you are completely new to the Big O notation, I would recommend this excellent article by Rob Bell.

The solution works and it is a decent way to get started. Once we have a working solution, we should start thinking along the lines of improving the code and the performance of the code. But remember, don’t get obsessed about improving the performance of every piece of code you write !

We can definitely do better in solving this problem by improving the time complexity to O(n). Let’s explore Kadane’s algorithm.

How to pronounce Kadane ?

Before we get started, I would like to sidestep a little bit with respect to the pronunciation of Kadane. We have a cricketer who goes by the name Ajinkya Rahane. His surname, Rahane , is pronounced as Ra-ha-‘nee’ with an emphasis on the ‘ne’. Kadane is not pronounced like that. It’s more like pronouncing the ‘cane’ in sugarcane or the way you pronounce ‘Dane’ when you refer to someone from Denmark. Peter Schmeichel is the most capped Danish footballer of all time, and the first Dane to reach 125 caps. There is no drag on the ‘ne’ in Dane. Similarly, don’t drag on the ‘ne’ in Kadane. By the way, his full name is Joseph Born Kadane, you can read more about him here. Hey Arsenal fans, speaking of Danes, you can’t forget the great Nicklas Bendtner, can you ? Let’s get back to Kadane’s algorithm now.

Kadane’s algorithm

You can get a good overview of the algorithm here. What is important to note is that, at any given index, the algorithm computes the subarray with the largest sum upto that index. Let’s get into an example.

30-402040
Example for Kadane’s algorithm

This is the same example which we considered when we used the brute force approach. When we went through the process step by step using brute force, do you remember a condition which was going to play a crucial role in solving this problem ?

If the current element by itself has a bigger value than adding it to the previous sum, it means we can discard the previous sum since we are in a quest to find maximum contiguous sum. If we did not do this, the previous sums will act as a deterrent to our overall sum.

Reset the sum since we found a better point for finding max sum continuity

The sum obtained up to the 2nd element is -10. Now, the next element is 20. If we add the previous sum of – 10 to 20, we get 10. However the current element, 20, is bigger than the sum. Why should we carry a baggage of lesser value as compared to the current element which is much bigger ? So, we reset the continuity and now the sum and the maximum value should be 20 from where we move on to the next element.

So in short, we need to take the maximum between the current element and the sum obtained by adding the previous sum to current element. If the current element is bigger, we reset the sum and the subarray. What if it is not ? What if the sum is bigger ? To get our answer , let’s consider the element 40, which is the next element in the array.

What does it mean if the current element is smaller than the sum obtained by adding current element to previous sum ? It means, the sum, the bigger value, can help us further as it has enhanced our overall value as compared to current element. It also means that the previous sum does not act as a deterrent and we continue with our sequence of elements.

Continue with the sum and the sequence since addition is an enabler and not a deterrent

So, in short, for every element in the array, we need to –

  1. Find the maximum value between the current element and the addition of the current element to the previous sum.
  2. If the current value is bigger, we need to reset. It will also give us a new value to the sum so that we can to continue our exploration further in the array.
  3. If the maximum between the two is the value of the addition of the current element and the previous sum, this value becomes the new sum.
  4. Once steps 1, 2 and 3 are done, remember, we need to maintain the max value like we did in the brute force approach.
Code for Kadane’s algorithm
public class MaxSumSubArray {

    public static void main(String[] args) {

        int[] arr = {30, -40, 20, 40};

        int sum = findMaxSum(arr);
        System.out.println("The maximum sum in the sub array is : " + sum);
    }

    private static int findMaxSum(int[] arr) {
        int sum = 0;
        int max = 0;

        for (int i = 0; i < arr.length; i++) {
            sum = Math.max(sum + arr[i], arr[i]);
            max = Math.max(sum, max);
        }
        return max;
    }
}

In the ‘for’ loop, we have simply translated the theory that we just understood into code. We find the maximum between the current element ( arr[i] ) at hand and the addition of the previous sum and the current element (sum + arr[i]). The one that has maximum value becomes the new sum.

sum = Math.max(sum + arr[i], arr[i]);

Once we have the new sum, we check if it is the new maximum and if it is, we assign it to the variable, max.

            max = Math.max(sum, max);

Did you notice that there is a single ‘for’ loop ? There are no nested loops ! The time complexity of this code using Kadane’s algorithm is O(n) which is better than the brute force approach. We only iterate through the elements of the array once. Does the code work for all inputs ? What if we supplied an array of all negative values ? Unfortunately, it does not.

Like the brute force approach, we have initialized the variable, max, to 0. One option is to assign it to Integer.MIN_VALUE, like we did in the brute force approach. There is another way to do it as well, the first element of the array can be assigned to the variables sum and max. We can then start our iteration from the 2nd element onwards.

private static int findMaxSum(int[] arr) {
        int sum = arr[0];
        int max = arr[0];

        for (int i = 1; i < arr.length; i++) {
            ...
        }
        return max;
}
Conclusion

We have seen 2 approaches to solve the problem of finding the maximum contiguous subarray. The first approach was simple but it had a time complexity of O(N2). On the other hand, Kadane’s algorithm has a time complexity of O(n), linear time, which is better than brute force. But, unless you know the algorithm before hand, it is difficult to come up with this solution at runtime, like, in an interview. But now I hope that after reading this blog, you will remember Kadane’s algorithm which is quite an effective technique to solve this problem.

If you still can’t remember the algorithm, it’s alright! I am sure that you will at least remember that the pronunciation is not ‘Kadanee..’ !