Dynamic Programming – Longest Common Substring

Objective: Given two string sequences write an algorithm to find, find the length of longest substring present in both of them.

This problem has been asked in Amazon and Microsoft interviews. Approach to solve this problem will be slightly different than the approach in “Longest Common Subsequence

What is Longest Common Substring: A longest substring is a sequence that appears in the same order and necessarily contiguous in both the strings.

Example:

String A = "tutorialhorizon";

String B = "dynamictutorialProgramming";

Output: Length of Longest Common Substring: 8 ("tutorial").

Approach:

Naive Approach:

Check all the substrings from first string with second string anxd keep track of the maximum.

Time Complexity: O(n2*m), O(n2) for the substring and O(m) for check all the substrings with second string.

Better Solution: Dynamic Programming

Earlier we have seen how to find “Longest Common Subsequence” in two given strings. Approach in this problem will be quite similar to that.

we will solve this problem in bottom-up manner. Create a matrix of size of m*n and store the solutions of substrings to use them later.

Base Cases: If any of the string is null then LCS will be 0.

Check if ith character in one string A is equal to jth character in string B

Case 1: both characters are same

LCS[i][j] = 1 + LCS[i-1][j-1] (add 1 to the result and remove the last character from both the strings and check the result for the smaller string.)

Case 2: both characters are not same.

LCS[i][j] = 0

At the end, traverse the matrix and find the maximum element in it, This will the length of Longest Common Substring.

See the code for better explanation.

Code:



Output:

LCS length : 8
Longest Common Substring Matrix

Longest Common Substring Matrix

__________________________________________________
Top Companies Interview Questions..-

Google Microsoft Amazon Facebook more..

If you find anything incorrect or you feel that there is any better approach to solve the above problem, please write comment.
__________________________________________________

  • Kamal Chaya

    Why is it 0 if characters are unequal?

    • Zt Zeng

      Because what that matrix store is not lcs, but lcs end at that index. You can refer to my blog post for detail, if you don’t understand it well. http://tonyz93.blogspot.com/2016/09/longest-common-substring.html

      • Ankita

        Hey thanks for sharing your code/blog.
        Can you please explain time-complexity of the brute force.
        It is O(n * m * min(n, m)) .what I dont get is how min(n,m) came ?
        Thanks

        • Zt Zeng

          I updated my blog to make it clear.:)

          Just wonder, why don’t you post this comment under my blog?

  • Joshua Greenfield

    There is a DP solution which uses only O(n) space.
    # python 2.6
    def find_longest_fast_nspace(w1, w2):
    m, n = len(w1), len(w2)
    table = []
    max_len = 0
    i, j = 0, 0
    res = “”

    for j in xrange(n+1):
    table.append(0)

    old = table
    for i in xrange(m+1):
    for j in xrange(n+1):
    if i == 0 or j == 0:
    table[j] = 0
    elif w1[i-1] == w2[j-1]:
    table[j] = old[j-1] + 1

    temp = table[j]
    if temp > max_len:
    max_len = temp
    res = w1[i-max_len:i]

    else:
    table[j] = 0

    old = list(table)

    print “res = “, res
    return res

    # Time O(n^2)
    # Space O(n)

    • bevardis

      But the time is O(n^2). There’s a way to do it using generalized suffix tree, which, if I’m not mistaken, requires O(n + m) space and the time is O(n + m)

      • Joshua Greenfield

        Ya there is suffix tree solution as well.

%d bloggers like this: