Dynamic Programming — Longest Common Substring

Objec­tive: Given two string sequences write an algo­rithm to find, find the length of longest sub­string present in both of them.

This prob­lem has been asked in Ama­zon and Microsoft inter­views. Approach to solve this prob­lem will be slightly dif­fer­ent than the approach in “Longest Com­mon Sub­se­quence

What is Longest Com­mon Sub­string: A longest sub­string is a sequence that appears in the same order and nec­es­sar­ily con­tigu­ous in both the strings.

Exam­ple:

String A = "tutorialhorizon";

String B = "dynamictutorialProgramming";

Output: Length of Longest Common Substring: 8 ("tutorial").

Approach:

Naive Approach:

Check all the sub­strings from first string with sec­ond string anxd keep track of the maximum.

Time Com­plex­ity: O(n2*m), O(n2) for the sub­string and O(m) for check all the sub­strings with sec­ond string.

Bet­ter Solu­tion: Dynamic Pro­gram­ming-

Ear­lier we have seen how to find “Longest Com­mon Sub­se­quence” in two given strings. Approach in this prob­lem will be quite sim­i­lar to that.

we will solve this prob­lem in bottom-up man­ner. Cre­ate a matrix of size of m*n and store the solu­tions of sub­strings to use them later.

Base Cases: If any of the string is null then LCS will be 0.

Check if ith character in one string A is equal to jth character in string B

Case 1: both characters are same

LCS[i][j] = 1 + LCS[i-1][j-1] (add 1 to the result and remove the last character from both the strings and check the result for the smaller string.)

Case 2: both characters are not same.

LCS[i][j] = 0

At the end, traverse the matrix and find the maximum element in it, This will the length of Longest Common Substring.

See the code for bet­ter explanation.

Code:

Out­put:

LCS length : 8
Longest Common Substring Matrix

Longest Com­mon Sub­string Matrix

You may also like...

  • Kamal Chaya

    Why is it 0 if char­ac­ters are unequal?

    • Zt Zeng

      Because what that matrix store is not lcs, but lcs end at that index. You can refer to my blog post for detail, if you don’t under­stand it well. http://tonyz93.blogspot.com/2016/09/longest-common-substring.html

      • Ankita

        Hey thanks for shar­ing your code/blog.
        Can you please explain time-complexity of the brute force.
        It is O(n * m * min(n, m)) .what I dont get is how min(n,m) came ?
        Thanks

        • Zt Zeng

          I updated my blog to make it clear.:)

          Just won­der, why don’t you post this com­ment under my blog?

  • Joshua Green­field

    There is a DP solu­tion which uses only O(n) space.
    # python 2.6
    def find_longest_fast_nspace(w1, w2):
    m, n = len(w1), len(w2)
    table = []
    max_len = 0
    i, j = 0, 0
    res = “”

    for j in xrange(n+1):
    table.append(0)

    old = table
    for i in xrange(m+1):
    for j in xrange(n+1):
    if i == 0 or j == 0:
    table[j] = 0
    elif w1[i-1] == w2[j-1]:
    table[j] = old[j-1] + 1

    temp = table[j]
    if temp > max_len:
    max_len = temp
    res = w1[i-max_len:i]

    else:
    table[j] = 0

    old = list(table)

    print “res = “, res
    return res

    # Time O(n^2)
    # Space O(n)

    • bevardis

      But the time is O(n^2). There’s a way to do it using gen­er­al­ized suf­fix tree, which, if I’m not mis­taken, requires O(n + m) space and the time is O(n + m)

      • Joshua Green­field

        Ya there is suf­fix tree solu­tion as well.