Dynamic Programming — Longest Common Substring

Objec­tive: Given two string sequences write an algo­rithm to find, find the length of longest sub­string present in both of them.

This prob­lem has been asked in Ama­zon and Microsoft inter­views. Approach to solve this prob­lem will be slightly dif­fer­ent than the approach in “Longest Com­mon Sub­se­quence

What is Longest Com­mon Sub­string: A longest sub­string is a sequence that appears in the same order and nec­es­sar­ily con­tigu­ous in both the strings.

Exam­ple:

```String A = "tutorialhorizon";

String B = "dynamictutorialProgramming";

Output: Length of Longest Common Substring: 8 ("tutorial").
```

Approach:

Naive Approach:

Check all the sub­strings from first string with sec­ond string anxd keep track of the maximum.

Time Com­plex­ity: O(n2*m), O(n2) for the sub­string and O(m) for check all the sub­strings with sec­ond string.

Bet­ter Solu­tion: Dynamic Pro­gram­ming-

Ear­lier we have seen how to find “Longest Com­mon Sub­se­quence” in two given strings. Approach in this prob­lem will be quite sim­i­lar to that.

we will solve this prob­lem in bottom-up man­ner. Cre­ate a matrix of size of m*n and store the solu­tions of sub­strings to use them later.

```Base Cases: If any of the string is null then LCS will be 0.

Check if ith character in one string A is equal to jth character in string B

Case 1: both characters are same

LCS[i][j] = 1 + LCS[i-1][j-1] (add 1 to the result and remove the last character from both the strings and check the result for the smaller string.)

Case 2: both characters are not same.

LCS[i][j] = 0

At the end, traverse the matrix and find the maximum element in it, This will the length of Longest Common Substring.
```

See the code for bet­ter explanation.

Code:

Out­put:

```LCS length : 8
```

Longest Com­mon Sub­string Matrix

You may also like...

• Kamal Chaya

Why is it 0 if char­ac­ters are unequal?

• Zt Zeng

Because what that matrix store is not lcs, but lcs end at that index. You can refer to my blog post for detail, if you don’t under­stand it well. http://tonyz93.blogspot.com/2016/09/longest-common-substring.html

• Ankita

Hey thanks for shar­ing your code/blog.
Can you please explain time-complexity of the brute force.
It is O(n * m * min(n, m)) .what I dont get is how min(n,m) came ?
Thanks

• Zt Zeng

I updated my blog to make it clear.:)

Just won­der, why don’t you post this com­ment under my blog?

• Joshua Green­field

There is a DP solu­tion which uses only O(n) space.
# python 2.6
def find_longest_fast_nspace(w1, w2):
m, n = len(w1), len(w2)
table = []
max_len = 0
i, j = 0, 0
res = “”

for j in xrange(n+1):
table.append(0)

old = table
for i in xrange(m+1):
for j in xrange(n+1):
if i == 0 or j == 0:
table[j] = 0
elif w1[i-1] == w2[j-1]:
table[j] = old[j-1] + 1

temp = table[j]
if temp > max_len:
max_len = temp
res = w1[i-max_len:i]

else:
table[j] = 0

old = list(table)

print “res = “, res
return res

# Time O(n^2)
# Space O(n)

• bevardis

But the time is O(n^2). There’s a way to do it using gen­er­al­ized suf­fix tree, which, if I’m not mis­taken, requires O(n + m) space and the time is O(n + m)

• Joshua Green­field

Ya there is suf­fix tree solu­tion as well.