Dynamic Programming – Edit Distance Problem

Objective: Given two strings, s1 and s2 and edit operations (given below). Write an algorithm to find minimum number operations required to convert string s1 into s2.

Allowed Operations:

  • Insertion – Insert a new character.
  • Deletion – Delete a character.
  • Replace – Replace one character by another.

Example:

String s1 = "horizon"
String  s2 = "horzon"
Output: 1  {remove 'i' from string s1}

String s1 = "horizon"
String  s2 = "horizontal"
Output: 3  {insert 't', 'a', 'l' characters in string s1}

Approach:

Start comparing one character at a time in both strings. Here we are comparing string from right to left (backwards).

Now for every string we there are two options:

  • If last characters in both the strings are same then just ignore the character and solve the rest of the string recursively.
  • Else if last characters in both the strings are not same then we will try all the possible operations ( insert, replace, delete) and get the solution for rest of the string recursively for each possibility and pick the minimum out of them.

Let’s say given strings are s1 and s2 with lengths m and n respectively.

  • case 1: last characters are same , ignore the last character.
    • Recursively solve for m-1, n-1
  • case 2: last characters are not same then try all the possible operations recursively.
    1. Insert a character into s1 (same as last character in string s2 so that last character in both the strings are same): now s1 length will be m+1, s2 length : n, ignore the last character and Recursively solve for m, n-1.
    2. Remove the last character from string s1. now s1 length will be m-1, s2 length : n, Recursively solve for m-1, n.
    3. Replace last character into s1 (same as last character in string s2 so that last character in both the strings are same): s1 length will be m, s2 length : n, ignore the last character and Recursively solve for m-1, n-1.

Choose the minimum of ( a, b, c).

First we will see the recursive solution then we will improve the solution by reducing its complexity using dynamic programming.

Recursion:

Let’s analyze the above solution.

So in worst case we need to perform the operation on every character of the string, since we have 3 operations, Time Complexity will be O(3^n).

Let’s see if there are overlapping sub-problems.

Example:

String s1 : “CAT” String s2: “DOG”Edit Distance

As we can see that there are many sub problems which are solved repeatedly so we have over lapping sub problems here. we can solve it using dynamic pro­gram­ming in bottom-up manner. We will solve the problem and store it into an array and use the solution as needed this way we will ensure that each sub problem will be solved only once.

Dynamic Programming:

Output:

Minimum Edit Distance -(DP): 3

NOTE:

In computer science, edit distance is a way of quantifying how dissimilar two strings (e.g., words) are to one another by counting the minimum number of operations required to transform one string into the other. Edit distances find applications in natural language processing, where automatic spelling correction can determine candidate corrections for a misspelled word by selecting words from a dictionary that have a low distance to the word in question. In bioinformatics, it can be used to quantify the similarity of DNA sequences, which can be viewed as strings of the letters A, C, G and T. Source wiki: https://en.wikipedia.org/wiki/Edit_distance

 

Reference:

https://web.stanford.edu/class/cs124/lec/med.pdf

http://www.geeksforgeeks.org/dynamic-programming-set-5-edit-distance/

__________________________________________________
Top Companies Interview Questions..-

Google Microsoft Amazon Facebook more..

If you find anything incorrect or you feel that there is any better approach to solve the above problem, please write comment.
__________________________________________________

You may also like...

  • caiyun

    why is checking the subproblem’s last characters are the same with s1.charAt(i-1)==s2.charAt(j-1) ? shouldn’t it check s1.charAt(i)==s2.charAt(j) ?

%d bloggers like this: