This post is completed by 3 users

  • 2
Add to List
Hard

204. Dynamic Programming - Edit Distance Problem

Objective: Given two strings, s1 and s2, and edit operations (given below). Write an algorithm to find the minimum number of operations required to convert string s1 into s2.

Allowed Operations:

  • Insertion - Insert a new character.
  • Deletion - Delete a character.
  • Replace - Replace one character with another.

Example:

String s1 = "horizon"
String  s2 = "horzon"
Output: 1  {remove 'i' from string s1}

String s1 = "horizon"
String  s2 = "horizontal"
Output: 3  {insert 't', 'a', 'l' characters in string s1}

Approach:

Start comparing one character at a time in both strings. Here we are comparing strings from right to left (backward).

Now for every string there are two options:

  • If the last characters in both the strings are the same then just ignore the character and solve the rest of the string recursively.
  • Else if the last characters in both the strings are not the same then we will try all the possible operations ( insert, replace, delete) and get the solution for the rest of the string recursively for each possibility and pick the minimum out of them.

Let's say the given strings are s1 and s2 with lengths m and n respectively.

  • case 1: last characters are the same, ignore the last character.
    • Recursively solve for m-1, n-1
  • case 2: last characters are not the same then try all the possible operations recursively.
    1. Insert a character into s1 (same as last character in string s2 so that last character in both the strings are same): now s1 length will be m+1, s2 length : n, ignore the last character and Recursively solve for m, n-1.
    2. Remove the last character from string s1. now s1 length will be m-1, s2 length: n, Recursively solve for m-1, n.
    3. Replace last character into s1 (same as last character in string s2 so that last character in both the strings are same): s1 length will be m, s2 length : n, ignore the last character and Recursively solve for m-1, n-1.

Choose the minimum of ( a, b, c).

First we will see the recursive solution then we will improve the solution by reducing its complexity using dynamic programming.

So in worst case we need to perform the operation on every character of the string, since we have 3 operations, Time Complexity will be O(3^n).

Let's see if there are overlapping sub-problems.

Example:

Edit Distance

String s1 : "CAT" String s2: "DOG"

As we can see that there are many subproblems that are solved repeatedly so we have overlapping sub-problems here. we can solve it using dynamic pro­gram­ming in bottom-up manner. We will solve the problem and store it into an array and use the solution as needed this way we will ensure that each subproblem will be solved only once.

Output:

Minimum Edit Distance -(DP): 3

NOTE:

In computer science, edit distance is a way of quantifying how dissimilar two strings (e.g., words) are to one another by counting the minimum number of operations required to transform one string into the other. Edit distances find applications in natural language processing, where automatic spelling correction can determine candidate corrections for a misspelled word by selecting words from a dictionary that have a low distance to the word in question. In bioinformatics, it can be used to quantify the similarity of DNA sequences, which can be viewed as strings of the letters A, C, G and T. Source wiki: https://en.wikipedia.org/wiki/Edit_distance

Reference:

https://web.stanford.edu/class/cs124/lec/med.pdf

http://www.geeksforgeeks.org/dynamic-programming-set-5-edit-distance/