**Objective: **Given two strings, s1 and s2 and edit operations (given below). Write an algorithm to find minimum number operations required to convert string s1 into s2.

Allowed Operations:

- Insertion – Insert a new character.
- Deletion – Delete a character.
- Replace – Replace one character by another.

**Example:**

String s1 = "horizon" String s2 = "horzon" Output: 1 {remove 'i' from string s1} String s1 = "horizon" String s2 = "horizontal" Output: 3 {insert 't', 'a', 'l' characters in string s1}

**Approach:**

Start comparing one character at a time in both strings. Here we are comparing string from right to left (backwards).

Now for every string we there are two options:

- If last characters in both the strings are same then just ignore the character and solve the rest of the string recursively.
- Else if last characters in both the strings are not same then we will try all the possible operations ( insert, replace, delete) and get the solution for rest of the string recursively for each possibility and pick the minimum out of them.

Let’s say given strings are s1 and s2 with lengths m and n respectively.

- case 1: last characters are same , ignore the last character.
**Recursively solve for m-1, n-1**

- case 2: last characters are not same then try all the possible operations recursively.
- Insert a character into s1 (same as last character in string s2 so that last character in both the strings are same): now s1 length will be m+1, s2 length : n, ignore the last character and
**Recursively solve for m, n-1.** - Remove the last character from string s1. now s1 length will be m-1, s2 length : n,
**Recursively solve for m-1, n.** - Replace last character into s1 (same as last character in string s2 so that last character in both the strings are same): s1 length will be m, s2 length : n, ignore the last character and
**Recursively solve for m-1, n-1.**

- Insert a character into s1 (same as last character in string s2 so that last character in both the strings are same): now s1 length will be m+1, s2 length : n, ignore the last character and

*Choose the minimum of ( a, b, c).*

First we will see the recursive solution then we will improve the solution by reducing its complexity using dynamic programming.

**Recursion:**

public class EditDistanceProblem { | |

public int editDistanceRecursion(String s1, String s2, int m, int n){ | |

//If any of the string if empty then number of operations | |

//needed would be equal to the length of other string | |

//(Either all characters will be removed or inserted) | |

if(m==0) | |

return n; //all elements will be inserted. | |

if(n==0) | |

return m; // all elements will be removed. | |

//If last characters are matching, ignore the last character | |

// and make a recursive call with last character removed. | |

if(s1.charAt(m–1)==s2.charAt(n–1)) | |

return editDistanceRecursion(s1, s2, m–1, n–1); | |

//If nothing above worked then we need to try all 3 operations | |

//and choose the minimum among them | |

return 1 + Math.min(editDistanceRecursion(s1, s2, m, n–1), //Insert | |

Math.min(editDistanceRecursion(s1, s2, m–1, n), // Remove | |

editDistanceRecursion(s1, s2, m–1, n–1))); //Replace | |

} | |

public static void main(String[] args) { | |

String s1 = "horizon"; | |

String s2 = "horizontal"; | |

EditDistanceProblem ed = new EditDistanceProblem(); | |

System.out.println("Minimum Edit Distance -(Recursion): " + ed.editDistanceRecursion(s1, s2, s1.length(), s2.length())); | |

} | |

} |

Let’s analyze the above solution.

So in worst case we need to perform the operation on every character of the string, since we have 3 operations, Time Complexity will be **O(3^n).**

Let’s see if there are overlapping sub-problems.

**Example: **

String s1 : “CAT” String s2: “DOG”

As we can see that there are many sub problems which are solved repeatedly so we have over lapping sub problems here. we can solve it using dynamic programming in bottom-up manner. We will solve the problem and store it into an array and use the solution as needed this way we will ensure that each sub problem will be solved only once.

**Dynamic Programming:**

public class EditDistanceProblem { | |

public int editDistanceDP(String s1, String s2) { | |

int [][] solution = new int[s1.length()+1][s2.length()+1]; | |

//base case | |

//If any of the string if empty then number of operations | |

//needed would be equal to the length of other string | |

//(Either all characters will be removed or inserted) | |

for (int i = 0; i <=s2.length(); i++) {//all elements will be inserted | |

solution[0][i] =i; | |

} | |

for (int i = 0; i <=s1.length(); i++) {//all elements will be removed | |

solution[i][0] =i; | |

} | |

//solving it bottom-up manner | |

int m = s1.length(); | |

int n = s2.length(); | |

for (int i = 1; i <=m ; i++) { | |

for (int j = 1; j <=n ; j++) { | |

//If last characters are matching, ignore the last character | |

// then the solution will be same as without the last character. | |

if(s1.charAt(i–1)==s2.charAt(j–1)) | |

solution[i][j] = solution[i–1][j–1]; | |

else | |

solution[i][j] = 1 + Math.min(solution[i][j–1], //Insert | |

Math.min(solution[i–1][j], // Remove | |

solution[i–1][j–1])); //Replace | |

} | |

} | |

return solution[s1.length()][s2.length()]; | |

} | |

public static void main(String[] args) { | |

String s1 = "horizon"; | |

String s2 = "horizontal"; | |

EditDistanceProblem ed = new EditDistanceProblem(); | |

System.out.println("Minimum Edit Distance -(DP): " + ed.editDistanceDP(s1, s2)); | |

} | |

} |

**Output:**

Minimum Edit Distance -(DP): 3

**NOTE:**

In computer science, edit distance is a way of quantifying how dissimilar two strings (e.g., words) are to one another by counting the minimum number of operations required to transform one string into the other. Edit distances find applications in natural language processing, where automatic spelling correction can determine candidate corrections for a misspelled word by selecting words from a dictionary that have a low distance to the word in question. In bioinformatics, it can be used to quantify the similarity of DNA sequences, which can be viewed as strings of the letters A, C, G and T. Source wiki: https://en.wikipedia.org/wiki/Edit_distance

**Reference: **

https://web.stanford.edu/class/cs124/lec/med.pdf

http://www.geeksforgeeks.org/dynamic-programming-set-5-edit-distance/