Show Buttons
Share On Facebook
Share On Twitter
Share On Google Plus
Share On Linkdin
Share On Pinterest
Share On Reddit
Share On Stumbleupon
Contact us
Hide Buttons

Dynamic Programming — Edit Distance Problem

Objec­tive: Given two strings, s1 and s2 and edit oper­a­tions (given below). Write an algo­rithm to find min­i­mum num­ber oper­a­tions required to con­vert string s1 into s2.

Allowed Oper­a­tions:

  • Inser­tion — Insert a new character.
  • Dele­tion — Delete a character.
  • Replace — Replace one char­ac­ter by another.

Exam­ple:

String s1 = "horizon"
String  s2 = "horzon"
Output: 1  {remove 'i' from string s1}

String s1 = "horizon"
String  s2 = "horizontal"
Output: 3  {insert 't', 'a', 'l' characters in string s1}

Approach:

Start com­par­ing one char­ac­ter at a time in both strings. Here we are com­par­ing string from right to left (backwards).

Now for every string we there are two options:

  • If last char­ac­ters in both the strings are same then just ignore the char­ac­ter and solve the rest of the string recursively.
  • Else if last char­ac­ters in both the strings are not same then we will try all the pos­si­ble oper­a­tions ( insert, replace, delete) and get the solu­tion for rest of the string recur­sively for each pos­si­bil­ity and pick the min­i­mum out of them.

Let’s say given strings are s1 and s2 with lengths m and n respectively.

  • case 1: last char­ac­ters are same , ignore the last character.
    • Recur­sively solve for m-1, n-1
  • case 2: last char­ac­ters are not same then try all the pos­si­ble oper­a­tions recursively.
    1. Insert a char­ac­ter into s1 (same as last char­ac­ter in string s2 so that last char­ac­ter in both the strings are same): now s1 length will be m+1, s2 length : n, ignore the last char­ac­ter and Recur­sively solve for m, n-1.
    2. Remove the last char­ac­ter from string s1. now s1 length will be m-1, s2 length : n, Recur­sively solve for m-1, n.
    3. Replace last char­ac­ter into s1 (same as last char­ac­ter in string s2 so that last char­ac­ter in both the strings are same): s1 length will be m, s2 length : n, ignore the last char­ac­ter and Recur­sively solve for m-1, n-1.

Choose the min­i­mum of ( a, b, c).

First we will see the recur­sive solu­tion then we will improve the solu­tion by reduc­ing its com­plex­ity using dynamic programming.

Recur­sion:

Let’s ana­lyze the above solution.

So in worst case we need to per­form the oper­a­tion on every char­ac­ter of the string, since we have 3 oper­a­tions, Time Com­plex­ity will be O(3^n).

Let’s see if there are over­lap­ping sub-problems.

Exam­ple:

String s1 : “CAT” String s2: “DOGEdit Distance

As we can see that there are many sub prob­lems which are solved repeat­edly so we have over lap­ping sub prob­lems here. we can solve it using dynamic pro­gram­ming in bottom-up man­ner. We will solve the prob­lem and store it into an array and use the solu­tion as needed this way we will ensure that each sub prob­lem will be solved only once.

Dynamic Pro­gram­ming:

Out­put:

Minimum Edit Distance -(DP): 3

NOTE:

In com­puter sci­ence, edit dis­tance is a way of quan­ti­fy­ing how dis­sim­i­lar two strings (e.g., words) are to one another by count­ing the min­i­mum num­ber of oper­a­tions required to trans­form one string into the other. Edit dis­tances find appli­ca­tions in nat­ural lan­guage pro­cess­ing, where auto­matic spelling cor­rec­tion can deter­mine can­di­date cor­rec­tions for a mis­spelled word by select­ing words from a dic­tio­nary that have a low dis­tance to the word in ques­tion. In bioin­for­mat­ics, it can be used to quan­tify the sim­i­lar­ity of DNA sequences, which can be viewed as strings of the let­ters A, C, G and T. Source wiki: https://en.wikipedia.org/wiki/Edit_distance

 

Ref­er­ence:

https://web.stanford.edu/class/cs124/lec/med.pdf

http://www.geeksforgeeks.org/dynamic-programming-set-5-edit-distance/

You may also like...

  • caiyun

    why is check­ing the subproblem’s last char­ac­ters are the same with s1.charAt(i-1)==s2.charAt(j-1) ? shouldn’t it check s1.charAt(i)==s2.charAt(j) ?