Implementation of Ideas

In my previous blog I described the ideas that were supposed to be implemented. I spent the last two weeks doing exactly this.

I initially spent a long time implementing the most efficient version of hungarian algorithm, however, in my quest to search for the most efficient implementation, I came across Scipy’s implementation of it. Here is the link to it.

Another idea to speed up the algorithm is to do some pre-processing before computing the edit distance. Once, this preprocessing is done, the edit distance between any a string A and any substring of string B can be found in O(1) time. The preprocessing itself takes O(|A||B| + |A| + |B|) time. This will bring down the time complexity from O(|A||B||B|) to O(|A||B| + |A| + |B|) time which is a good improvement. More details about this algorithm are given in this paper.

Future Plan:

The goal is to write a spider in scrapy to gather the data from top 500 websites. I will be spending my next week doing exactly this.

Link to the implementation code –

Leave a Reply

Your email address will not be published. Required fields are marked *