Abstract
In the minimum common string partition problem one is given two strings S and T with the same character statistics and one seeks for the smallest partition of S into substrings so that T can also be partitioned into the same substring multiset. The problem is fundamental in several variants of edit distance with block operations, e.g. signed reversal distance with duplicates and edit distance with moves. The minimum common string partition problem is known to be NP-complete and the best approximation algorithm known has an approximation factor of O(lognlog*n). Since the minimum common string partition problem is of utmost practical importance one seeks a heuristic that will (1) usually have a low approximation factor and (2) will run fast. A simple greedy algorithm is known, which iteratively choose non-overlapping longest common substrings of the input strings. This algorithm has been well-studied from an approximation point of view and it has been shown to have a bad worst case approximation factor. However, all the bad approximation factors presented so far stem from complicated recursive construction. In practice the greedy algorithm seems to have small approximation factors. However, the best current implementation of greedy runs in quadratic time. We propose a novel method to implement greedy in linear time.
Original language | English |
---|---|
Pages (from-to) | 98-107 |
Number of pages | 10 |
Journal | Theoretical Computer Science |
Volume | 542 |
Issue number | C |
DOIs | |
State | Published - 2014 |
Bibliographical note
Publisher Copyright:© 2014 Published by Elsevier B.V.
Funding
Funders | Funder number |
---|---|
Bloom's Syndrome Foundation | 2010437 |
German-Israeli Foundation for Scientific Research and Development | 1147/2011 |
Keywords
- Approximation algorithm
- Pattern matching
- Strings