Quick greedy computation for minimum common string partition

Isaac Goldstein, Moshe Lewenstein

Research output: Contribution to journalArticlepeer-review

7 Scopus citations

Abstract

In the minimum common string partition problem one is given two strings S and T with the same character statistics and one seeks for the smallest partition of S into substrings so that T can also be partitioned into the same substring multiset. The problem is fundamental in several variants of edit distance with block operations, e.g. signed reversal distance with duplicates and edit distance with moves. The minimum common string partition problem is known to be NP-complete and the best approximation algorithm known has an approximation factor of O(lognlog*n). Since the minimum common string partition problem is of utmost practical importance one seeks a heuristic that will (1) usually have a low approximation factor and (2) will run fast. A simple greedy algorithm is known, which iteratively choose non-overlapping longest common substrings of the input strings. This algorithm has been well-studied from an approximation point of view and it has been shown to have a bad worst case approximation factor. However, all the bad approximation factors presented so far stem from complicated recursive construction. In practice the greedy algorithm seems to have small approximation factors. However, the best current implementation of greedy runs in quadratic time. We propose a novel method to implement greedy in linear time.

Original languageEnglish
Pages (from-to)98-107
Number of pages10
JournalTheoretical Computer Science
Volume542
Issue numberC
DOIs
StatePublished - 2014

Bibliographical note

Publisher Copyright:
© 2014 Published by Elsevier B.V.

Funding

FundersFunder number
Bloom's Syndrome Foundation2010437
German-Israeli Foundation for Scientific Research and Development1147/2011

    Keywords

    • Approximation algorithm
    • Pattern matching
    • Strings

    Fingerprint

    Dive into the research topics of 'Quick greedy computation for minimum common string partition'. Together they form a unique fingerprint.

    Cite this