String Factorization via Prefix Free Families

Matan Kraus, Moshe Lewenstein, Alexandru Popa, Ely Porat, Yonathan Sadia

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

A factorization of a string S is a partition of w into substrings u1, . . ., uk such that S = u1u2 · · · uk. Such a partition is called equality-free if no two factors are equal: ui ≠ uj, ∀i, j with i ≠ j. The maximum equality-free factorization problem is to find for a given string S, the largest integer k for which S admits an equality-free factorization with k factors. Equality-free factorizations have lately received attention because of their applications in DNA self-assembly. The best approximation algorithm known for the problem is the natural greedy algorithm, that chooses iteratively from left to right the shortest factor that does not appear before. This algorithm has a √n approximation ratio (SOFSEM 2020) and it is an open problem whether there is a better solution. Our main result is to show that the natural greedy algorithm is a Θ(n1/4) approximation algorithm for the maximum equality-free factorization problem. Thus, we disprove one of the conjectures of Mincu and Popa (SOFSEM 2020) according to which the greedy algorithm is a Θ(√n) approximation. The most challenging part of the proof is to show that the greedy algorithm is an O(n1/4) approximation. We obtain this algorithm via prefix free factor families, i.e. a set of non-overlapping factors of the string which are pairwise non-prefixes of each other. In the paper we show the relation between prefix free factor families and the maximum equality-free factorization. Moreover, as a byproduct we present another approximation algorithm that achieves an approximation ratio of O(n1/4) that we believe is of independent interest and may lead to improved algorithms. We then show that the natural greedy algorithm has an approximation ratio that is Ω(n1/4) via a clever analysis which shows that the greedy algorithm is Θ(n1/4) for the maximum equality-free factorization problem.

Original languageEnglish
Title of host publication34th Annual Symposium on Combinatorial Pattern Matching, CPM 2023
EditorsLaurent Bulteau, Zsuzsanna Liptak
PublisherSchloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing
ISBN (Electronic)9783959772761
DOIs
StatePublished - Jun 2023
Event34th Annual Symposium on Combinatorial Pattern Matching, CPM 2023 - Marne-la-Vallee, France
Duration: 26 Jun 202328 Jun 2023

Publication series

NameLeibniz International Proceedings in Informatics, LIPIcs
Volume259
ISSN (Print)1868-8969

Conference

Conference34th Annual Symposium on Combinatorial Pattern Matching, CPM 2023
Country/TerritoryFrance
CityMarne-la-Vallee
Period26/06/2328/06/23

Bibliographical note

Publisher Copyright:
© 2023 Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing. All rights reserved.

Funding

Funding This work was supported by a grant of the Ministry of Research, Innovation and Digitization, CNCS - UEFISCDI, project number PN-III-P1-1.1-TE-2021-0253, within PNCDI III. Matan kraus, Ely Porat and Yonathan Sadia were supported by ISF grants no. 1278/16 and 1926/19, by a BSF grant 2018364, and by an ERC grant MPM under the EU’s Horizon 2020 Research and Innovation Programme (grant no. 683064).

FundersFunder number
Ministry of Research
Corporation for National and Community Service
European Commission
United States-Israel Binational Science Foundation2018364
Israel Science Foundation1926/19, 1278/16
Unitatea Executiva pentru Finantarea Invatamantului Superior, a Cercetarii, Dezvoltarii si InovariiPN-III-P1-1.1-TE-2021-0253
Horizon 2020683064

    Keywords

    • NP-hard problem
    • approximation algorithm
    • string factorization

    Fingerprint

    Dive into the research topics of 'String Factorization via Prefix Free Families'. Together they form a unique fingerprint.

    Cite this