Approximate string matching with swap and mismatch

Ohad Lipsky, Benny Porat, Elly Porat, B. Riva Shalom, Asaf Tzur

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

6 Scopus citations

Abstract

Finding the similarity between two sequences is a major problem in computer science. It is motivated by many issues from computational biology as well as from information retrieval and image processing. These fields take into account possible corruptions of the data caused by genome rearrangements, typing mistakes, and more. Therefore, many applications do not require merely complete resemblance of the sequences, but rather an approximated matching. We consider mismatches and swaps as natural mistakes which are allowed in a meagre number. The edit distance problem with swap and mismatch operations was discussed by Amir et. al. [3], They solved the problem in O(n√log m) time. From then on the problem of string matching with at most k swaps and mismatches errors was open. In this paper we present an algorithm that finds all locations where the pattern has at most k mismatch and swap errors in time O(n√k log m).

Original languageEnglish
Title of host publicationAlgorithms and Computation - 18th International Symposium, ISAAC 2007, Proceedings
PublisherSpringer Verlag
Pages869-880
Number of pages12
ISBN (Print)9783540771180
DOIs
StatePublished - 2007
Event18th International Symposium on Algorithms and Computation, ISAAC 2007 - Sendai, Japan
Duration: 17 Dec 200719 Dec 2007

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume4835 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference18th International Symposium on Algorithms and Computation, ISAAC 2007
Country/TerritoryJapan
CitySendai
Period17/12/0719/12/07

Fingerprint

Dive into the research topics of 'Approximate string matching with swap and mismatch'. Together they form a unique fingerprint.

Cite this