Approximate Hashing for Bioinformatics

Guy Arbitman, Shmuel T. Klein, Pierre Peterlongo, Dana Shapira

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review


The paper extends ideas from data compression by deduplication to the Bioinformatic field. The specific problems on which we show our approach to be useful are the clustering of a large set of DNA strings and the search for approximate matches of long substrings, both based on the design of what we call an approximate hashing function. The outcome of the new procedure is very similar to the clustering and search results obtained by accurate tools, but in much less time and with less required memory.

Original languageEnglish
Title of host publicationImplementation and Application of Automata - 25th International Conference, CIAA 2021, Proceedings
EditorsSebastian Maneth
PublisherSpringer Science and Business Media Deutschland GmbH
Number of pages12
ISBN (Print)9783030791209
StatePublished - 2021
Event25th International Conference on Implementation and Application of Automata, CIAA 2021 - Virtual, Online
Duration: 19 Jul 202122 Jul 2021

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume12803 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349


Conference25th International Conference on Implementation and Application of Automata, CIAA 2021
CityVirtual, Online

Bibliographical note

Publisher Copyright:
© 2021, Springer Nature Switzerland AG.


Dive into the research topics of 'Approximate Hashing for Bioinformatics'. Together they form a unique fingerprint.

Cite this