Searching in compressed dictionaries

S. T. Klein, D. Shapira

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

12 Scopus citations

Abstract

We introduce two new methods to represent a prefix omission method (POM) file so that direct search can be done in these compressed dictionaries. The processing time is typically twice as fast for the Fibonacci variant than for the Huffman based algorithm, and also compared to decoding a Huffman encoded POM file and searching on the uncompressed version. We see that in the case of small files, which is the important application since dictionaries are usually kept in small chunks, the Fibonacci variant is much faster than decoding and searching or than the POM-Huffman method. Even though the compression performance might be slightly inferior to the character version of Huffman (but still generally better than the bit version), this might well be a price worth paying for faster processing.

Original languageEnglish
Title of host publicationProceedings - DCC 2002
Subtitle of host publicationData Compression Conference
EditorsJames A. Storer, Martin Cohn
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages142-151
Number of pages10
ISBN (Electronic)0769514774
DOIs
StatePublished - 2002
EventData Compression Conference, DCC 2002 - Snowbird, United States
Duration: 2 Apr 20024 Apr 2002

Publication series

NameData Compression Conference Proceedings
Volume2002-January
ISSN (Print)1068-0314

Conference

ConferenceData Compression Conference, DCC 2002
Country/TerritoryUnited States
CitySnowbird
Period2/04/024/04/02

Bibliographical note

Publisher Copyright:
© 2002 IEEE.

Keywords

  • Computer science
  • Decoding
  • Dictionaries
  • Encoding
  • Gallium nitride
  • Information retrieval
  • Large-scale systems
  • Natural languages
  • Pattern matching
  • Production systems

Fingerprint

Dive into the research topics of 'Searching in compressed dictionaries'. Together they form a unique fingerprint.

Cite this