Abstract
We introduce two new methods to represent a prefix omission method (POM) file so that direct search can be done in these compressed dictionaries. The processing time is typically twice as fast for the Fibonacci variant than for the Huffman based algorithm, and also compared to decoding a Huffman encoded POM file and searching on the uncompressed version. We see that in the case of small files, which is the important application since dictionaries are usually kept in small chunks, the Fibonacci variant is much faster than decoding and searching or than the POM-Huffman method. Even though the compression performance might be slightly inferior to the character version of Huffman (but still generally better than the bit version), this might well be a price worth paying for faster processing.
Original language | English |
---|---|
Title of host publication | Proceedings - DCC 2002 |
Subtitle of host publication | Data Compression Conference |
Editors | James A. Storer, Martin Cohn |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
Pages | 142-151 |
Number of pages | 10 |
ISBN (Electronic) | 0769514774 |
DOIs | |
State | Published - 2002 |
Event | Data Compression Conference, DCC 2002 - Snowbird, United States Duration: 2 Apr 2002 → 4 Apr 2002 |
Publication series
Name | Data Compression Conference Proceedings |
---|---|
Volume | 2002-January |
ISSN (Print) | 1068-0314 |
Conference
Conference | Data Compression Conference, DCC 2002 |
---|---|
Country/Territory | United States |
City | Snowbird |
Period | 2/04/02 → 4/04/02 |
Bibliographical note
Publisher Copyright:© 2002 IEEE.
Keywords
- Computer science
- Decoding
- Dictionaries
- Encoding
- Gallium nitride
- Information retrieval
- Large-scale systems
- Natural languages
- Pattern matching
- Production systems