Adaptive Single-Pass Compression of Unbounded Integers

Date

2017-12

Authors

Hyatt, Christopher

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Due to webpage creation, data gathering and storage, and the increasing data needed for machine learning, the amount of data in the world is rapidly growing. Organizations such as Google, Wikipedia, and the National Security Agency (NSA) use techniques to convert all of this data into searchable indexes of references. These indexes are growing as quickly as the data used to create them, and are an equal subject for compression. This research explores the ability to dynamically compress data in one pass. This, effectively improves latency and throughput. To achieve dynamic compression in one pass, the combination of Tunstall and two other compression algorithms, Elias-Delta code and a variation of Group VarInt, are used. The two resulting pairs are Variable length nibbles with Tunstall (VLNT) and Delta-Tunstall (δ-T). This is the first known attempt at compressing data using these algorithm pairs. These compression algorithms are applied to a number of different datasets. A synthetic dataset and a Wikipedia dataset are used to test the algorithms’ ability to compress integers. The synthetic dataset is created using several probability distribution functions (PDF), such as geometric and Poisson. Meanwhile, the Wikipedia dataset is acquired from actual inverted indexes for a number of different Wikipedia search terms. VLNT and d-T are also used to compress the members of the Silesia benchmarks dataset. VLNT and δ-T show promise as good platforms for data compression and are recommended for future research focusing on single-pass compression of unbounded datasets.

Description

Keywords

Single-pass compression, Unbounded integers, Compression

Citation

Hyatt, C. R. (2017). <i>Adaptive single-pass compression of unbounded integers</i> (Unpublished thesis). Texas State University, San Marcos, Texas.

Rights

Rights Holder

Rights License

Rights URI