Adaptive Single-Pass Compression of Unbounded Integers

dc.contributor.advisorTamir, Dan
dc.contributor.authorHyatt, Christopher
dc.contributor.committeeMemberYan, Yan
dc.contributor.committeeMemberQasem, Apan
dc.date.accessioned2022-01-12T20:58:16Z
dc.date.available2022-01-12T20:58:16Z
dc.date.issued2017-12
dc.description.abstractDue to webpage creation, data gathering and storage, and the increasing data needed for machine learning, the amount of data in the world is rapidly growing. Organizations such as Google, Wikipedia, and the National Security Agency (NSA) use techniques to convert all of this data into searchable indexes of references. These indexes are growing as quickly as the data used to create them, and are an equal subject for compression. This research explores the ability to dynamically compress data in one pass. This, effectively improves latency and throughput. To achieve dynamic compression in one pass, the combination of Tunstall and two other compression algorithms, Elias-Delta code and a variation of Group VarInt, are used. The two resulting pairs are Variable length nibbles with Tunstall (VLNT) and Delta-Tunstall (δ-T). This is the first known attempt at compressing data using these algorithm pairs. These compression algorithms are applied to a number of different datasets. A synthetic dataset and a Wikipedia dataset are used to test the algorithms’ ability to compress integers. The synthetic dataset is created using several probability distribution functions (PDF), such as geometric and Poisson. Meanwhile, the Wikipedia dataset is acquired from actual inverted indexes for a number of different Wikipedia search terms. VLNT and d-T are also used to compress the members of the Silesia benchmarks dataset. VLNT and δ-T show promise as good platforms for data compression and are recommended for future research focusing on single-pass compression of unbounded datasets.
dc.description.departmentComputer Science
dc.formatText
dc.format.extent66 pages
dc.format.medium1 file (.pdf)
dc.identifier.citationHyatt, C. R. (2017). <i>Adaptive single-pass compression of unbounded integers</i> (Unpublished thesis). Texas State University, San Marcos, Texas.
dc.identifier.urihttps://hdl.handle.net/10877/15142
dc.language.isoen
dc.subjectSingle-pass compression
dc.subjectUnbounded integers
dc.subjectCompression
dc.titleAdaptive Single-Pass Compression of Unbounded Integers
dc.typeThesis
thesis.degree.departmentComputer Science
thesis.degree.disciplineComputer Science
thesis.degree.grantorTexas State University
thesis.degree.levelMasters
thesis.degree.nameMaster of Science

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
HYATT-THESIS-2017.pdf
Size:
3.61 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
LICENSE.txt
Size:
2.13 KB
Format:
Plain Text
Description: