Reducing Training Cost and Improving Inference Speed Through Neural Network Compression

Blakeney, Cody

Reducing Training Cost and Improving Inference Speed Through Neural Network Compression

dc.contributor.advisor	Zong, Ziliang
dc.contributor.author	Blakeney, Cody
dc.contributor.committeeMember	Yan, Yan
dc.contributor.committeeMember	Islam, Tanzima
dc.contributor.committeeMember	Metsis, Vangelis
dc.date.accessioned	2023-05-04T13:28:00Z
dc.date.available	2023-05-04T13:28:00Z
dc.date.issued	2023-05
dc.description.abstract	As AI models have become integral to many software applications used in everyday life, the need for ways to run these computationally intensive applications on mobile and edge devices has grown. To help solve these problems, a new research area of neural network compression has emerged. Techniques like quantization, pruning, and model distillation have become standard. However, these methods have several drawbacks. Many of these techniques require specialized hardware for inference, reduce robustness to adversarial examples as well as amplify existing model biases, and require significant retraining done in a time-consuming iterative process. This dissertation explores several shortcomings in model compression, how to address them, and ultimately provides a simple, repeatable recipe for creating high-quality neural network models for inference. It shows that model pruning is not a true compression process, and in fact, pruning causes model representations to change such that they are as different as a new model trained at random. It explores how pruning can cause unwanted effects of pruning and how knowledge distillation can be used to mitigate these effects. It demonstrates how model compression for more accurate fidelity to the original can be achieved while also deconstructing it into a highly efficient and parallelized process by replacing sections of the model in a block-wise fashion. Finally, it examines how knowledge distillation can be used during the training process such that it both improves training efficiency, amortizes the cost of hyper-parameter searchers, and can provide state-of-the-art compression results.
dc.description.department	Computer Science
dc.format	Text
dc.format.extent	118 pages
dc.format.medium	1 file (.pdf)
dc.identifier.citation	Blakeney, C. (2023). Reducing training cost and improving inference speed through neural network compression (Unpublished dissertation). Texas State University, San Marcos, Texas.
dc.identifier.uri	https://hdl.handle.net/10877/16696
dc.language.iso	en
dc.subject	machine learning
dc.subject	deep learning
dc.subject	neural network
dc.subject	compression
dc.subject	pruning
dc.subject	distillation
dc.subject	knowledge distillation
dc.title	Reducing Training Cost and Improving Inference Speed Through Neural Network Compression
dc.type	Dissertation
thesis.degree.department	Computer Science
thesis.degree.discipline	Computer Science
thesis.degree.grantor	Texas State University
thesis.degree.level	Doctoral
thesis.degree.name	Doctor of Philosophy

Files

Original bundle

Now showing 1 - 1 of 1

Name:: BLAKENEY-DISSERTATION-2023.pdf
Size:: 4.94 MB
Format:: Adobe Portable Document Format

Download

Collections

Graduate Theses and Dissertations