Reducing Training Cost and Improving Inference Speed Through Neural Network Compression

dc.contributor.advisorZong, Ziliang
dc.contributor.authorBlakeney, Cody
dc.contributor.committeeMemberYan, Yan
dc.contributor.committeeMemberIslam, Tanzima
dc.contributor.committeeMemberMetsis, Vangelis
dc.date.accessioned2023-05-04T13:28:00Z
dc.date.available2023-05-04T13:28:00Z
dc.date.issued2023-05
dc.description.abstractAs AI models have become integral to many software applications used in everyday life, the need for ways to run these computationally intensive applications on mobile and edge devices has grown. To help solve these problems, a new research area of neural network compression has emerged. Techniques like quantization, pruning, and model distillation have become standard. However, these methods have several drawbacks. Many of these techniques require specialized hardware for inference, reduce robustness to adversarial examples as well as amplify existing model biases, and require significant retraining done in a time-consuming iterative process. This dissertation explores several shortcomings in model compression, how to address them, and ultimately provides a simple, repeatable recipe for creating high-quality neural network models for inference. It shows that model pruning is not a true compression process, and in fact, pruning causes model representations to change such that they are as different as a new model trained at random. It explores how pruning can cause unwanted effects of pruning and how knowledge distillation can be used to mitigate these effects. It demonstrates how model compression for more accurate fidelity to the original can be achieved while also deconstructing it into a highly efficient and parallelized process by replacing sections of the model in a block-wise fashion. Finally, it examines how knowledge distillation can be used during the training process such that it both improves training efficiency, amortizes the cost of hyper-parameter searchers, and can provide state-of-the-art compression results.
dc.description.departmentComputer Science
dc.formatText
dc.format.extent118 pages
dc.format.medium1 file (.pdf)
dc.identifier.citationBlakeney, C. (2023). Reducing training cost and improving inference speed through neural network compression (Unpublished dissertation). Texas State University, San Marcos, Texas.
dc.identifier.urihttps://hdl.handle.net/10877/16696
dc.language.isoen
dc.subjectmachine learning
dc.subjectdeep learning
dc.subjectneural network
dc.subjectcompression
dc.subjectpruning
dc.subjectdistillation
dc.subjectknowledge distillation
dc.titleReducing Training Cost and Improving Inference Speed Through Neural Network Compression
dc.typeDissertation
thesis.degree.departmentComputer Science
thesis.degree.disciplineComputer Science
thesis.degree.grantorTexas State University
thesis.degree.levelDoctoral
thesis.degree.nameDoctor of Philosophy

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
BLAKENEY-DISSERTATION-2023.pdf
Size:
4.94 MB
Format:
Adobe Portable Document Format