Design and Performance Analysis of Hardware Accelerator for Deep Neural Network in Heterogeneous Platform
Abstract
This thesis describes a new flexible approach to implementing energy-efficient
DNN accelerator on FPGAs. Our design leverages the Coherent Accelerator Processor Interface (CAPI) which provides a cache-coherent view of system memory to attached accelerators. Computational kernels are accelerated on a CAPI-supported Kintex FPGA board. Our implementation bypasses the need for device driver code and significantly reduces the communication and I/O transfer overhead. To improve the performance of the entire application, we propose a collaborative model of execution in which the control of the data flow within the accelerator is kept independent, freeing-up CPU cores to work on other parts of the application. For further performance enhancements, we propose a technique to exploit data locality in the cache, situated in the CAPI Power Service Layer (PSL). Finally, we develop a resource-conscious implementation for more efficient utilization of resources and improved scalability. Compared with the previous work, our architecture achieves both improved performance and better power efficiency.