Multi-GPU Parallelization of Irregular Algorithms
MetadataShow full metadata
All programs possess a certain degree of irregularity in their control flow and memory ac-cess patterns. The more irregular a program is, the harder it tends to be to parallelize and port to accelerators such as Graphics Processing Units (GPUs). Additionally, efficient ac-celerator-based computing devices are rapidly spreading since they provide more perfor-mance and better energy efficiency than conventional computers. Multi-accelerator sys-tems are already on the horizon and will likely be commonplace in the near future. Hence, it is important to learn how to efficiently run irregular computations on multi-ac-celerator platforms. I have rewritten four single-GPU programs, each with different amounts of irregularity, so that they can exploit multiple GPUs simultaneously. By ana-lyzing shared variables and data dependencies within the programs, I was able to create a general approach for parallelizing programs across multiple accelerators. I then compared the performance of these codes against their single-GPU counterparts to determine the performance benefit and how irregularity impacts that benefit. My results show that mostly regular programs and programs that display control flow irregularity tend to ob-tain a significant performance boost. However, programs that display memory access ir-regularity tend not to gain any speedup from multiple GPUs.