An Efficient Connected Components Algorithm for Massively-Parallel Devices
MetadataShow full metadata
Massively-parallel devices such as GPUs are best suited for accelerating regular algorithms. Since the memory access patterns and control flow of irregular algorithms are data dependent, such programs are more difficult to parallelize in general and a direct parallelization may not yield good performance, on GPUs in particular. However, by carefully studying the underlying problem, it may be possible to derive new algorithms that are more suitable for massively-parallel accelerators. This thesis involves studying and analyzing such an irregular algorithm, called Connected Components, and proposes an efficient algorithm, called ECL, which is faster than the existing CC algorithms on most tested inputs. Though atomic operations are fast, they can represent a bottleneck as these operations run serially and might hinder performance in the future parallel devices. This thesis also proposes a synchronous and atomic-free algorithm, called ECLaf, whose performance is comparable to the fastest existing CC algorithm.