A Parallel Implementation of A Greedy TSP Algorithm
Abstract
The Traveling Salesman Problem has often been used as an exploration ground for building heuristics to calculate the shortest path of a complete graph that traverse every vertex exactly once.
This has a number of important practical applications. Since it is an NP-hard problem, many heuristics have been proposed to obtain near-optimal solutions in polynomial time.
The heuristic explored in this study is based on Kruskal’s algorithm, where the edges of a graph are sorted in non-decreasing order. The smallest edges that meet the eligibility criteria are included in the path until a tour that includes all vertices has been constructed. Whether an edge meets the criteria depends on which smaller edges have been inserted. This makes the algorithm difficult to parallelize.
I combined previously published with new parallelization techniques and implemented the algorithm in OpenMP and CUDA. The resulting GPU code is very fast, taking just 0.06 seconds to process a graph with 11,849 vertices and 70,193,476 edges. This is approximately 8 times faster than the serial CPU implementation and has a solution quality that is within 18% of the optimal. Compared to the optimal solver CONCORDE, my code is 1,206,302 times faster.