Abstract
The Graphics Processing Units (GPUs) provide highcomputation power at a low cost and is an important computeaccelerator with a massively multithreaded architecture. Inthis paper, we present fast implementations of common graphoperations like breadth-first search, st-connectivity, single-sourceshortest path, all-pairs shortest path, minimum spanning tree,and maximum flow for undirected graphs on the GPU using theCUDA programming model. Our implementations exhibit highperformance, especially on large graphs. We use two data-parallelprogramming methodologies for these algorithms. One is aniterative, mask-based approach that processes valid data elementslike vertices and edges using independent threads for each.The other is a divide-and-conquer approach that reduces theproblem into smaller problems that are handled later using thesame approach. Parallel algorithms for such problems have beenreported in the literature before, especially on supercomputers.The massively multithreaded model of the GPU makes it possibleto adopt the data-parallel approach even to irregular algorithmslike graph algorithms, usingO(V)orO(E)simultaneous threads.The algorithms and the underlying techniques presented in thispaper are likely to be applicable to many irregular algorithms.We show the impact of our implementations on random, scale-free, and real-life graphs of up to millions of vertices on high-end and low-end GPUs. The availability and spread of GPUs todesktops and laptops make them ideal candidates to accelerategraph operations over the CPU-only implementations. Practicalimplementations of basic operations go a long way in realizingtheir potential.