Abstract
                                                                        The Graphics Processing Units (GPUs) provide highcomputation  power  at  a  low  cost  and  is  an  important  computeaccelerator   with   a   massively   multithreaded   architecture.   Inthis  paper,  we  present  fast  implementations  of  common  graphoperations like breadth-first search, st-connectivity, single-sourceshortest  path,  all-pairs  shortest  path,  minimum  spanning  tree,and maximum flow for undirected graphs on the GPU using theCUDA  programming  model.  Our  implementations  exhibit  highperformance, especially on large graphs. We use two data-parallelprogramming  methodologies  for  these  algorithms.  One  is  aniterative, mask-based approach that processes valid data elementslike   vertices   and   edges   using   independent   threads   for   each.The  other  is  a  divide-and-conquer  approach  that  reduces  theproblem  into  smaller  problems  that  are  handled  later  using  thesame approach. Parallel algorithms for such problems have beenreported  in  the  literature  before,  especially  on  supercomputers.The massively multithreaded model of the GPU makes it possibleto adopt the data-parallel approach even to irregular algorithmslike graph algorithms, usingO(V)orO(E)simultaneous threads.The  algorithms  and  the  underlying  techniques  presented  in  thispaper  are  likely  to  be  applicable  to  many  irregular  algorithms.We  show  the  impact  of  our  implementations  on  random,  scale-free,  and  real-life  graphs  of  up  to  millions  of  vertices  on  high-end and low-end GPUs. The availability and spread of GPUs todesktops  and  laptops  make  them  ideal  candidates  to  accelerategraph  operations  over  the  CPU-only  implementations.  Practicalimplementations  of  basic  operations  go  a  long  way  in  realizingtheir potential.