15618 Project
Summary

We are going to implement a parallel version of Dijkstra algorithm on both PIM system and GPU, and compare between two kind of architecture on the Graph workload.

Challenge

Despite being a well-studied shortest path graph search algorithm, parallel implementation of Dijkstra algorithm faces challenges:

Memory Access Characteristics: Graph search often exhibit irregular memory access patterns, as the pointer of a vertex is arbitrary distributed in the memory, showing limited temporal and spatial data locality.

Divergent Execution: The path in graph have various length, some threads might exit early as the path involve less vertex.

Workload imbalance: The graph search algorithm is inherently workload imbalanced, as the number of edge connecting different vertex varies, dependent on the graph topology.

Communication Overhead: there are constraints on the communication bandwidth in GPU and DPU if the data to transfer is too large. The synchronization between threads for updating the shortest path length also require inherent communication.

Goal and Deliverable
We plan to implement the two versions of Dijkstra Algorithm on two kinds of computer architecture.
The deliverables are the CUDA implementation and the PIM implementation of the algorithms, and analysis of our parallel implementation of the two architecture with comparison. We will show speedup graphs and profiling results, and discuss the results.
Resource

The resource we need for this project is the GHC/PSC machine cluster, with openMP and MPI installed, and GPU core enabled. Also the access to the PIM system.

Schedule

Week of March 31:

Develop the CUDA version of the Dijkstra Algorithm.

Week of April 7:

Finish develop the CUDA version of the Dijkstra Algorithm, and start the PIM version of the Dijkstra Algorithm.

Week of April 14:

Finish the PIM version of the Dijkstra Algorithm.

Week of April 21:

Collect experimental data and compare the performance. Finish writing the final report.

May 5:

Final Project Report Due.