Andrew Id: svattiku, balasubs
For our final project, we will implement and analyze multiple parallel approaches to the graph coloring problem. We'll develop traditional parallelization strategies alongside a focus on a novel transactional memory-inspired approach that uses optimistic execution with conflict resolution.
Graph coloring assigns colors to nodes such that no adjacent nodes share the same color. This problem has many applications, including register allocation in compilers, scheduling problems, and frequency assignment in wireless networks. The minimum graph coloring problem (finding the coloring with the fewest colors) is NP-Complete, so approximation algorithms are typically used.
The greedy approach is one of the most common approximation algorithms for graph coloring. It assigns the lowest possible color to each vertex while ensuring no conflicts with already-colored neighbors. While simple, this sequential approach can be slow for large graphs, and computation increases as the graph size increases.
We will implement the following traditional approaches using both OpenMP (for shared memory parallelism) and MPI (for distributed memory parallelism) to evaluate their performance characteristics, scalability, and effectiveness across different graph structures:
Concept: Identifying sets of vertices that can be colored simultaneously without conflicts
Implementation: Using OpenMP for thread-level parallelism and MPI for distributed identification and coloring of independent sets
Challenge: Efficiently identifying maximal independent sets in parallel
Concept: Partitioning the graph into subgraphs and coloring them in parallel
Implementation: Using OpenMP for shared memory decomposition and MPI for distributed partitioning with special handling for boundary vertices
Challenge: Managing cross-partition conflicts while maintaining load balance
Concept: Mimicking transactional memory systems by allowing threads to color vertices optimistically in parallel without synchronization, then detecting and resolving conflicts afterward
After initial coloring, a parallel scan over edges identifies pairs of adjacent vertices with the same color, recording conflicts in dedicated data structures. This post-processing approach uses atomic operations to mark conflicts without serializing the detection process, enabling efficient bulk conflict identification.
Several sophisticated approaches will be explored including:
These strategies balance conflict resolution quality against computational overhead.
We will explore optimizations with:
Fine-grained state tracking monitors whether vertices are in tentative, committed, or conflicted states throughout the coloring process.
Our approach shares important conceptual foundations with HLE, a feature in modern processors that enables threads to execute critical sections concurrently without locks:
This connection highlights how our software approach mirrors hardware-level optimization techniques in modern processor architectures.
Graphs often have non-uniform structures, leading to load imbalance when parallelized. We will investigate dynamic load balancing techniques to distribute work evenly among processors, particularly for irregular graphs.
The primary challenge in parallelizing graph coloring is managing dependencies between vertices. Since a vertex's color depends on its neighbors, naive parallelization can lead to race conditions where neighboring vertices receive the same color.
Parallel implementations may use more colors than the sequential greedy algorithm. We will analyze this tradeoff between coloring quality and performance improvement, and explore techniques to minimize the color count while maintaining speedup.
Our approach will start with developing a C/C++ implementation of the sequential greedy graph coloring algorithm as our foundation. Once we have a working sequential version, we'll extend it to create a parallel implementation. We'll conduct our initial development on the GHC machines, but for comprehensive performance evaluation of our final implementations, we plan to utilize the PSC machines with their superior core counts.
We'll leverage multi-core GHC and PSC machines for our implementation. C/C++ was selected based on our prior assignment experience with OpenMP and OpenMPI-frameworks well-suited to the parallel approaches described in our background section.
Week | To-Do |
---|---|
Week 1 |
|
Week 2 |
|
Week 3 |
|
Week 4 |
|
Week 5 |
|
Final Deliverables |
|