Parallel Graph Coloring

Andrew Id: svattiku, balasubs

OpenMP MPI C/C++

Summary

For our final project, we will implement and analyze multiple parallel approaches to the graph coloring problem. We'll develop traditional parallelization strategies alongside a focus on a novel transactional memory-inspired approach that uses optimistic execution with conflict resolution.

Background

Graph coloring assigns colors to nodes such that no adjacent nodes share the same color. This problem has many applications, including register allocation in compilers, scheduling problems, and frequency assignment in wireless networks. The minimum graph coloring problem (finding the coloring with the fewest colors) is NP-Complete, so approximation algorithms are typically used.

The greedy approach is one of the most common approximation algorithms for graph coloring. It assigns the lowest possible color to each vertex while ensuring no conflicts with already-colored neighbors. While simple, this sequential approach can be slow for large graphs, and computation increases as the graph size increases.

Parallelization Approaches

Traditional Parallel Approaches

We will implement the following traditional approaches using both OpenMP (for shared memory parallelism) and MPI (for distributed memory parallelism) to evaluate their performance characteristics, scalability, and effectiveness across different graph structures:

1. Independent Set Coloring

Concept: Identifying sets of vertices that can be colored simultaneously without conflicts

Implementation: Using OpenMP for thread-level parallelism and MPI for distributed identification and coloring of independent sets

Challenge: Efficiently identifying maximal independent sets in parallel

2. Domain Decomposition

Concept: Partitioning the graph into subgraphs and coloring them in parallel

Implementation: Using OpenMP for shared memory decomposition and MPI for distributed partitioning with special handling for boundary vertices

Challenge: Managing cross-partition conflicts while maintaining load balance

Novel Transactional Memory-Inspired Approach

3. Optimistic Parallel Coloring with Conflict Resolution

Concept: Mimicking transactional memory systems by allowing threads to color vertices optimistically in parallel without synchronization, then detecting and resolving conflicts afterward

Implementation
Key Components
Conflict Detection

After initial coloring, a parallel scan over edges identifies pairs of adjacent vertices with the same color, recording conflicts in dedicated data structures. This post-processing approach uses atomic operations to mark conflicts without serializing the detection process, enabling efficient bulk conflict identification.

Resolution Strategies

Several sophisticated approaches will be explored including:

These strategies balance conflict resolution quality against computational overhead.

Performance Optimizations

We will explore optimizations with:

Fine-grained state tracking monitors whether vertices are in tentative, committed, or conflicted states throughout the coloring process.

Relation to Hardware Lock Elision (HLE)

Our approach shares important conceptual foundations with HLE, a feature in modern processors that enables threads to execute critical sections concurrently without locks:

This connection highlights how our software approach mirrors hardware-level optimization techniques in modern processor architectures.

Key Differences Between Approaches

Challenges

Load Balancing

Graphs often have non-uniform structures, leading to load imbalance when parallelized. We will investigate dynamic load balancing techniques to distribute work evenly among processors, particularly for irregular graphs.

Dependency Management

The primary challenge in parallelizing graph coloring is managing dependencies between vertices. Since a vertex's color depends on its neighbors, naive parallelization can lead to race conditions where neighboring vertices receive the same color.

Performance-Quality Tradeoffs

Parallel implementations may use more colors than the sequential greedy algorithm. We will analyze this tradeoff between coloring quality and performance improvement, and explore techniques to minimize the color count while maintaining speedup.

Additional Specific Challenges

Resources

Our approach will start with developing a C/C++ implementation of the sequential greedy graph coloring algorithm as our foundation. Once we have a working sequential version, we'll extend it to create a parallel implementation. We'll conduct our initial development on the GHC machines, but for comprehensive performance evaluation of our final implementations, we plan to utilize the PSC machines with their superior core counts.

Platform Choice

We'll leverage multi-core GHC and PSC machines for our implementation. C/C++ was selected based on our prior assignment experience with OpenMP and OpenMPI-frameworks well-suited to the parallel approaches described in our background section.

Goals and Deliverables

Plan to Achieve (Baseline)

Hope to Achieve (Stretch Goals)

Schedule

Week To-Do
Week 1
  • Implement and optimize sequential greedy graph coloring
  • Set up testing framework and graph datasets
Week 2
  • Develop traditional parallel algorithm implementation
  • Benchmark and analyze performance on shared memory systems
  • Explore transactional memory conflict detection strategies
Week 3
  • Prepare and submit project milestone report (April 15th)
  • Continue transactional memory approach - implement strategies
Week 4
  • Complete transactional memory approach
  • Conduct comparative analysis
Week 5
  • Finalize optimizations
  • Prepare final report and presentation materials
  • Prepare for project poster session
Final Deliverables
  • Final Project Report (April 28th)
  • Project Poster Session (April 29th)