Performance Analysis of Parallel and Distributed Iterative Solvers for Large-Scale Sparse Linear Systems

Razzaq Abd Ali  Hussein

pdf

Published June 25, 2026

Razzaq Abd Ali Hussein

Abstract

Background: The solution of large-scale sparse linear systems remains a fundamental task in numerical analysis, with direct applications in the discretization of partial differential equations (PDEs), computational fluid dynamics, and inverse problems. Classical iterative solvers such as Jacobi, Conjugate Gradient (CG), and GMRES exhibits algorithmic elegance and low memory overhead; however, their practical efficiency is heavily constrained by matrix conditioning and parallel implementation quality. Unpreconditioned variants often converge slowly or stagnate, while naive parallelization can induce excessive communication overhead and numerical non-determinism, compromising reproducibility and scalability. Methods: We present a rigorous, mathematically grounded performance analysis of shared-memory (OpenMP) and distributed-memory (MPI) implementations of Jacobi, CG, and GMRES. Our study integrates theoretical convergence bounds—expressed via the spectral radius , condition number , and polynomial minimization—with empirical benchmarking on matrices from the SuiteSparse Matrix Collection ( to ). We implement correct parallel variants featuring row-wise domain decomposition, explicit ghost-layer handling, and statistically robust validation (one-way ANOVA, Tukeys HSD, ). Crucially, we contextualize our hand-coded solvers against the state-of-the-art hypre library, which employs algebraic multigrid (AMG) preconditioning, to establish a reproducible theoretical-empirical benchmarking framework.Results: Our analysis confirms that unpreconditioned solvers require 6–7 more iterations than their AMG-preconditioned counterparts (e.g., 38 vs. 6 iterations for CG on thermal2), directly validating classical error bounds involving . While our OpenMP and MPI implementations achieve strong scaling speedups of up to 9.8 and 11.4 (on 32 cores), hypre outperforms them by 5 due to optimized SpMV kernels, communication-computation overlap, and near-optimal preconditioning. Communication overhead dominates MPI runtime (28% for Krylov methods), aligning with theoretical predictions on global reduction bottlenecks.Conclusions: This analysis demonstrates the primacy of algorithmic sophistication—particularly preconditioning—over raw parallel efficiency in large-scale linear solvers. The integration of numerical analysis (convergence estimates, conditioning theory) with high-performance computing methodologies yields actionable insights: educational implementations serve pedagogical purposes but are unsuitable for production; reproducible comparisons with libraries like hypre are mandatory for meaningful performance evaluation; and future solvers must co-design numerical robustness with communication-avoiding parallelism. Our open-source code and benchmarking data provide a transparent, reproducible foundation for scalable numerical linear algebra research.

Issue

Vol 11 No 6 (2026): VOLUME 11 ISSUE 6 YEAR 2026

Section

Articles

##plugins.themes.bootstrap3.article.sidebar##

##plugins.themes.bootstrap3.article.main##

Abstract

##plugins.themes.bootstrap3.article.details##

Similar Articles