**Dimitar Lukarski
New Frontiers in High Performance Computing Exploiting
Multicore and Coprocessor Technology and
Engineering Mathematics and Computing Lab
(EMCL)
Karlsruhe Institute of Technology (KIT)
Karlsruhe, Germany
**

In this talk we consider two types of solvers - out-of-the-box solvers such as
preconditioned Krylov subspace solvers (e.g. CG, BiCGStab, GMRES), and
problem-aware solvers such as geometric matrix-based multi-grid methods.
Clearly, the majority of the solvers can be written in terms of sparse
matrix-vector and vector-vector operations which can be performed in parallel.
The focus is on parallel, generic and portable preconditioners which are
suitable for multi-core and many-core devices. We study additive (e.g.
Gauss-Seidel, SOR), multiplicative (ILU factorization with or without fill-ins)
and approximate inverse preconditioners. The preconditioners can also be used as
smoothing schemes in the multi-grid methods via a preconditioned defect
correction step. We treat the additive splitting schemes by a multi-coloring
technique to provide the necessary level of parallelism. For controlling the
fill-in entries for the ILU factorization we propose a novel method which we
call the power(q)-pattern method. This algorithm produces a new matrix structure
with diagonal blocks containing only diagonal entries. With these techniques we
can perform the forward and backward substitution of the preconditioning step in
parallel. By formulating the algorithm in block-matrix form we can execute the
sweeps in parallel only by performing matrix-vector multiplications. Thus, we
can express the data-parallelism in the sweeps without any specification of the
underlying hardware or programming models.

In object-oriented languages, an abstraction separates the object behavior from
its implementation. Based on this abstraction, we have developed a linear
algebra toolbox which supports several platforms such as multi-core CPUs, GPUs
and accelerators. The various backends (sequential, OpenMP, CUDA, OpenCL)
consist of optimized and platform-specific matrix and vector routines. Using
unified interfaces across all platforms, the library allows users to build
linear solvers and preconditioners without any information about the underlying
hardware. With this technique, we can write our solvers and preconditioners in a
single source code for all platforms. Furthermore, we can extend the library by
adding new platforms without modifying the existing solvers and preconditioners.

To show the efficiency of the parallel techniques we consider two scenarios -
preconditioned Krylov subspace methods and matrix-based multi-grid methods. We
demonstrate speed ups in two dimensions: first, the
preconditioners/smoothers reduce the total solution time by decreasing the
number of iterations, and second, the preconditioning/smoothing phase is
efficiently executed in parallel providing good scalability across several
parallel architectures. We present numerical experiments and performance
analysis on several platforms such as multi-core CPU and GPU devices.
Furthermore, we show the viability and benefit of the proposed preconditioning
schemes and software approach.

References:

Vincent Heuveline, Dimitar Lukarski, Nico Trost, Jan-Philipp Weiss, Parallel
Smoothers for Matrix-based Multigrid Methods on Unstructured Meshes Using
Multicore CPUs and GPUs, EMCL Preprint Series,
preprint, 2011

Vincent Heuveline, Dimitar Lukarski, Jan-Philipp Weiss, Enhanced Parallel ILU(p)-based Preconditioners for Multi-core CPUs and GPUs -- The Power(q)-pattern Method, EMCL Preprint Series, preprint, 2011