Autotuning on multicore CPU/manycore GPU system
Dynamics of a cylinderflow simulation using a vortex-based
discretization and the FMM. The computations run concurrently on a
multicore CPU and a manycore GPU. It is complicated to use these
hardware in an optimal way and the approach tested here is based
on autotuning for performance. Two parameters control the balance
between offloading onto the GPU vs. CPU work. To the left in blue
is the number of levels in the FMM-method. A large number of
levels means that less work is offloaded to the GPU. To the right
in green is a variable theta that controls the multipole
acceptance criterion (the theta criterion). A large value of
theta, say, close to 1, means that a larger number of multipole
coefficients is used, but also that the communication stencils
become smaller. The autotuner continuously test varying these
parameters by measuring the benefits of accepting suggested
changes. By also measuring how costly each parameter test is, the
added cost of using the autotuner itself may be controlled.
References
The fast multipole method employed in the simulations was
detailed in S. Engblom: On well-separated sets and fast
multipole methods, in Appl. Numer. Math.
61(10):1096--1102,
2011: (doi).
The GPU implementation itself was described in A. Goude and
S. Engblom: Adaptive fast multipole methods on the GPU,
in J. Supercomput 63(3):897--918,
2013: (doi)
The idea of autotuning and the algorithm used is worked out
in M. Holm, S. Engblom, A. Goude, S. Holmgren: Dynamic
autotuning of adaptive fast multipole methods on hybrid
multicore CPU & GPU systems, in SIAM J. Sci. Comput.
36(4):C376--C399
(2014): (doi)
Stefan Engblom
Last modified: Mon Dec 29 13:53:56 MEST 2015