Performance Analysis of Fully Explicit and Fully Implicit Solvers Within A Spectral Element Shallow-Water Atmosphere Model
A Trilinos library based implicit solver that uses the GPU within the residual has been implemented within the spectral element CAM. The implicit solver provides accurate solutions for a range of problem types and scales >86,400 cores. The Newton-based implicit algorithms have been evaluated at this scale of complexity for atmosphere fluid flow applications for a wide variety of configurations.
The implicit solver is able to use large time step sizes such that no subcycling of tracers and physics is needed. Also, it shows equal performance to explicit for strongly regionally refined configurations. There is potential for increased efficiency using the GPU, however more development of the interface between the solver library and the code is needed.
Several methods utilizing a Newton–Krylov nonlinear solver are evaluated for a range of configurations of the shallow-water dynamical core of the spectral element community atmosphere model to evaluate their computational performance. These configurations are designed to explore the attributes of each method under different but relevant model usage scenarios, including varied spectral order within an element, static regional refinement, and scaling to the largest problem sizes. The limitations and benefits to using explicit Runge-Kutta versus implicit multistep methods, with different parameters and settings, are discussed in light of the trade-offs with Message Passing Interface (MPI) communication and memory and their inherent efficiency bottlenecks. The recommendation for future work using the implicit solvers is conditional based on scale separation and the stiffness of the problem. For the regionally refined configurations, the implicit method has about the same efficiency as the explicit method, without considering efficiency gains from a preconditioner. Initial simulations with OpenACC directives to utilize a Graphics Processing Unit (GPU) when performing function evaluations show improvements locally, and that overall gains are possible with adjustments to data exchanges.