GPU/CPU Performance Results on Exascale Architectures for OMEGA: The Ocean Model for E3SM Global Applications
We are writing a new ocean model, Omega: The Ocean Model for E3SM Global Applications, to take advantage of heterogeneous architectures on the fastest US Department of Energy (DOE) computers. Of the computers listed on Top500, the top computer, Frontier, is DOE-owned and heavily dependent on GPUs. Frontier provides a measured throughput (Rmax) of 1.2 exaflops and a theoretical peak performance of 2 exaflops. To run on Frontier and other DOE computers with heterogeneous CPU/GPU architectures, Omega is written in C++ and utilizes the Kokkos performance portability library. Omega is designed for variable-resolution unstructured meshes, and is the new ocean component of the DOE’s Energy Exascale Earth System Model (E3SM).
OMEGA is in the early stages of development; we are currently testing Omega-0, which is a layered shallow water model. Future versions of Omega will solve the three-dimensional primitive equations, add sub-grid scale parameterizations, and will be capable of running high-resolution, eddying simulations in realistic global domains.
Here we share results of Omega-0 verification and performance testing. Verification includes unit tests implemented with CTest as well as convergence tests in Polaris, an in-house python package with a large suite of test problems. Performance tests compare simulations conducted on CPUs versus GPUs and across different architectures: tests are run on Frontier, which has AMD “Optimized 3rd Gen EPYC” CPUs and AMD MI250X GPUs, as well as Perlmutter, which is composed of AMD EPYC 7763 CPUs and NVIDIA A100 GPUs. Performance is compared to MPAS-Ocean, the current ocean component of E3SM. MPAS-Ocean is written in Fortran with MPI partitioning and OpenMP, and includes OpenACC for GPUs.