Scream Therapy: Lessons Learned Tuning GPU Kernels
We present an end-to-end case study of tuning specific GPU kernels for the Simple Cloud-Resolving E3SM Atmosphere Model, from initial performance profiling to inclusion in production simulations. We describe challenges in improving performance, designing maintainable source code, preserving portability, and testing correctness. Based on our experiences, we recommend particular GPU development strategies that may help others surmount these challenges, and we warn against particular strategies that may not.