Exascale Deep Learning for Climate Science
In recent years, our group has developed Deep Learning capabilities to perform binary classification and localization tasks for extreme weather patterns. Recently, we have developed an advanced architecture to segment pixel-level masks of patterns (hurricanes and atmospheric rivers). We scale this architecture on Summit at OakRidge National Lab; the largest GPU system in the world. We train the network on 15360 Volta GPUs, and obtain a peak performance of 263 PF/s. In the AMS timeframe, we expect further scaling and code enhancements to achieve a performance level closer to 1 ExaFlop. The network trains to convergence in 100 minutes. We develop a number of innovations spanning software frameworks, I/O, and machine learning algorithms to pull off this unprecedented level of performance.