Towards Exascale Deep Learning for Climate Science
We develop an advanced Deep Learning architecture to segment pixel-level masks of extreme weather patterns (hurricanes and atmospheric rivers). We scale this architecture on the largest GPU system in the world: the OLCF Summit system. We train the network on 15360 Volta GPUs, and obtain a peak performance of 263 PF/s. The network trains to convergence in 100 minutes. We develop a number of innovations spanning software frameworks, I/O, and machine learning algorithms to pull off this unprecedented level of performance.