Machine-Learning Emulation of the 4-km CONUS404 Downscaling
Downscaling large ensembles of GCM simulations to very high resolution is highly desirable, but computationally infeasible using traditional dynamical models. However, machine learning models have been able to outperform traditional weather forecasting models using orders of magnitude less computation. In this work, we are using the NCAR MILES group's new CREDIT framework for ML-based weather and climate models in an attempt to train a machine learning model to emulate convection-permitting WRF simulations that downscale the ERA5 reanalysis from 0.25° to 4-km resolution.
Specifically, we are attempting to emulate CONUS404, which is a long-term (40-year), high-resolution (4-km) regional hydroclimate reanalysis for North America (CONUS, plus its hydrologic neighbors). It was produced by the NSF National Center for Atmospheric Research (NCAR), in collaboration with the USGS Water Mission Area, to support scientific analysis of climate and hydrology by researchers and end-users. Its high resolution explicitly resolves many precipitation processes, which reduces uncertainty, improves climate representation in areas of complex topography, captures important events like mesoscale convective systems, and better meets stakeholder needs. Producing a similar downscaling for a GCM simulation running 1950-2100 would require more than 1.25 million core-hours on a supercomputer.
This presentation will discuss the results of that effort and the technical challenges we faced, focusing on the data pipeline and including some recommendations and lessons learned. We will also discuss the primary philosophical challenge that ML models face, which is whether the results are trustworthy and credible (i.e, are the results meaningful, or are they just persuasive nonsense) and propose some explainable AI (XAI) and trustworthy AI (TAI) approaches to answering that question.