Prediction of Extreme Precipitation Occurrence with Machine Learning: Insights from Multiple Reanalysis Data
Extreme precipitation can impose significant social and economic consequences by causing severe flooding and landslides. Our previous work has demonstrated that data-driven deep learning (DL) methods like convolutional neural networks (CNNs) are more powerful in identifying the large-scale meteorological patterns (LSMPs) associated with extreme precipitation and achieve more skillful prediction of the extreme precipitation occurrence (EPO) across several US regions and seasons than conventional statistical approaches. This study aims to tackle some remaining challenges from our previous work, mainly the prevailing tendency of trained CNN models to overpredict the EPO and lead to a biased forecast system. In addition, we 1) explore different sampling strategies to split training and testing datasets and assess how these strategies affect predictive performances of CNN models; 2) experiment the training of CNN models with the LSMPs from multiple reanalysis datasets together, which allows for understanding the performance of each reanalysis data and associated uncertainty; and 3) examine the relationships between precipitation intensity and LSMPs associated with extreme versus non-extreme precipitation events to understand the underlying physical causes. We use California as a case study to demonstrate a proof of concept, but the presented framework is readily applied to other regions of interest.