Machine Learning of Key Factors Impacting Extreme Precipitation in Various Regions of the Contiguous United States
Amplification in extreme precipitation intensity and frequency can cause severe flooding and impose significant social and economic consequences. Variations in extreme precipitation intensity, frequencies, and return periods can be attributed to a high dimension of physical factors across spatial and temporal scales. Here we employed ensemble machine learning (ML) methods, namely random forest (RF), eXtreme Gradient Boosting (XGB), and artificial neural networks (ANN), to explore key contributing factors to extreme precipitation intensity and frequency in six regions over the United States. We further established emulators for return periods. Results show that the ML models for intensity perform better in regions with obvious seasonality (i.e., North Great Plains, South Great Plains, and West Coast) than the other three regions (Northeast, Southwest, and Rocky Mountains), while for frequency the models perform well for most regions. The Shapley additive explanation is used to help explain the relationships between extreme precipitation characteristics and identify top factors for RF and XGB. We find that latent heat flux, relative humidity, soil moisture, and large-scale subsidence are key common factors across the regions in both intensity and frequency, and their compound effects are non-negligible. The developed ML models well capture the probability and return period of extreme precipitation for all regions and may be used for decision making (e.g., infrastructure planning and design).