Machine Learning of Modal Aerosol Microphysics in E3SM: Evaluating Data Sampling Strategies
Microphysical processes are an important part of the aerosol life cycle, but detailed process-level representations with high numerical accuracy can cause very high computational burden for global atmospheric models, especially at high resolutions. Emulating aerosol microphysics through machine learning (ML) algorithms offers a promising approach to providing computationally inexpensive yet physically realistic and numerically accurate representations for global models like the Energy Exascale Earth System Model (E3SM). We are taking a first step in exploring this solution by emulating the parameterization suite used in E3SM version 2 (v2), using E3SM output for training, validation, and testing of the ML models.
Previous work by Harder et al., (2022, 2023) on emulating the M7 aerosol microphysics suggests that achieving desired performance and generalizability requires training data with a comprehensive description of the aerosol distribution. They used sub-sampling to obtain training data that covers spatial, diurnal, and seasonal variations. Our exploration has a similar need for sub-sampling: ten years of 3-hourly E3SM output with horizontal grid spacing of 200 km and 72 vertical layers would yield about 45 billion independent samples. In this work, we aim to reduce this size by a factor of 10,000 or more while still capturing the probability distribution of the variables relevant to the key processes.
This presentation will report on the progress of our work in three aspects: (1) analysis of the probability distributions of E3SM variables and their spatial and temporal variations; (2) selection of metrics to quantify the agreement between the probability distributions of the sub-sampled and the original datasets; and (3) exploration of different data reduction methods. Results from this work are expected to provide a solid foundation for ML-based emulation.
References
Harder et al. (2022), doi:10.1017/eds.2022.22
Harder et al. (2023), IAMA Conference presentation