From Calibration to Parameter Learning: Harnessing the Scaling Effects of Big Data in Geoscientific Modeling
Instead of using machine learning to predict geoscientific variables directly, we've harnessed machine learning to predict physically meaningful parameters for computational geoscientific models. This allows the models to get much higher quality parameters while at the same time allowing the framework to output variables that are not extensively observed, which pure machine learning models cannot do. Meanwhile, we found a beneficial scaling curve not demonstrated before -- with more training data, our framework shows higher quality, better generalization, and lower cost.
We show the framework can obtain more physically-meaningful and robust parameters due to having a stronger global constraint. It also can reduce the computational demand by orders of magnitude -- the job that would have taken 100 processors 2-3 days with the traditional approach now takes one graphical processing unit (GPU) one hour.
The behaviors and skills of models in many geosciences (e.g., hydrology and ecosystem sciences) strongly depend on spatially-varying parameters that need calibration. A well-calibrated model can reasonably propagate information from observations to unobserved variables via model physics, but traditional calibration is highly inefficient and results in non-unique solutions. Here we propose a novel differentiable parameter learning (dPL) framework that efficiently learns a global mapping between inputs (and optionally responses) and parameters. Crucially, dPL exhibits beneficial scaling curves not previously demonstrated to geoscientists: as training data increases, dPL achieves better performance, more physical coherence, and better generalizability (across space and uncalibrated variables), all with orders-of-magnitude lower computational cost. We demonstrate examples learned from soil moisture and streamflow, where dPL drastically outperformed existing evolutionary and regionalization methods or required only ~12.5% of the training data to achieve similar performance. The generic scheme promotes integrating deep learning and process-based models without mandating reimplementation.