Probing the limit of hydrologic predictability with the Transformer network
Although the basic Transformer deep learning model architecture was not suitable for hydrologic modeling, the modified (recurrence-free) Transformer architecture was a rare competitive architecture to the current top hydrological model architecture, Long Short-Term Memory (LSTM) when evaluated on a 671-basin streamflow dataset. The performance of these state-of-the-art models may be close to the prediction limits of the dataset. We suspect that unless we bring in new information, it is highly unlikely for any other models to produce noticeable advantages beyond these two models on this dataset, for the tests presented here. Errors with forcings, basin shapes, attributes, and discharge data are likely the remaining factors preventing higher performance.
Rainfall-runoff modeling is essential for flood prediction, water resource management, and environmental protection. Data-driven modeling based on the LSTM deep learning algorithm has been consistently surpassing traditional statistical and process-based models for hydrological predictions. In many applications outside hydrology (including speech recognition and computer vision), however, the Transformer architecture has demonstrated superior performance to LSTM. This work was undertaken to determine whether this popular deep learning architecture might be able to further improve hydrological predictions for the benefit of society.
For a number of years since their introduction to hydrology, recurrent neural networks like long short-term memory (LSTM) networks have proven remarkably difficult to surpass in terms of daily hydrograph metrics on community-shared benchmarks. Outside of hydrology, Transformers have now become the model of choice for sequential prediction tasks, making it a curious architecture to investigate for application to hydrology. Here, we first show that a vanilla (basic) Transformer architecture is not competitive against LSTM on the widely benchmarked CAMELS streamflow dataset, and lagged especially prominently for the high-flow metrics, perhaps due to the lack of memory mechanisms. However, a recurrence-free variant of the Transformer model obtained mixed comparisons with LSTM, producing very slightly higher Kling-Gupta efficiency coefficients (KGE), along with other metrics. The lack of advantages for the vanilla Transformer network is linked to the nature of hydrologic processes. Additionally, similar to LSTM, the Transformer can also merge multiple meteorological forcing datasets to improve model performance. Therefore, the modified Transformer represents a rare competitive architecture to LSTM in rigorous benchmarks. Valuable lessons were learned: (1) the basic Transformer architecture is not suitable for hydrologic modeling; (2) the recurrence-free modification is beneficial, so future work should continue to test such modifications; and (3) the performance of state-of-the-art models may be close to the prediction limits of the dataset. As a non-recurrent model, the Transformer may bear scale advantages for learning from bigger datasets and storing knowledge. This work lays the groundwork for future explorations into pretraining models, serving as a foundational benchmark that underscores the potential benefits in hydrology.