Machine learning predictions outside our comfort zone

Machine learning (ML) has given us a new tool to better predict hydrological events and their impact on our urban water systems. (see for example link, link). They can provide us with more accurate predictions of system behavior (water demand, pipe failure, etc.) with far less information about the system itself and, depending on the case, potentially fewer computational resources than physics-based numerical modelling. In these applications of ML, the algorithm is taught how a natural or physical system behaves through exposing it to a large number of examples in a process called supervised learning. The success of its application hinges on the use of a suitable training dataset of sufficient quality, which encompasses all aspects of the system behavior that we want to be able to predict.

And that is a major point of concern. Our changing climate is presenting us with new weather extremes that directly impact our water system time and time again. For example, the heat wave that rolled over western North America (“virtually impossible without human-caused climate change”) beat Canada’s prior record temperature high by an astonishing 4.5 degrees Celsius (link). And the torrential rains that hit Belgium, Germany and the Netherlands in July 2021 (“made more likely by climate change”) locally showed a 48 hour precipitation sum that exceeded the highest registered during the last “comparable” event in Belgium (dating from 1996) by almost 90 mm (or +50%, link).

These examples illustrate that the new extremes may be far beyond the previously established ones. This implies that they are also far outside the range that is represented in training datasets for any predictive ML model that has been prepared for any particular aspect of the local or regional water system.

Researchers are aware of this, explicitly or more implicitly, but major questions are whether this blind spot is adequately dealt with, and how to do this in the first place. Historical time series for weather parameters such as precipitation can be transformed to represent the statistics of climate model predictions and used as training datasets. However, the Dutch national meteorological institute KNMI also indicates that the observed increase in likelihood and intensity of extreme precipitation events outpaces model predictions. Several methods exist for dealing with imbalanced datasets, in which the extreme nature of events is characterized only by their low frequency of occurrence (see this blog post) and not by their unprecedented magnitude. Work has also been published on using deep convolutional neural networks to predict extreme behavior on the basis of a training dataset that does not include these extremes in a turbulent dynamical system. And in a more general sense, the field of exploratory modelling has seen a range of methods emerge to deal with deep uncertainty in model predictions and decision making (see this paper for an overview).

I lack the expertise and experience to say to what degree this is an issue in for example flood modelling; I can only guess. However, there are many other topics in the water industry in which the application of machine learning is on the rise and which are likely to be partial to the issue raised in this blog. In particular I would like to mention the prediction of water demand and water availability, and the response of water infrastructure to weather extremes (e.g. pipe failure).

Only if we assume that the relation between input parameters and the output (prediction) is similar for normal and extreme events, we may feel safe to trust predictions that are (slightly) beyond the range of the training data. So there may yet be a major role for physics-based models that are capable of looking and predicting beyond the established bounds of system behavior, and/or for hybrid physics-AI models. Also, the application of new methods like the deep convolutional neural networks mentioned above to water sector topics merits our full attention and effort.

Machine learning has already given us so much insight in how systems work, and we can expect much more from this branch of research that continues to develop at a rapid pace. But we need to stay alert. And to conclude, (obviously) we need to also strengthen our efforts to mitigate climate change, in order to limit the magnitude of the new extremes that are ahead of us.