Is there more than just data?

Hydroinformatics knowledge exchange meeting

The focus in Hydroinformatics (HI) soon tends to be exclusively on data. Indeed, they power the data-driven models that we use a lot in this field. But do we think enough about the physical and mathematical models in which model systems have traditionally been built using a greater understanding of the processes involved? Both types of model have their own advantages and pitfalls. This allows them to complement each other and combinations may be of added value. At the latest meeting of the Hydroinformatics platform on 2 May 2023, participants exchanged knowledge about the benefits and drawbacks of the application of established, new and hybrid modelling techniques such as ‘physically informed artificial intelligence’.

The Hydroinformatics knowledge exchange meetings are organised as part of the joint sector research (BTO) of the water utilities. An important goal is to exchange practical experience on the basis of a relevant theme. The latest meeting focused on the question of the added value of physical, mathematical and data-driven models.

Getting to grips with the world using different models

Models allow us to get to grips with the world. To kick off the meeting, Peter van Thienen (KWR) first described the concepts. Traditionally, systems have been modelled in mathematical-physical process equations. These can be scrutinised and the results can be extrapolated to situations that have not occurred before. The types of data-driven model which have been developed recently have sparked a minor revolution. These are powerful statistical models for the description of complex situations. No process information is needed but the description is valid only for the data range underlying the model. The situation in which a particular type of model can be used depends in part on the degree of data availability, and the understanding of the system.

When can a physical model be used and when is a data-driven model appropriate? Source:

The best of two model forms with hybrid approaches

Data-driven models can be combined in different ways using domain expertise. One way is by directly encoding the physical laws, as in the case of physics-informed AI. Other ways include the use of the data structure or simulating the operation of established mathematical algorithms. Dr. Riccardo Taormina (Delft University of Technology) is engaged in research with ‘physically informed’, ‘hybrid’ or ‘domain informed’ AI models for the water sector. The combination has a number of advantages.

  • Data efficiency and accuracy: fewer data are needed to train models that perform better.
  • Generalisation: it is possible to proceed from straightforward interpolation to extrapolation.
  • Robustness: models with domain knowledge are less affected by noise in the data.
  • The approach results in models that are easier to interpret/explain by taking an ex ante approach.

Domain-informed AI can be applied, for example, to accelerate EPANET simulations. This is a software application used throughout the world to model water distribution systems. Dr. Taormina’s lab (AIdroLab) is currently developing surrogate models for EPANET based on graph neural networks and deep unrolled neural networks. These techniques are used, respectively, to seamlessly process graph data from EPANET simulations (water networks are, after all, graphs) and to simulate the operation of the global gradient algorithm in the EPANET core. The answer to the question of which application is more useful – supporting data-driven models with physics models or vice-versa – is that this works both ways and that the field is developing rapidly.

AI in practice

Alex van der Helm of Waternet explained how data-driven models with ‘reinforcement learning’ are being used to reduce nitrous oxide (N2O) emissions at one of the seven aeration tanks at the Amsterdam West wastewater treatment plant of the Amstel, Gooi en Vecht water authority. Since 2016, sensors have been used to monitor, in real time, nitrous oxide in the waste gas from two aeration tanks at the treatment plant. In international terms, this is a unique dataset. The data were used to build a data-driven digital twin on which a data-driven ‘control agent’ was trained. The control agent determines the optimal setpoint on the basis of the conditions, taking into account nitrous oxide emissions, energy consumption and requirements for effluent quality. The control agent was implemented on one of the seven treatment lines. As a result, N2O emissions are significantly lower than in the line used for comparison purposes. This is a promising result that will probably result in a significant reduction in the climate footprint of the Amsterdam West plant.

Datagedreven modellen kunnen fysische modellen aanvullen door (onbekende) missende processen te identificeren en te modelleren.

Data-driven models can complement physical models by identifying and modelling (unknown) missing processes. Source:

High expectations

The knowledge-sharing meeting ended in the time-honoured way with a discussion. The benefits and drawbacks of both types of model were discussed first in separate groups and then with all the participants. In addition to endorsing the ongoing usefulness of physical-mathematical models, they expect a lot from domain-informed, data-driven models in the future because, as a systems technology, they will transform the field. In the future, AI may itself create the most relevant model on the basis of data and build its optimisation agents itself. The ‘ChatGPT’ application is an example of a system of this kind that comes up with its own solutions. Nevertheless, there are restrictions. There must be enough good-quality data that are representative for the system. In addition, data collection may involve high costs that are almost absent in the case of physics models. Furthermore, data-driven models behave like ‘black boxes’: it is difficult to determine exactly what decisions are based on. To alleviate these concerns to some extent, the combination of data-driven and physical-mathematical models is very useful. Developments in this area merit close observation.