Big integrated data

Assessment of added value and the use of external datasets during Hydroinformatics knowledge exchange meeting

The flow of data from publicly accessible databases is expanding rapidly. We have data about water quality, for instance, or about weather conditions and the environment, from institutions like the European Space Agency, the Royal Netherlands Meteorological Institute (KNMI), Rijkswaterstaat and Statistics Netherlands. How can water utilities make the most of these well-known data hubs or repositories, and their less prominent counterparts? And how can those date be used effectively and usefully, for example in combination with our own monitoring data? These questions about the applications of ‘big integrated data’ were discussed at the knowledge exchange meeting of the Hydroinformatics theme group on 5 September. When the internal processes at water utilities are designed to be more data-driven, the data can be used more effectively, was one of the conclusions.

The Hydroinformatics theme group organises regular knowledge exchange meetings in the context of the collective research programme of the water utilities (BTO). An important goal is to exchange practical experiences on the basis of a relevant theme. The latest meeting focused on the question of the added value use of external datasets.

Biggest challenge

The meeting kicked off with an online survey that allowed participants to share their experiences with data integration. The survey made it clear that people mainly associate big integrated data with the potential opportunities. The participants thought the biggest challenge was related to data integration and data quality, the lack of standardisation and how to interpret external data. But some of them also thought that finding suitable data at all is a problem.

Finding and using data

In her presentation during the meeting, KWR researcher Tessa Pronk emphasised the importance of data availability for efficient projects: “Data can be found in many places: public repositories, individual institutions, government and the internet, for instance.” She provided examples as a basis for a discussion of ways to use the data. A Creative Commons license or user agreement makes explicit the conditions for using data. To work in reproducible ways with the external data, consideration must also be given to the period and conditions relating to availability. When opening up access to your own data through data publication, a standard like Aquo can facilitate data exchange.

Open data to enrich your own data

A great example of data integration in practice was given by Sjoerd Rijpkema, a geohydrologist at the Groningen Water Utility. The researcher explained how a Tableau Dashboard makes it possible to combine your own data from measurements of discharge and water levels with KNMI data such as precipitation-evaporation, and with detailed data in the form of a weather plume, precipitation-discharge models and water-level measurements from the water authorities. These data are stored in a data warehouse in order to determine water quality later in the form of relationships between concentrations, seasons, drainage and precipitation. Machine learning (and more specifically, random forest) is used at the Groningen Water Utility to transform data into knowledge about future developments. In this way, external data contribute to insights into variable water quality; this is a valuable development for the water utilities.

GIS as a connector

KWR researcher Herbert ter Maat explained how GEO and other available GIS data can be used for the water sector. He gave numerous examples of sources of this kind and demonstrated how they can be used. The list included: ArcGIS, EsriNL, StatLine (CBS), Geotop models, groundwater levels, the BRO basic subsurface register, 3D BAG, KNMI data (climatology, climate explorer), international datasets like those available from Copernicus, and Sentinel-2 download and visualisation options. The participants were shown where to find a wide range of possible data sources with this overview.

Potential not yet fully exploited

The knowledge-sharing meeting ended in the time-honoured way with a discussion. It emerged that the effective use of data in an organisation depends primarily and vitally on improving data maturity at drinking water utilities. That requires designing internal processes on the lines of a more data-driven organisation. Data quality also has to be improved across the board. To locate data accurately in a water utility, useful sets could be described in a data catalogue. The discussion also looked at the difficulty resulting from the differing backgrounds and knowledge levels of employees. To make data integration more possible in practice, it is important to pick up inspiration from user cases and to share experiences. This knowledge sharing meeting certainly made a useful contribution to the latter.

Response to a question in the online survey during the HI Knowledge Exchange.


Associations with ‘big integrated data’ during the HI knowledge exchange.