Improving the reliability and practicality of an AI-based tool for data validation and correction

Water companies are increasingly using sensor data for advanced process monitoring and control for key variables. They are therefore increasingly reliant on high-quality data. In this project, the technological readiness of an AI-based tool developed in the past for data validation and correction is being raised to a level that makes it suitable for implementation in the current production environment of water companies. The goal is to improve the reliability and practicality of the data validation tool and knowledge exchange about the applicability of the tool.

In practice, the quality of the data on which drinking water companies rely is not always adequate for sound decisions or automatic control because of factors such as complications with sensors or loss of data. The manual processing and correction of these data is labour-intensive, and the drawback of manual processing is that the corrected sensor signals cannot be used for advanced real-time control. In addition, existing data management and process monitoring software is not yet adequately equipped with data validation tools. A European project, FIWARE4water, therefore developed a proof-of-concept for AI-based data validation and correction (DVR).

This DVR tool is specifically tailored to the FIWARE open data model system concept on the one hand, and to the data from a selected wastewater treatment plant owned by Waternet on the other. However, at present, there is no water company that uses FIWARE in its data systems. In addition, many water companies are not (yet) willing to make that switch. The reliability of the current tool could be further improved, for example in the area of imputting the expected sensor data when there is a prolonged loss of sensor data. That is why, in the collective research programme for the water sector (BTO), there have been calls to raise the technological readiness of this tool, through ongoing development, tests and demonstrations, to a level that is suitable for implementation in the current production environment of the water companies.

Reliability and practicality of data validation tools and knowledge exchange

This project is working on the following areas:

  • Improving the reliability of the AI-based DVR tool
    The reliability of the tool is being improved by retraining the current AI-based models with new datasets. The expectation is that the dynamics of sensor signals can be predicted better. In addition, the current methods for the detection of sensor-based anomalies are being updated. Where necessary, new techniques will be integrated to improve performance.
  • Improving the practicality of the AI-based DVR tool
    The tool is currently running on a development server in Waternet’s IT system. This is being done in combination with FIWARE-based components, as was expected for the Fiware4Water project. For wider use in the present systems used by the water companies who are interested, the tool is being made independent with its own database and API wrapper that can be modularised in Docker containers. Furthermore, automated training is being developed for the AI-based models in the Waternet user application. This allows the models to be easily retrained with training datasets containing new operational conditions. The various components of the AI-DVR tool are modularised separately in Docker containers. This allows other water companies to also consider implementing the tool (or components of it). The first priority is the modularisation of anomaly detection. By collecting a series of test datasets from the interested water companies, the anomaly detection component of the tool can be further challenged with respect to benchmarking issues. This makes it possible to assess performance and the transfer potential. Finally, the different types of anomalies that are currently identified during detection are labelled for the purposes of straightforward dashboard visualisation.
  • Knowledge sharing and assessment of application
    On the basis of experience with the development and application of the AI-based DVR tool in Waternet’s system, specific lessons are being drawn that will allow for the transfer of the tool to other water companies. The results and potential user applications are being shared in a report that will be sent to other companies in the Dutch and Flemish water sector.

Technology ready for use

The aim is to use the DVR tool as a near-real-time screening layer in existing process monitoring and/or control systems. The tool checks data quality and reconciles anomalous values with model predictions. In this way, data can be screened and corrected before being used by data-driven models (such as digital twins). Screening of this kind is valuable for water companies who wish to adopt a more data-driven strategy.

This project aims to build on, improve and extend the AI-based DVR tool to improve its reliability and practicality, and make the technology ready for use. In addition, the viability of the AI-based DVR tool is being evaluated by other companies in the Dutch and Flemish water sector. The project scope does not include enabling and modularising transfer learning, in other words retraining models with other data by adopting parts of the model structure and automating this process. Instead, the methods used are being well documented. Insights about lessons learned with respect to the development and implementation of AI-based tools in the IT architectures of water companies will be shared.