Poor data quality is frequently identified as one of the obstacles to the successful exploitation of data mining techniques (e.g., exploratory research into data mining), for example, for analyses, monitoring, process management or decision-making support. In 2018 a research project was dedicated to this subject within the Hydroinformatics Platform. The project reported on current practices and opportunities for improvement, and emphasised the importance of bringing together domain and expert knowledge in the field of data validation.
Data-quality improvement needed
Drinking water utilities apply data-based models and techniques, for example, in their analyses, monitoring, process management or to support decision-making. But poor data quality frequently hampers the successful exploitation of these techniques. Improving data quality is therefore an important first step in their application. This project aims to develop a strategy to enable the conduct of data-quality control within an existing data environment.
Pilot in data validation and safeguarding
A pilot will be developed and run at a water utility. Improvement in the validation and safeguarding of a selected data stream should result in a better data management system. Such a system is highly desirable, for instance, for anomaly detection, algorithms for predictive maintenance, and intelligent (early warning) monitoring and/or control systems. With a well-functioning data management system, water utilities will be better placed to use (existing) data to optimise their operational management at the operational, tactical and strategic level.
Strategy for data-quality control
A strategy for data-quality control will be realised within the developed pilot by means of:
- a demonstration of the operational application of (more) advanced data validation techniques;
- a representation of the work method developed in a phased plan, which will allow other water utilities to implement the generic framework;
- knowledge transfer through reports, a source code, a publication in a trade journal, and a presentation that can be used by members of the Hydroinformatics Platform.