Data mining (the search for statistical connections in databases) is one step in ‘Knowledge Discovery in Databases (KDD)’. Both have already been embraced by the marketing, medical care, ICT and financial sectors, but their implementation in the water sector is so far limited. This, despite the fact that they potentially offer new possibilities for the water sector to improve its asset management. Water companies therefore want a better understanding of the added value that data-driven analytical methods could represent for them, and seek a knowledge base to guide them in the decision as to whether or not to implement KDD in their asset management.
With its extensive survey of asset management knowledge issues, a literature study of data mining, and the initial practical experiences from two TKI projects, this project represents the first, cautious step towards data-driven operational management for the water companies.
Collecting information on demand, supply and initial practical experiences with data mining
To begin with, a literature study was conducted into the main characteristics, possibilities and limitations of data mining. Also, the ‘knowledge needs’ were identified in consultation with asset managers at water companies. During a workshop at KWR, the knowledge issues were further prioritised in terms of their urgency and importance. An inventory was thus made of both data mining demand and supply. This inventory was used to gain insight into the most promising application areas for data mining for purposes of water infrastructure asset management. Moreover, the results of two TKI projects were used in this work as models for the implementation of data mining/KDD for asset management issues.
Data mining requires streamlined data management
The key lessons from already completed data mining projects are: (1) the importance of feature engineering (enriching an existing dataset with parameters derived, for example, from model simulations or calculations); (2) the importance of collaboration with professional experts in the field of water infrastructure and operational processes; and (3) the limited amount of available data.
One cannot yet speak of big data when it comes to the water companies. Still, because of the steady increase in the numbers of sensors in the distribution networks, smart meters, growing databases with measurement data, and failure registrations, more and more possibilities will open up to employ data mining to answer questions of relevance to the drinking water sector.
Water companies can use the collected knowledge when deciding whether or not to implement data mining and data-driven analytical techniques to improve their operational asset management. Given the (qualitative and quantitative) limitations of the currently available data, caution is called for with regard to data mining ambitions. However, data cleansing acts and data quality controls within the companies could counter the limitations.