Deep Explorations: a explorative study for deep learning applications in the water sector

This project concerns an explorative study to assess the value of deep learning (DL) for KWR and the water sector in general. The different types of deep learning, their strengths and weaknesses, were investigated and considered. The technique has been applied to two case studies: (1) datamining in customer complaints received by drinking water utilities, and (2) datamining in infrared spectroscopy for microplastics analysis and polymer classification, which can later be extended to other areas, such as chromatography, UV adsorption and pattern recognition in data sets.

What is deep learning?

Deep Learning (DL) is a family of machine learning (ML) algorithms based on artificial neural networks, able to derive more complex relations between input and output than other ML techniques. As a result, DL turns out to be highly effective in applications like image and speech processing. In these applications, DL techniques beat more traditional artificial neural networks (ANN) relying on a limited number of hidden layers or other algorithms such as Random Forest or Support Vector Machine (SVM).

What is the value of deep learning for the water sector?

In this project we investigated the potential and value of DL for the water sector. This fits very well within the trend at KWR and the water sector as a whole, where more and more often ML approaches are used to answer (research) questions. To investigate this technology, we drafted an overview of the different type of DL and their strengths and weaknesses. This will allow the selection of topics or problems in which DL has high potential to contribute. Furthermore, two cases were selected based on the properties of DL, namely that it excels when complex relationships exist between input and output and/or when large amounts of data are available.

These cases are:

  • The analysis of text data of customer communications with the water utilities, and to automatically extract the topic of the message. These topics could be cross reference with data from pipe failures and network maintenance, if available. This allows for early detection of several problems and decreased service in the water distribution network. At the same time, it would help to improve customised customer contact.
  • The analysis of infrared spectroscopy data in the field of microplastics identification. The output of this use case will consist in various deep and ensemble learning models for accurate classification of polymer of microplastics found in the environment as well as the knowledge and skills to derive these models. This will allow to implement advanced data analysis and classification algorithms to complex chemical data. The impact of this research will be substantial as the experience acquired will be relevant for numerous other research areas within KWR (e.g., other chemical analysis, forecasting of toxicity of chemicals as well as the removal efficiency based on quantitative structure activity relationship (QSAR) models).

From the two application cases we showed the value of deep learning and ensemble learning in automating multiple activities that are currently performed manually. We demonstrated that natural language processing powered by deep learning is an effective tool for automating text processing. When applied to a case study involving customer complaints about water-related nuisances collected by a Dutch water utility, the used algorithms were able to understand the emotions and requests of customers based on the textual description of the complaint. Meanwhile, we showed that by combining Laser Direct Infrared spectroscopy and ensemble machine learning it is possible to identify polymers in water samples. In this case, a limited number of labeled microplastic samples was used to identify more samples whose types were initially unknown. These findings suggest that in general artificial-intelligence-aided techniques can greatly support and promote analytical chemistry and water research, which is of interest to researchers who aim to classify polymers (based on classification models) or predict their bio-chemical behaviors (based on regression models).

A new tool in the researcher’s toolbox?

With the experienced gained with these use cases we have also gained insight into the possible applications of DL within the research of KWR and the water sector.  It will be recognized when it is a proper tool to address different problems across the water sector. These possible applications are by nature of the technique, across disciplines. As an outcome we aim to make this technique more accessible within the organisation. As such, it can become a valuable tool for KWR researchers – a new tool in the researcher’s toolbox –  to better serve our clients – the water sector.