Text mining for the early detection of relevant water pollutants

Long before substances are included in regulations and monitoring programmes, text sources such as reports, social media, press releases, regulatory agency websites or the scientific literature can provide indications that they are possibly being emitted to the water system.

Reading all the potentially relevant information is time-consuming, particularly because the availability of digital information increases year on year and information is fragmented. ‘Text mining’ (TM) can be used to detect substances semi-automatically when there is a possibility of them being emitted to water (now or in the future). These substances can then be put under additional scrutiny.

The aim of the research project is to build up an approach for text mining to identify new, potentially problematic, chemicals for Dutch waters based on a range of information sources. This information relates to possible new emissions that are relevant for the Netherlands by, for example, industry, households, or agriculture. This can be considered part of an early warning system to detect and identify processes (new and established) that will emit substances that may cause problems with water quality. In addition, it can be used to extend monitoring programmes.