Pipe condition modelling – more data is better, but no panacea

It is essential to bring and keep our water distribution networks to a high level of performance (i.e. hydraulically reliable (meaning good service), structurally sound (meaning low losses) and impervious to changes in water quality (meaning healthy water). Doing this in a resource-efficient manner (required due to the large size and replacement costs of the network) requires a clear picture and thorough understanding of our distribution infrastructure. When it comes to monitoring and understanding the condition of the network, a first approach may be to look at direct indicators for condition such as pipe breaks and background leakage. This top-down approach has limitations, though, as there rarely is sufficient data to gain detailed insight in de underlaying mechanisms that will ultimately lead to failure (and the better the integrity, the sparser this data will be in the first place).

The alternative is to model the condition of the network bottom-up, based on insight in the underlaying physics and on the available information. This too is not trivial, however, as the infrastructure mostly hidden underground, so that the information required is largely unreachable for humans. One way to gain the information required is to use inspection technology to look at the pipe from a distance. There are many inspection tools on the market that serve multiple niches, but more universally applicable and flexible robotic inspection tools should make all the difference there, once they reach maturity. Inspections are expensive and typically capture only a part of the information required, however, so that this should maybe be seen as a ‘last resort’ while more readily available information has not been properly considered, yet. Ideally, as many different relevant sources of information as possible are combined – including both inspection data, asset data and environmental data. This brings up the question of data availability, which is relevant both in the narrow context of this particular application and in a much more general sense.

Data availability

The argument for broad data availability is relatively simple. Nobody operates in a vacuum; all infrastructure is affected by and affects its environment. In order to investigate, understand, and ultimately predict the (operational, deterioration and failure) behavior of the infrastructure, data on its surroundings need to be included. The preferred way to do so is (or should be) using data from the agencies that are responsible for the particular aspects of the environment or external infrastructure in question. This is only feasible if relevant data is made available by all parties operating in the same spatial domain. It bears mentioning that sharing (the load of gathering) information will be especially crucial to our sector in light of the ever increasing complexity of construction and spatial planning in the urban environment, which requires more and more cooperation, transparency and mutual understanding.

However, data availability continues to be an issue. In the context of drinking water, this has been raised several times in the past years (e.g. here and here), both with reference to internal (within a single organization) and external (between organizations) data availability. But in particular also in a broader context, the call to make data findable, accessible, interoperable and reusable (FAIR) and machine-actionable has been getting stronger in recent years. Indeed, as the number of available datasets grows, the need for them being FAIR through digital tools is obvious, as humans quickly lose sight of what is out there.


As more data become available, we are inching towards knowing all that is relevant to, in our case, modelling the condition of an asset. However, many relevant factors will remain hidden (e.g. growing cracks in PVC pipes still are prohibitively difficult to find with remote technology) or lost forever (e.g. past pressure transients in the pipe). Because of this it will never be possible to deterministically model pipe condition for pipes that are already in the ground. Probabilistic modelling of pipe condition and failure likelihood is something which my colleagues have been working on for years (see e.g. this report). Not only does this generate a more balanced and nuanced picture of the condition of a pipe than a single number that is most likely not exactly right and may be quite far off; it also implicitly deals with the errors and uncertainties of various sorts that are likely to be present in the underlying data (see this whitepaper for a broader perspective). Probabilistic modelling is the way to go for many real-world applications with incomplete, uncertain and/or partially contradictory data. Not only is this important to get a better result from modelling exercises, but in particular also to better understand and communicate the value of particular model predictions for the decisions they are meant to support.

Bringing it all together

Our recently completed Midas project was aimed at considering and including  data availability and model uncertainty in the context of pipe condition modelling. It was shown how evaluating model uncertainty allows asset owners to determine how certain one can be about a condition estimate based on only internal company data and publicly available data about the environment from external sources. In turn, it was shown that this approach then also allows asset owners to objectively evaluate what the information provided by a certain inspection will mean in the bigger picture in terms of increased knowledge/certainty. In the project, which was a collaboration between Acquaint, Brabant Water, HDM Pipelines, KWR, Spatial Insight, Waterschapsbedrijf Limburg, and Waterschap Zuiderzeeland, we looked at the data needs and availability for pipe condition assessment and prediction. We also investigated the possibility of using surrogate parameters in case of data incompleteness and, last but not least, developed the propagation of uncertainty in model parameters in the condition models (illustrated in a very nice dashboard). Have a look at the project page, dashboard, and this stakeholder interview if you want to know more!