This is the second installment by John Cook and Edwin A. Rohl, Jr. of Advanced Data Mining International. Their water quality research has enormous impacts to water security and event detection to improve the knowledge of operators, and ultimately provide benefits to public health. I hope you enjoy!

“Spot” sampling is a common practice in the water industry. The frequency of spot sampling can be based on: regulatory requirements, such as weekly and monthly total coliform and quarterly disinfection byproducts reporting; a matter of convenience, such as once per shift; or it can be more arbitrary. It is easy to imagine that problem conditions such as a very low disinfectant concentration or an overdosing could be missed by a spot sampling regime that is insufficiently frequent. SCADA systems that poll online instruments at adjustable frequencies can readily manifest problems, which are evident for tens of minutes or hours. However, a problem that might be apparent for only a few minutes, such a contaminated slug traveling past sensors in a pipe, would be undetectable at a 5-minute polling interval. Therefore, to effectively monitor a process, utility personnel must answer questions about the types of problems they want to detect: what types of sensors are needed; where should they be placed; and what sampling frequency is needed?

The sampling frequency can be estimated by evaluating time series data directly. A starting point is provided by the Nyquist-Shannon sampling theorem, which dates from the 1940’s. Harry Nyquist was a contributor to the field of “information theory”, whose establishment is credited to Claude Shannon of Bell Labs. The theory provides much of the foundations for the fields of signal processing and computing. The theorem suggests that the sampling frequency needed to detect a behavior must exceed half the behavior’s duration. For example, above in Figure 1 is a chlorine residual signal that was polled at 1-minute intervals. A number of “features” have been labeled in the detail below. According to the sampling theorem, the 2-minute feature would require a sampling frequency higher than the 1-minute poling frequency, while sampling frequencies for the 23, 75 (1.25 hr), and 90 (1.5 hr) –minute features would be 11, 37, and 44 minutes when rounded down to the nearest minute. A polling frequency of say, 30 seconds would be needed to make them all detectable. The reason these features are so visually discernable is because of large trend shifts that are not always present in process signals. An automated detection scheme would leverage multiple ways that signals are characterized.

The sampling theorem is really true only for idealized cases and most applicable to signals with limited noise. The field of chaotic time series analysis offers an approach that better accommodates noisy signals. The approach suggests that the sampling interval be the “time delay” t_{d} at which a signal is statistically independent of a copy of itself (not auto-correlated). Parsimoniously, each new sample carries unique information that is separable from the accompanying noise. As shown in Figure 2, t_{d} for a sine wave is ¼ of its period, whereas the sampling theorem indicates an interval twice as long. This indicates that if the sine were to degenerate into a noisy signal, shorter sample intervals would be needed to manifest the remaining information.

We are reminded of wastewater plant operators who take a single chlorine effluent sample per shift, only to find regular fecal coliform permit violations. And one can see that the spot sampling frequency, which appears simple on the surface, is really quite challenging in order to be able to detect the behavior of a water quality time series. Further, one can readily see the importance of its frequency in helping to maintain water quality.

Question,

at figure 1 chlorine residual (mg/L), is free or total?

Because (i think), if is total, is too few. And if is free, is too much.

Here (Chile), the free chlorine measurement above 0.5 (mg/L) indicates no Coli. As a rule of thumb. So we take samples once a week, just for information.

Thanks.

@Karim,

The chlorine residual is measured in total chlorine as the system uses chloramination. The chlorine residual is therefore not really too high or low for this particular utility,especially in warmer weather conditions.

The regulatory assumption that maintaining a free chlorine residual of 0.5 mg/L ensures no total coliform (I assume total versus fecal), is not always valid. It has been my observation that there are many times in which turbidity levels on a distribution system can be well above 1.0 NTU. In theory, this would make it very difficult to properly disinfect as inactivation of pathogens could be easily shielded by the insoluble particles in the water.

While the regulatory assumption may be true under low turbidity conditions, it is a leap of faith to assume it is true under all system operational conditions. Respectfully, John

Hi Noah,

This is a re-post of a comment I made earlier on LinkedIn.

My experience in data collection tells me to take as many samples as practical for the conditions. Taking a measurement is like taking a picture from a camera. The result is a static representation of a condition in that time frame. A single sample gives the minimal data. Multiple samples gives a more complete picture of what is going on with a sensor.

Consider this – it takes a little more effort (but costs the essentially the same) to collect the same measurement under different time-frames. The same sensor measurement under different sampling schemes can give both an instantaneous reading, an averaged reading and if computed – Standard Deviation of that average.

Where this equates to the real-world is how an operator will interpret the data that is collected. Given the scenario of a single sample taken at fixed intervals, a noisy measurement will give readings ‘all over the place’ or may give extreme indications. The next step up is to take multiple samples and compute a mean average. This works well as far as giving more repeatable data for an operator but can mask-out a sensor specific condition that you should be aware of. Computing a Standard Deviation for the Mean Average reading gives you the ‘dynamics’ of the Mean Average reading. Looking at both components of Mean Average AND Standard Deviation gives you an accurate portrayal of what the sensor sees.

Simply put – an operator with a sharp eye can distinguish by looking at a sensor in real-time the amount of ‘noise’ in the measurement and where the reading range ‘should be’ for a given condition. If given a data set to look at, one can draw different conclusions from ‘live’ samples to ‘averaged’ samples. Adding the third component of Standard Deviation gives the true condition (stability) of the reading.

Peter (in his LinkedIn post) also made the point on ‘real-time’ readings. No reading is truly ‘real-time’ in the sense of things. Real-time is actually the last measurement that was received by the data base. In the case of an averaged reading – for the last time-frame of multiple samples.

In data collection, the ability to collect meaningful data is a mix of how you measure and how you sample. (end of earlier LinkedIn remarks made by me)

The above article by John Cook is spot on. Thanks for sharing it.

At Finland based Liqum Water Technology we produce a patented wireless, on-line, real-time water quality change monitor. Monitoring the electrochemistry, it alerts instantly to deviations from the desired quality profile of the water. There is a user screen, where the quality-change data is displayed 24/7 and up-dated every 60 seconds, but the system can also be configured to trigger e-mail and text message alerts to key personnel. Our LEW-100 monitor has recently won an innovation award, presented by the President Of Finland. The LEW-100 is used in sectors such as mining, beverage production, bio energy, paper-making, and environment; in Finland it’s already used by two water utilities and others are evaluating it. Our technology can replace or complement sampling routines, depending on the environment and circumstances. Please take a look at http://www.liqum.com.

I went over this site and I think you have a lot of fantastic info, saved to

bookmarks.