Hi Water Quality and Water Security Followers,
We have another series dealing with Water Quality that will feature Eyal Brill, PhD. He is the Chief Developer of the BlueBox Intelligent Event Detection System from Whitewater’s Quality and Security Division and Professor at Holon Institute of Technology. He will be providing us with his insights on statistics and modeling techniques used by the water industry in order to detect water quality events. This will be a bi-monthly series. I hope you all enjoy.
A water quality event is a situation in which the quality of water prevents safe distribution for drinking water purposes. When such a situation occurs a few rules of thumb are of keen interest:
- It is highly important to provide an early warning to such events.
- Ideally, we would like to predict an event before it happens (ie. in the case of pipe bursting)
- Event types should be classified as organic, inorganic, or a water source change.
The spatial effect of the event should be evaluated, in order to assist decision makers in the situation management process.
…That being said,
Water quality measurement values are assumed to have “normal distribution”. The term normal refers to a situation in which 99% from the samples should be found around +/-3 standard deviations around the average. Is it true?
Let’s examine some data from the actual world. Figure 1 shows turbidity distribution from actual customer site. The distribution is based on 40,000 samples (polling interval 1 minute) taken over a period of several months.
The horizontal axis shows the turbidity measurement. The vertical axis shows frequency. The red line is continuous fit for normal distribution. The green line is continuous fit for gamma distribution. Even untrained eye can detect that the green line fits the discrete distribution better than the red line. So what is the reason for this?
Normal distribution assumes that samples from both side of the average have equal probability to occur. In our sample the average is 0.3 with standard deviation of 0.26. Thus, according to the definition of normal distribution a value of -0.1 should occur also. However, we know that turbidity with value of -0.1 does not exist!
Moreover, according to the definition of the normal distribution 99% of the samples should be in the boundary of a value less than 1.08 (0.3 plus three standard deviations). However, we know from actual data that more than 1% of the samples have value greater than that!
Gamma distribution on the other hand fits more the description of the data. This distribution is a member of a family of distributions called “long tail” distributions. This family fits situations in which some rare members may be 5,10 even 20 standard deviation from the average.
The most common example for such distribution is income distribution. Assuming that you are earning the average income, haven’t you seen people with twenty times more that your income(?!).
Conclusion: water quality measurements in many cases do not have “Normal distribution”. As such its analysis should adjust. Otherwise valid samples will be classified as abnormal mistakenly.