data analytics, Data to Insight, Forecasting

Forecasting Water Demands: Part IV

Let us briefly review what we learned from Edwin Roehl and John Cook of ADMI in Part III. It is well known that the most common empirical approach to demand forecasting is ordinary least squares (OLS), which relates variables using straight lines. Now, over long time periods, this can be an accurate approach though that is not always the case. From Part III, we learned that the ability to model chaotic behavior is critical to the success of being able to accurately predict demand forecasting because, as it is well known, weather plays a major role in water demand and weather behaves highly chaotically.  Chaotic behavior can be modeled using state space reconstruction (SSR) (Abarbanel 1996) and the mathematical basis for SSR was demonstrated.


As was previously explained, the location for the case study was a utility that is in the Temperate Zone near the Atlantic coast in the US. The case study period was January 1, 1994 to July 31, 2004 (3,865 days), which started with the onset of consistent data collection and archival at the water utility. It straddles five years of record drought from 1998 to 2002, and El Ninos in 1998 and 2003 that brought sustained rains. Daily weather observations were obtained from the National Oceanic and Atmospheric Administration (NOAA) National Data Center for five stations in the region around the utility’s service area. The observed variables were daily maximum and minimum ambient temperatures (Tmax, Tmin) and daily cumulative precipitation (Precip). The original intent was to evaluate how spatial as well as temporal weather variability affect demand; however, problems such as gaps and baseline shifts were found in all of the data. Data from and outlying location (hereafter referred to as “Outlying”) was least afflicted and deemed the only data set immediately useful[1]. The study evaluated forecasting 7, 30, and 90 to 180 days into the future. Because of space limitations in the blog, only the 90 to 180-day forecast modeling is described.


Part IV of this series begins with Figure 6, showing cross correlation plots of QN versus TmaxN and PrecipN, which are created by time-stepping one time series relative to the other and calculating the Pearson coefficient R at each time step. The (+/-) signs of R confirm that QN increases with ambient TmaxN and decreases with PrecipN (sprinklers are on when it’s hot and dry). With respect to TmaxN, R2 grows from a significant 0.17 at tp=0 to a large peak of 0.32 at tp=140 days. With respect to PrecipN, R2 grows from an insignificant 0.01 at tp=0 to a large peak of 0.28 at tp=187 days. These results show that large changes in prevailing weather patterns affect demand for a long time and that forecasting to a horizon or up to six months is possible.

“Prediction Sub-Model 2” predicts the normalized demand QN (QNP) from inputs for Baseline, TmaxS, TmaxN, and PrecipN[2]. The TmaxS inputs provide time-of-year information that interacts with the normalized weather variables to boost prediction accuracy. The prediction sub-model outputs QP1 and QNP are summed to produce the final prediction QP2.  A model can be configured to forecast a parameter at a future time by shifting tpi and “retraining” the affected ANN sub-models. Here, only Prediction Sub-Model 2 uses real-time inputs and has to be retrained.

Figures 7 and 8 show the predictions QNP and QP2 at tp= 0, 90, 120, 150, 180 days respectively from 2000 onward. The ANNs readily represent the conditions that led to the significant demand decline in 2003. The R2 and RMSE (root mean squared error) for QNP were 0.71 and 1.1 MGD (million gallons/ day) for 2000 onward, which improved to 0.95 and 0.36 MGD for QP2. The errors of the forecast models (tp > 0) at the beginning of the time series and adjacent to the large data gap indicate unlearned behavior because of missing data. The expected per-day forecast errors for tp=0, 90, 180 days are 0.36, 0.56, 0.60 MGD respectively. Over 90 and 180 days, the expected cumulative errors are ±50 and ±108 MG.

[1] Station-to-station correlations for each variable type were found to be high, indicating that errors and gaps at one station could be reconstructed by correlating to time series at other stations.

[2] k=5 input variables: Baseline using tp=0, dM=1; TmaxS using tp=0, td=90, dM=2; 30-day MWA TmaxN (all other inputs use 90-day MWA) using tp=0, dM=1; TmaxN using tp=30, td=90, dM=2; and PrecipN using tp=190, dM=1.

Fig 6. Cross correlation plots of QN versus TmaxN and PrecipN.


For comparison, given an average flow of 17 MGD, the demand over 90 and 180 days would be 1,530 and 3,060 MG respectively, corresponding to error percents of 3.3% and 3.5% respectively. These high accuracies result from the seasonally periodic nature of demand, the very high correlation between demand and weather variables, and the use of ANNs in an SSR framework that can accommodate highly complex, nonlinear and chaotic variable interactions.


Fig 7. Measured, predicted, and forecast QN. At d = 0, 90, 120, 150, 180 days, R2 = 0.90, 0.77, 0.79, 0.80, 0.72 and RMSE = 0.36, 0.56, 0.52, 0.51, 0.60 MGD respectively.

Fig 8. Measured, predicted, and forecast QP2. At td = 0, 90, 120, 150, 180 days, R2 = 0.95, 0.87, 0.83, 0.84, and 0.83 respectively.


Figure 9 shows the “response surface” (Roehl, Conrads and Cook 2003) of Prediction Sub-Model 2, which reveals the functional form of its mapping of TmaxN at τp=30 days and PrecipN at τp=190 days to QNP. Note that the response surface is non-linear (non-planar), and remember that this sub-model has seven inputs10. All but the variables selected for the horizontal axes, here TmaxN and PrecipN, are “unshown”. The values to which unshown variables are set affects the surface’s shape. Here they correspond to mid-summer: TmaxS90 at τp=0 and 90 days = 78 and 90 F respectively; Baseline = 13 MGD; 30-day MWA TmaxN = 0; and TmaxN at τp=120 days = 0. The vertical range of QNP » 4 MGD. The horizontal plane at QNP = 0.0 marks a boundary between above and below average demand.

Fig 9. ANN Model

In summary, the response surface shows that:

a) demand QNP is greatest when conditions are warmest and hottest;

b) demand is lowest when conditions are coolest and wettest;

c) demand is less sensitive to TmaxN when PrecipN is low; and

d) demand is less sensitive to PrecipN when TmaxN is high.

There are no real surprises regarding the response surface, but reassuring to see that the functional relationships follow intuitive expectations and are now quantified.




This study determined that more than 90% of demand variability is attributable to the Baseline and weather, which are already being monitored. Prediction accuracy was significantly improved by augmenting the Baseline and standard weather variables with real-time weather inputs. The possibility of accurately forecasting demand three to six months into the future is supported by initially high model performance statistics that decline slowly as the prediction date is push forward into the future. This would allow a utility to forecast through late winter and spring to predict demand for the warm weather months having the greatest year-to-year variability. Further, the model’s  “What ifs?” capability, e.g., “What if it stops raining…or starts raining…significantly?” makes it a powerful addition to a utility’s risk management strategy.





Abarbanel, H.D.I. 1996. Analysis of Observed Chaotic Data, New York, 4-12, 39.: Springer-Verlag New York, Inc.

Ballard, R. 2003. “Forecasting with Neural Networks – A Review,” National Social Science J., Feb. 24, 2003.

Conrads, P.A. and E.A. Roehl. 2004. Integration of Data Mining Techniques with Mechanistic Models to Determine the Impacts of Non-Point Source Loading on Dissolved Oxygen in Tidal Waters, In Proc. South Carolina Environmental Conference, Myrtle Beach, March 2004.: SC-AWWA.

Charytoniuk, W., Box, E.D., Lee, W.J., Chen, M.S., Kotas, P., and P. Van Olinda. 2000. Neural-Network-Based Demand Forecasting in a Deregulated Environment, In IEEE Transactions on Industry Applications, 36(3).

Devine, T.W., Roehl, E.A., and J.B. Busby. 2003. Virtual Sensors – Cost Effective Monitoring, In Proc. Air and Waste Management Association Annual Conference, June 2003.

Jensen, B.A. 1994. Expert Systems – Neural Networks, Instrument Engineers’ Handbook Third Edition, Radnor PA.: Chilton.

Roehl, E.A., Conrads, P.A., and T.A. Roehl. 2000. Real-Time Control of the Salt Front in a Complex, Tidally Affected River Basin, Proceedings of the Artificial Neural Networks in Engineering Conference, St. Louis, 947-954.

Roehl, E.A., Conrads, P.A., and J.B. Cook. 2003. Discussion of Using Complex Permittivity and Artificial Neural Networks for Contaminant Prediction, J. Env. Engineering., Nov. 2003, pp. 1069-1071.: ASCE.




About noahmorgenstern

Entrepreneurial Warlock, mCouponing evangelist, NFC Rabbi, Innovation and Business Intelligence Imam, Secular World Shaker, and General All Around Good Guy


No comments yet.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: