Case Studies, Data to Insight, Forecasting

Forecasting Water Demands: Part III

WaterFront Followers,

Let us briefly review what we learned from Part 2 based on the work of John Cook and Edwin Roehl of ADMI. It is well known that the most common empirical approach to demand forecasting is ordinary least squares (OLS), which relates variables using straight lines. Now, over long time periods, this can be an accurate approach though that is not always the case. From Part 2, we learned that the ability to model chaotic behavior is critical to the success of being able to accurately predict demand forecasting because, as it is well known, weather plays a major role in water demand and weather behaves highly chaotically.  Chaotic behavior can be modeled using state space reconstruction (SSR) (Abarbanel 1996) and the mathematical basis for SSR was demonstrated.  Let us now consider a practical case study.


 The location for the case study was a utility that is in the Temperate Zone near the Atlantic coast in the US. The case study period was January 1, 1994 to July 31, 2004 (3,865 days), which started with the onset of consistent data collection and archival at the water utility. It straddles five years of record drought from 1998 to 2002, and El Ninos in 1998 and 2003 that brought sustained rains. Daily weather observations were obtained from the National Oceanic and Atmospheric Administration (NOAA) National Data Center for five stations in the region around the utility’s service area. The observed variables were daily maximum and minimum ambient temperatures (Tmax, Tmin) and daily cumulative precipitation (Precip). The original intent was to evaluate how spatial as well as temporal weather variability affect demand; however, problems such as gaps and baseline shifts were found in all of the data. Data from and outlying location (hereafter referred to as “Outlying”) was least afflicted and deemed the only data set immediately useful[1]. The study evaluated forecasting 7, 30, and 90 to 180 days into the future. Because of space limitations in the blog, only the 90 to 180-day forecast modeling is described.

Correlation analysis quickly revealed that demand is highly correlated to Tmax, somewhat less to Precip, and that Tmin contributed very little information not accounted for in other variables. Figure 1 shows actual and “standard” 90-day moving window averages (MWA) Tmax (TmaxS) and Precip (PrecipS). TmaxS and PrecipS are calculated by computing an average value for each variable for each day of the year. Some summers and winters are warmer than others, and Tmax is much less variable than Precip. Most years exhibit spring rains, followed by a dryer period, and then rainfall peaks in the latter half of the summer. Some years receive more rain than others. High winter rains of the El Ninos of 1998 and 2003 are apparent. In most years Precip peaks one to two months after Tmax peaks. This is not true for 2003 when Precip rose unusually early. Also note that 2003 was cooler than the previous five years. The early Precip and lower Tmax after five years of drought probably led customers to irrigate less in 2003.

[1] Station-to-station correlations for each variable type were found to be high, indicating that errors and gaps at one station could be reconstructed by correlating to time series at other stations.

Figure 1. 90-day MWA Standards TmaxS and PrecipS, and Actual of Tmax and Precip. R2 for TmaxS vs. PrecipS = 0.71.

Figure 2 shows the 30-day MWA demand (Q). It shows dramatic growth between 1998 and 2000, the shape of the annual demand curve changes from year to year, and the increased peak-to-trough difference from 1999 onwards indicates increased demand for seasonal irrigation. The 2003 Q is also seen to be significantly lower than in neighboring years. Overlain onto Q are options for a “baseline demand” (Baseline), which is the minimum Q for the calendar year that occurs in late winter. The black line connects each year’s minimum while the gray line (selected for this study) only changes when a new “high low” is observed (no backsliding).

Figure 2. 30-day MWA dem.

Figure 3: Super-model architecture used to predict demand QP2.

Figure 3 shows that the forecasting model would be a “super-model” comprised of four “sub-models”, each having a specific purpose. Figure 4 shows demand predictions QP1 made by an ANN[1] (Prediction Sub-Model-1) using only the Baseline, TmaxS, and PrecipS as inputs[2]. We will need to apply the equations from state space reconstruction theory as explained in Part 2.  For a multivariate process of k independent variables:

Y(t) = {[x1(t), x1(t – td1),…, x1(t – (dL1 – 1)td1)],….,[xk(t), xk(t – tdk),…, xk(t – (dLk – 1)tdk)]}         eq. 1

where each x(t,tdi) represents a different dimension in state space, and therefore a different element in a state vector. Values of dL and td are estimated analytically or experimentally from the data. The mathematical formulations for models are derived from those for state vectors. To predict a dependent variable of interest y(t) from prior measurements (a.k.a. forecasting) of k independent variables (Roehl et al. 2000):

y(t) = F{[x1(t – tp1), x1(t – tp1 – td1),…, x1(t – tp1 – (dM1 – 1)td1)],

….,[xk(t – tpk), xk(t – tpk – tdk),…, xk(t – tpk – (dMk – 1)tdk)]}                                eq. 2

where F is an empirical function such as an ANN, each x(t,tpi,tdi) is a different input to F, and tpi is yet another time delay. For each variable, tpi is either: 1) constrained to the time delay at which an input variable becomes uncorrelated to all other inputs, but can still provide useful information about y(t); or, 2) constrained to the time delay of the most recent available measurement of xi; or, 3) the time delay at which an input variable is most highly correlated to y(t). Here, the state space local dimension dL of Equation 1 is replaced with a model input variable dimension dM, which is determined experimentally. dM £ dL, and tends to decrease with increasing k.


While the model is statistically accurate, QP1 does not track measured Q well through the decline in 2003.  Shown are interpolated yearly minimum demand, and interpolated yearly minimum demand without backslidingblack line is the model’s prediction error, which is also the “normalized” demand (QN) after having Baseline, TmaxS and PrecipS “components” largely removed. Normalizing variables tends to amplify untypical behaviors for study. Figure 5 shows Q, and Tmax (TmaxN) and Precip (PrecipN) after being normalized using Normalization Sub-Models 1 and 2 respectively. Normalization Sub-Model 1[3] uses only TmaxS as an input. Normalization Sub-Model 2[4] also uses TmaxS as an input, but at two time delays, effectively decorrelating TmaxN and PrecipN. Annual peak Qs are marked with dotted lines for comparison by inspection to low TmaxN and PrecipN. The difficulty in ascertaining clear relationships between these variables underscores the need for creating a process model, which will be demonstrated in the next article on Forecasting Demand.

[1] Referring to Equation 2, k=3 input variables: Baseline, TmaxS, and PrecipS. All tp=0, all dM=1.

[2] The models and plots in this paper were generated using the iQuestTM data mining software.

[3] k=1, tp=0, dM=1

[4] k=1, tp=0, td=90, dM=2

Figure 4: Measured, Predicted, and Normalized 90-day MWA Demand from 1994. Measured versus Predicted R2 = 0.96 and root man square error (RMSE) = 0.95 MGD for the entire period, diminishing to 0.71 and 1.1 MGD for 2000 onward.

Figure 5: TmaxN, PrecipN, and Q. R2 for TmaxN vs. and PrecipN is 0.098 (previously 0.20 for Tmax vs. Precip).


[1] k=1, tp=0, dM=1

[1] k=1, tp=0, td=90, dM=2



Abarbanel, H.D.I. 1996. Analysis of Observed Chaotic Data, , New York, 4-12, 39.: Springer-Verlag New York, Inc.

Ballard, R. 2003. “Forecasting with Neural Networks – A Review,” National Social Science J., Feb. 24, 2003.

Conrads, P.A. and E.A. Roehl. 2004. Integration of Data Mining Techniques with Mechanistic Models to Determine the Impacts of Non-Point Source Loading on Dissolved Oxygen in Tidal Waters, In Proc. South Carolina Environmental Conference, Myrtle Beach, March 2004.: SC-AWWA.

Charytoniuk, W., Box, E.D., Lee, W.J., Chen, M.S., Kotas, P., and P. Van Olinda. 2000. Neural-Network-Based Demand Forecasting in a Deregulated Environment, In IEEE Transactions on Industry Applications, 36(3).

Devine, T.W., Roehl, E.A., and J.B. Busby. 2003. Virtual Sensors – Cost Effective Monitoring, In Proc. Air and Waste Management Association Annual Conference, June 2003.

Jensen, B.A. 1994. Expert Systems – Neural Networks, Instrument Engineers’ Handbook Third Edition, Radnor PA.: Chilton.

Roehl, E.A., Conrads, P.A., and T.A. Roehl. 2000. Real-Time Control of the Salt Front in a Complex, Tidally Affected River Basin, Proceedings of the Artificial Neural Networks in Engineering Conference, St. Louis, 947-954.

Roehl, E.A., Conrads, P.A., and J.B. Cook. 2003. Discussion of Using Complex Permittivity and Artificial Neural Networks for Contaminant Prediction, J. Env. Engineering., Nov. 2003, pp. 1069-1071.: ASCE.




About noahmorgenstern

Entrepreneurial Warlock, mCouponing evangelist, NFC Rabbi, Innovation and Business Intelligence Imam, Secular World Shaker, and General All Around Good Guy


No comments yet.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: