WaterFront Followers,

Let us briefly review what we learned from Part 2 based on the work of John Cook and Edwin Roehl of ADMI. It is well known that the most common empirical approach to demand forecasting is ordinary least squares (OLS), which relates variables using straight lines. Now, over long time periods, this can be an accurate approach though that is not always the case. From Part 2, we learned that the ability to model chaotic behavior is critical to the success of being able to accurately predict demand forecasting because, as it is well known, weather plays a major role in water demand and weather behaves highly chaotically. Chaotic behavior can be modeled using state space reconstruction (SSR) (Abarbanel 1996) and the mathematical basis for SSR was demonstrated. Let us now consider a practical case study.

**CASE STUDY**

** **The location for the case study was a utility that is in the Temperate Zone near the Atlantic coast in the US. The case study period was January 1, 1994 to July 31, 2004 (3,865 days), which started with the onset of consistent data collection and archival at the water utility. It straddles five years of record drought from 1998 to 2002, and El Ninos in 1998 and 2003 that brought sustained rains. Daily weather observations were obtained from the National Oceanic and Atmospheric Administration (NOAA) National Data Center for five stations in the region around the utility’s service area. The observed variables were daily maximum and minimum ambient temperatures (T_{max}, T_{min}) and daily cumulative precipitation (Precip). The original intent was to evaluate how spatial as well as temporal weather variability affect demand; however, problems such as gaps and baseline shifts were found in all of the data. Data from and outlying location (hereafter referred to as “Outlying”) was least afflicted and deemed the only data set immediately useful[1]. The study evaluated forecasting 7, 30, and 90 to 180 days into the future. Because of space limitations in the blog, only the 90 to 180-day forecast modeling is described.

Correlation analysis quickly revealed that demand is highly correlated to T_{max}, somewhat less to Precip, and that T_{min} contributed very little information not accounted for in other variables. Figure 1 shows actual and *“standard”* 90-day moving window averages (MWA) T_{max} (T_{max}S) and Precip (PrecipS). T_{max}S and PrecipS are calculated by computing an average value for each variable for each day of the year. Some summers and winters are warmer than others, and T_{max} is much less variable than Precip. Most years exhibit spring rains, followed by a dryer period, and then rainfall peaks in the latter half of the summer. Some years receive more rain than others. High winter rains of the El Ninos of 1998 and 2003 are apparent. In most years Precip peaks one to two months after T_{max} peaks. This is not true for 2003 when Precip rose unusually early. Also note that 2003 was cooler than the previous five years. The early Precip and lower T_{max} after five years of drought probably led customers to irrigate less in 2003.

[1] Station-to-station correlations for each variable type were found to be high, indicating that errors and gaps at one station could be reconstructed by correlating to time series at other stations.

Figure 2 shows the 30-day MWA demand (Q). It shows dramatic growth between 1998 and 2000, the shape of the annual demand curve changes from year to year, and the increased peak-to-trough difference from 1999 onwards indicates increased demand for seasonal irrigation. The 2003 Q is also seen to be significantly lower than in neighboring years. Overlain onto Q are options for a “baseline demand” (Baseline), which is the minimum Q for the calendar year that occurs in late winter. The black line connects each year’s minimum while the gray line (selected for this study) only changes when a new “high low” is observed (no backsliding).

Figure 3 shows that the forecasting model would be a “super-model” comprised of four “sub-models”, each having a specific purpose. Figure 4 shows demand predictions QP1 made by an ANN[1] (Prediction Sub-Model-1) using only the Baseline, T_{max}S, and PrecipS as inputs[2]. We will need to apply the equations from state space reconstruction theory as explained in Part 2. For a multivariate process of k independent variables:

Y(t) = {[x_{1}(t), x_{1}(t – t_{d1}),…, x_{1}(t – (d_{L}_{1} – 1)t_{d1})],….,[x_{k}(t), x_{k}(t – t_{dk}),…, x_{k}(t – (d_{L}_{k} – 1)t_{dk})]} eq. 1

where each x(t,t_{di}) represents a different dimension in state space, and therefore a different element in a state vector. Values of d_{L} and t_{d} are estimated analytically or experimentally from the data. The mathematical formulations for models are derived from those for state vectors. To predict a dependent variable of interest y(t) from prior measurements (a.k.a. forecasting) of k independent variables (Roehl et al. 2000):

y(t) = F{[x_{1}(t – t_{p1}), x_{1}(t – t_{p1 }– t_{d1}),…, x_{1}(t – t_{p1 }– (d_{M}_{1} – 1)t_{d1})],

….,[x_{k}(t – t_{pk}), x_{k}(t – t_{pk }– t_{dk}),…, x_{k}(t – t_{pk }– (d_{M}_{k} – 1)t_{dk})]} eq. 2

where F is an empirical function such as an ANN, each x(t,t_{pi},t_{di}) is a different input to F, and t_{pi} is yet another time delay. For each variable, t_{pi} is either: 1) constrained to the time delay at which an input variable becomes uncorrelated to all other inputs, but can still provide useful information about y(t); or, 2) constrained to the time delay of the most recent available measurement of x_{i}; or, 3) the time delay at which an input variable is most highly correlated to y(t). Here, the state space local dimension d_{L} of Equation 1 is replaced with a model input variable dimension d_{M}, which is determined experimentally. d_{M} £ d_{L}, and tends to decrease with increasing k.

While the model is statistically accurate, QP1 does not track measured Q well through the decline in 2003. ** **Shown are interpolated yearly minimum demand, and interpolated yearly minimum demand without backslidingblack line is the model’s prediction error, which is also the “normalized” demand (QN) after having Baseline, T_{max}S and PrecipS “components” largely removed. Normalizing variables tends to amplify untypical behaviors for study. Figure 5 shows Q, and T_{max} (T_{max}N) and Precip (PrecipN) after being normalized using Normalization Sub-Models 1 and 2 respectively. Normalization Sub-Model 1[3] uses only T_{max}S as an input. Normalization Sub-Model 2[4] also uses T_{max}S as an input, but at two time delays, effectively decorrelating T_{max}N and PrecipN. Annual peak Qs are marked with dotted lines for comparison by inspection to low T_{max}N and PrecipN. The difficulty in ascertaining clear relationships between these variables underscores the need for creating a process model, which will be demonstrated in the next article on Forecasting Demand.

[1] Referring to Equation 2, k=3 input variables: Baseline, T_{max}S, and PrecipS. All t_{p}=0, all d_{M}=1.

[2] The models and plots in this paper were generated using the iQuest^{TM} data mining software.

[3] k=1, t_{p}=0, d_{M}=1

[4] k=1, t_{p}=0, t_{d}=90, d_{M}=2

—————————————————–

[1] k=1, t_{p}=0, d_{M}=1

[1] k=1, t_{p}=0, t_{d}=90, d_{M}=2

—————————————————–

**LITERATURE CITED FOR THIS SERIES**

Abarbanel, H.D.I. 1996. *Analysis of Observed Chaotic Data*, , New York, 4-12, 39.: Springer-Verlag New York, Inc.

Ballard, R. 2003. “Forecasting with Neural Networks – A Review,” *National Social Science J*., Feb. 24, 2003.

Conrads, P.A. and E.A. Roehl. 2004. Integration of Data Mining Techniques with Mechanistic Models to Determine the Impacts of Non-Point Source Loading on Dissolved Oxygen in Tidal Waters, In *Proc. South Carolina Environmental Conference*, Myrtle Beach, March 2004.: SC-AWWA.

Charytoniuk, W., Box, E.D., Lee, W.J., Chen, M.S., Kotas, P., and P. Van Olinda. 2000. Neural-Network-Based Demand Forecasting in a Deregulated Environment, In *IEEE Transactions on Industry Applications*, 36(3).

Devine, T.W., Roehl, E.A., and J.B. Busby. 2003. Virtual Sensors – Cost Effective Monitoring, In Proc. *Air and Waste Management Association Annual Conference*, June 2003.

Jensen, B.A. 1994. *Expert Systems – Neural Networks, Instrument Engineers’ Handbook Third Edition*, Radnor PA.: Chilton.

Roehl, E.A., Conrads, P.A., and T.A. Roehl. 2000. Real-Time Control of the Salt Front in a Complex, Tidally Affected River Basin, Proceedings of the Artificial Neural Networks in Engineering Conference, St. Louis, 947-954.

Roehl, E.A., Conrads, P.A., and J.B. Cook. 2003. Discussion of Using Complex Permittivity and Artificial Neural Networks for Contaminant Prediction, *J. Env. Engineering.*, Nov. 2003, pp. 1069-1071.: ASCE.

## Discussion

## No comments yet.