data analytics, Data to Insight, Process Modeling

Using Data to Build Process Models: Part I – Introduction

We have a great new series this week from guest bloggers John B. Cook and Edwin Roehl of ADMI on “Using Data to Build Process Models.” As always we’d love to hear your comments and feedback.

Difference between Data and Information

The power behind collecting data is not the ability to collect and store it; the power behind data collection is the ability to extract valuable information contained within it. Many utilities are “data rich” but “information poor.”  Or, stated differently,utilities find themselves collecting rivers of data, while very little synthesis of the data is performed.  Without data synthesis, valuable information necessary to enhance decision-making and improve processes is not made available to be converted to knowledge.  However, with the newer information technologies presently available, such as real-time data collection and data mining, this need not be the case.  Most utilities already have a wealth of data ready to be transformed into a rich source of information.  Moreover, data collection is becoming easier with more robust and a greater variety of sensors with which to collect data.

Data mining has been defined as “the search for valuable information in large volumes of data. It is a cooperative effort of humans and computers”.  The purpose of this introductory paper is to describe some of the techniques which are available to extract information from data to build process models and to demonstrate how large amounts of data have been evaluated using sophisticated data mining techniques that have enabled utility managers and other decision-makers to make prudent decisions involving reduced risk.  It should be noted that there is a strong mathematical foundation for information theory, which we shall forego for the sake of brevity.

What Can We Know?

There are many natural environmental systems, treatment plants, and customer service operations that have accumulated large databases.  However, given the limited resources and greater demands being placed upon utilities, decision-making becomes more critical than ever; hence, there is a need to transform these databases so as to leverage the information hidden in them.  While these databases are generally underutilized, this need not be the case as both software and computational power exist to determine what is capable of being known about any process.   When we speak of “process” we could be thinking of either a natural system, such as the movement of groundwater, or a water treatment plant or distribution system.  But to know how a process works requires information extracted from collected data.  By itself, data is simply a record of one or more physical processes such as a plot of temperature versus time, or stream flow versus time, or average water consumption per customer per day.  In contrast, information in this context tells us about critical relationships among variables, probabilities of event occurrence, and how variables evolve over time—how fast and in what direction.

The process of extracting information from data is called “data mining”.  Data mining is a powerful analytical approach which consists of synthesizing information from large databases used in fields as diverse as financial services, banking, customer service, biology, manufacturing and e-commerce.  Data mining is also used in environmental fields such as water and wastewater treatment, water quality modeling, hydrology and demand forecasting.  Over the course of this series, we shall examine how data can be used to build various process models such as:

  1. Minimizing DBP formation
  2. Optimizing coagulation
  3. Demand forecasting
  4. Optimizing filtration performance
  5. Determining groundwater movement
  6. Developing regional TMDLs

Data mining techniques include advanced multivariate analysis and modeling, multi-dimensional visualization, machine learning, artificial intelligence and artificial neural networks, a type of AI.   By building process models, some of the hypothetical questions which data mining could be used to answer include, but are not limited to:

  • Why does a wastewater or water treatment plant process run better at some times than at others?  At times, the answer is obvious; at other times, no logical explanation is apparent.
  • What are the critical factors to ensure that a wastewater treatment plant (WWTP) is able to nitrify or remove nutrients consistently?
  • How can water use demand be accurately predicted, and what will happen if certain conservation measures are taken, if the weather changes, or if there is a change in the rate structure, population growth, etc.?
  • What is the optimal operational and backwash mode for filtration?
  • How can the coagulation process be optimized to remove TOC?
  • What is the optimal use of limited water resources in a drainage basin?
  • How can costs be reduced without adversely impacting service levels?
  • How can the impacts from point source versus non-point source pollution be accurately separated?
  • What can be done to reduce disinfection by-products without major capital expenditure?
  • How can customer service records be used to enhance service levels?
  • How can data mining techniques be used to help develop an asset management program or program for predicting critical failures?

The above examples are a minor sampling of all possible uses of databases to extract critical information in helping utility managers, engineers, resource managers, operators and financial analysts to make sense of what is happening on a larger scale and longer timeframe.  While the above sample illustrations are different, they have much in common. First, conclusions are not apparent when first beginning each project, that is, collecting large quantities of data and looking at simple trends will yield a limited amount of information.  Second, the information to make decisions is extracted from large databases covering a variety of conditions.  And third, the information extracted is used to build a process model and/or decision support system (DSS), all of which would enable decisions to be made accurately, rapidly, and reflecting the wisdom of sound operational history.


About noahmorgenstern

Entrepreneurial Warlock, mCouponing evangelist, NFC Rabbi, Innovation and Business Intelligence Imam, Secular World Shaker, and General All Around Good Guy


One thought on “Using Data to Build Process Models: Part I – Introduction

  1. You’ve made a subjective statement. Data mining in general
    is basically just the? uncovering of patterns from a set of data.
    Definitions are what are given in this video you made a point
    to submit opinion about how or why people should data mine…
    it should be Better Knowledge of Customer Data Mining Services.

    Posted by Contact | June 21, 2012, 10:35 am

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: