Universidade Federal de Santa Maria

Ci. e Nat., Santa Maria, v.42, e110, 2020

DOI:10.5902/2179460X33910

ISSN 2179-460X

Received: 25/07/18  Accepted: 20/12/19  Published: 23/12/20

Statistics

Prediction of water consumption by consumer categories: a case study

Predição do consumo de água por categoria de consumo: um estudo de caso

Jorge Alberto Achcar I

Marcos Valerio Araujo II

Claudio Luis Piratelli III

Ricardo Puziol de Oliveira IV

I   Universidade de São Paulo, Ribeirão Preto, SP - achcar@fmrp.usp.br

II  Universidade de Araraquara, Araraquara, SP - marcos.dearaujo@aegea.com.br

III Universidade de Araraquara, Araraquara, SP - clpiratelli@uniara.com.br

IV Universidade de São Paulo, São Paulo, SP - rpuziol.oliveira@gmail.com

ABSTRACT

This study introduces a new Bayesian model for predicting water consumption in a medium-sized municipality in the State of São Paulo, Brazil. For the study, a stratified random sample of water consumption for consumers in different consumer categories (residential, industrial, public and commercial) is selected for 55 monthly consecutive measurements of water consumption and the proposed model is compared with some usual existing time series models (moving average models and ARIMA models) commonly used in forecasts. The Bayesian model for the consumption data assumes the presence of a random effect that captures the possible dependence between the monthly consumption for the different categories. A hierarchical Bayesian analysis is done using MCMC (Markov Chain Monte Carlo) methods to generate samples of the joint posterior distribution of interest. A detailed discussion of the results obtained is presented, showing the advantages and disadvantages of each model proposed in terms of feasibility for the municipality’s water supply company. The results of this study can be generalized to water consumption data for any municipality.

Keywords: water consumption, time series models, water consumption forecast, Bayesian model, MCMC methods

RESUMO

Este estudo introduz um novo modelo bayesiano de previsão para o consumo de água em um município de médio porte do Estado de São Paulo, Brasil. Para o estudo, foi selecionada uma amostra aleatória estraticada de consumidores classicados em diferentes categorias (residencial, industrial, pública e comercial) considerando 55 medições consecutivas mensais do consumo de água para cada consumidor. O modelo proposto é comparado com alguns modelos usuais de séries temporais (modelos de médias móveis e modelos ARIMA) comumente usados em previsões. O modelo Bayesiano para os dados de consumo pressupõe a presença de um efeito aleatório que captura a possível dependência entre o consumo mensal para as diferentes categorias. Uma análise Bayesiana hierárquica é feita usando métodos MCMC (Monte Carlo em Cadeias de Markov) para gerar amostras da distribuição a posteriori conjunta de interesse. Uma discussão detalhada dos resultados obtidos são apresentados, mostrando as vantagens e desvantagens de cada modelo proposto em termos de viabilidade para o empresa de abastecimento de água do município. Os resultados deste estudo podem ser generalizados para dados de consumo de água para qualquer outro município.

Palavras-chave: consumo de água, modelos de séries temporais, previsão do consumo de água, modelo Bayesiano, métodos MCMC

1 INTRODUCTION

Water consumption by the population is a subject of great interest in the area of basic sanitation, given the huge population growth on the planet and the limited water in municipal supply systems. Several factors can influence water consumption, such as the category of consumer unit (residential, commercial, industrial or public) and the socioeconomic category of consumers, among others, as well as the time of year (month) and season. The introduction of new, simple statistical modeling formulations for consumption forecasting may be of great interest to the municipal water utility managers to get important mid or long-term strategic decisions to prevent the collapse of the system.

In general, each water supply company has large temporal databases associated with each consumer unit through automated supervisory and control systems that can be used in studies leading to better statistical models and better forecasts. This study introduces a new statistical model under the Bayesian paradigm, which may be more accurate and simpler to be used by water utilities when compared to traditional time series models such as the popular moving average models or ARIMA models to predict water consumption for each consumer unit. More accurate water consumption forecasts and statistical models with simple implementation may be of great interest to water utilities if for some reason it is impossible to access the monthly water consumption of an unit for some time, or in the planning of water stock expansions normally done by water supply companies in each municipality.

The literature contains a lot of studies related to water consumption. Zhow et al. (2000, 2002) proposed a forecasting model for daily water consumption for Melbourne (Australia) based on time series, considering the effects of the following factors: trend seasonality, climatic correlation and autocorrelation.

Altunkaynak et al. (2005) introduced a fuzzy logic method for predicting future monthly water consumption in Istanbul (Turkey) as an alternative to the usage of the Markov or ARIMA models.

Ally and Wanakule (2004) presented an approach for short term forecasting of municipal water use based on a deterministic smoothing algorithm.

Balling, Gaber and Jones (2008) used a time series of monthly water use anomalies to compare with anomalies in temperature, precipitation and the hydrological index. More than 70% of monthly variability in water supply was explained by atmospheric conditions.

Silva et al. (2008) studied the degree of participation of some socio-economic and climatic variables from a multiple linear regression model in the determination of per capita water consumption and they proposed a statistical model that projects the demand for water in the city of Cuiabá, Mato Grosso, Brazil. The results indicated the non-intervention of the climatic variables in the phenomenon. Contributing to the definition of consumption were the socio-economic class variable and the per capita consumption of electric energy. Feil and Haetinger (2014) studied the water supply system in the city of Lajeado, Rio Grande do Sul, Brazil from 2000 to 2007. The study was related to the prediction of water consumption by the population of Lajeado between 2008 and 2032, through a mathematical model used by the water utility to verify the likelihood of a water shortage. They observed from the results of their study that the variables that intervene in per capita consumption correspond to the relative air humidity and total population. The future forecast detected an extrapolation of the maximum flow rate of the treated water produced from 2026.

Dias et al. (2010) evaluated the impact that a change in household income had on the consumption of treated water provided by the concessionaire (in this case, the Minas Gerais Sanitation Company - Copasa) in the city of Belo Horizonte, Brazil, for 35 months covering the period from August 2003 to June 2006. For this purpose, data from the Brazilian Geography and Statistics Institute (IBGE) via its Monthly Employment Survey (PME) were used, encompassing 3,100 households and 10,200 residents, and the residential consumption of the six districts operated in by Copasa. The results showed an intrinsic relationship between per capita consumption and monthly income. Perry (1981) used a linear regression model for the next day’s forecast using temperature and past consumption data as covariates in order to optimize the use of water lifting stations and to reduce electricity costs. Kher and Shorosian (1986) compared regression models for monthly consumption considering monthly consumption history, family income, water tariffs, average monthly rainfall, average monthly temperature and the effective evaporation for each month considered as covariates. Smith (1988) developed an autoregressive model for daily prediction of consumption through a time series including some explanatory variables such as water tariffs, number of connections, socio-economic characteristics, and type of consumption (residential, commercial, industrial). Rhoades and Walski (1991) applied a multiple regression model for monthly consumption forecast considering the variables temperature, population, and precipitation for the utility planning of the city of Austin in the state of Texas, USA.

Gwaivangmin and Jiya (2017) presented a model based on an artificial neural network to predict hourly water demand in nodes for water distribution in a city in Nigeria, Africa. The results of the model were useful for supervisory control and monitoring of the water demand nodes in the city, solving the perennial problem of water scarcity.

Wong, Zhang and Chen (2010) studied the effects of five factors on daily water consumption in Hong Kong as support for an effective urban water resource. They used a series of statistical models in order to find the major factors influencing urban water use (seasonality, followed by calendar effects).

Bougadis, Adamowski and Diduch (2005) forecasted water demand in the city of Ottawa, Ontario, Canada. They studied rainfall and maximum air temperature influences on past water demand. Three different artificial neural network and regression models and seven time series models were compared. It was found that the existing infrastructure would not meet the water demand for the projected population in 2021. They also found that water demand on a weekly basis is more significantly correlated with the amount of rainfall than the frequency of rainfall.

Some recent studies on water consumption focus on statistical analyzes of spatial and demographic factors from the cities – House-Petters, Pratt and Chang (2010); Chang, Parandvash and Shandas (2010) – or behavioral aspects of residential consumers – Makki et al 2013, Willis et al 2013.

Many other papers have also been introduced to the literature related to water consumption data (Amaral and Shirota, 2000; Dias et al., 2010; Narchi, 1989; Jain and Ormsbee, 2002; Kher and Sorooshian, 1984; Maidment and Parzen, 1984; Perry, 1981; Smith, 1988; Thomas, 2000).

Some studies, such as those reported by Crommelynk et al (1992), Stark et al (2000), Jain and Ormsebee (2002), Silva (2003), and Falkemberg et al. (2003), consider prediction models for water consumption based on artificial intelligence techniques.

1.1 Data set and main goals

In this study, the data from a medium-sized city in the central region of the state of São Paulo, Brazil (Matão), which contains approximately 35,000 consumer units corresponding to 55 consecutive months (monthly data from January 2012 to July 2016) is considered as a case study. From this population, a stratified random sample (by neighborhood) of 3,000 consumers was selected. In addition to the monthly water consumption for each unit, the randomly selected data contain information on the consumer units, consumer categories, type of economy, and region of the city. In particular, we consider the statistical analysis of a time series consisting of the monthly average consumption for the period of 55 months in each one of the different categories of consumers (residential, industrial, public and commercial).

The main goals of the study are:

• The introduction of a Bayesian model in the presence of lagged effects, covariates (months and quarters) and a random factor which captures the possible dependence structure among the averages of the water consumption measurements in each month for different categories of consumers (commercial, industrial, public and residential).

• For comparative purposes, other existing time series models are also assumed, such as the popular moving average models and the ARIMA models commonly used in forecasting water consumption in different consumer segments. In this comparative study the possible advantages and disadvantages of each proposed model for the forecast of water consumption in each category of consumer are discussed.

A secondary goal of the study, considers:

• Statistical analysis to verify if there is a significant difference among the means of water consumer categories based on data analysis of the averages of consumers over the 55 months using descriptive statistics and ANOVA (analysis of variance) models.

1.2 Preliminary statistical data analysis for the consumer categories

Initially, analysis of the water consumption averages over the 55 months of follow-up in the different categories of consumers is considered. That is, the longitudinal consumption data for 55 months in each consumer unit is replaced by a single quantity, the average water consumption over the 55 months, for each one of the 3,000 water consumer units reported for the period of 55 months (from January 2012 to July 2016). From this data set, great variability among the water consumption averages (in cubic meters) is observed. The overall mean for all data (n = 3,000 observations) is equal to 16.66. The consumption means, standard deviations and median samples for each category of consumer are obtained using Minitab® software, version 16 (Table 1).

Tabela 1 - Descriptive statistics for the average water consumption by category

Category

n

Mean

S.D.

Median

Commercial

342

16.03

10.65

14.18

Industrial

9

31.62

22.89

24.95

Public

17

23.74

20.73

13.58

Residential

2632

16.65

7.80

15.25

From the results in Table 1, two categories of water consumers (industrial and public) are observed to have very high sample means (well above the overall mean) when compared to the means of the other water consumer categories (commercial and residential). Figure 1 presents the box-plots for the average consumption in the 55 months in each category of consumer.

Figura 1 - Box-plots for the average water consumption in different categories of consumers

A high degree of variability of the means in each category of consumer is observed in the box plots in Figure 1. Preliminary analysis of the data eliminates two observations (incomplete data); that is, n = 2998 average observations are considered for the statistical analysis of the data. Figure 2 shows the histograms of the average consumption in 55 months on the original scale and on the log scale. A better normality of the data in the logarithmic scale is observed.

Figura 2 - Histograms for average water consumptions in the original scale and in the logarithmic scale

1.3 Use of an ANOVA model to compare the averages of category consumers over the 55 months

From the descriptive analysis of average monthly water consumption data of the city considered in this study, great differences among the means of consumption in different categories of consumer are observed. In order to confirm the possible difference in the average monthly water consumptions, an ANOVA (analysis of variance) model is used, considering the data (average consumption) transformed to the logarithmic scale to have better normality, an assumption needed to validate the inferences.

Analysis of variance (ANOVA) is a statistical methodology to test whether a given factor has a significant effect on the dependent variable Y. Assuming µj representing the true value of the mean of the dependent variable classified at different levels of a factor, in level j, the ANOVA technique tests the hypothesis that there are no differences between the means µj, assuming that there are no differences between the variability of the observations in each group (constant variance). For more details on ANOVA, Montgomery and Runger, 2011 is recommended. Table 2 presents the ANOVA results obtained using the statistical software MINITAB® version 16, considering the data set (n = 2998 average consumer units).

Tabela 2 - ANOVA (water consumption by category-log scale)

Source

DF

SS

MS

F

p

Commercial

3

11.374

3.791

14.68

< 0.001

Public

2994

773.10

0.258

 

 

Residential

2997

784.473

 

 

 

(DF: degrees of freedom; SS: sum of squares; MS: mean square;F: statistics F of Snedecor; p: p-value)

From the results in Table 2, a significant difference between the different levels (commercial, residential, public and industrial) of consumer category (p-value to test equality of averages is less than 0.05) is observed. These results confirm the results of the preliminary data analysis in terms of identification of levels with higher or lower water consumption. The verification of the normality assumptions of the residuals and constant variance was verified via graphs of residuals not presented here for reasons of space saving

2 METHODS

For the statistical analysis of the time series over the 55 months (the main goal of the study) the averages of consumer units in each month (55 months) for each category of consumer considering the data transformed to the logarithmic scale are considered as a data set. That is, we have four time series with 55 observations in each series. Figure 3 presents the time series plots for the data

Figure 3 - Illustration of the problem and its main physical characteristics. Available from: Cavalieri et al. (2016)

2.1 Use of a Bayesian Regression Model

Since the four time series for each category of average consumer (residential, commercial, public and industrial) are measured at the same time, it is possible to have dependence among the four time series. Thus, in this section a Bayesian model with the inclusion of a random effect or latent factor that captures the possible dependence between the four series and lagged effects is proposed:

(1)

where i = 1, ... , 55; j = 1,2,3,4 (categories); Y1i = log(commerciali); Y2i = log(industriali); Y3i = log(publici); Y4i = log(residentiali); wi is a random effect that captures the possible dependence between the series on the same date; ϵji is a random error assumed to be an independent sample with a normal distribution with mean equal to zero and constant variance equal to σj2. The random factor wi is assumed to have another normal distribution with mean equal to zero and constant variance equal to σw2. The four seasons denoted by 1 (January, February, March), 2 (April, May, June), 3 (July, August, September) and 4 (October, November, December) are considered. For a hierarchical Bayesian analysis of the model (1), MCMC (Monte Carlo Markov Chains) methods (see, for example, Gelfand and Smith, 1990; Casela and George, 1992; Chib and Greenberg, 1995). OpenBugs software version 3.2.2 (Spiegelhalter et al, 2003) is used to simulate samples of the joint posterior distribution of interest

For a Bayesian analysis, prior gamma distributions G(1,1) are considered for 1/σj2, j = 1,2,3,4 and a G(0.1,0.1) prior for 1/σw2 where G(a,b) denotes a gamma distribution with mean equal to a/b and variance equal to a/b2 and normal N(0, 10) prior distributions for the regression parameters βj0, βj1, βj2, βj3, βj4 and βj5, j = 1,2,3,4. In the sample simulation from the joint posterior distribution, a “burn-in sample” size of 51,000 was discarded to eliminate the effect of the initial values in the iterative process; after this burning sample, a further 100,000 samples were generated by taking samples of 100 in 100, totaling a final sample of size 1,000 that will be used to obtain the posterior summaries of interest. The convergence of the sample simulation algorithm of the joint posterior distribution was verified via time series plots of the simulated Gibbs samples.

3 RESULTS

Table 3 shows the posterior summaries of interest. Although not all covariates indicate significant effects since the zero value is included in the 95% credible intervals for each regression parameter, the model is useful for making predictions of water consumption in the future month at specified year and season for consumers in each category.

It is important to point out that using the Bayesian model (1) we can find forecasts for water consumption in the different categories (commercial, industrial, public and residential) at any time of year with the effects of months and seasons.

Figure 4 presents the observed and fitted time series. A good fit is observed for the proposed models with the observed data. Table 4 shows the values for evaluation of the fit for the model according to the MAPE, MAD and MSD criteria (see appendix 1).

Tabela 3 - Posterior summaries for the consumption time series by category

 

Mean

SD

Low 95%

Upper 95%

β10

2.587

0.2429

2.123

3.04

β11

0.004418

0.002468

-0.0460

0.009074

β12

0.02961

0.03109

-0.0311

0.08895

β13

0.02704

0.1231

-0.2002

0.2689

β14

-0.0008

0.1229

-0.2408

0.2379

β15

-0.03367

0.09228

-0.2165

0.1460

β20

3.308

0.2837

2.746

3.862

β21

-0.00161

0.002658

-0.00669

0.003496

β22

0.0393

0.0363

-0.03072

0.1134

β23

0.1142

0.1040

-0.08761

0.3230

β24

0.02778

0.1107

-0.1814

0.2521

β25

-0.1236

0.08162

-0.2882

0.03511

β30

3.046

0.2823

2.5140

3.6040

β31

-0.00189

0.002677

-0.00717

0.003391

β32

0.03399

0.0369

-0.03945

0.1068

β33

0.06106

0.1112

-0.1531

0.2882

β34

0.01865

0.1088

-0.1962

0.2407

β35

-0.05838

0.08668

-0.2309

0.1113

β40

2.565

0.2399

2.085

3.035

β41

0.004087

0.002482

-0.0570

0.009102

β42

0.03387

0.03252

-0.03113

0.09801

β43

0.02329

0.1266

-0.218

0.2743

β44

0.01928

0.1252

-0.2223

0.2590

β45

-0.02654

0.09492

-0.2077

0.1629

σ2w

0.01112

0.04117

0.01996

0.00679

σ21

0.04732

0.23969

0.07300

0.03289

σ22

0.00499

0.33795

0.11390

0.04766

σ23

0.07674

0.37119

0.11994

0.05321

σ24

0.04842

0.23894

0.07616

0.03402

Tabela 4 - MAPE, MAD and MSD values (Bayesian model)

 

MAPE

MAD

MSD

Commercial

3.9839

0.6236

0.6000

Industrial

13.5190

4.3167

31.9516

Public

14.6645

3.4846

22.7087

Residential

4.5658

0.7322

0.9596

Figure 4 - Observed and fitted time series (Bayesian model)

4 COMPARISON WITH OTHER PREDICTION MODELS

Water supply companies normally use simple independent time series models such as the moving average models or more sophisticated models like ARIMA (Autoregressive Integrated Moving Average) models also known as Box and Jenkins models (1994) to forecast water consumption in the upcoming measurements in each category of consumer.

4.1 Use of moving average models

As a first alternative for the use of the proposed Bayesian model (1), a time series model given by moving averages is used in forecasting water consumption (see, for example, Box et. al, 1994; or Morettin and Toloi, 1987). The moving average model considers a stationary and locally constant time series Z1, Z2, ..., Zn, composed of its level and additional noise defined by,

(2)

where t = 1,2, ..., n; E(at) = 0; var(at) = σ2a and µt is an unknown parameter that varies over time. The moving average technique consists of calculating the arithmetic mean of the most recent k observations, that is,

(3)

The length of the mean is given by k. Thus, Mt is an estimate of de µt that does not take into account the oldest observations, only the last k observations where in each period the oldest observation is replaced by the most recent one, and a new average is calculated.

The forecast of future values is given by the last calculated moving average, that is

(4)

or,

(5)

for all h > 0. It is observed that the above equation corrects the prediction of Zt+h at every instant; that is, with each new observation in the series, Zt+h is updated. Assuming that the noise at a has a normal distribution with mean equals to zero and variance equals to σ2a implies that the forecast  has a normal distribution with mean equal to µt and variance equal to σ2a / k. Therefore, a confidence interval for µt with a confidence coefficient equal to 100(1 − α)% is given by,

(6)

where zα/2 is the quantile of a standard normal distribution.

Figure 5 presents the observed and fitted time series by moving averages for the different levels of the category factor considering a length of 3 months. From the fitted moving average models, we find the average forecast values considering a length of 3 months for the next month (56th month) and also 95% confidence intervals for the average water consumption (use of the MINITAB® Statistical Software, version 16) in each consumer category (see Table 5).

Table 5 - Forecast for average consumption of water by category of consumer (moving average models)

 

Forecast

MAPE

MAD

MSD

Commercial

17.2824

2.7707

0.4420

0.3655

Industrial

32.8148 

10.9492

3.5031

22.0466

Public

24.1569

10.0761

2.4247

11.6008

Residential

18.1097

3.7239

0.6083

0.75049

The prediction values can be used to determine water consumption for each type of consumer for the next month. This, despite being a very simple model for predicting water consumption, may be useful for water distribution companies. The model should be updated and continually remade with the information from the previous three months.

4.2 Use of ARIMA (Autoregressive Integrated Moving Average) Models

A class of models widely used for time series, especially in the area of finance, is the ARIMA (Autoregressive Integrated Moving Average) model class, usually known as the Box-Jenkins method (Box et al., 1994), which is accurate for short-term forecast, but less so with long-term forecasting. The analysis of a time series in the time domain is performed by a parameter known as the serial correlation coefficient, or the autocorrelation coefficient. This parameter indicates the dependence on successive values of a time series.

Most prediction problems involve the use of time series data. Montgomery et al. (2008) suggest that forecasting problems are often classified as short-term, medium-term and long-term.

The use of available observations at time t of a time series used for forecasting a value at some future time is usually considered as a basis for economic and business planning, production planning, inventory control and production, and process control and optimization (Box et al., 1994).

Figure 5 - Observed and fitted time series for mean water consumption for each category of consumer (moving average model)

Generally, predictions are made at time t, taking the current month Yt and the previous months Y1, Y2, ..., Yt−1, to predict future values given by Ft+1, Ft+2, ..., Ft+m. Stochastic modeling of hydrological time series has been widely used for the management of water resources systems, such as reservoir design and the occurrence of future hydrological events. For example, stochastic models are used to generate synthetic series of water supply that may occur in the future and are used to estimate the probability distribution of key decision-making parameters related to storage. In addition, stochastic models can be used for water supplies and demands days, weeks, months and years in advance (Fortin et al., 2004).

ARIMA modeling is essentially an exploratory oriented approach with great flexibility to assemble an appropriate model that is adapted from the structure of the data itself. The stochastic nature of the time series can be roughly modeled with the aid of the autocorrelation function and partial autocorrelation function where random variables, periodic components, cyclic patterns and serial correlation can be discovered. As a result, the predictions of the series values can easily be obtained with a high degree of precision (Ho and Xie, 1998).

The process is constructed through the identification of the model, estimation of parameters and verification for the fit of the proposed model (Ho and Xie, 1998).

The ARIMA models contain three components, namely: autoregressive components (AR), integrated components (I) and moving average components (MA). The AR part describes the relationship between current observations and past observations. The MA part represents the autocorrelation structure of the error. Component I represents the level of differentiation of the series to eliminate non-stationarity. Generally denoted by ARIMA (p, d, q) where p indicates the order of self-regression, d denotes order of differentiation, q denotes the order of the mean moving. A brief description of this class of models is given (see for example, Muhammad, 2012) as follows.

·         AR model: An AR(p) model expresses the current value of time series as a linear combination of previous p values and a white noise term (random shock). Bell (1984) expresses the current value of the time series of the AR(p) model as:

(7)

where  are parameters AR(p), at is the random shock in the normal distribution with mean zero and variance σ2a at time t, and p is the order of AR(p).

·         MA model: The MA(q) model expresses the current value of a time series as a linear combination of current noise and q previous values of white noise. The (purely) mobile mean (MA) is the model (Bell, 1984):

(8)

·         ARMA model: To increase flexibility in the construction of real time series, both autoregressive and moving average components are combined, leading to ARMA(p,q) (Bell, 1984) model:

(9)

The mixed type of series that is explained by its own lagged values and by lagged noise terms are called ARMA(p,q) models. If the process is stationary, an ARMA model can be used to represent the data. If it is not stationary, differentiation is applied to make the model stationary and this leads to the ARIMA model (Akgun, 2003).

·         ARIMA model: The first of these conditions implies that the series Yt given in (9) is stationary. In practice, the Yt series may not be stationary, but with the first Yt − Yt−1 stationary difference; if Yt − Yt−1 is not stationary, we may need to take the second difference (Yt Yt−1)˘(Yt1 Yt−2) and so on. In general, we may need to take the dth difference of Yt (although rarely d is greater than 2). Thus, we have the ARIMA(p, d, q) where d is the order of differentiation. Thus, an ARIMA(p, d, q) can be given for t = t + v by,

(10)

4.3 ARIMA models results

In this section we consider the fit for ARIMA models (10) in the four classes of consumers (residential, public, industrial and commercial) using MINITAB® software, version 16. The results presented are related to the best and most parsimonious (lower number of parameters) models. In order to verify the suitability of the models, residual graphs (normality and constant variance), ACF (autocorrelation function), PACF (partial autocorrelation function), and chi-squared hypothesis tests of Ljung-Box (1978) were considered.

·         Commercial

Assuming the ARIMA model (10) with p = 3, d = 0 and q = 1 for consumer data of the commercial category, Table 6 shows the model’s parameter estimates. In Figure 6 we have the graphs for the residuals and ACF from which it can be observed that the assumptions of the model are reasonably verified.

Table 6 - Estimators of ARIMA model parameters (commercial)

Type

Coefficient

SE

T

p

AR 1

0.1447

0.2927

0.49

0.623

AR 2

0.3168

0.2662

1.19

0.240

AR 3

0.3432

0.1394

2.46

0.017

MA 1

-0.6144

0.2995

-2.05

0.045

Constant

3.1149

0.1783

17.47

< 0.01

Mean

15.9436

0.9129

 

 

Figure 6 - Residual plots and ACF (commercial)

·         Industrial

Assuming the ARIMA model (10) with p = 3, d = 0 and q = 2 for the consumer data of the industrial category, Table 7 shows the model parameter estimates. In Figure 7, we have the graphs of the residuals and ACF. From this figure it is observed that the assumptions of the model are reasonably verified.

·         Public

Assuming the ARIMA model (10) with p = 3, d = 0 and q = 2 for public category consumer data, Table 8 shows the model parameter estimates. In Figure 8, we have the graphs of the residuals and ACF. It is observed that the assumptions of the model are reasonably verified.

·         Residential

Assuming the ARIMA model (10) with p = 4, d = 0 and q = 2 for consumer data of the residential category, Table 9 shows the model parameter estimates. From Figure 9, it is observed that the assumptions of the model are reasonably verified.

In Figure 10, we have the observed and fitted series (ARIMA models). From the fitted ARIMA models, we can find the forecast values for the next month (month 56) and also 95% confidence intervals for the average water consumption in each consumer category (see Table 10).

Table 7 - Estimators of ARIMA model parameters (industrial)

Type

Coefficient

SE

T

p

AR 1

0.2125

0.1942

1.09

0.279

AR 2

-0.7384

0.1838

-4.02

< 0.01

AR 3

0.2986

0.1542

1.94

0.059

MA 1

-0.0268

0.1713

-0.16

0.876

MA 2

-0.8995

0.1500

-6.00

< 0.01

Constant

38.902

1.656

23.49

< 0.01

Mean

31.700

1.349

 

 

Figure 7 - Residual plots and ACF (industrial)

Table 8 - Estimators of ARIMA model parameters (public)

Type

Coefficient

SE

T

p

AR 1

-1.1645

0.1672

-6.96

< 0.01

AR 2

-0.2160

0.2221

-0.97

0.336

AR 3

0.4880

0.1364

3.58

< 0.001

MA 1

-1.6290

0.1446

-11.26

< 0.01

MA 2

-0.9288

0.1602

-5.80

< 0.01

Constant

45.079

2.130

21.17

< 0.01

Mean

23.820

1.125

 

 

Figure 8 - Residual plots and ACF (public)

5 DISCUSSION OF THE OBTAINED RESULTS

From the results obtained for the three proposed time series models (moving average, ARIMA and Bayesian model in the presence of a random effect that captures the possible correlation for consumption averages in the four categories of consumers - commercial, industrial, public and residential), it is observed that the forecasts are approximately close, with a slight gain for the Bayesian model, since the forecast obtained for the 56th month is closer to the value of the previous month (see Table 11). In terms of a general assessment, based on the MAPE, MAD and MSD values (see Table 11), there is also a small variation between the values obtained for the three models proposed in this study. It is important to point out that the Bayesian model in the presence of a random factor has a structure which captures the dependence among water consumption for each category at the same time. The other models assume independent time series, not a realistic fact for the available data set.

Table 9 - Estimators of ARIMA model parameters (residential)

Type

Coefficient

SE

T

p

AR 1

-1.1826

0.1406

-8.41

<0.01

AR 2

0.1905

0.2233

-0.85

0.398

AR 3

0.8070

0.1976

4.08

<0.01

AR 4

0.3600

0.1447

2.49

0.016

MA 1

-1.6983

0.0501

-33.90

<0.01

MA 2

-0.7898

0.1248

-6.33

<0.01

Constant

13.690

0.5773

23.71

<0.01

Mean

16.592

0.6997

 

 

Figure 9 - Residual plots and ACF (residential)

Figure 10 - Observed and fitted time series (ARIMA) for the average water consumption for different categories of consumers

Table 10 - Forecast for average water consumption by consumer category (ARIMA)

 

Predicted

MAPE

MAD

MSD

Commercial

16.8203

3.8103

0.6072

0.6039

Industrial

32.8460

14.522

4.5860

36.1231

Public

24.9056

13.417

3.1107

17.5526

Residential

18.0965

5.2752

0.8626

1.3129

An additional advantage for the Bayesian model (1) for its use by the water utility to forecast consumption in different categories of consumers: the estimated model obtained using MCMC simulation methods can be used for different times and seasons without the need for monthly updating as required with the use of the traditional moving averages and ARIMA models. In addition, the ARIMA (p, d, q) model is an exploratory model that requires several choices of p, d and q values, which makes its practical use by water utilities difficult, where consumption forecasts should be made monthly. In general these water supply companies need ready estimated models for consecutive use for several months where updating the model should be done only after reasonably long periods of use. This is certainly another favorable point for the use of the Bayesian model by the water supply utility.

Table 11 - Forecast for average water consumption by consumer category (all models) at month 56 (forecast for next month)

Category

Model

Prediction

Commercial

MA

ARIMA

Bayes

17.2824

16.8203

18.2302

Industrial

MA

ARIMA

Bayes

32.8148

32.8460

31.2073

Public

MA

ARIMA

Bayes

24.1569

24.9056

21.9453

Residential

MA

ARIMA

Bayes

18.1097

18.0965

19.8154

To better evaluate the fit of the three proposed models, we consider a reanalysis of the data considering only the data from the first 54 months and leaving the observed consumption of the 55th month to be used in comparison with the forecast value for each model. Considering the three models previously used (moving averages model, ARIMA model and Bayesian model in the presence of a random effect), Table 12 shows the water consumption forecasts for the 55th month in each category of consumer. In the ARIMA model we consider (p, d, q) = (3,0,1) for the commercial category and (p, d, q) = (3,0,2) for the industrial, public and residential categories. For the Bayesian model (1), we assume the same prior distributions for the parameters of the model as used before and the same MCMC simulation scheme using the OpenBugs software to find the consumption forecast in the 55th month and the 3rd season in each consumer category (the obtained posterior summaries are shown in Table 13). From the results in Table 12, good prediction is observed (especially for the public and residential categories) considering the Bayesian model. In addition, this estimated Bayesian model could be used by the water company, without the need for remodeling for a fixed and long period of time), which makes the model very attractive in predicting water consumption. The use of the models proposed by this study can be generalized considering other categorical variables such as the region of the municipality, socioeconomic range of consumers, rainy and non-rainy periods among several other factors, which results in the high level of applicability of the models introduced.

Table 12 - Forecast for average water consumption by consumer category (all models) at month 56 (forecast for next month)

Category

Model

Prediction

Observed commercial = 17.3912

MA

ARIMA

Bayes

17.2824

16.8203

18.2302

Observed industrial = 30.0000

MA

ARIMA

Bayes

32.8148

32.8460

31.2073

Observed public = 21.8824

MA

ARIMA

Bayes

24.1569

24.9056

21.9453

Observed residential = 17.7135

MA

ARIMA

Bayes

18.1097

18.0965

19.8154

Table 13 - Posterior summaries of time series of consumption by category (Bayesian model deleting observation 55)

 

Mean

SD

Low 95%

Upper 95%

β10

2.584

0.2563

2.078

3.092

β11

0.004594

0.002518

-0.0183

0.009748

β12

0.02961

0.03368

-0.03691

0.09588

β13

0.02327

0.1284

-0.224

0.2775

β14

-0.00301

0.122

-0.2492

0.2329

β15

-0.02827

0.09485

-0.2157

0.1531

β20

3.314

0.2858

2.744

3.874

β21

-0.00135

0.002722

-0.006706

0.004215

β22

0.04041

0.03633

-0.02739

0.1136

β23

0.1142

0.1086

-0.1009

0.3312

β24

0.02099

0.1098

-0.19

0.2439

β25

-0.1214

0.08097

-0.2784

0.04419

β30

3.05

0.2916

2.466

3.612

β31

-0.00185

0.00283

-0.007361

0.003684

β32

0.0344

0.03732

-0.03931

0.1083

β33

0.0535

0.111

-0.158

0.2692

β34

0.02492

0.1122

-0.1903

0.2453

β35

-0.05854

0.08638

-0.2237

0.1109

β40

2.562

0.2507

2.064

3.051

β41

0.004177

0.002495

-0.0815

0.009164

β42

0.03547

0.03278

-0.0275

0.09659

β43

0.02235

0.1288

-0.2307

0.2723

β44

0.01242

0.1287

-0.2387

0.2631

β45

-0.01964

0.09748

-0.2118

0.1724

1/σ2w

86.96

23.31

47.71

138.3

1/σ21

20.78

4.209

13.57

29.53

1/σ22

13.86

2.809

8.95

20.04

1/σ23

12.76

2.63

8.13

18.25

1/σ24

20.23

4.21

13.06

29.19

Appendix 1. Evaluation criteria for the fit of the time series model

      Mean average percentage error (MAPE)

MAPE expresses the accuracy as a percentage of the error given by,

(11)

where  is an observation,  is the fitted value and n is the number of observations. This measure expresses accuracy as a percentage of the error. Since this number is a percentage, it may be easier to understand than other statistics. For example, if the MAPE is 5, on average the forecast is incorrect at 5%.

      Mean absolute deviation (MAD)

MAD expresses precision in the same data units, which helps to conceptualize the magnitude of the error given by, MAPE expresses the accuracy as a percentage of the error given by,

(12)

where  is an observation,  is the fitted value and n is the number of observations. Outliers (discordant points) have less effect on MAD than MSD.

      Mean square deviation (MSD)

Another commonly used measurement of the accuracy of fitted time series values is given by,

(13)

where  is an observation,  is the fitted value and n is the number of observations. Outliers (discordant points) have a greater effect on MSD than on MAD.

REFERENCES

Akgün, B. (2003). Identification of periodic autoregressive moving average models. PhD thesis. Middle East Technical University.

Altunkaynak, A., Özger, M., Çakmakci, M. (2005). Water consumption prediction of istanbul city by using fuzzy logic approach. Water Resources Management, 19(5), 641–654.

Aly, A. H., Wanakule, N. (2004). Short-term forecasting for urban water consumption. Journal of water resources planning and management, 130(5), 405–410.

Amaral, A. M. P., Shirota, R. (2000). Mean residential consumption of treated water: an application of time series models in piracicaba (in portuguese). Revista Agrícola, 49(1), 55–72.

Balling, R. C., Gober, P., Jones, N. (2008). Sensitivity of residential water consumption to variations in climate: An intraurban analysis of Phoenix, Arizona. Water Resources Research, 44(10).

Bell, W. R. (1984). An introduction to forecasting with time series models. Insurance: Mathematics and Economics, 3(4), 241–255.

Bougadis, J., Adamowski, K., Diduch, R. (2005). Short-term municipal water demand forecasting. Hydrological Processes: An International Journal, 19(1), 137–148.

Box, G. E., Jenkins, G. M., Reinsel, G. C., Ljung, G. M. (2015). Time series analysis: forecasting and control. John Wiley & Sons.

Casella, G., George, E. I. (1992). Explaining the Gibbs sampler. The American Statistician, 46(3), 167–174.

Chang, H., Parandvash, G. H., Shandas, V. (2010). Spatial variations of single-family residential water consumption in portland, oregon. Urban geography, 31(7), 953–972.

Chib, S., Greenberg, E. (1995). Understanding the metropolis-hastings algorithm. The american Statistician, 49(4), 327–335.

Crommelynck, V., Duquesne, C., Mercier, M., Miniussi, C. (1992). Daily and hourly water consumption forecasting tools using neural networks. Em: Proc. of the AWWA’s annual computer specialty conference, pp. 665–676.

Dias, D., Martinez, C.B., M., Libanio (2010). Evaluation of the impact of income variation on household consumption of water (in portuguese). Engenharia Sanitária Ambiental, 15(2), 155–166.

Feil, A., Haetinger, C. (2014). Prediction of water consumption via mathematical modeling of water supply system (in portuguese). Revista DAE, 195, 32–46.

Fortin, V., Perreault, L., Salas, J. (2004). Retrospective analysis and forecasting of streamflows using a shifting level model. Journal of Hydrology, 296(1-4), 135–163.

Gelfand, A. E., Smith, A. F. (1990). Sampling-based approaches to calculating marginal densities. Journal of the American Statistical Association, 85(410), 398–409.

Gwaivangmin, B., Jiya, J. (2017). Water demand prediction using artificial neural network for supervisory control. Nigerian Journal of Technology, 36(1), 148–154.

Ho, S., Xie, M. (1998). The use of arima models for reliability forecasting and analysis. Computers & Industrial Engineering, 35(1-2), 213–216.

House-Peters, L., Pratt, B., Chang, H. (2010). Effects of urban spatial structure, sociodemographics, and climate on residential water consumption in Hillsboro, Oregon. Journal of the American Water Resources Association, 46(3), 461–472.

Jain, A., Ormsbee, L. E. (2002). Short-term water demand forecast modeling techniques: Conventional methods versus AI. Journal-American Water Works Association, 94(7), 64–72.

Kher, L. K., Sorooshian, S. (1986). Identification of water demand models from noisy data. Water Resources Research, 22(3), 322–330.

Ljung, G. M., Box, G. E. (1978). On a measure of lack of fit in time series models. Biometrika, 65(2), 297–303.

Maidment, D. R., Parzen, E. (1984). Cascade model of monthly municipal water use. Water Resources Research, 20(1), 15–23.

Makki, A. A., Stewart, R. A., Panuwatwanich, K., Beal, C. (2013). Revealing the determinants of shower water end use consumption: enabling better targeted urban water conservation strategies. Journal of Cleaner Production, 60, 129–146.

Montgomery, D. C., Runger, G. C. (2010). Applied statistics and probability for engineers. John Wiley & Sons.

Montgomery, D. C., Jennings, C. L., Kulahci, M. (2015). Introduction to time series analysis and forecasting. John Wiley & Sons.

Morettin, P. A., Toloi, C. M. (1987). Time series forecast (in Portuguese). Atual.

Muhammad, M. K. I. B. (2012). Time Series Modeling Using Markov and Arima Models. PhD thesis. Universiti Teknologi Malaysia.

Narchi, H. (1989). Domestic demand for water (in portuguese). Revista DAE, 154, 1–7.

Nucci, N. L. R. (1983). Assessment of urban water demand. economic and urban aspects. the built area as a possible explanatory and prospective factor (in portuguese). Revista DAE, 135, 22–49.

Perry, P. F. (1981). Demand forecasting in water supply networks. Journal of the Hydraulics Division, 107(9), 1077–1087.

Rhoades, S. D., Walski, T. M. (1991). Using regression analysis to project pumpage. Journal-American Water Works Association, 83(12), 45–50.

Silva, C. S. (2003). Multivariate prediction of hourly water demand in urban water supply systems. PhD thesis. University of São Paulo.

Silva, W. T. P., Silva, L. M., Chichorro, J. F. (2008). Water resources management: Per capita water consumption perspectives in cuiabá (in portuguese). Engenharia Sanitaria Ambiente, 13(1), 8–14.

Smith, J. A. (1988). A model of daily municipal water use for short-term forecasting. Water Resources Research, 24(2), 201–206.

Spiegelhalter, D., Thomas, A., Best, N., Lunn, D. (2003). Winbugs user manual.

Stark, H. L. (2003). The application of artificial neural networks to water demand modelling.

Thomas, S. P. (2000). Prediction of Water Consumption: Interface of Water and Sewage Installations with public services (in Portuguese). Navegar Editora.

Willis, R. M., Stewart, R. A., Giurco, D. P., Talebpour, M. R., Mousavinejad, A. (2013). End use water consumption in households: impact of socio-demographic factors and efficient devices. Journal of Cleaner Production, 60, 107–115.

Wong, J. S., Zhang, Q., Chen, Y. D. (2010). Statistical modeling of daily urban water consumption in hong kong: Trend, changing patterns, and forecast. Water resources research, 46(3).

Zhou, S., McMahon, T., Walton, A., Lewis, J. (2002). Forecasting operational demand for an urban water supply zone. Journal of hydrology, 259(1-4), 189–202.

Zhou, S. L., McMahon, T. A., Walton, A., Lewis, J. (2000). Forecasting daily urban water demand: a case study of melbourne. Journal of hydrology, 236(3-4), 153–164.