Universidade Federal de Santa Maria
Ci. e Nat., Santa Maria, v.42, e110, 2020
DOI:10.5902/2179460X33910
ISSN 2179460X
Received: 25/07/18 Accepted: 20/12/19 Published: 23/12/20
Statistics
Prediction of water consumption by consumer categories: a case study
Predição do consumo de água por categoria de consumo: um estudo de caso
Jorge Alberto Achcar ^{I}
Marcos Valerio Araujo ^{II}
Claudio Luis Piratelli ^{III}
Ricardo Puziol de Oliveira ^{IV}
^{I} Universidade de São Paulo, Ribeirão Preto, SP  achcar@fmrp.usp.br
^{II} Universidade de Araraquara, Araraquara, SP  marcos.dearaujo@aegea.com.br
^{III} Universidade de Araraquara, Araraquara, SP  clpiratelli@uniara.com.br
^{IV} Universidade de São Paulo, São Paulo, SP  rpuziol.oliveira@gmail.com
ABSTRACT
This study introduces a new Bayesian model for predicting water consumption in a mediumsized municipality in the State of São Paulo, Brazil. For the study, a stratiﬁed random sample of water consumption for consumers in different consumer categories (residential, industrial, public and commercial) is selected for 55 monthly consecutive measurements of water consumption and the proposed model is compared with some usual existing time series models (moving average models and ARIMA models) commonly used in forecasts. The Bayesian model for the consumption data assumes the presence of a random effect that captures the possible dependence between the monthly consumption for the different categories. A hierarchical Bayesian analysis is done using MCMC (Markov Chain Monte Carlo) methods to generate samples of the joint posterior distribution of interest. A detailed discussion of the results obtained is presented, showing the advantages and disadvantages of each model proposed in terms of feasibility for the municipality’s water supply company. The results of this study can be generalized to water consumption data for any municipality.
Keywords: water consumption, time series models, water consumption forecast, Bayesian model, MCMC methods
RESUMO
Este estudo introduz um novo modelo bayesiano de previsão para o consumo de água em um município de médio porte do Estado de São Paulo, Brasil. Para o estudo, foi selecionada uma amostra aleatória estratiﬁcada de consumidores classiﬁcados em diferentes categorias (residencial, industrial, pública e comercial) considerando 55 medições consecutivas mensais do consumo de água para cada consumidor. O modelo proposto é comparado com alguns modelos usuais de séries temporais (modelos de médias móveis e modelos ARIMA) comumente usados em previsões. O modelo Bayesiano para os dados de consumo pressupõe a presença de um efeito aleatório que captura a possível dependência entre o consumo mensal para as diferentes categorias. Uma análise Bayesiana hierárquica é feita usando métodos MCMC (Monte Carlo em Cadeias de Markov) para gerar amostras da distribuição a posteriori conjunta de interesse. Uma discussão detalhada dos resultados obtidos são apresentados, mostrando as vantagens e desvantagens de cada modelo proposto em termos de viabilidade para o empresa de abastecimento de água do município. Os resultados deste estudo podem ser generalizados para dados de consumo de água para qualquer outro município.
Palavraschave: consumo de água, modelos de séries temporais, previsão do consumo de água, modelo Bayesiano, métodos MCMC
1 INTRODUCTION
Water consumption by the population is a subject of great interest in the area of basic sanitation, given the huge population growth on the planet and the limited water in municipal supply systems. Several factors can inﬂuence water consumption, such as the category of consumer unit (residential, commercial, industrial or public) and the socioeconomic category of consumers, among others, as well as the time of year (month) and season. The introduction of new, simple statistical modeling formulations for consumption forecasting may be of great interest to the municipal water utility managers to get important mid or longterm strategic decisions to prevent the collapse of the system.
In general, each water supply company has large temporal databases associated with each consumer unit through automated supervisory and control systems that can be used in studies leading to better statistical models and better forecasts. This study introduces a new statistical model under the Bayesian paradigm, which may be more accurate and simpler to be used by water utilities when compared to traditional time series models such as the popular moving average models or ARIMA models to predict water consumption for each consumer unit. More accurate water consumption forecasts and statistical models with simple implementation may be of great interest to water utilities if for some reason it is impossible to access the monthly water consumption of an unit for some time, or in the planning of water stock expansions normally done by water supply companies in each municipality.
The literature contains a lot of studies related to water consumption. Zhow et al. (2000, 2002) proposed a forecasting model for daily water consumption for Melbourne (Australia) based on time series, considering the effects of the following factors: trend seasonality, climatic correlation and autocorrelation.
Altunkaynak et al. (2005) introduced a fuzzy logic method for predicting future monthly water consumption in Istanbul (Turkey) as an alternative to the usage of the Markov or ARIMA models.
Ally and Wanakule (2004) presented an approach for short term forecasting of municipal water use based on a deterministic smoothing algorithm.
Balling, Gaber and Jones (2008) used a time series of monthly water use anomalies to compare with anomalies in temperature, precipitation and the hydrological index. More than 70% of monthly variability in water supply was explained by atmospheric conditions.
Silva et al. (2008) studied the degree of participation of some socioeconomic and climatic variables from a multiple linear regression model in the determination of per capita water consumption and they proposed a statistical model that projects the demand for water in the city of Cuiabá, Mato Grosso, Brazil. The results indicated the nonintervention of the climatic variables in the phenomenon. Contributing to the deﬁnition of consumption were the socioeconomic class variable and the per capita consumption of electric energy. Feil and Haetinger (2014) studied the water supply system in the city of Lajeado, Rio Grande do Sul, Brazil from 2000 to 2007. The study was related to the prediction of water consumption by the population of Lajeado between 2008 and 2032, through a mathematical model used by the water utility to verify the likelihood of a water shortage. They observed from the results of their study that the variables that intervene in per capita consumption correspond to the relative air humidity and total population. The future forecast detected an extrapolation of the maximum ﬂow rate of the treated water produced from 2026.
Dias et al. (2010) evaluated the impact that a change in household income had on the consumption of treated water provided by the concessionaire (in this case, the Minas Gerais Sanitation Company  Copasa) in the city of Belo Horizonte, Brazil, for 35 months covering the period from August 2003 to June 2006. For this purpose, data from the Brazilian Geography and Statistics Institute (IBGE) via its Monthly Employment Survey (PME) were used, encompassing 3,100 households and 10,200 residents, and the residential consumption of the six districts operated in by Copasa. The results showed an intrinsic relationship between per capita consumption and monthly income. Perry (1981) used a linear regression model for the next day’s forecast using temperature and past consumption data as covariates in order to optimize the use of water lifting stations and to reduce electricity costs. Kher and Shorosian (1986) compared regression models for monthly consumption considering monthly consumption history, family income, water tariffs, average monthly rainfall, average monthly temperature and the effective evaporation for each month considered as covariates. Smith (1988) developed an autoregressive model for daily prediction of consumption through a time series including some explanatory variables such as water tariffs, number of connections, socioeconomic characteristics, and type of consumption (residential, commercial, industrial). Rhoades and Walski (1991) applied a multiple regression model for monthly consumption forecast considering the variables temperature, population, and precipitation for the utility planning of the city of Austin in the state of Texas, USA.
Gwaivangmin and Jiya (2017) presented a model based on an artiﬁcial neural network to predict hourly water demand in nodes for water distribution in a city in Nigeria, Africa. The results of the model were useful for supervisory control and monitoring of the water demand nodes in the city, solving the perennial problem of water scarcity.
Wong, Zhang and Chen (2010) studied the effects of ﬁve factors on daily water consumption in Hong Kong as support for an effective urban water resource. They used a series of statistical models in order to ﬁnd the major factors inﬂuencing urban water use (seasonality, followed by calendar effects).
Bougadis, Adamowski and Diduch (2005) forecasted water demand in the city of Ottawa, Ontario, Canada. They studied rainfall and maximum air temperature inﬂuences on past water demand. Three different artiﬁcial neural network and regression models and seven time series models were compared. It was found that the existing infrastructure would not meet the water demand for the projected population in 2021. They also found that water demand on a weekly basis is more signiﬁcantly correlated with the amount of rainfall than the frequency of rainfall.
Some recent studies on water consumption focus on statistical analyzes of spatial and demographic factors from the cities – HousePetters, Pratt and Chang (2010); Chang, Parandvash and Shandas (2010) – or behavioral aspects of residential consumers – Makki et al 2013, Willis et al 2013.
Many other papers have also been introduced to the literature related to water consumption data (Amaral and Shirota, 2000; Dias et al., 2010; Narchi, 1989; Jain and Ormsbee, 2002; Kher and Sorooshian, 1984; Maidment and Parzen, 1984; Perry, 1981; Smith, 1988; Thomas, 2000).
Some studies, such as those reported by Crommelynk et al (1992), Stark et al (2000), Jain and Ormsebee (2002), Silva (2003), and Falkemberg et al. (2003), consider prediction models for water consumption based on artiﬁcial intelligence techniques.
1.1 Data set and main goals
In this study, the data from a mediumsized city in the central region of the state of São Paulo, Brazil (Matão), which contains approximately 35,000 consumer units corresponding to 55 consecutive months (monthly data from January 2012 to July 2016) is considered as a case study. From this population, a stratiﬁed random sample (by neighborhood) of 3,000 consumers was selected. In addition to the monthly water consumption for each unit, the randomly selected data contain information on the consumer units, consumer categories, type of economy, and region of the city. In particular, we consider the statistical analysis of a time series consisting of the monthly average consumption for the period of 55 months in each one of the different categories of consumers (residential, industrial, public and commercial).
The main goals of the study are:
• The introduction of a Bayesian model in the presence of lagged effects, covariates (months and quarters) and a random factor which captures the possible dependence structure among the averages of the water consumption measurements in each month for different categories of consumers (commercial, industrial, public and residential).
• For comparative purposes, other existing time series models are also assumed, such as the popular moving average models and the ARIMA models commonly used in forecasting water consumption in different consumer segments. In this comparative study the possible advantages and disadvantages of each proposed model for the forecast of water consumption in each category of consumer are discussed.
A secondary goal of the study, considers:
• Statistical analysis to verify if there is a signiﬁcant difference among the means of water consumer categories based on data analysis of the averages of consumers over the 55 months using descriptive statistics and ANOVA (analysis of variance) models.
1.2 Preliminary statistical data analysis for the consumer categories
Initially, analysis of the water consumption averages over the 55 months of followup in the different categories of consumers is considered. That is, the longitudinal consumption data for 55 months in each consumer unit is replaced by a single quantity, the average water consumption over the 55 months, for each one of the 3,000 water consumer units reported for the period of 55 months (from January 2012 to July 2016). From this data set, great variability among the water consumption averages (in cubic meters) is observed. The overall mean for all data (n = 3,000 observations) is equal to 16.66. The consumption means, standard deviations and median samples for each category of consumer are obtained using Minitab® software, version 16 (Table 1).
Tabela 1  Descriptive statistics for the average water consumption by category
Category 
n 
Mean 
S.D. 
Median 
Commercial 
342 
16.03 
10.65 
14.18 
Industrial 
9 
31.62 
22.89 
24.95 
Public 
17 
23.74 
20.73 
13.58 
Residential 
2632 
16.65 
7.80 
15.25 
From the results in Table 1, two categories of water consumers (industrial and public) are observed to have very high sample means (well above the overall mean) when compared to the means of the other water consumer categories (commercial and residential). Figure 1 presents the boxplots for the average consumption in the 55 months in each category of consumer.
Figura 1  Boxplots for the average water consumption in different categories of consumers
A high degree of variability of the means in each category of consumer is observed in the box plots in Figure 1. Preliminary analysis of the data eliminates two observations (incomplete data); that is, n = 2998 average observations are considered for the statistical analysis of the data. Figure 2 shows the histograms of the average consumption in 55 months on the original scale and on the log scale. A better normality of the data in the logarithmic scale is observed.
Figura 2  Histograms for average water consumptions in the original scale and in the logarithmic scale
1.3 Use of an ANOVA model to compare the averages of category consumers over the 55 months
From the descriptive analysis of average monthly water consumption data of the city considered in this study, great differences among the means of consumption in different categories of consumer are observed. In order to confirm the possible difference in the average monthly water consumptions, an ANOVA (analysis of variance) model is used, considering the data (average consumption) transformed to the logarithmic scale to have better normality, an assumption needed to validate the inferences.
Analysis of variance (ANOVA) is a statistical methodology to test whether a given factor has a signiﬁcant effect on the dependent variable Y. Assuming µ_{j} representing the true value of the mean of the dependent variable classiﬁed at different levels of a factor, in level j, the ANOVA technique tests the hypothesis that there are no differences between the means µ_{j}, assuming that there are no differences between the variability of the observations in each group (constant variance). For more details on ANOVA, Montgomery and Runger, 2011 is recommended. Table 2 presents the ANOVA results obtained using the statistical software MINITAB® version 16, considering the data set (n = 2998 average consumer units).
Tabela 2  ANOVA (water consumption by categorylog scale)
Source 
DF 
SS 
MS 
F 
p 
Commercial 
3 
11.374 
3.791 
14.68 
< 0.001 
Public 
2994 
773.10 
0.258 


Residential 
2997 
784.473 



(DF: degrees of freedom; SS: sum of squares; MS: mean square;F: statistics F of Snedecor; p: pvalue)
From the results in Table 2, a significant difference between the different levels (commercial, residential, public and industrial) of consumer category (pvalue to test equality of averages is less than 0.05) is observed. These results confirm the results of the preliminary data analysis in terms of identification of levels with higher or lower water consumption. The verification of the normality assumptions of the residuals and constant variance was verified via graphs of residuals not presented here for reasons of space saving
2 METHODS
For the statistical analysis of the time series over the 55 months (the main goal of the study) the averages of consumer units in each month (55 months) for each category of consumer considering the data transformed to the logarithmic scale are considered as a data set. That is, we have four time series with 55 observations in each series. Figure 3 presents the time series plots for the data
Figure 3  Illustration of the problem and its main physical characteristics. Available from: Cavalieri et al. (2016)
2.1 Use of a Bayesian Regression Model
Since the four time series for each category of average consumer (residential, commercial, public and industrial) are measured at the same time, it is possible to have dependence among the four time series. Thus, in this section a Bayesian model with the inclusion of a random effect or latent factor that captures the possible dependence between the four series and lagged effects is proposed:

(1) 
where i = 1, ... , 55; j = 1,2,3,4 (categories); Y_{1i} = log(commercial_{i}); Y_{2i} = log(industrial_{i}); Y_{3i} = log(public_{i}); Y_{4i} = log(residential_{i}); w_{i} is a random effect that captures the possible dependence between the series on the same date; ϵ_{ji} is a random error assumed to be an independent sample with a normal distribution with mean equal to zero and constant variance equal to σ_{j}^{2}. The random factor w_{i} is assumed to have another normal distribution with mean equal to zero and constant variance equal to σ_{w}^{2}. The four seasons denoted by 1 (January, February, March), 2 (April, May, June), 3 (July, August, September) and 4 (October, November, December) are considered. For a hierarchical Bayesian analysis of the model (1), MCMC (Monte Carlo Markov Chains) methods (see, for example, Gelfand and Smith, 1990; Casela and George, 1992; Chib and Greenberg, 1995). OpenBugs software version 3.2.2 (Spiegelhalter et al, 2003) is used to simulate samples of the joint posterior distribution of interest
For a Bayesian analysis, prior gamma distributions G(1,1) are considered for 1/σ_{j}^{2}, j = 1,2,3,4 and a G(0.1,0.1) prior for 1/σ_{w}^{2} where G(a,b) denotes a gamma distribution with mean equal to a/b and variance equal to a/b^{2} and normal N(0, 10) prior distributions for the regression parameters β_{j}_{0}, β_{j}_{1}, β_{j}_{2}, β_{j}_{3}, β_{j}_{4} and β_{j}_{5}, j = 1,2,3,4. In the sample simulation from the joint posterior distribution, a “burnin sample” size of 51,000 was discarded to eliminate the effect of the initial values in the iterative process; after this burning sample, a further 100,000 samples were generated by taking samples of 100 in 100, totaling a ﬁnal sample of size 1,000 that will be used to obtain the posterior summaries of interest. The convergence of the sample simulation algorithm of the joint posterior distribution was veriﬁed via time series plots of the simulated Gibbs samples.
3 RESULTS
Table 3 shows the posterior summaries of interest. Although not all covariates indicate signiﬁcant effects since the zero value is included in the 95% credible intervals for each regression parameter, the model is useful for making predictions of water consumption in the future month at speciﬁed year and season for consumers in each category.
It is important to point out that using the Bayesian model (1) we can ﬁnd forecasts for water consumption in the different categories (commercial, industrial, public and residential) at any time of year with the effects of months and seasons.
Figure 4 presents the observed and ﬁtted time series. A good ﬁt is observed for the proposed models with the observed data. Table 4 shows the values for evaluation of the ﬁt for the model according to the MAPE, MAD and MSD criteria (see appendix 1).
Tabela 3  Posterior summaries for the consumption time series by category

Mean 
SD 
Low 95% 
Upper 95% 
β10 
2.587 
0.2429 
2.123 
3.04 
β11 
0.004418 
0.002468 
0.0460 
0.009074 
β12 
0.02961 
0.03109 
0.0311 
0.08895 
β13 
0.02704 
0.1231 
0.2002 
0.2689 
β14 
0.0008 
0.1229 
0.2408 
0.2379 
β15 
0.03367 
0.09228 
0.2165 
0.1460 
β20 
3.308 
0.2837 
2.746 
3.862 
β21 
0.00161 
0.002658 
0.00669 
0.003496 
β22 
0.0393 
0.0363 
0.03072 
0.1134 
β23 
0.1142 
0.1040 
0.08761 
0.3230 
β24 
0.02778 
0.1107 
0.1814 
0.2521 
β25 
0.1236 
0.08162 
0.2882 
0.03511 
β30 
3.046 
0.2823 
2.5140 
3.6040 
β31 
0.00189 
0.002677 
0.00717 
0.003391 
β32 
0.03399 
0.0369 
0.03945 
0.1068 
β33 
0.06106 
0.1112 
0.1531 
0.2882 
β34 
0.01865 
0.1088 
0.1962 
0.2407 
β35 
0.05838 
0.08668 
0.2309 
0.1113 
β40 
2.565 
0.2399 
2.085 
3.035 
β41 
0.004087 
0.002482 
0.0570 
0.009102 
β42 
0.03387 
0.03252 
0.03113 
0.09801 
β43 
0.02329 
0.1266 
0.218 
0.2743 
β44 
0.01928 
0.1252 
0.2223 
0.2590 
β45 
0.02654 
0.09492 
0.2077 
0.1629 
σ^{2}_{w} 
0.01112 
0.04117 
0.01996 
0.00679 
σ^{2}_{1} 
0.04732 
0.23969 
0.07300 
0.03289 
σ^{2}_{2} 
0.00499 
0.33795 
0.11390 
0.04766 
σ^{2}_{3} 
0.07674 
0.37119 
0.11994 
0.05321 
σ^{2}_{4} 
0.04842 
0.23894 
0.07616 
0.03402 
Tabela 4  MAPE, MAD and MSD values (Bayesian model)

MAPE 
MAD 
MSD 
Commercial 
3.9839 
0.6236 
0.6000 
Industrial 
13.5190 
4.3167 
31.9516 
Public 
14.6645 
3.4846 
22.7087 
Residential 
4.5658 
0.7322 
0.9596 
Figure 4  Observed and fitted time series (Bayesian model)
4 COMPARISON WITH OTHER PREDICTION MODELS
Water supply companies normally use simple independent time series models such as the moving average models or more sophisticated models like ARIMA (Autoregressive Integrated Moving Average) models also known as Box and Jenkins models (1994) to forecast water consumption in the upcoming measurements in each category of consumer.
4.1 Use of moving average models
As a first alternative for the use of the proposed Bayesian model (1), a time series model given by moving averages is used in forecasting water consumption (see, for example, Box et. al, 1994; or Morettin and Toloi, 1987). The moving average model considers a stationary and locally constant time series Z_{1}, Z_{2}, ..., Z_{n}, composed of its level and additional noise defined by,

(2) 
where t = 1,2, ..., n; E(a_{t}) = 0; var(a_{t}) = σ^{2}_{a} and µt is an unknown parameter that varies over time. The moving average technique consists of calculating the arithmetic mean of the most recent k observations, that is,

(3) 
The length of the mean is given by k. Thus, M_{t} is an estimate of de µ_{t} that does not take into account the oldest observations, only the last k observations where in each period the oldest observation is replaced by the most recent one, and a new average is calculated.
The forecast of future values is given by the last calculated moving average, that is

(4) 
or,

(5) 
for all h > 0. It is observed that the above equation corrects the prediction of Z_{t+h} at every instant; that is, with each new observation in the series, Z_{t+h} is updated. Assuming that the noise a_{t} a has a normal distribution with mean equals to zero and variance equals to σ^{2}_{a} implies that the forecast has a normal distribution with mean equal to µt and variance equal to σ^{2}_{a} / k. Therefore, a confidence interval for µ_{t} with a confidence coefficient equal to 100(1 − α)% is given by,

(6) 
where z_{α/2} is the quantile of a standard normal distribution.
Figure 5 presents the observed and fitted time series by moving averages for the different levels of the category factor considering a length of 3 months. From the fitted moving average models, we find the average forecast values considering a length of 3 months for the next month (56th month) and also 95% confidence intervals for the average water consumption (use of the MINITAB® Statistical Software, version 16) in each consumer category (see Table 5).
Table 5  Forecast for average consumption of water by category of consumer (moving average models)

Forecast 
MAPE 
MAD 
MSD 
Commercial 
17.2824 
2.7707 
0.4420 
0.3655 
Industrial 
32.8148 
10.9492 
3.5031 
22.0466 
Public 
24.1569 
10.0761 
2.4247 
11.6008 
Residential 
18.1097 
3.7239 
0.6083 
0.75049 
The prediction values can be used to determine water consumption for each type of consumer for the next month. This, despite being a very simple model for predicting water consumption, may be useful for water distribution companies. The model should be updated and continually remade with the information from the previous three months.
4.2 Use of ARIMA (Autoregressive Integrated Moving Average) Models
A class of models widely used for time series, especially in the area of finance, is the ARIMA (Autoregressive Integrated Moving Average) model class, usually known as the BoxJenkins method (Box et al., 1994), which is accurate for shortterm forecast, but less so with longterm forecasting. The analysis of a time series in the time domain is performed by a parameter known as the serial correlation coefficient, or the autocorrelation coefficient. This parameter indicates the dependence on successive values of a time series.
Most prediction problems involve the use of time series data. Montgomery et al. (2008) suggest that forecasting problems are often classified as shortterm, mediumterm and longterm.
The use of available observations at time t of a time series used for forecasting a value at some future time is usually considered as a basis for economic and business planning, production planning, inventory control and production, and process control and optimization (Box et al., 1994).
Figure 5  Observed and fitted time series for mean water consumption for each category of consumer (moving average model)
Generally, predictions are made at time t, taking the current month Y_{t} and the previous months Y_{1}, Y_{2}, ..., Y_{t}_{−1}, to predict future values given by F_{t+}_{1}, F_{t+}_{2}, ..., F_{t+m}. Stochastic modeling of hydrological time series has been widely used for the management of water resources systems, such as reservoir design and the occurrence of future hydrological events. For example, stochastic models are used to generate synthetic series of water supply that may occur in the future and are used to estimate the probability distribution of key decisionmaking parameters related to storage. In addition, stochastic models can be used for water supplies and demands days, weeks, months and years in advance (Fortin et al., 2004).
ARIMA modeling is essentially an exploratory oriented approach with great flexibility to assemble an appropriate model that is adapted from the structure of the data itself. The stochastic nature of the time series can be roughly modeled with the aid of the autocorrelation function and partial autocorrelation function where random variables, periodic components, cyclic patterns and serial correlation can be discovered. As a result, the predictions of the series values can easily be obtained with a high degree of precision (Ho and Xie, 1998).
The process is constructed through the identification of the model, estimation of parameters and verification for the fit of the proposed model (Ho and Xie, 1998).
The ARIMA models contain three components, namely: autoregressive components (AR), integrated components (I) and moving average components (MA). The AR part describes the relationship between current observations and past observations. The MA part represents the autocorrelation structure of the error. Component I represents the level of differentiation of the series to eliminate nonstationarity. Generally denoted by ARIMA (p, d, q) where p indicates the order of selfregression, d denotes order of differentiation, q denotes the order of the mean moving. A brief description of this class of models is given (see for example, Muhammad, 2012) as follows.
· AR model: An AR(p) model expresses the current value of time series as a linear combination of previous p values and a white noise term (random shock). Bell (1984) expresses the current value of the time series of the AR(p) model as:

(7) 
where are parameters AR(p), a_{t} is the random shock in the normal distribution with mean zero and variance σ^{2}_{a} at time t, and p is the order of AR(p).
· MA model: The MA(q) model expresses the current value of a time series as a linear combination of current noise and q previous values of white noise. The (purely) mobile mean (MA) is the model (Bell, 1984):

(8) 
· ARMA model: To increase flexibility in the construction of real time series, both autoregressive and moving average components are combined, leading to ARMA(p,q) (Bell, 1984) model:

(9) 
The mixed type of series that is explained by its own lagged values and by lagged noise terms are called ARMA(p,q) models. If the process is stationary, an ARMA model can be used to represent the data. If it is not stationary, differentiation is applied to make the model stationary and this leads to the ARIMA model (Akgun, 2003).
· ARIMA model: The first of these conditions implies that the series Y_{t} given in (9) is stationary. In practice, the Y_{t} series may not be stationary, but with the first Y_{t} − Y_{t}_{−1} stationary difference; if Y_{t} − Y_{t}_{−1} is not stationary, we may need to take the second difference (Y_{t} − Y_{t}_{−1})˘(Y_{t}_{−1 }− Y_{t}_{−2}) and so on. In general, we may need to take the d^{th} difference of Y_{t} (although rarely d is greater than 2). Thus, we have the ARIMA(p, d, q) where d is the order of differentiation. Thus, an ARIMA(p, d, q) can be given for t = t + v by,

(10) 
4.3 ARIMA models results
In this section we consider the fit for ARIMA models (10) in the four classes of consumers (residential, public, industrial and commercial) using MINITAB® software, version 16. The results presented are related to the best and most parsimonious (lower number of parameters) models. In order to verify the suitability of the models, residual graphs (normality and constant variance), ACF (autocorrelation function), PACF (partial autocorrelation function), and chisquared hypothesis tests of LjungBox (1978) were considered.
· Commercial
Assuming the ARIMA model (10) with p = 3, d = 0 and q = 1 for consumer data of the commercial category, Table 6 shows the model’s parameter estimates. In Figure 6 we have the graphs for the residuals and ACF from which it can be observed that the assumptions of the model are reasonably verified.
Table 6  Estimators of ARIMA model parameters (commercial)
Type 
Coefficient 
SE 
T 
p 
AR 1 
0.1447 
0.2927 
0.49 
0.623 
AR 2 
0.3168 
0.2662 
1.19 
0.240 
AR 3 
0.3432 
0.1394 
2.46 
0.017 
MA 1 
0.6144 
0.2995 
2.05 
0.045 
Constant 
3.1149 
0.1783 
17.47 
< 0.01 
Mean 
15.9436 
0.9129 


Figure 6  Residual plots and ACF (commercial)
· Industrial
Assuming the ARIMA model (10) with p = 3, d = 0 and q = 2 for the consumer data of the industrial category, Table 7 shows the model parameter estimates. In Figure 7, we have the graphs of the residuals and ACF. From this figure it is observed that the assumptions of the model are reasonably verified.
· Public
Assuming the ARIMA model (10) with p = 3, d = 0 and q = 2 for public category consumer data, Table 8 shows the model parameter estimates. In Figure 8, we have the graphs of the residuals and ACF. It is observed that the assumptions of the model are reasonably verified.
· Residential
Assuming the ARIMA model (10) with p = 4, d = 0 and q = 2 for consumer data of the residential category, Table 9 shows the model parameter estimates. From Figure 9, it is observed that the assumptions of the model are reasonably verified.
In Figure 10, we have the observed and fitted series (ARIMA models). From the fitted ARIMA models, we can find the forecast values for the next month (month 56) and also 95% confidence intervals for the average water consumption in each consumer category (see Table 10).
Table 7  Estimators of ARIMA model parameters (industrial)
Type 
Coefficient 
SE 
T 
p 
AR 1 
0.2125 
0.1942 
1.09 
0.279 
AR 2 
0.7384 
0.1838 
4.02 
< 0.01 
AR 3 
0.2986 
0.1542 
1.94 
0.059 
MA 1 
0.0268 
0.1713 
0.16 
0.876 
MA 2 
0.8995 
0.1500 
6.00 
< 0.01 
Constant 
38.902 
1.656 
23.49 
< 0.01 
Mean 
31.700 
1.349 


Figure 7  Residual plots and ACF (industrial)
Table 8  Estimators of ARIMA model parameters (public)
Type 
Coefficient 
SE 
T 
p 
AR 1 
1.1645 
0.1672 
6.96 
< 0.01 
AR 2 
0.2160 
0.2221 
0.97 
0.336 
AR 3 
0.4880 
0.1364 
3.58 
< 0.001 
MA 1 
1.6290 
0.1446 
11.26 
< 0.01 
MA 2 
0.9288 
0.1602 
5.80 
< 0.01 
Constant 
45.079 
2.130 
21.17 
< 0.01 
Mean 
23.820 
1.125 


Figure 8  Residual plots and ACF (public)
5 DISCUSSION OF THE OBTAINED RESULTS
From the results obtained for the three proposed time series models (moving average, ARIMA and Bayesian model in the presence of a random effect that captures the possible correlation for consumption averages in the four categories of consumers  commercial, industrial, public and residential), it is observed that the forecasts are approximately close, with a slight gain for the Bayesian model, since the forecast obtained for the 56^{th} month is closer to the value of the previous month (see Table 11). In terms of a general assessment, based on the MAPE, MAD and MSD values (see Table 11), there is also a small variation between the values obtained for the three models proposed in this study. It is important to point out that the Bayesian model in the presence of a random factor has a structure which captures the dependence among water consumption for each category at the same time. The other models assume independent time series, not a realistic fact for the available data set.
Table 9  Estimators of ARIMA model parameters (residential)
Type 
Coefficient 
SE 
T 
p 
AR 1 
1.1826 
0.1406 
8.41 
<0.01 
AR 2 
0.1905 
0.2233 
0.85 
0.398 
AR 3 
0.8070 
0.1976 
4.08 
<0.01 
AR 4 
0.3600 
0.1447 
2.49 
0.016 
MA 1 
1.6983 
0.0501 
33.90 
<0.01 
MA 2 
0.7898 
0.1248 
6.33 
<0.01 
Constant 
13.690 
0.5773 
23.71 
<0.01 
Mean 
16.592 
0.6997 


Figure 9  Residual plots and ACF (residential)
Figure 10  Observed and fitted time series (ARIMA) for the average water consumption for different categories of consumers
Table 10  Forecast for average water consumption by consumer category (ARIMA)

Predicted 
MAPE 
MAD 
MSD 
Commercial 
16.8203 
3.8103 
0.6072 
0.6039 
Industrial 
32.8460 
14.522 
4.5860 
36.1231 
Public 
24.9056 
13.417 
3.1107 
17.5526 
Residential 
18.0965 
5.2752 
0.8626 
1.3129 
An additional advantage for the Bayesian model (1) for its use by the water utility to forecast consumption in different categories of consumers: the estimated model obtained using MCMC simulation methods can be used for different times and seasons without the need for monthly updating as required with the use of the traditional moving averages and ARIMA models. In addition, the ARIMA (p, d, q) model is an exploratory model that requires several choices of p, d and q values, which makes its practical use by water utilities difficult, where consumption forecasts should be made monthly. In general these water supply companies need ready estimated models for consecutive use for several months where updating the model should be done only after reasonably long periods of use. This is certainly another favorable point for the use of the Bayesian model by the water supply utility.
Table 11  Forecast for average water consumption by consumer category (all models) at month 56 (forecast for next month)
Category 
Model 
Prediction 
Commercial 
MA ARIMA Bayes 
17.2824 16.8203 18.2302 
Industrial 
MA ARIMA Bayes 
32.8148 32.8460 31.2073 
Public 
MA ARIMA Bayes 
24.1569 24.9056 21.9453 
Residential 
MA ARIMA Bayes 
18.1097 18.0965 19.8154 
To better evaluate the fit of the three proposed models, we consider a reanalysis of the data considering only the data from the first 54 months and leaving the observed consumption of the 55^{th} month to be used in comparison with the forecast value for each model. Considering the three models previously used (moving averages model, ARIMA model and Bayesian model in the presence of a random effect), Table 12 shows the water consumption forecasts for the 55^{th} month in each category of consumer. In the ARIMA model we consider (p, d, q) = (3,0,1) for the commercial category and (p, d, q) = (3,0,2) for the industrial, public and residential categories. For the Bayesian model (1), we assume the same prior distributions for the parameters of the model as used before and the same MCMC simulation scheme using the OpenBugs software to find the consumption forecast in the 55th month and the 3^{rd} season in each consumer category (the obtained posterior summaries are shown in Table 13). From the results in Table 12, good prediction is observed (especially for the public and residential categories) considering the Bayesian model. In addition, this estimated Bayesian model could be used by the water company, without the need for remodeling for a fixed and long period of time), which makes the model very attractive in predicting water consumption. The use of the models proposed by this study can be generalized considering other categorical variables such as the region of the municipality, socioeconomic range of consumers, rainy and nonrainy periods among several other factors, which results in the high level of applicability of the models introduced.
Table 12  Forecast for average water consumption by consumer category (all models) at month 56 (forecast for next month)
Category 
Model 
Prediction 
Observed commercial = 17.3912 
MA ARIMA Bayes 
17.2824 16.8203 18.2302 
Observed industrial = 30.0000 
MA ARIMA Bayes 
32.8148 32.8460 31.2073 
Observed public = 21.8824 
MA ARIMA Bayes 
24.1569 24.9056 21.9453 
Observed residential = 17.7135 
MA ARIMA Bayes 
18.1097 18.0965 19.8154 
Table 13  Posterior summaries of time series of consumption by category (Bayesian model deleting observation 55)

Mean 
SD 
Low 95% 
Upper 95% 
β10 
2.584 
0.2563 
2.078 
3.092 
β11 
0.004594 
0.002518 
0.0183 
0.009748 
β12 
0.02961 
0.03368 
0.03691 
0.09588 
β13 
0.02327 
0.1284 
0.224 
0.2775 
β14 
0.00301 
0.122 
0.2492 
0.2329 
β15 
0.02827 
0.09485 
0.2157 
0.1531 
β20 
3.314 
0.2858 
2.744 
3.874 
β21 
0.00135 
0.002722 
0.006706 
0.004215 
β22 
0.04041 
0.03633 
0.02739 
0.1136 
β23 
0.1142 
0.1086 
0.1009 
0.3312 
β24 
0.02099 
0.1098 
0.19 
0.2439 
β25 
0.1214 
0.08097 
0.2784 
0.04419 
β30 
3.05 
0.2916 
2.466 
3.612 
β31 
0.00185 
0.00283 
0.007361 
0.003684 
β32 
0.0344 
0.03732 
0.03931 
0.1083 
β33 
0.0535 
0.111 
0.158 
0.2692 
β34 
0.02492 
0.1122 
0.1903 
0.2453 
β35 
0.05854 
0.08638 
0.2237 
0.1109 
β40 
2.562 
0.2507 
2.064 
3.051 
β41 
0.004177 
0.002495 
0.0815 
0.009164 
β42 
0.03547 
0.03278 
0.0275 
0.09659 
β43 
0.02235 
0.1288 
0.2307 
0.2723 
β44 
0.01242 
0.1287 
0.2387 
0.2631 
β45 
0.01964 
0.09748 
0.2118 
0.1724 
1/σ^{2}_{w} 
86.96 
23.31 
47.71 
138.3 
1/σ^{2}_{1} 
20.78 
4.209 
13.57 
29.53 
1/σ^{2}_{2} 
13.86 
2.809 
8.95 
20.04 
1/σ^{2}_{3} 
12.76 
2.63 
8.13 
18.25 
1/σ^{2}_{4} 
20.23 
4.21 
13.06 
29.19 
Appendix 1. Evaluation criteria for the fit of the time series model
• Mean average percentage error (MAPE)
MAPE expresses the accuracy as a percentage of the error given by,

(11) 
where is an observation, is the fitted value and n is the number of observations. This measure expresses accuracy as a percentage of the error. Since this number is a percentage, it may be easier to understand than other statistics. For example, if the MAPE is 5, on average the forecast is incorrect at 5%.
• Mean absolute deviation (MAD)
MAD expresses precision in the same data units, which helps to conceptualize the magnitude of the error given by, MAPE expresses the accuracy as a percentage of the error given by,

(12) 
where is an observation, is the fitted value and n is the number of observations. Outliers (discordant points) have less effect on MAD than MSD.
• Mean square deviation (MSD)
Another commonly used measurement of the accuracy of fitted time series values is given by,

(13) 
where is an observation, is the fitted value and n is the number of observations. Outliers (discordant points) have a greater effect on MSD than on MAD.
REFERENCES
Akgün, B. (2003). Identification of periodic autoregressive moving average models. PhD thesis. Middle East Technical University.
Altunkaynak, A., Özger, M., Çakmakci, M. (2005). Water consumption prediction of istanbul city by using fuzzy logic approach. Water Resources Management, 19(5), 641–654.
Aly, A. H., Wanakule, N. (2004). Shortterm forecasting for urban water consumption. Journal of water resources planning and management, 130(5), 405–410.
Amaral, A. M. P., Shirota, R. (2000). Mean residential consumption of treated water: an application of time series models in piracicaba (in portuguese). Revista Agrícola, 49(1), 55–72.
Balling, R. C., Gober, P., Jones, N. (2008). Sensitivity of residential water consumption to variations in climate: An intraurban analysis of Phoenix, Arizona. Water Resources Research, 44(10).
Bell, W. R. (1984). An introduction to forecasting with time series models. Insurance: Mathematics and Economics, 3(4), 241–255.
Bougadis, J., Adamowski, K., Diduch, R. (2005). Shortterm municipal water demand forecasting. Hydrological Processes: An International Journal, 19(1), 137–148.
Box, G. E., Jenkins, G. M., Reinsel, G. C., Ljung, G. M. (2015). Time series analysis: forecasting and control. John Wiley & Sons.
Casella, G., George, E. I. (1992). Explaining the Gibbs sampler. The American Statistician, 46(3), 167–174.
Chang, H., Parandvash, G. H., Shandas, V. (2010). Spatial variations of singlefamily residential water consumption in portland, oregon. Urban geography, 31(7), 953–972.
Chib, S., Greenberg, E. (1995). Understanding the metropolishastings algorithm. The american Statistician, 49(4), 327–335.
Crommelynck, V., Duquesne, C., Mercier, M., Miniussi, C. (1992). Daily and hourly water consumption forecasting tools using neural networks. Em: Proc. of the AWWA’s annual computer specialty conference, pp. 665–676.
Dias, D., Martinez, C.B., M., Libanio (2010). Evaluation of the impact of income variation on household consumption of water (in portuguese). Engenharia Sanitária Ambiental, 15(2), 155–166.
Feil, A., Haetinger, C. (2014). Prediction of water consumption via mathematical modeling of water supply system (in portuguese). Revista DAE, 195, 32–46.
Fortin, V., Perreault, L., Salas, J. (2004). Retrospective analysis and forecasting of streamflows using a shifting level model. Journal of Hydrology, 296(14), 135–163.
Gelfand, A. E., Smith, A. F. (1990). Samplingbased approaches to calculating marginal densities. Journal of the American Statistical Association, 85(410), 398–409.
Gwaivangmin, B., Jiya, J. (2017). Water demand prediction using artificial neural network for supervisory control. Nigerian Journal of Technology, 36(1), 148–154.
Ho, S., Xie, M. (1998). The use of arima models for reliability forecasting and analysis. Computers & Industrial Engineering, 35(12), 213–216.
HousePeters, L., Pratt, B., Chang, H. (2010). Effects of urban spatial structure, sociodemographics, and climate on residential water consumption in Hillsboro, Oregon. Journal of the American Water Resources Association, 46(3), 461–472.
Jain, A., Ormsbee, L. E. (2002). Shortterm water demand forecast modeling techniques: Conventional methods versus AI. JournalAmerican Water Works Association, 94(7), 64–72.
Kher, L. K., Sorooshian, S. (1986). Identification of water demand models from noisy data. Water Resources Research, 22(3), 322–330.
Ljung, G. M., Box, G. E. (1978). On a measure of lack of fit in time series models. Biometrika, 65(2), 297–303.
Maidment, D. R., Parzen, E. (1984). Cascade model of monthly municipal water use. Water Resources Research, 20(1), 15–23.
Makki, A. A., Stewart, R. A., Panuwatwanich, K., Beal, C. (2013). Revealing the determinants of shower water end use consumption: enabling better targeted urban water conservation strategies. Journal of Cleaner Production, 60, 129–146.
Montgomery, D. C., Runger, G. C. (2010). Applied statistics and probability for engineers. John Wiley & Sons.
Montgomery, D. C., Jennings, C. L., Kulahci, M. (2015). Introduction to time series analysis and forecasting. John Wiley & Sons.
Morettin, P. A., Toloi, C. M. (1987). Time series forecast (in Portuguese). Atual.
Muhammad, M. K. I. B. (2012). Time Series Modeling Using Markov and Arima Models. PhD thesis. Universiti Teknologi Malaysia.
Narchi, H. (1989). Domestic demand for water (in portuguese). Revista DAE, 154, 1–7.
Nucci, N. L. R. (1983). Assessment of urban water demand. economic and urban aspects. the built area as a possible explanatory and prospective factor (in portuguese). Revista DAE, 135, 22–49.
Perry, P. F. (1981). Demand forecasting in water supply networks. Journal of the Hydraulics Division, 107(9), 1077–1087.
Rhoades, S. D., Walski, T. M. (1991). Using regression analysis to project pumpage. JournalAmerican Water Works Association, 83(12), 45–50.
Silva, C. S. (2003). Multivariate prediction of hourly water demand in urban water supply systems. PhD thesis. University of São Paulo.
Silva, W. T. P., Silva, L. M., Chichorro, J. F. (2008). Water resources management: Per capita water consumption perspectives in cuiabá (in portuguese). Engenharia Sanitaria Ambiente, 13(1), 8–14.
Smith, J. A. (1988). A model of daily municipal water use for shortterm forecasting. Water Resources Research, 24(2), 201–206.
Spiegelhalter, D., Thomas, A., Best, N., Lunn, D. (2003). Winbugs user manual.
Stark, H. L. (2003). The application of artificial neural networks to water demand modelling.
Thomas, S. P. (2000). Prediction of Water Consumption: Interface of Water and Sewage Installations with public services (in Portuguese). Navegar Editora.
Willis, R. M., Stewart, R. A., Giurco, D. P., Talebpour, M. R., Mousavinejad, A. (2013). End use water consumption in households: impact of sociodemographic factors and efficient devices. Journal of Cleaner Production, 60, 107–115.
Wong, J. S., Zhang, Q., Chen, Y. D. (2010). Statistical modeling of daily urban water consumption in hong kong: Trend, changing patterns, and forecast. Water resources research, 46(3).
Zhou, S., McMahon, T., Walton, A., Lewis, J. (2002). Forecasting operational demand for an urban water supply zone. Journal of hydrology, 259(14), 189–202.
Zhou, S. L., McMahon, T. A., Walton, A., Lewis, J. (2000). Forecasting daily urban water demand: a case study of melbourne. Journal of hydrology, 236(34), 153–164.
Copyright (c) 2020 Ciência e Natura
This work is licensed under a Creative Commons AttributionNonCommercialShareAlike 4.0 International License.
This work is licensed under a Creative Commons AttributionNonCommercialShareAlike 4.0 International License.
DEAR AUTHORS,
PLEASE, CHECK CAREFULLY BEFORE YOUR SUBMISSION:
1. IF ALL AUTHORS "METADATA" (ORCID, LINK TO LATTES, SHORT BIOGRAPHY, AFFILIATION) WERE ADDED,
2. THE CORRECT IDIOM YOUR SECTION,
3 IF THE HIGHLIGHTS WERE ADDED,
4. IF THE GRAPHIC ABSTRACTS WAS ADDED,
5. IF THE REVIEWERS INDICATION WAS DONE,
6. IF THE REFERENCES FORMAT ARE CORRECT(ABNT)
7. IF THE RESOLUTION YOUR FIGURES (600 DPI) ARE SUITABLE
8. IF THE STATEMENT BY THE ETHICS COMMITTEE (IF IT INVOLVES HUMANS) WAS ADDED;
9. IF THE DECLARATION OF ORIGINALITY WAS ADDED.
10. IF THE TEXT IS ORIGINAL. IF THE IDEA HAS ALREADY BEEN REGISTERED IN SUMMARY FORM, OR PUBLISHED IN CONGRESS ANNUALS, PLEASE INFORM THE EDITOR.