Universidade Federal de Santa Maria

Ci. e Nat., Santa Maria v.42, Special Edition: Micrometeorologia, e4, 2020

DOI:10.5902/2179460X45217

ISSN 2179-460X

Received: 27/05/20 Accepted: 27/05/20 Published: 28/08/20

Special Edition

Monthly rainfall forecast study in southeastern Brazil using multi-layer perceptron (MLP) neural networks

Estudo mensal de previsão de chuva no sudeste do Brasil utilização de redes neurais multi-camadas perceptron (MLP)

Cleber Souza Corrêa ^I

Diogo Machado Custodio ^II

Haroldo de Campos Velho ^III

^IDivision of Atmospheric Sciences, Institute of Aeronautics and Space (IAE). São Jose dos Campos. São Paulo, Brazil. E-mail clebercsc@fab.mil.br

^IIDivision of Atmospheric Sciences, Institute of Aeronautics and Space (IAE). São Jose dos Campos. São Paulo, Brazil. E-mail: diogodmc@fab.mil.br.

^IIIComputing and Applied Mathematics Laboratory, Instituto Nacional de Pesquisas Espaciais (INPE), São Jose dos Campos, São Paulo, Brazil. E-mail: haroldo.camposvelho@gmail.com.

ABSTRACT

This work uses the MLP neural network technique to make monthly rainfall forecast estimates for Guarulhos airport in southeastern Brazil using a time series of approximately 70 years. Neural network structures with two or more hidden layers showed a better result, minimizing the prediction error.

Keywords: Neural network; Time series; Monthly Rainfall

RESUMO

Este trabalho usa a técnica de rede neural MLP para fazer estimativas de previsão de precipitação mensal para o aeroporto de Guarulhos, no sudeste do Brasil, usando séries temporais de aproximadamente 70 anos. Estruturas da rede neural com duas ou mais camadas apresentaram melhores resultados, minimizando os erros de predição.

Palavras-chave: Rede neural; Séries temporais; Precipitação mensal

1 INTRODUCTION

The Rainfall variability is a nonlinear variable and may present seasonal variations and have significant changes from one given year to another. A computational technique with great use is the neural network, with different techniques used that characterize different structures and models of Neural Networks allow an optimization minimizing the estimation error with massive processes. In CORRÊA et al. (2019), using wavelet and cross-wavelet analysis, multi-decadal cycles were observed between the monthly number of spots and the South Oscillation (IOS) and Pacific Decadal Oscillation (DOP) indexes. Showing cycles of 2.66, 5.33, 10.66 and 21.33 years. It was also compared to the average monthly rainfall in the meteorological stations of the airports of Belém, Fortaleza, São Luiz, and Natal, showing that in the north/northeast of Brazil the multi-decadal cycles of precipitation accompanied the variability of the sunspots, with intense signal of 11 years and less intense of 22 years. As the time series of monthly rainfall show a very complex behavior, the methodology using Cross-wavelet allowed to observe the correlation in the Brazilian tropical region between the long historical series of monthly rainfall and the sunspot series. Allowing associated multi-decadal cycles to be observed. An environmental variable such as rainfall has behavior that can be dismembered at different spatial scales such as meso-scale or macro scale, as well as at a regional synoptic level. This study seeks to analyze the temporal series of monthly rainfall of Guarulhos Airport (in southeastern Brazil) over a period of 70 years by adjusting a neural network, which may have a minimal error and also if the Neural Network can represent processes with great temporal variability and nonlinearity.

2 MATERIAL AND METHODS

2.1 Monthly Rainfall time series data

Monthly rainfall data were obtained from the Aeronautics Command of the Airspace Control Department (DECEA) Climatological Database, the website of the aeronautics climatological database: http://clima.icea.gov.br/clima/. Monthly rainfall data were from the following location Guarulhos Airport in the São Paulo state. Monthly precipitation data show in a long historical series very complex characteristic as it may have great spatial variety and may not have stationary, or to be semi-stationary. The forecast function generated a set of values with 24 months of accumulated monthly rainfall.

2.2 Neural Networks

This work uses the NNFOR package version 0.9.6, 2019 in R to generate Forecasting Time Series with Neural Networks; the following references are from this package, CRONE & KOURENTZES, 2010 and KOURENTZES et al., 2014. The NNFOR package has two different implementations Multi-layer Perceptron (MLP) and Extreme Learning Machine (ELM) with possible configuration differences. In this work the automated configurations were used as default. It is important to remember that the usual purpose to train the multilayer Perceptron is to get
good generalization in unseen data, for example in time series forecasting applications. Maximum generalization performance will occur before the general training network error reaches a minimum value. A trained network in a cluttered data set until the global minimum is reached is that may ultimately result in worse predictive ability of a network. One way to ensure that this does not happen, and that the generalization performance is good, is to split the training data into multiple sets - a training set, a validation set, and a test set.

The training set is used to really train the network. The validation set can be used to assess the generalizability of the network while training is taking place. The accuracy function of the R package Forecast was used to estimate the errors in the rainfall series forecasting estimates with forecast combined using the median operator, the proposed methodology was used by HYNDMAN & KOEHLER, 2006 and HYNDMAN & ATHANASOPOULOS, 2014. The measures calculated are: RMSE: Root Mean Squared Error; MAE: Mean Absolute Error. The training of the neural network is done in the accuracy function in which randomly selects 5 parts of the series to perform and calculate the errors between the predicted and the observed.

2.3 Multi-layer Perceptron (MLP)

This function fits MLP neural networks for time series forecasting. Because it has an MLP architecture it can be seen as a general practical tool for nonlinear input-output mapping. Specifically, let k be the number of network inputs and m the number of outputs. The network input-output relationship defines a mapping from a k-dimensional input Euclidean space to an output m-dimensional Euclidean space, which is infinitely continuously differentiable, Gardner & Dorling (1998).

Figure 1 shows the tree models of neural network structure used, (a) MLP with 21 inputs and 5 hiddens layer and 20 trainings, (b) MLP with 21 inputs and 10 hiddens 1 layer, 5 hiddens 2 layer and 20 trainings and one output and (c) MLP with 21 inputs, 30 hiddens 3 layer, 10 hiddens 1 layer, 5 hiddens 2 layer and 20 trainings and one output.

Figure 1 – The two neural networks model used in this work (a) MLP with 5 hidden, (b) MLP with 10 hidden 1 and 5 hidden 2 and (c) MLP with 30 hidden 1, 10 hidden 2 and 5 hidden 3

3 RESULTS AND DISCUSSION

Figure 2 (a) shows the temporal series with the associated autocorrelation function, which can be observed that the time series has a significant seasonality over a period of 12 months. Figure 2 (b) shows the scattering by each temporal lag for a total of 16 months.

Figure 2 – It presents in (a) the monthly rainfall temporal series with the estimate of autocorrelation function (ACF) and in (b) the spread by time step in 16 lags

Figure 3 – Forecast Monthly Rainfall at 24 months, the forecast is in blue at the end of the series (a) MLP with 5 hidden, (b) MLP with 10 hidden-1 and 5 hidden-2 and (c) MLP with 30 hidden-1, 10 hidden-2 and 5 hidden-3

Table 1 shows significant efficiency in decreasing forecast estimation error by increasing the number of layers in the neural network architecture and the number of hidden between them. The three-layer neural network minimized the error by an order of 5 times, compared to the first neural network with a layer of 5 hidden. The preliminary analysis shows that using three layers had good results, reducing by an order of 5 the RMSE (from 76.87 to 15.96) and the MAE (from 51.99 to 10.09). These results are sumarized in Table 1.

Table 1 – Presents the observed values of the Root Mean Squared Error (RMSE) and Mean Absolute Error (MAE)

	One layer(5 hidden)	Two layers (10, 5 hidden)	Tree layers (30, 10, 5 hidden)
RMSE	76.87	56.44	15.96
MAE	51.99	37.55	10.09

4 Conclusion

In order to improve the results, further analysis will be developed using other rainfall time series and testing other neural network architectures with massive intensive processing.

Acknowledgments

The authors would like to thank the Instituto de Aeronautica e Espaço (IAE).

References

CORRÊA, C.S.; GUEDES, R.L.; CORRÊA, K.A.B. & PILAU, F.G.. Multidecadal Cycles Study in the Climate Indexes Series Using Wavelet Analysis in North/Northeast Brazil. Anuário do Instituto de Geociências – UFRJ. 42(1): 66-73. doi: 10.11137/2019_1_66_73, 2019.

CRONE, S.F.; KOURENTZES N. Feature selection for time series prediction – A combined filter and wrapper approach for neural networks. Neurocomputing, 73(10), 1923-1936, 2010.

HYNDMAN, R.J.; KOEHLER, A.B. Another look at measures of forecast accuracy. International Journal of Forecasting, 22(4), 679-688, 2006

HYNDMAN, R.J. & ATHANASOPOULOS, G. Optimally reconciling forecasts in a hierarchy. Foresight, vol. Fall 2014, no. 35, pp. 42 – 48, 2014.

KOURENTZES N.; BARROW, B.K., CRONE, S.F. Neural network ensemble operators for time series forecasting. Expert Systems with Applications, 41(9), 4235-4244, 2014.