Stream-Flow Forecasting of Small Rivers Based on LSTM

2020·Arxiv

Abstract

Abstract

Stream-flow forecasting for small rivers has always been of great importance, yet comparatively challenging due to the special features of rivers with smaller volume. Artificial Intelligence (AI) methods have been employed in this area for long, but improvement of forecast quality is still on the way. In this paper, we tried to provide a new method to do the forecast using the Long-Short Term Memory (LSTM) deep learning model, which aims in the field of time-series data. Utilizing LSTM, we collected the stream flow data from one hydrologic station in Tunxi, China, and precipitation data from 11 rainfall stations around to forecast the stream flow data from that hydrologic station 6 hours in the future. We evaluated the prediction results using three criteria: root mean square error (RMSE), mean absolute error (MAE), and coefficient of determination (R2). By comparing LSTM’s prediction with predictions of Support Vector Regression (SVR) and Multilayer Perceptions (MLP) models, we showed that LSTM has better performance, achieving RMSE of 82.007, MAE of 27.752, and R2 of 0.970. We also did extended experiments on LSTM model, discussing influence factors of its performance.

Index Terms—machine learning, deep learning, LSTM, stream-flow forecasting, small rivers

I. INTRODUCTION

Worldwide, floods are considered as one of the most common and naturally distributed risks to life and property [1]. According to [2], in 2017, flood were the most influential disaster with respect to number of people affected - 59.6% of people affected by natural disasters were affected by flood. Due to its burstiness and uncertainties, floods remain to be a comparatively hard-to-prevent disaster, and more advanced controls methods are eagerly needed. Among them, flood forecasting is always a crucial one. A timely and precise advance warning allows ample time for more mitigating actions and less damage by the disaster. However, when it comes to medium or small rivers, various problems exert excess challenges on the forecasting process. Due to the low capacity of those rivers, floods often abrupt in appearance, rapid in confluence, and short in forecast period [3]. Thus, more sophisticated forecast methods are always in high demand.

Main traditional stream flow forecasting methods are those which employ physical hydrologic models or traditional machine learning algorithms. Hydrologic models that use the data of river stage, stream flow, or runoff volumes to forecast floods are mainly based on mathematical and physical analysis of hydrologic process, thus they are usually deterministic, and forecast results are normally exhibited as time series of estimates [4]. Results of these models are often easily deteriorated if the data fed in contain certain degree of error or environmental noise [5]. With the development of artificial intelligence and the approach of big data, researchers began to use data-driven models instead of mathematic or physic models - to study various aspects of hydrological phenomenon. Data-driven models focus less on the exact logic and physic theories behind the forecast and more on the potential relationships lying inside the huge amount of data, thus remarkably reduce the amount of work done due to the non-linear feature and noise complexity of hydrological models, and improve the accuracy of the forecast. However, traditional machine learning models manipulate every input and output in a discrete manner, thus have limited performance in the area of prediction, which involves time-series data and every piece of data at a certain time has relationship with recent data. Prediction result of most of traditional models are accompanied by considerable errors.

As described above, physical models and traditional machine learning algorithms both have limited performance due to: 1, erroneous and chaotic data, and 2, special feature of time-series prediction. In this paper, we use the LSTM (Long Short-Term Memory) model a kind of circular memory neural networks developed from RNN (Recurrent Neural Network) in stream flow prediction to try to solve above two problems. As a sophisticated machine learning model, LSTM works well in dealing with chaotic data resulting from the complexity of real environment, and instability of medium or small rivers. Moreover, our prediction method involving the use of LSTM has innovation comparing to methods using traditional models, in the way that LSTM has memory, and every output is based on previous outputs, thus has ability to take advantage of the information between time-series data, and works better in predicting the stream flow changes that is a trend along time. The main idea of this paper is to use LSTM to analysis big amount of stream flow data, accompanied by rainfall data collected from various precipitation stations along the rivers to estimate the future stream flow of a certain spot in a river, and compare the prediction result with traditional machine learning model SVR (Support Vector Regression) and deep learning model MLP (Multilayer Perceptions). The results of the comparative experiment conducted in this paper proved that LSTM model contributes to stream-flow forecasting of small rivers with respect to:

1) Better model stability. Different from other two models, LSTM performs forecast that does not produce frequent and obvious fluctuations of stream flow line in cases of small rainfalls.

2) Better model reliability. LSTM is more accurate in forecasting stream flow peaks, which is vital to early warning of floods.

3) More intelligent in capturing the features of data. By extended experiments, we observed that LSTM is able to read different combinations of input data, including history stream flow volume, rainfall data, and areal rainfall data, and improve model accuracy based on all of them.

The rest of this paper is organized as follows. In section 2, works related to development and current situation of stream flow forecasting are listed. In section 3, the RNN model, which is the origin and foundation of LSTM, and the LSTM model are introduced. Then the complete experiment process of testing the performance of LSTM, including data preparation, model training, comparative models selecting, evaluation criteria choosing, final results, and extended experiments of LSTM performance are presented in section 4. At last, conclusion comes out in section 5.

II. RELATED WORKS

In recent years, there are more and more data-driven AI model stream flow forecasting methods that are developed and put into practice. According to Yaseen, et al. [6], internationally there are mainly five areas of focus: ANN (Artificial neural network), SVM (Support vector machine), Fuzzy (Fuzzy logic method), EC (Evolutionary computing methods), and W-AI (Wavelet-complementary modeling).

An ANN is a kind of Artificial intelligence information processing system that resembles the biological neural networks of human brains [7]. In 2002, Hsu et al. [8] proposed the selforganizing linear output map (SOLO) a kind of multivariate ANN procedure to forecast rainfall-runoff. Cigizoglu [9] tested the performance of GRNN (Generalized regression neural network) regarding the intermittent daily mean flows forecasting and estimation in 2005. In 2010, Kagoda et al. [10] used RBFNN (Radial Basis Function Neural Network) to perform 1-day forecasts of stream-flow and proved that it is a relatively more superior method.

SVM is popularized in last 20 years as an effective method solving the noisy problems. In 2005, Sivapragasam and Liong [11] experimented the performance of SVM in stream-flow prediction and yielded promising results. Asefa et al. [12] used SVM approach to predict seasonal and hourly multi-scale stream-flow in 2006. In 2011, Noori et al. [13] assessed the input variables determination on the SVM model performance using PCA, Gamma test, and forward selection techniques for monthly stream flow prediction.

The theory of fuzzy sets was introduced by Lotfi A. Zadeh in 1965. Fuzzy has been used to deal with the uncertainties inside the variables in models. In 2007, a neuro-fuzzy model was introduced by El-Shafie et al. [14] to forecast the monthly basis inflow of the Nile river. In 2009, ¨Ozger [15] utilized the Mamdani and the TakagiSugeno (TS) fuzzy inference systems for stream-flow value prediction. Sanikhani and Kisi [16], in 2012, developed two different adaptive neuro-fuzzy (ANFIS) techniques to estimate monthly river flow.

EC (Evolutionary Computing) is the collective of Evolutionary Algorithms (EA) that are used in the process of selection, mutation, and reproduction on a population of individual structures that undergo evolution [4]. In 1999, Savic et al. [17] conducted the first research on the employment of Evolutionary Computing in the field of stream-flow modeling. The performance of Genetic Programming and ANN in stream-flow forecasting were compared by Makkeasorn et al. [18] in 2008. In 2009, the river inflow prediction ability of LGP was investigated by Guven [19] and the comparison with MLP and GRNN methods was carried out, and the result proved that LGP had a better performance.

Wavelet Transform (WT) is a method that focuses on handling data of time series. Wavelet and neuro-fuzzy conjunction model was employed by Shiri and Kisi [20] in 2010 to make daily, monthly, and yearly stream-flow model. In 2014, wavelet transform-genetic algorithm-neural network model (WAGANN) was proposed by Sahay and Srivastava [21] for forecasting monsoon river flows one day ahead.

III. MODELS

A. Recurrent Neural Network (RNN)

First developed in 1980s, RNN obtained its specialness due to its structure: the neurons are connected with each other and self-looped, thus the structure is able to display dynamic temporal behaviors and remember the information from last process [22]. The basic and classic logic of RNN is presented below [23]:

In one unit, is the input, ht represents the hidden state, and is the output. The subscript represents time. Firstly, hidden state output from last time is combined with current input (each with the weights and ), the result of which is transformed by a nonlinear function - tanh or sigmoid, conventionally and then fed into the hidden state. Then, the hidden state takes its weight , transformed by another nonlinear function, and at last the result is accepted by . In this way, current output is affected by last hidden state, thus obtains short memory.

One significant problem of classical RNN is that, due to its looped feature, the error of backward propagation depends on the weights in an exponential manner. Thus, error signals of RNN vanish or blow up in long-term process [24].

B. Long Short-Term Memory network (LSTM)

U and W are the weights of input into different gates: input gate (), input modulate gate (), forget gate (), and output gate (is bias vectors, is cell state, and is hidden state. All these controllers determine how much information to receive from the last loop, and how much to pass to the new state.

By actively choosing useful information to store and others to reject, LSTM provides a solution to the gradient explosion and vanishing problem faced by RNN.

IV. EXPERIMENTS

In this section, the complete experiment process of the stream-flow forecast of the rivers in Tunxi, China using Ar-tificial Intelligence data-driven model is presented, including data preparation, model training, comparative models selecting, evaluation criteria choosing, final results, and expended experiments of LSTM performance.

A. Data Collection and Division

The data for the experiment is collected from Tunxi District, Huangshan City, Anhui Province, China. According to [25], Tunxi catchment has a drainage area of 2696.76 km2. Its altitude is low in east and increases gradually towards west. As affected by continental monsoon climate, the rainfall differs a lot between years. In one year, the rainfall also has uneven separation. More than 50% of the annual precipitation happens between April and June. Stream-flow changes in Tunxi area have the feature of small rivers: complexity and abruptness, which is suitable to test the forecast ability of models.

The experiment data consists of the stream-flow volume data of Tunxi which was collected from a hydrologic station, and rainfall data from 11 precipitation stations located on the upstream of the hydrologic station. There are in total 18648 pieces of data collected from 1981 to 2003.

B. Data Pre-processing

The experiment will use the stream-flow data and precipitation data from 11 rainfall stations in the past 12 hours to forecast the stream-flow volume of the 6th hour in the future. In order to transform the raw data into the form suitable for supervised learning, in this experiment, a series to supervised function is used. After the transformation, the data turns into the form as shown in Tab. I.

Q(t+X) represents the stream-flow data from (t-12) to (t+5), which means from 12 hours in the past to 6 hours in the future. P1(t+X) to P11(t+X) represents the precipitation data of the 11 rainfall stations from (t-12) to (t+5). Then, the 1st-144th columns (from Q(t-12) to P11(t-1)) are selected to be the features (x set), which contain the stream-flow and all the precipitation data of the past 12 hours. The 205th column (Q(t+5)) is selected to be the target (y set), which is the stream-flow data of the 6th hour in the future.

When the data are transformed into time-series format, only those which have enough data in front of and after it to form a time series are kept. Thus, some rows at the beginning or in the end are thrown away. After the 12 - 6 transformation mentioned above, 18237 pieces of data are kept. They are divided by an around 7:3 ratio - 13000 pieces are used as a training set, and 5237 used as a test set.

As the data of this experiments is collected from different stations and through a large time span, the dimension of the different sets of data are not the same. In order for the models to have better performance, the data goes through a normalization process using the MinMaxScaler function in the sklearn package, and is unified to [0,1]. The formula follows:

C. Model Training

The LSTM model used in the experiment is based on keras library, the python deep learning library. The amount of hidden layer nodes is one of the parameters need to be determined in the model. By experiments, model with 64 nodes has the best performance. The optimizer, batch size and epochs are also parameters that influence the performance of the model. The choice of optimizer influences how the loss function is minimized, thus how the model heads to the final outcome. Standard choices include momentum, Adagrad, RMSProp, Adam, etc. By experiments, the Adam optimizer is chosen. Batch size affects the amount of data processed at a time. Through batches, the model updates multiple times before processing the whole dataset and thus the dynamics of the process is affected. As small batch size greatly slows down training speed and big batch size causes overfitting, on balance the batch size is set to 72 in this experiment. Epochs are the

TABLE I HEADINGS OF DATA AFTER SERIES TO SUPERVISED TRANSFORMATION

Fig. 1. Loss of train and test sets of LSTM models with different epochs.

times the model runs through the whole data. According to Fig. 1, when epochs are approximately 30, the loss of test set is the lowest. So, the epochs are set to 30 in this experiment.

D. Comparative Models Selecting

To evaluate the performance of the proposed model LSTM, another two models are chosen to be comparative models. The former is a traditional machine learning method, while the latter is a deep learning method.

• SVR: Support Vector Regression is a model derived from Support Vector Machine. According to [26], ”The idea of SVR is based on the computation of a linear regression function in a high dimensional feature space where the input data are mapped via a nonlinear function.” The kernel function and two parameters - C and gamma -should be determined for model setup. In this paper, RBF kernel function is selected, and by grid search, the combination of (C=0.095, gamma=0.165) is chosen.

• MLP: Multilayer perceptrons are a class of ANN, which the nonlinear computing elements are arranged in a feedforward layered structure [27]. In this paper, the MLP model of 1 hidden layer is selected.

E. Evaluation Criteria

Three metrics are used in this paper as the evaluation criteria: root mean square error (RMSE), median absolute error (MAE), and coefficient of determination (R2).

RMSE is a common measurement method to show the difference between value predicted and value observed. Its formula is the following:

Where m denotes the total number of values, denotes the value predicted, and denotes the value observed. The square root uniforms the outcome (error) scale with the input scale. RMSE value is always non-negative. A lower RMSE value means a better prediction.

MAE works in a similar way to RMSE except that the error is linear. Its formula is the following:

Since it works in a linear way, MAE does not penalize big errors more than small errors, but present them as they were. Similar to RMSE, MAE value is always non-negative, and a lower MAE value means a better prediction.

R2, or coefficient of determination, is a metric based on MSE (MSE is the square of RMSE). It differs from the preceding two metrics in that the scale of outcome does not depend on scale of input. The formula is the following:

y is the mean of all values predicted. The denominator is the total variation of the predicted values. In most of the cases, R2 value is in range [0,1], and a higher value means a better prediction.

F. Results

Feed the data and run the models, the results in the form of errors of SVR, MLP, and LSTM model are in Tab. II. The prediction results in the form of graphs are in Fig. 2. From Tab. II, we can figure out that LSTM has the best performance among the three models with respect to all three evaluation criteria. Thus, statistically the LSTM model has the best prediction accuracy. Fig. 2 shows that the LSTM prediction result of stream-flow data almost excellently fits the actual situations. Different from the SVR and MLP predictions, LSTM prediction does not yield obvious nonexistent small peaks or valleys. Moreover, with respect to prediction of major stream flow peaks, LSTM model is considerably better than MLP model, and slightly better than SVR model. The results show that the remember-forget ability of LSTM greatly helps the model to predict non-linear and time-series data and have a relatively better performance on the forecast of stream-flow of rivers. However, LSTM still have errors in the major peak prediction most of the major peak predictions exceeds the actual value by approximately 10 per cent. Better results may be achieved through adjustment of training process or larger and better available data base.

G. Extended Experiments for LSTM

1) Combinations of input data: Tab. III shows the error of

LSTM models fed with different combinations of input data, while all other conditions stay the same as in the standard experiment. The result shows that history stream-flow data play a significant role in the accuracy of forecast, but rainfall (and areal rainfall) data are also indispensable. However, upon

Fig. 2. Prediction results of SVR, MLP, and LSTM models

TABLE II COMPARISON OF ERRORS OF MODELS

the presence of rainfall-related data, different combinations of rainfall data types do not pose a large difference on the result. Rainfall data are relatively more helpful than areal rainfall data.

2) Change of predict time step: The predict time step is

how far in the future does the LSTM model predict. The standard model in this paper has a predict time step of 6. That is, upon receiving the newest data, the model gives out predictions for the 6th hour in the future from now. Fig. 3 shows the values of three evaluation criteria of LSTM models with different predict time step, while all other conditions stay the same as the standard model in this paper. The results imply that predict time step has a negative correlation with the accuracy of the model, which makes sense since it’s harder for models to predict further into the future.

3) Change of encoder time step: The encoder time step is

the number of hours of history data fed into the LSTM model. The encoder time step of the standard model in this paper is 12. Fig. 4 shows the errors of models with different encoder time steps. Approximately, models with encoder time step in the range of [12,14] have the best forecast accuracy.

V. CONCLUSION

Forecast is always critical in saving humans lives and properties from the flood disaster. This paper proposed a method of stream-flow forecast using LSTM network a kind of deep learning neural network derived from RNN, equipped with a remember-forget system to avoid parameter blowing up or vanishing. To prove its advantage in time-series forecast with non-linear features, it is compared to the machine learning SVR model and deep learning MLP model in forecasting the stream-flow of Tunxi, China. Results of the experiment show that LSTM model provides more stable and more accurate prediction comparing to SVR and MLP models, proving its ability.

However, there is still room for improvement in the LSTM stream-flow forecasting model: the results show errors in peak volume forecast which cannot be ignored. The models may be improved in the following ways: First, Due to the limit of time and hardware capacity, the parameter choices of LSTM model are only based on simple tests and lack of thorough study. Moreover, most of default parameters of the model remain in their original value without adjustments. More study on the parameter adjustment may improve the models accuracy. Second, the data used in the experiment have a time span of more than 20 years. Due to the lack of technology and management in the past, the original data have a certain degree of disorder and deficiency, and various kinds of amendments are made to the data. Feeding data with higher quality may improve the models performance.

REFERENCES

[1] S. Balica, I. Popescu, L. Beevers, and N. G. Wright, “Parametric and physically based modelling techniques for flood risk and vulnerability assessment: a comparison,” Environmental modelling & software, vol. 41, pp. 84–92, 2013.

[2] P. Wallemacq, “Natural disasters in 2017: Lower mortality, higher cost,” Brussels, Belgium: Centre for Research on the Epidemiology of Disasters, 2018.

[3] J. FENG and F. PAN, “A hydrologic forecast method based on lstm-bp,” Computer and Modernization, no. 7, p. 19, 2018.

[4] S. Han and P. Coulibaly, “Bayesian flood forecasting methods: A review,” Journal of Hydrology, vol. 551, pp. 340–351, 2017.

[5] H. G. Damavandi, R. Shah, D. Stampoulis, Y. Wei, D. Boscovic, and J. Sabo, “Accurate prediction of streamflow using long short-term memory network: A case study in the brazos river basin in texas,” International Journal of Environmental Science and Development, vol. 10, no. 10, 2019.

[6] Z. M. Yaseen, A. El-Shafie, O. Jaafar, H. A. Afan, and K. N. Sayl, “Artificial intelligence based models for stream-flow forecasting: 2000– 2015,” Journal of Hydrology, vol. 530, pp. 829–844, 2015.

[7] S. Haykin, Neural networks: a comprehensive foundation. Prentice Hall PTR, 1994.

TABLE III COMPARISON OF ERRORS OF LSTM MODELS WITH DIFFERENT COMBINATIONS OF INPUT DATA

Fig. 3. Errors of LSTM models with different predict time steps

[8] K.-l. Hsu, H. V. Gupta, X. Gao, S. Sorooshian, and B. Imam, “Self- organizing linear output map (solo): An artificial neural network suitable for hydrologic modeling and analysis,” Water Resources Research, vol. 38, no. 12, pp. 38–1, 2002.

[9] H. K. Cigizoglu, “Application of generalized regression neural networks to intermittent flow forecasting and estimation,” Journal of Hydrologic Engineering, vol. 10, no. 4, pp. 336–341, 2005.

[10] P. A. Kagoda, J. Ndiritu, C. Ntuli, and B. Mwaka, “Application of radial basis function neural networks to short-term streamflow forecasting,” Physics and Chemistry of the Earth, Parts A/B/C, vol. 35, no. 13-14, pp. 571–581, 2010.

[11] C. Sivapragasam and S.-Y. Liong, “Flow categorization model for improving forecasting,” Hydrology Research, vol. 36, no. 1, pp. 37–48, 2005.

[12] T. Asefa, M. Kemblowski, M. McKee, and A. Khalil, “Multi-time scale stream flow predictions: The support vector machines approach,” Journal of hydrology, vol. 318, no. 1-4, pp. 7–16, 2006.

[13] R. Noori, A. Karbassi, A. Moghaddamnia, D. Han, M. Zokaei-Ashtiani, A. Farokhnia, and M. G. Gousheh, “Assessment of input variables determination on the svm model performance using pca, gamma test, and forward selection techniques for monthly stream flow prediction,” Journal of Hydrology, vol. 401, no. 3-4, pp. 177–189, 2011.

[14] A. El-Shafie, M. R. Taha, and A. Noureldin, “A neuro-fuzzy model for inflow forecasting of the nile river at aswan high dam,” Water resources management, vol. 21, no. 3, pp. 533–556, 2007.

[15] M. ¨OZGER, “Comparison of fuzzy inference systems for streamflow prediction,” Hydrological Sciences Journal, vol. 54, no. 2, pp. 261–273, 2009.

[16] H. Sanikhani and O. Kisi, “River flow estimation and forecasting by using two different adaptive neuro-fuzzy approaches,” Water resources

management, vol. 26, no. 6, pp. 1715–1729, 2012.

[17] D. A. Savic, G. A. Walters, and J. W. Davidson, “A genetic programming approach to rainfall-runoff modelling,” Water Resources Management, vol. 13, no. 3, pp. 219–231, 1999.

[18] A. Makkeasorn, N.-B. Chang, and X. Zhou, “Short-term streamflow forecasting with global climate change implications–a comparative study between genetic programming and neural network models,” Journal of hydrology, vol. 352, no. 3-4, pp. 336–354, 2008.

[19] A. Guven, “Linear genetic programming for time-series modelling of daily flow rate,” Journal of earth system science, vol. 118, no. 2, pp. 137–146, 2009.

[20] J. Shiri and O. Kisi, “Short-term and long-term streamflow forecasting using a wavelet and neuro-fuzzy conjunction model,” Journal of Hydrology, vol. 394, no. 3-4, pp. 486–493, 2010.

[21] R. R. Sahay and A. Srivastava, “Predicting monsoon floods in rivers embedding wavelet transform, genetic algorithm and neural network,” Water resources management, vol. 28, no. 2, pp. 301–317, 2014.

[22] J. Zhang, Y. Zhu, X. Zhang, M. Ye, and J. Yang, “Developing a long short-term memory (lstm) based model for predicting water table depth in agricultural areas,” Journal of hydrology, vol. 561, pp. 918–929, 2018.

[23] R. Pascanu, C. Gulcehre, K. Cho, and Y. Bengio, “How to construct deep recurrent neural networks,” arXiv preprint arXiv:1312.6026, 2013.

[24] J. Schmidhuber and S. Hochreiter, “Long short-term memory,” Neural Comput, vol. 9, no. 8, pp. 1735–1780, 1997.

[25] A. Dong, L. Zhi-Jia, W. Yong-Tuo, Y. Cheng, and D. Yi-Heng, “Flood forecasting model based on geographical information system,” Proceedings of the International Association of Hydrological Sciences, vol. 368, pp. 192–196, 2015.

[26] D. Basak, S. Pal, and D. C. Patranabis, “Support vector regression,”

Fig. 4. Errors of LSTM models with different encoder time steps

Neural Information Processing-Letters and Reviews, vol. 11, no. 10, pp. 203–224, 2007.

[27] H. Bourlard and Y. Kamp, “Auto-association by multilayer perceptrons and singular value decomposition,” Biological cybernetics, vol. 59, no. 4-5, pp. 291–294, 1988.