Shear Stress Distribution Prediction in Symmetric Compound Channels Using Data Mining and Machine Learning Models

2019·Arxiv

Abstract

Abstract

Shear stress distribution prediction in open channels is of utmost importance in hydraulic structural engineering as it directly affects the design of stable channels. In this study, at first, a series of experimental tests were conducted to assess the shear stress distribution in prismatic compound channels. The shear stress values around the whole wetted perimeter were measured in the compound channel with different floodplain widths also in different flow depths in subcritical and supercritical conditions. A set of, data mining and machine learning models including Random Forest (RF), M5P, Random Committee (RC), KStar and Additive Regression Model (AR) implemented on attained data to predict the shear stress distribution in the compound channel. Results indicated among these five models, RF method indicated the most precise results with the highest value of 0.9. Finally, the most powerful data mining method which studied in this research (RF) compared with two well-known analytical models of Shiono and Knight Method (SKM) and Shannon method to acquire the proposed model functioning in predicting the shear stress distribution. The results showed that the RF model has the best prediction performance compared to SKM and Shannon models.

Keywords: Compound channel, Machine learning, SKM model, Shear stress distribution,

Data mining models

1. Introduction

In the design of hydraulic structures; the boundary shear stress distribution is an essential factor to understand most of the flow characteristics such as the flow resistances, sediment transport, and cavitation problems. It is suggested that, the stress distribution depends on some parameters such as the flume geometry, the hydraulic condition, the boundary roughness, particularly the streamwise velocity component and the secondary flow pattern (Chiu and Chiou, 1986; Chiu and Lin, 1983; Flintham and Carling, 1988; Ghosh and Roy, 1970; Knight et al., 1994). Since the compound cross section is the nearest section to the rivers, understanding the distribution of shear stress along the periphery of compound channels is essential. Furthermore, studying the river morphology and engineering the river bed and banks is dependent on it. In addition, analysis and design of flood control structures depends on extended knowledge on the distribution of shear stresses in the flooding route. Literature includes various investigations considering different methods and case studies (Khatua and Patra, 2007; Knight and Hamed, 1984; Naik and Khatua, 2016; Rezaei and Knight, 2010; Tominaga et al., 1989). Because of the difficulty and time-consuming of direct and indirect shear stress measurement, many analytical, semi-analytical, and numerical methods have been currently developed (Shiono and Knight, 1988; Khodashenas and Paquier, 1999; Yang and Lim, 2005; Yang et al., 2012; Bonakdari et al., 2015; Sheikh Khozani et al., 2017a; Sheikh Khozani et al., 2017b). Rezaei and Knight (2009) modified the Shiono and Knight method (SKM) to predict the shear stress distribution in the compound channel with non-prismatic floodplains. Sheikh Khozani and Bonakdari (2016) compared five different analytical models to estimate the shear stress distribution in compound channels with prismatic rectangular shapes. They investigated the performance of each model in estimating shear stress in each section of the compound channel. They deducted the method of Tsallis entropy could estimate good results with fewer calculations.

Nowadays applying soft computing and data mining methods in forecasting different hydraulic and hydrology phenomena are in progress (Genç et al., 2015; Bonakdari et al., 2018; Sheikh Khozani et al., 2018a; 2018b; Azad et al., 2018; Jahanpanah et al., 2019; Sanikhani et al., 2019; Anitescu et al., 2019; Geo et al., 2019).

Nowadays applying soft computing and data mining methods in forecasting different hydraulic and hydrology phenomena are in progress. In estimating shear stress distribution Sheikh (Khozani et al., 2017) utilized the Randomize Neural Network (RNN) model in circular channels and estimated their results with results of the Shannon entropy. These researchers proposed a matrix-based equation. Khuntia et al. (2018) carried out a model of neural networks to predict the force applied to the walls in compound channel cross-sections. Sheikh Khozani et al. (2019) applied different data mining models to estimate apparent shear stress in compound channels. They deducted that by using the Bagging-M5P model the more accurate results of apparent shear stress will be obtained.

Based on the knowledge of authors there is few studies which estimated the shear stress distribution in compound channels by using data mining models. Therefore, a set of experiments were done in different flow depths and flow conditions then the extracted data was used to forecast the shear stress distribution in the smooth compound channel. About 1812 data of shear stress applied to five different models as Additive Regression (AR), M5P, KStar, Random Forest (RF), and Random Committee (RC) models. The performance of each model in prediction of the distribution of shear stress is investigated, and the most accurate model is selected. Also, the output of the most appropriate model is compared with two analytical models as Shiono and Knight (SKM) and Shannon model.

2. Apparatus and Proceeding of Experiments

In this study, the experiments are conducted utilizing a flume of 18m length. All experiments were performed in the flume with a simple rectangular cross-section compound channel. The flume width and depth are 1200 mm and 400 mm, respectively. The bed has a slope of 2.003×10-3. The main channel dimensions are 398 mm, 50 , and 400 mm for width, depth, and floodplains respectively, has been constructed with PVC material. The modulus floodplain widths for the L-shaped aluminum sections in prismatic compound channels are 100 mm, 200 mm, 300 mm and 400 mm. In this study, the distribution of shear stress in the prismatic compound channel with 100 mm floodplain width is investigated (see Figs.1 and 2).

Fig. 1. General view of an experimental flume.

Fig. 2 The cross-section of prismatic compound channels with different floodplain widths.

In the expereinents, the uniform flow is controlled by a series of adjustable tailgates located in the end of the flume. OPC denotes, Overbank flow in the channel, the first three numbers after OPC refer to the floodplain width and two code numbers denoted the flow discharge. Local boundary shear stress was measured by using a Preston tube of 4.77 mm outer diameter, at the wetted channel perimeter at 25 mm transverse intervals on the bed and 10 mm vertical intervals on the walls. Note that, the above measurements were performed at one section (14 m from the channel inlet). The range of hydraulic parameters of the experimental data is presented in Table 1. The shear stress distribution was measured in different width of the floodplain.

Table 1 The range of the main hydraulic parameters in the prismatic compound channel.

According to the results of different research the shear stress distribution in an smooth

compound channel is related to geometry of channel (the width of floodplain,

channel wetted perimeter (L)), the transverse coordinate (y), bankfull depth (h), depth of flow

over main channel (H), slope of channel bed (), flow velocity (V), fluid density (

gravitational acceleration (g) and hydraulic radius (R) then the dimensionless shear stress can

be expressed as a function:

In this study, the are as input variables which applied to each model

and the dimensionless shear stress is the output variable.

3. Material and methods

3.1. Data mining methods

Economist Michael Lovell who used the term "data mining" for the first time in the Review of Economic Studies (1983). Data mining is a process which discovers trends and patterns Han et al. (2011). Data mining is a subset of statistics and computer science with the mission of discovering patterns in data sets with a goal to extract trends and information from a data set and to prepare the extracted information into a required structure for further application (Witten et al., 2016).

On the other hand, in addition to the analysis step, it contains data management, inference consideration, pre-processing and post-processing of data, visualization and interestingness metrics (Khuntia et al. 2018). Data mining, unlike data analyzing, employs statistical or machine learning techniques to estimate, predict and to model patterns of the target dataset (Olson, 2007). Most common applications of data mining methods are Association learning, Anomaly detection, Cluster detection, classification, and Regression.

3.1.1. Random forest

Random forests (RFs) are methods for regression and classification and related tasks with constructing a multitude of decision trees. RFs considered in ensemble learning method category. This method was first introduced by Ho (1995) who implemented the stochastic discrimination to classify to the proposed by Eugene Kleinberg using the random subspace method (Barandiaran, 1998). An extension of the RFs algorithm has been registered as a trademark (Breiman, 2001). In another study by Sun et al. (2018), a new RFs algorithm has been proposed for classification based on cooperative game theory, on the other hand, the evaluation of each feature power was performed using Banzhaf power index which was traversing possible coalitions of the feature. In another study, Chen et al. (2018) proposed an adaptive variable step method based on RFs. This method from one hand was able to accelerate the training process and on the other hand, can decrease the gain of calculations of information. Based on evidence and documentation, the proposed approach was suitable to be applied in the most decision tree-based models.

In this study the optimum parameter settings of RF models including of batchsize, maximum depth of tree, number of decimal places, number execution slots, number of features, number of iterations, and number of seeds are 100, 0, 2, 1, 0, 100 and 4 respectively.

3.1.2. M5P Model

M5P algorithm is first introduced by Quinlan (1992). This method is the upgraded version of the M5 algorithm. Model trees can effectively handle large data sets, and in case of dealing with missing data, they are robust.

Based on Fig. 1, which shows the schematic diagram of the M5 algorithm, the process first

split the input data (or input space) into subspaces.

Fig. 3. The schematic diagram of M5 algorithm.

Figure 3 demonstrates the input space which has been divided into subspaces S1, S2, and S3. The minimization of the variation is performing by the use of linear regression approaches. After this step, in order to create a tree-like structure, information of the previous step is imported to build several nodes. In this step, the standard deviation reduction (SDR) is employed to reduce the error at the node (Eq. 1) (Wang and Witten, 1996):

- Si= subspaces

- Sd= the standard deviation

Lower SDR than the expected error creates over-training problems. To overcome this problem, there is a need for a smoothing process for the combination of all the models from the root to the leaf. This establishes the final model of the leaf. Finally, the resulted values of data from leaf are combined with the predicted values using linear regression for that node (Eq. 2) (Behnood et al., 2017):

- E’= Predicted value for the next higher node

- e = Predicted value for the current node

- a = Model prediction value

- n = Quantity of the training samples

- k = Constant value

In this paper the optimum parameter settings of M5P models including of batchsize, number

of decimal places, number of instance and number of seeds are 100, 0, 2, 4, and 3

respectively.

3.1.3. K-Star model (K*)

K* model or in other word K* algorithm as an Instance-based Learner and a memory-based classifier was presented by Cleary and Trigg (1995) in a conference proceedings of machine learning. The distance metric for K* technique has been performed by employing the entropy concept. Therefore, it can be claimed that the transformation probability occurs in a “random walk away” manner. Summing the probabilities classifies the K*. Generally, there is not enough evidence about how K* faces class noisy and attribute, and with the attributes mixed values in the datasets (Tejera Hernández, 2015).

In order to specify the K* technique, we have (Eq. 3 to Eq. 5):

It satisfies Eq. 6 as a consequence:

Eq. 7 defines the probability function P*:

The following properties have been satisfied by P*:

Finally, the K* function will be defined as Eq. 9:

In this study the optimum parameter settings of KStar models including of batchsize, global blend, and minimum number of places, are 100, 1, and 1 respectively.

3.1.4. Additive regression method

This method is a nonparametric regression method which was first introduced by Friedman and Stuetzle (1981). This method is known as an essential part of the alternating conditional expectations algorithm. The alternating conditional expectations algorithm employs a onedimensional smoother (in Eq. 10) to create a class of non-parametric regression models (Eq. 10). This make the method smoother than a p-dimensional method. This technique is also more flexible compared with that for a standard linear model, but is more interpretable compared with that for a general regression surface. Multicollinearity, overfitting and model selection are consodered as application fields for an additive reggression method.

By considering , (i=1 to n) as data-set for n units, which xi indicates estimators and yi reperesents the outcome value, the additive model is as Eq. 10:

Fitting the Additive regression method can be performed by the use of the backfitting algorithm presented by. Yoshida (2018) employed a semiparametric method to explore the structure of

additive regression models

The optimum parameter settings of AR models including of number of itration and shrinkage

are 12 and 1 respectively.

3.1.5. Random Committee

Random committee belongs to the category of committee machines which works based on ensemble of predictors, e.g. ANNs, decision trees (Hwang and Hu, 2001). Thus, it is considered as an ensemble classifier which work on the basis of classification for accoplish the training. It is made using a learning mechanism which predicts the committees of the new inputs. The new imputs are generated through the integration of the estimation of every single committee members. The random committee functions as a meta-learning technique using a number of randomized classifiers. The average of estimation achieved each classifier of Random committee provides the final classification result.

Hwang and Hu (2001) documented the concept of Random Committee. He described the architecture and algorithm where some Base classifiers are constructed using a different number of random seeds. Furthermore, an estimation average generated through every base classifier form the final value for the prediction.

Fig. 4. The ME architecture. The outputs of the gating network modulate the outputs of the

By assuming x as input variable and y as output variable vectors, f(x) and P(y|f(x)) will be respectively function and conditional density. By considering as a set of

NQ test points and let fq = (fq 1 ,...,fq NQ ) as the vector of the corresponding unknown

response variables and by spliting up the input data set into M sets of data D = {D1,...,DM} and by denoting the data which are not in Di as we will have in general:

It can be approximated Eq. 13:

Now the combination of Bayes’ formula and approximation generates Eq. 14:

approximate predictive density is calculated as Eq.15:

In this case, Eˆ and are estimated based on

With

The above integration of the committee members predictions ressembles the Bayesian committee machine (Hwang and Hu, 2001).

The optimum parameter settings of RC models of Batchsize, number of decimal places, number Execution slots, number of itration, and number of seed are 100, 1, 1, 15 and 1 respectively.

3.2. Analytical models

3.2.1. SKM Model

The Navier–Stokes equation for a fluid element in steady uniform flow can be written as:

stresses. Furthermore, are gravitational acceleration and fluid density, respectively.

An analytical solution for the Navier–Stokes equation to predict the lateral variation of the

depth-averaged velocity in compound channels proposed earlier by Shiono and Knight

(1988). It accounts for the 3D flow by the use of depth-integrated parameters to simplify its

use as follow:

Where s is the channel side wall slope. are the local flow depth, the depth-

averaged velocity, the dimensionless eddy viscosity, the Darcy–Weisbach friction factor and

the lateral coordinate, respectively. Shiono and Knight (1988) proposed an analytical

solution, initially ignoring the secondary flow term on the other side of the Equation (19).

They concluded that by ignoring the current secondary term, the velocity profile could be

determined relatively accurate. By increasing the bed friction, f, or the turbulent friction,

the relationship between the depth-averaged velocity and bed shear stress might be

jeopardized in such a way that it became impossible to get a prediction of both profiles

accurately at the same time.

Shiono and Knight (1991) proposed a secondary current model in order to improve the

analytical results. From experimental results, they came to conclusion that within certain

regions of the flow, the depth-averaged term on the right-hand side of differential Equation

(19) varied linearly in the y-direction on the floodplains and in the main channel, in such a

way, that its derivative could be replaced by the constant, , in the main channel and on the

floodplains. Hence

For a flat bed region ( 0), the differential Equation (21) may be written as follow

According to Shiono and Knight (1991), the analytical solution of Equation (22) for a prismatic compound channel with a flat bed region and vertical side walls is expressed as follows:

At an interface between selected panels, different boundary conditions can be used to determine the unknown parameters A.

Having the depth-averaged velocity, the bed shear stress can be calculated as:

It should be noted that the SKM is not able to model shear stress distribution on the

rectangular compound channels walls.

3.2.2. Shannon Model

Based on the Shannon entropy concept, (Sterling and Knight, 2002) extended equations to

estimate shear stress distribution in channels. They proposed equations for predicting shear

stress distribution along the wetted perimeter in the circular channel without flat bed. Also

they presented equations to forecast the shear stress distribution in wall and bed of

trapezoidal and circular channels with sediment separately. Sheikh Khozani and Bonakdari

(2016) used these models for estimating shear stress distribution to compare with other

analytical models. The suggested equations by Sterling and Knight are as bellows:

where are shear stress values for wall and bed of floodplain or main channel

respectively. is the wall and bed wetted perimeter respectively, is an offset taken

as 5 mm in the study of Sterling and Knight (2002) and are the Lagrange multipliers

related to wall and bed of compound channel subsections respectively which calculated as:

Which is the fluid density, g is the gravity acceleration, R is the hydraulic radius and

the channel slope. In order to compute the maximum shear stress distribution, the proposed

relations by Knight et al. (1994) these equations were utilized in studies of other researchers

such as Bonakdari et al. (2015), Sheikh and Bonakdari (2015), and Sheikh Khozani and

Bonakdari (2018).

4. Models performance evaluation

According to Dawson et al., (2007) using one statistical criterion is not suitable for evaluating

a model. To investigate the performance of each model for estimating the shear stress

distribution in compound channels four commonly used criteria were utilized. These applied

criteria are as coefficient of determination (), Root Mean Square Errors (RMSE), Mean

Absolute Error (MAE), Nash-Sutcliffe Efficiency (NSE), and BIAS. These statistical indexes

are calculated as:

which is the predicted shear stress values by models, is the observed shear stress values

in the laboratory, are the mean value of shear stress values which observed and

predicted respectively and n is the number of samples.

These indexes were used by Sheikh Khozani et al. (2019) to investigate the model

performances in modeling apparent shear stress in compound channels.

5. Results and discussion

5.1. Selection the best statistical model

All five mentioned models were applied to shear stress distribution data which was measured

in a straight rectangular compound channel. About 1812 data was used in the modeling

procedure that 70% were used for the training stage and 30% for the testing stage. The results

of the testing stage are shown in Figure 5 as a scatter plot and hydrograph. According to the

results of this figure, the Additive Regression Model predicted the worst results of shear

stress distribution with of 0.6745. As seen in Figure 5 the Additive Regression Model

predicted the same values of shear stress in different y/P in each test. Also based the results of hydrograph this model could not able estimate shear stress in the whole wetted perimeter.

The M5P and KStar models show the same results to somewhat. As seen in hydrograph these

models are weak in predicting the maximum and minimum shear stresses in walls and beds of

main channel and floodplains, but for other y/P they show more accurate results than the

Additive Regression Model. The RC and RF models’ predictions for the maximum and

minimum shear stress values are better than those of other models. It clearly is seen from the

scatter plot of Figure 5 that the RF Model with of 0.9003 demonstrated the most precise

results than the AR, KStar, M5P, and RC models. Therefore, the predictions of the RF Model

will be compared with two mentioned analytical models (the SKM and the Shannon models)

in the next section.

Fig. 5. The results of predicted shear stress values by data mining models in the testing period

The results of statistical criteria for comparing all five data mining models are presented in

Table 2. As seen in this table the performance of RF model is superior than those of other

models with the lowest RMSE of 0.971. In addition, the AR model demonstrated the worst

results for estimating shear stress distribution in compound channels with RMSE of 0.1707.

based the results of Figure 5 and Table 2 the RF model was selected as the best model

between all mentioned models to obtain the most accurate prediction values of shear stress

distribution in compound channels.

Table 2 Statistical parameters in the comparison between the soft computing methods.

5.2. Comparison of the models

To estimate the shear stress distribution in a prismatic compound channel with rectangular

cross-section five different data mining methods were investigated. Based on the results the

RF model performed superior to those of other models in all subsections of the compound

channel. In this section, the performance of the RF model is compared with the ability of the

Shannon and SKM models in forecasting the shear stress distribution. Figure 6 demonstrates

the comparison between two analytical models and the RF model. As seen in Figures 6a and

b the SKM model shows better performance in predicting the shear stress in the bed of the

main channel than the bed of floodplains. As we know the SKM model only can estimate the

bed shear stress and this model is not able to predict wall shear stresses. Based on the results

of Fig. 6 using the SKM model overestimated values obtain for bed shear stress of the main

channel and underestimated values calculate for the shear stress of bed of floodplains. With

increasing the width of floodplains, the accuracy of the SKM model predictions for the bed of the main channel was decreased.

On the other hand, in higher floodplain width the shear stress predictions values for the bed of floodplain are more precise. Also, when the width of floodplain increased, the SKM model estimates the pattern of shear stress for the bed of floodplain with higher accuracy as seen in Figures 6e, f, g, and h. The performance of the Shannon model is better than the SKM model. In all sub-sections, the Shannon model predictions are overestimated, but this model performs better for estimating wall shear stress than bed shear stress. When the width of floodplains is equal to 100 mm, the performance of the Shannon model is the same as the SKM model for main channel bed shear stress to somewhat. With increasing the width of floodplains, the results of the SKM model become weaker than the Shannon model. Between three mentioned models the RF model illustrates the best results with higher accuracy as seen in Fig 6. By using the RF model in addition to the most accurate predictions of shear stress distribution in the whole wetted perimeter, the model could estimate the pattern of shear stress distribution very well. In modeling with the RF model only using the hydraulic parameters of channel as y/L, the shear stress values can estimated in whole channel boundary while in the Shannon entropy it needs to compute the Lagrange multiplier and the results are not accurate as the RF model. In addition in the SKM model we can only estimate the bed shear stress and it needs to calculate the average depth velocity and computing the shear stress needs to time-consuming procedure.

Fig. 6. The shear stress distribution prediction in the compound channel by RF, Shannon and

The statistical results of comparison between the RF, Shannon and SKM models are tabulated in Table 3. As we know, the lower values of RMSE and MAE indexes shows the higher performance of models to forecast a specific phenomenon. As mentioned before the SKM model predict bed shear stress of floodplains and the main channel, in Table 3 the results of the SKM model contains only these predictions. According to the results of this table, the RF model with lower values of RMSE and MAE indicates the best results of estimating shear stress distribution in compound channels. The Shannon entropy model performs better than the SKM model in predicting shear stress values. The values of NSE demonstrates the performance of model which graded as very good for 0.75 <NSE≤ 1, good for 0.65 <NSE≤ 0.75, satisfactory for 0.5 <NSE≤ 0.65, acceptable for 0.4 <NSE≤ 0.5, and unsatisfactory for NSE ≤0.4. As seen in Table 3 for the RF model the obtained values of NSE are higher than 0.95, therefore, the RF model has a perfect grade for estimating shear stress values. For estimating shear stress distribution values in OPC-100, OPC-200, OPC-300, and OPC-400 the results of the RF model are most precise with RMSE of 0.0166, 0.0255, 0.0338, and 0.0518 respectively in comparison with the Shannon and the SKM models. All in all, based the results of Figure 6 and Table 3 the RF model is the most robust model between mentioned models in this study for estimating shear stress distribution in compound channels. It is worth addition that and BIAS which are used to estimate how good regression models are, in some cases, they can overestimate (or underestimate) the training data. To overcome these issues (overestimation and underestimation), Bayesian methods can be used to improve the regression model (Vu-Bac et al. 2014, 2015, 2016).

Table 3 Statistical parameters in the comparison between the RF, Shannon and SKM models.

6. Conclusion

In this research, the authors have investigated on shear stress distribution on the compound channel. Series of experiments were performed in prismatic simple rectangular cross-section compound channels of floodplain 100 mm, 200mm, 300mm and 400 mm width using flume of the University of Birmingham. The results have used for five different data mining method to predict the shear stress distribution; AR, M5P, KStar, RC and RF models. The AR model with of 0.6745 was not able to estimate shear stress in whole wetted perimeter accurately. The M5P and KStar models did not show appropriate results in predicting the maximum, and minimum shear stresses in walls and beds of main channel and floodplains, however for other locations of perimeter they showed more accurate outcomes rather than the AR model. The maximum and minimum shear stress values can be predicted better with the RC and RF models in comparison with the other models. The RF Model can predict the results with which is the most precise prediction among other statistical models. Shannon and SKM analytical model have been compared with RF model, the SKM model is able to predict bed shear stress of floodplains and the main channel better than wall shear stresses, however, Shannon model can predict wall shear stresses more accurately. The accuracy of the SKM model predictions for the main channel bed decreases by increasing the floodplains width. The shear stress predictions values for the floodplain bed are more meticulous in broader floodplains. The results showed that the RF machine learning model has the lower values of RMSE and MAE in comparison with the two famous accurate analytical models’ prediction of shear stress distribution in the whole wetted perimeter. Random Forest modeling technique can estimate the shear stress values in whole channel boundaries using the hydraulic parameters of while, Lagrange multiplier and average depth velocity is needed in the Shannon entropy, and SKM model, respectively and the results are not as accurate as the RF model.

References

Anitescu C, Atroshchenko E, Alajlan N, Rabczuk T. Artificial Neural Network Methods for

the Solution of Second Order Boundary Value Problems. Computers, Materials & Continua,

2019;59(1):345-359.

Azad A, Farzin S, Kashi H, Sanikhani H, Karami H, Kisi O. Prediction of river flow using

hybrid neuro-fuzzy models. Arabian Journal of Geosciences. 2018;11:718.

Barandiaran I. The random subspace method for constructing decision forests. IEEE

transactions on pattern analysis and machine intelligence. 1998;20.

Behnood A, Behnood V, Gharehveran MM, Alyamac KE. Prediction of the compressive

strength of normal and high-performance concretes using M5P model tree algorithm.

Construction and Building Materials. 2017;142:199-207.

Bonakdari H, Sheikh Khozani Z, Zaji AH, Asadpour N. Evaluating the apparent shear stress

in prismatic compound channels using the Genetic Algorithm based on Multi-Layer

Perceptron: A comparative study. Applied Mathematics and Computation. 2018;338:400-

411.

Bonakdari H, Sheikh Z, Tooshmalani M. Comparison between Shannon and Tsallis entropies for prediction of shear stress distribution in open channels. Stochastic Environmental

Research and Risk Assessment. 2015;29(1):1–11.

Breiman L. Random forests. Machine learning. 2001;45:5-32.

Buja A, Hastie T, Tibshirani R. Linear smoothers and additive models. The Annals of

Statistics. 1989:453-510.

Chen M, Wang X, Feng B, Liu W. Structured random forest for label distribution learning.

Neurocomputing. 2018;320:171-82.

Chiu C-L, Chiou J-D. Structure of 3-D flow in rectangular open channels. Journal of

Hydraulic Engineering. 1986;112:1050-67.

Chiu C-L, Lin G-F. Computation of 3-D flow and shear in open channels. Journal of

Hydraulic Engineering. 1983;109:1424-40.

Cleary JG, Trigg LE. K*: An instance-based learner using an entropic distance measure.

Machine Learning Proceedings 1995: Elsevier; 1995. p. 108-14.

Dawson CW, Abrahart RJ, See LM. HydroTest: A web-based toolbox of evaluation metrics

for the standardised assessment of hydrological forecasts. Environmental Modeling &

Software. 2007;22:1034-1052.

Flintham T, Carling P. Prediction of mean bed and wall boundary shear in uniform and

compositely rough channels. International Conference on River Regime Hydraulics Research Limited, Wallingford, Oxon UK 1988 p 267-287.

Friedman JH, Stuetzle W. Projection pursuit regression. Journal of the American statistical

Association. 1981;76:817-23.

Genç O, Gonen B, Ardıçlıoğlu M. A comparative evaluation of shear stress modeling based on

machine learning methods in small streams. Journal of Hydroinformatics. 2015;17(5):805-816.

Ghosh S, Roy N. Boundary shear distribution in open channel flow. Journal of Hydraulics

Division. 1970; 59:967-994.

Guo H, Zhuang X, Rabczuk T. A Deep Collocation Method for the Bending Analysis of

Kirchhoff Plate. Computers, Materials & Continua, 2019;59(2):433-456.

Han J, Pei J, Kamber M. Data mining: concepts and techniques: Elsevier; 2011.

Ho TK. Random decision forests. Document analysis and recognition, 1995, proceedings of

the third international conference on: IEEE; 1995. p. 278-82.

Hwang J-N, Hu YH. Handbook of neural network signal processing: CRC press; 2001.

https://www7.in.tum.de/~trespvol/papers/combine_incl_proof.pdf.

Khatua KK, Patra KC. Boundary shear stress distribution in compound open channel flow.

ISH Journal of Hydraulic Engineering. 2007;13:39-54.

Jahanpanah E, Khosravinia P, Sanikhani H, Kisi O. Estimation of discharge with free overfall in rectangular channel using artificial intelligence models. Flow Measurement and

Instrumentation. 2019;67:118-130.

Khodashenas SR, Paquier A. A geometrical method for computing the distribution of

boundary shear stress across irregular straight open channels. Journal of Hydraulic Research.

1999;37:381-8.

Khuntia JR, Devi K, Khatua KK. Boundary Shear Stress Distribution in Straight Compound

Channel Flow Using Artificial Neural Network. Journal of Hydrologic Engineering. 2018;

23(5), 04018014.

Knight DW, Hamed ME. Boundary shear in symmetrical compound channels. Journal of

Hydraulic Engineering. 1984;110:1412-30.

Knight DW, Yuen K, Alhamid A. Boundary shear stress distributions in open channel flow in Physical Mechanisms of mixing and Transport in the Environment, Ed. Beven, K. &

Chatwin, PC & Millbark, J. J Wiley. 1994.

Naik B, Khatua KK. Boundary shear stress distribution for a converging compound channel.

ISH Journal of Hydraulic Engineering. 2016;22:212-9.

Olson DL. Data mining in business services. Service Business. 2007;1:181-93.

Quinlan JR. Learning with continuous classes. 5th Australian joint conference on artificial

intelligence: Singapore; 1992. p. 343-8.

Rezaei B, Knight DW. Application of the Shiono and Knight Method in compound channels

with non-prismatic floodplains. Journal of Hydraulic Research. 2009;47:716-26.

Rezaei B, Knight DW. Overbank flow in compound channels with nonprismatic floodplains.

Journal of Hydraulic Engineering. 2010;137:815-24.

Rezaei, Bahram. “Overbank flow in compound channels with prismatic and non-prismatic

floodplains." PhD diss., University of Birmingham, 2006.

Sanikhani H, Kisi O, Maroufpoor E, Yaseen ZM. Temperature-based modeling of reference

evapotranspiration using several artificial intelligence models: application of different

modeling scenarios. Theoretical and Applied Climatology. 2019;135:449-462.

Sheikh Khozani Z, Bonakdari H, Ebtehaj I. An analysis of shear stress distribution in circular channels with sediment deposition based on Gene Expression Programming. International

Journal of Sediment Research. 2017a;32:575-84.

Sheikh Khozani Z, Bonakdari H, Ebtehaj I. An expert system for predicting shear stress

distribution in circular open channels using gene expression programming.” Water Science

and Engineering. 2018a;11(2):167-176.

Sheikh Khozani Z, Bonakdari H, Zaji AH. Efficient shear stress distribution detection in

circular channels using Extreme Learning Machines and the M5 model tree algorithm. Urban

Water Journal. 2017b;14:999-1006.

Sheikh Khozani Z, Bonakdari H, Zaji AH. Estimating shear stress in a rectangular channel

with rough boundaries using an optimized SVM method. Neural Computing and

Applications.2018b;30(8):2555-2567.

Sheikh Khozani Z, Bonakdari H, Zaji AH. Estimating the shear stress distribution in circular

channels based on the randomized neural network technique. Applied Soft Computing.

2017c;58:441-8.

Sheikh Khozani Z, Bonakdari H. A comparison of five different models in predicting the

shear stress distribution in straight compound channels. Scientia Iranica Transaction A, Civil

Engineering. 2016;23:2536.

Sheikh Khozani Z, Bonakdari, H. Formulating the shear stress distribution in circular open

channels based on the Renyi entropy. Physica A: Statistical Mechanics and its Applications,

2018;490:114-126.

Sheikh Khozani Z, Khosravi Kh, Pham BT, Klove B, Wan Mohtar WHM, Yaseen ZM.

Determination of compound channel apparent shear stress: Application of novel data mining

models. Hydroinformatics, 2019, https://doi.org/10.2166/hydro.2019.037

Sheikh Z, Bonakdari H. Prediction of boundary shear stress in circular and trapezoidal

channels with entropy concept. Urban Water Journal, 13(6), 629–636.

Shiono K, Knight D. Turbulent open-channel flows with variable depth across the channel.

Journal of Fluid Mechanics. 1991; 22: 617-646.

Shiono K, Knight D. Two-dimensional analytical solution for a compound channel. Proc, 3rd Int Symp on refined flow modeling and turbulence measurements1988. p. 503-10.

Sterling M, Knight D. An attempt at using the entropy approach to predict the transverse

distribution of boundary shear stress in open channel flow. Stochastic Environmental

Research and Risk Assessment. 2002;16:127-42.

Sun J, Zhong G, Huang K, Dong J. Banzhaf random forests: Cooperative game theory based

random forests with consistency. Neural Networks. 2018;106:20-9.

Tejera Hernández DC. An Experimental Study of K* Algorithm. International Journal of

Information Engineering & Electronic Business. 2015;7.

Tominaga A, Nezu I, Ezaki K, Nakagawa H. Three-dimensional turbulent structure in straight open channel flows. Journal of hydraulic research. 1989;27:149-73.

Vu-Bac N, Lahmer T, Zhang Y, Zhuang X, Rabczuk T. Stochastic predictions of interfacial

characteristic of polymeric nanocomposites (PNCs). Composites Part B: Engineering.

2014;59:80-95.

Vu-Bac N, Raee R, Zhuang X, Lahmer T, Rabczuk T. Uncertainty quantication for multiscale modeling of polymer nanocomposites with correlated parameters. Composites Part B:

Engineering. 2015;68:446-464.

Vu-Bac, N., Lahmer, T., Zhuang, X., Nguyen-Thoi, T., & Rabczuk, T. (2016). A software

framework for probabilistic sensitivity analysis for computationally expensive models.

Advances in Engineering Software. 2016;100:19-31.

Wang Y, Witten IH. Induction of model trees for predicting continuous classes. 1996.

Witten IH, Frank E, Hall MA, Pal CJ. Data Mining: Practical machine learning tools and

techniques: Morgan Kaufmann; 2016.

Yang K, Nie R, Liu X, Cao S. Modeling depth-averaged velocity and boundary shear stress

in rectangular compound channels with secondary flows. Journal of Hydraulic Engineering.

2012;139:76-83.

Yang S-Q, Lim S-Y. Boundary shear stress distributions in trapezoidal channels. Journal of

Hydraulic Research. 2005;43:98-102.

Yoshida T. Semiparametric method for model structure discovery in additive regression

models. Econometrics and Statistics. 2018;5:124-36.