Comprehensive performance analysis of training functions in flow prediction model using artificial neural network

Higher Himalayan catchments are often poorly monitored for hydrological activities involving flood flow prediction for the safety of riverside communities and the successful operation of hydropower projects. This study aimed to estimate the comparative performance of artificial neural network (ANN) based flow prediction models using 10 years of daily river flow data of Kaligandaki catchment at Kotagaun, Nepal, which is a snow-fed catchment in the Himalayan region. The flow prediction models were trained and tested at a hydrological station using the previous 3 days’ river flow data to predict the 1-day ahead flow data. Eight different training functions were employed in an ANN model for comprehensive statistical assessment of accuracy and precision of each training function. The most significant and validated result obtained in this study is the comprehensive comparison of various training functions’ performance, and identification of the most efficient training function for the study case. Among the training functions investigated, the Levenberg-Marquardt backpropagation function exhibits the best performance for the model having Nash-Sutcliffe efficiency, root mean square error and mean absolute error values of 0.866, 209.578 and 75.422, respectively. This study provides a fundamental basis for accurate flow prediction of topographically challenged catchments where hydrological monitoring and data collection may be limited. In particular, this model will help to improve early warning system, hydrological planning, and the safety of riverside communities in the Himalayan region.


INTRODUCTION
In modern times, the exploitation of river water and its energy for human needs has become widespread (White, 1943).Researchers are constantly focusing on the wise use of river water with maximum utilization while maintaining a healthy river ecosystem (Suwal et al., 2020;Yuqin et al., 2019;Vugteveen et al., 2006;Elosegi and Sabater, 2013;Ekka et al., 2020;Joshi et al., 2018;Boulton, 2000).Due to the limitation of the available water resources, efficient management and optimized use of water resources is imperative (Simonovic, 2012).While optimizing river water use, prediction of future flow characteristics of water bodies becomes crucial.Reliable flow forecasting not only ensures effective management of water resources and supports sustainable environmental management practices, but also protects communities from the impact of floods and other water hazards by establishing early warning systems (Alfieri et al., 2012).A plethora of research has been conducted to develop tools and methodologies for precise flow forecasting (Firat, 2008;Adamowski, 2008;Nash and Sutcliffe, 1970;Yaseen et al., 2019;Shamseldin, 2010;Kasiviswanathan and Sudheer, 2013;Ahani et al., 2018).However, due to vastly diverse characteristics of river basins all over the world, developing a reliable common methodology applicable to all types of river basins is highly challenging (Gharib and Davies, 2021;Hapuarachchi et al., 2011;Pagano et al., 2014), especially in the context of Nepal, which is bestowed with plentiful water bodies in a highly varied geographical and climatic region within a very small area, and where the water basins are poorly monitored due to the limitations of hydrological data measurement and geographical extremities.This study applies an artificial neural network (ANN) to predict 1-day ahead river discharge in the Kaligandaki River, using 8 different training functions.The study will provide fundamental knowledge for river discharge prediction in Nepal, which can assist in water resource management and hazard management practices in the country.The study further investigates the performance of different training functions employed in an ANN model through a detailed statistical analysis of each training function.
Flow forecasting is a highly challenging application, as it involves prediction of natural future events that are governed by a number of parameters that interact in a complex non-linear fashion (Steere et al., 2000;Jain et al., 2018).Several machine-learning techniques are gaining attention for flow forecasting due to their constant improvisation and efficacy in flow prediction precision (Mosavi et al., 2018;Dodangeh et al., 2020;Rathod et al., 2023).Recent studies have wisely applied artificial intelligence for enhanced prediction of various flow characteristics (Pandey et al., 2023(Pandey et al., , 2022;;Shivashankar et al., 2022).Dodangeh et al. (2020) integrated multi-time resampling, random subsampling and bootstrapping algorithms into machine learning models and obtained improved results relative to traditional models (Dodangeh et al., 2020).Plumb et al. (2005) employed four different training functions into three ANN packages and observed that all models had high accuracy and no model outperformed the others in all categories of analyses (Plumb et al., 2005).2019) carried out a comparative analysis of Levenburg-Marquardt, Bayesian regularization and scaled conjugate gradient based ANN models and showed that the Bayesian regularization model performed better than other algorithms in terms of fitness, regression value, mean square error and number of epochs (Khan et al., 2019).Yonaba et al. (2010) compared three different activation functions for multi-step-ahead streamflow forecasting, and their results demonstrated that the tangent sigmoid activation function had the best predictive ability for their study case (Yonaba et al., 2010).Dash et al. (2010) developed a hybrid neural network model using ANN in conjunction with genetic algorithm for prediction of groundwater levels in Mahanandi River basin of Orissa State, India.They employed three training functions and the simulations suggested that the Bayesian regularization model was the most efficient model among those tested (Dash et al., 2010).Afzaal et al. (2019) estimated groundwater levels for two watersheds in Prince Edward Island, Canada, with deep learning techniques and ANN models, and discovered that the convolutional neural network outperformed the other models (Afzaal et al., 2019).Aqil et al. (2007) developed three adaptive techniques to study the behaviour of ANN and neuro-fuzzy system in modelling of daily and hourly runoff.Their study revealed that the neuro-fuzzy system performed better than the other models (Aqil et al., 2007).Bui et al. (2012) developed two models for landslide susceptibility assessment using Bayesian regularization and Levenberg-Marquardt techniques in ANN and discovered that both models produced high accuracy results with the former being slightly superior to the latter (Bui et al., 2012).
The focus of previous research has primarily been on two transfer functions: Levenberg-Marquardt and Bayesian regularization transfer functions.However, a comprehensive analysis of other training functions for large hydrological time-series data has not been attempted.This study uses 10 years of daily river flow data from Kotagaun in the Kaligandaki basin to conduct a thorough analysis of 8 different training functions for predicting 1-day ahead discharge.The study also includes detailed analyses using various statistical parameters.As a result, this research provides significant insights into selecting the best training function for hydrological data prediction in a Himalayan catchment without relying on meteorological data.

Study area and data collection
This study focuses on flow forecasting in the Kaligandaki River basin at Kotagaun, which is one of the major catchments in Nepal with diverse geographic and climatic characteristics (Fig. 1).The Kaligandaki River originates in the Mustang region of the Himalayas in Nepal.It has a length of approximately 630 km, ultimately joining the Trishuli River at Devghat in central Nepal, and has a catchment area of approximately 11 830 km 2 .It is one of the major tributaries of the Gandaki River, which is a significant river system in Nepal.The basin is characterized by high mountain peaks ranging in elevation from 202 to 8 147 m amsl.The basin holds a total hydropower potential of approximately 2 108.97 MW (Bagale, 2017).Currently, the only hydropower project operational in this area is the Kaligandaki A, which generates 144 MW of electricity.In the current study, a time-series data of 10 years, from 2001 to 2010, was utilized in the model.The first 7 years of data were used for training and the later 3 years data for testing of the model, i.e. a ratio of 70:30 was applied, which is considered to be the ideal ratio for hydrological time-series data in ANN models (Khosravi et al., 2020;Lei et al., 2021;Nguyen et al., 2021;Abraham, 2002).

Artificial neural network
An artificial neural network (ANN) is a machine-learning algorithm inspired by the structure and functioning of biological neurons in human brains.It consists of a collection of interconnected nodes or artificial neurons that collaboratively process and analyse complex data inputs.The concept of ANN was initially proposed by Warren McCulloch and Walter Pitts, who developed the McCulloch-Pitts neuron, which is considered as the basic building block in many neural network designs (Abraham, 2002).Subsequently, Frank Rosenblatt (1958) introduced a simple neural network model that could learn to classify patterns into different categories (Rosenblatt, 1958).Since then, several researchers have developed improved ANN models for a variety of applications (Yaseen et al., 2019;Shamseldin, 2010;Zainuddin et al., 2019;Khashei and Bijari, 2011;Zhang et al., 2020;Khashei et al., 2012;Jahangir et al., 2019).
The basic building block of an ANN is the artificial neuron, which receives one or more input signals, performs a mathematical operation on them, and produces an output signal that is transmitted to other neurons in the network.The connections between the neurons are represented by weights, which are adjusted during the training process to optimize the network's performance on a specific task.Figure 2

Statistical evaluation
The flow prediction models were prepared for the Kaligandaki catchment using 8 different training functions.The obtained results were analysed using statistical indices such as NSE, RMSE, and MAE.NSE measures the proportion of variation in the observed data relative to the mean of the observed data.In NSE, a value of 1 indicates a perfect match, whereas a value of 0 indicates the mean value of observed data, and a negative value indicates that the predicted values are worse than the mean values of the observed data.RMSE is calculated by taking the square root of the average of the squared differences between the model predictions and the corresponding observed values.A lower RMSE value signifies better performance in predicting the data, while a larger RMSE value indicates higher discrepancies between the predicted and observed values.MAE represents the average absolute deviation of the differences between the predicted and observed data.Similar to RMSE, a lower value of MAE indicates better model performance, while a larger value indicates higher deviation of the predicted values from the observed values.Table 1 presents the range and the mathematical expressions for each of the statistical tools employed in this study.

Comparison with observed data
Figure 3 illustrates the time-series of the predicted and observed data for each model and their respective scatter plots.The models demonstrate good performance in data prediction, with R 2 values ranging from 0.806 to 0.866 indicating a reasonable match between the predicted and observed values.However, for each model it was observed that the regression line lies slightly below ) where: X i is the observed value, X i is the mean of observed values, Y i is the predicted value, and n is the total number of observations This underestimation for all models is particularly dominant during the monsoon season (June, July, August and September), when the discharge is high, with fluctuating values.Conversely, for RB, CGBPR and OSSB models, a slight overestimate was observed in the dry season as depicted in the time-series shown in Fig. 3.Among the 8 models, the most accurate predictions were achieved by the LMB and BRB models, with R 2 values of 0.866 and 0.860, respectively.This result is comparable to that obtained by Heng et al. (2022), where the authors demonstrated the superior prediction capability of LMB and BRB models (Heng et al., 2022).The LMB model combines the gradient descent method with the Gauss-Newton method, resulting in higher convergence rates (Sapna et al., 2012;Singh et al., 2007).Both LMB and BRB are effective in handling noisy data (Mahapatra and Sood, 2012;Kayri, 2016;Payal et al., 2013;Wali and Tyagi, 2020;Jazayeri et al., 2016), contributing to their superior performance compared to the other models.On the other hand, the RB and CGBPR models exhibit the least accurate predictions among the 8 models, with R 2 values of 0.806 and 0.814, respectively.

Deviation of predicted data
The outputs from the 8 models were analysed to evaluate their respective accuracy of prediction.The deviation of the predicted from observed data was plotted, along with a box-and-whisker plot of the deviation values for all models (Fig. 4).It was observed that all the models exhibited a similar deviation pattern with slight variations.The deviation values for the monsoon season of 2009 showed larger deviations for all models, which is mainly due to the high levels of fluctuation and presence of noisy data during that period.Additionally, box-and-whisker plots for the observed and predicted data were generated, as shown in Fig. 5.The median values for all the models were found to be similar to that of the observed data, except for the RB and CGBPR models, which have slightly higher median values, of 266.8 and 223.6, respectively, as shown in Table 2.Moreover, the difference between the 3 rd quartile and 1 st quartile values closely resembles that for the observed values, except for the RB and CGBPR models, having values of 344.8 and 264.6, respectively, which indicates that the output values for these models are more concentrated in the middle 50%, having lower variability and an inability to predict fluctuating data correctly.

Performance analysis
The Taylor diagram serves as an excellent tool for assessing the level of agreement between the actual observed data and predicted data from various models (Apaydin et al., 2021;Ali Ghorbani et al., 2018;Reddy et al., 2021).In this study, the to the observed values, at 551.5 and 547.8, respectively.Similarly, the highest values of CC were observed for LMB and BRB models, with values of 0.866 and 0.862, respectively.Therefore, the Taylor diagram confirms the LMB model as having the highest accuracy of prediction, followed by the BRB model.

Statistical analysis
In this study, we conducted an in-depth analysis utilizing three prominent statistical metrics commonly employed in hydrological data assessment: NSE, RMSE, and MAE.Our objective was to assess and compare the predictive capabilities of each of the 8 training function models.
For NSE, a value of 1 indicates the perfect prediction of the model.In this study, the highest values of NSE were obtained for the LMB model, followed by the BRB and SCGB models, as presented in Table 3 and Fig. 7.The RB model, followed by CGBPR and CGBFR models, showed the worst performance based on NSE values.

CONCLUSION
In conclusion, this study presents an analysis of 8 different ANN training functions for predicting the daily river discharge at Kaligandaki River, Kotagaun.It highlights the effective application of these models in a remotely situated basin with scarce meteorological data.The LMB model demonstrated the highest accuracy in flow prediction, as evidenced by its high NSE value of 0.866 and optimal SD, RMSE, and MAE values.This was closely followed by the BRB model, which also showed commendable performance.The superiority of the LMB and BRB models is attributed to their robust handling of noisy data, underpinned by LMB's integration of gradient descent and Gauss-Newton methods, ensuring effective data convergence (Sapna et al., 2012;Singh et al., 2007;Mahapatra and Sood, 2012).These findings substantiate the LMB model's prominence as the most suitable training function for the specified area and input parameters, offering significant implications for similar hydrological modelling endeavours.
Aggarwal and Kumar (2015) developed models of ANN using various training functions to predict hourly temperature for 24 hours and concluded that training functions are the most important parameters of the ANN model and revealed that the Levenberg-Marquardt backpropagation training function outperformed the other training functions for the model (Aggarwal and Kumar, 2015).Similarly, Tabbussum and Dar (2020) developed a flood forecasting model using an ANN algorithm with five different training functions and concluded that the Levenburg-Marquardt model performed the best among those tested (Tabbussum and Dar, 2020).Khan et al. (

Figure 1 .
Figure 1.Catchment characteristics of the study area illustrates the basic architecture of an ANN.Training functions for ANN Training functions are vital parameters in an ANN model, playing a significant role in determining the accuracy and precision of predicted output results.A training function serves as a tool within the ANN framework, encompassing a specific procedure or algorithm to train the network by iteratively updating its weights and biases based on the provided input and desired output data.The training function continues its iterative process until a specified stoppage condition is met, and continuously enhances the predictive capacity of the ANN model.Thus, training functions are the focal parameters that determine the performance of the model (Aggarwal and Kumar, 2015).In this study, we explore the predictive capabilities of 8 wellestablished training functions for the flow prediction model of Kaligandaki River at Kotagaun.By systematically analysing and comparing the results obtained, we aim to uncover valuable insights into the efficacy of various training functions and their impact on the overall model performance.The 8 training functions under investigation in this study were as follows: 1. Levenberg-Marquardt backpropagation (LMB) 2. Resilient backpropagation (RB) 3. Conjugate gradient backpropagation with Powell-Beale restarts (CGBPB) 4. Conjugate gradient backpropagation with Fletcher-Reeves updates (CGBFR) 5. Conjugate gradient backpropagation with Polak-Ribiere updates (CGBPR) 6. Bayesian regulation backpropagation (BRB) 7. Scaled conjugate gradient backpropagation (SCGB) 8.One step secant backpropagation (OSSB) Model setup The ANN model was implemented using Matlab programming.For this study, daily discharge measurement values from the Kaligandaki River at Kotagaun were used over a period of 10 years from 2001 to 2010.The dataset was divided into two, with the first 7 years of data used for model training and the remaining 3 years of data used for model testing.Thus, each model has to handle a large volume of data, which consists of large fluctuations, noise, and even faulty data values, and which requires complex non-linear relations to be established for effective prediction.ANNs are well-established tools for such predictions (Reddy et al., 2021).The input parameters for the model included the 3 previous days' discharge values and the day of prediction (day number out of 365 days).These inputs were utilized to train the ANN model and predict the flow.The model architecture consisted of 3 hidden layers with sigmoid activation function, while the output layer employed a linear activation function.The study evaluates the performance of 8 different training function models of ANN in predicting 1-day ahead river discharge with reference to the previous 3 days' data.After the predictions were made, the 3 major statistical tools for hydrological data analysis, namely, Nash-Sutcliffe efficiency (NSE), root mean square error (RMSE), and mean absolute error (MAE), were studied for all of the 8 training function models to investigate and analyse their respective predictive capabilities.

Figure 2 .
Figure 2. Schematic diagram of ANN architecture

Figure 5 .
Figure 5. Box-and-whisker plot for the observed and predicted data Taylor diagram was utilized to visually represent the accuracy and predictive ability of the 8 different training function models through comparison of correlation coefficient (CC), standard deviation (SD), and root mean square error (RMSE).The Taylor diagrams for the 8 models show similar properties with all values converging to same location, as shown in Fig. 6.The lower values of SD and RMSD, and higher values of CC, indicate a better match of the predicted model with the observed data.The lowest values of RMSD were observed for the LMB and BRB models, having values of 209.6 and 212.8, respectively.The observed data has an SD value of 573.5, while all the predicted values for all the models have a lower value the observed data, which indicates that the predicted values are clustered closely around the average value compared to the observed values.The BRB and LMB model gave the closest SD values

Figure 6 .
Figure 6.Taylor diagram for different models of training function

Table 1 .
Mathematical expressions of the statistical indices

Table 2 .
Statistical evaluation of predicted data by different training functions

Table 3 .
Statistical evaluation of different training functions RMSE values indicate average error in the predicted model.The RMSE values for the 8 models ranged from 209.578 to 252.212, with the lowest value for the LMB model and highest value for the RB model, as shown in Table 3 and Fig. 7. MAE indicates the average amount of deviation of the predicted values from the observed values.For the 8 training functions in this study, the CGBPB model had the lowest MAE, followed by BRB, LMB, SCGB and SCGFR, which have similar values.The CGBPR model had the highest MAE, followed by the RB and OSSB models.