Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Thesis on COMPARATIVE STUDY OF FORECASTING MODELS BASED ON WEATHER PARAMETERS Submitted for the award of DOCTOR OF PHILOSOPHY Degree in STATISTICS SUBMITTED BY Mohita Anand Sharma UNDER THE SUPERVISION OF Dr. J.B. Singh Senior Professor Statistics SHOBHIT INSTITUTE OF ENGINEERING & TECHNOLOGY A DEEMED-TO-BE UNIVERSITY MODIPURAM, MEERUT– 250110 (INDIA) 2012 Shobhit University Campus : NH-58, Modipuram, Meerut 250110, INDIA T. : + 91-121-2575091/92 F. : + 91-121-2575724 E. : [email protected] U. : www.shobhituniversity.ac.in University Certificate This is to certify that the thesis, entitled “Comparative Study Of Forecasting Models Based On Weather Parameters” which is being submitted by Ms. Mohita Anand Sharma for the award of Degree of Doctor of Philosophy in Statistics to the Faculty of Humanities, Physical and Mathematical Sciences of Shobhit University, Meerut, a deemed-to-be University, established by GOI u/s 3 of UGC Act 1956, is a record of bonafide investigations and extensions of the problems carried out by her under my supervision and guidance. To the best of my knowledge, the matter embodied in this thesis is the original work of the candidate herself and has not been submitted for the award of any other degree or diploma of any University or Institution. It is further certified that she has worked with me for the required period in the Faculty of Humanities, Physical and Mathematical Sciences, Shobhit University, Meerut, (U.P.), India. Prof. J.B. Singh (Supervisor) Senior Professor Statistics DECLARATION I, hereby, declare that the work presented in this thesis, entitled “Comparative Study Of Forecasting Models Based On Weather Parameters” in fulfillment of the requirements for the award of Degree of Doctor of Philosophy, submitted in the Faculty of Humanities, Physical and Mathematical Sciences at Shobhit University, Meerut, a deemed-to-be University, established by GOI u/s 3 of UGC Act 1956, is an authentic record of my own research work carried out under the supervision of Prof. J.B. Singh. I also declare that the work embodied in the present thesis (i) Is my original work and has not been copied from any Journal/Thesis/Book, and (ii) Has not been submitted by me for any other Degree or Diploma of any University/ Institution. [Mohita Anand Sharma] ACKNOWLEDGEMENT Research is well versed with booms and hiccups. But despite these, one relishes at the faint end where one comes out of this entrenched mundane. During this mercurial period one gets along with innumerable individuals to whom you owe something or more. This is my endeavor here to figure at least few people who lent their support for smooth accomplishment of my doctoral work. Foremost, I would like to express my heartiest gratitude to Prof. J. B. Singh, my guide, for providing his openhanded driving force behind my research activities. It is the great opportunity to complete my doctoral program under his scholarly and innovative guidance. I owe him for his efficient supervision, constant inspiration, encouragement and stimulating discussions throughout the research work. I am thankful to the Chancellor Dr. Shobhit Kumar, Pro-vice chancellor Kuwar Shekhar Vijendra, Vice-Chancellor Prof. R. P. Agarwal and Dean Prof. S. C. Agarwal for providing amiable environment for concluding research in the University. I would like to acknowledge the significant contribution of Prof. Irene Sarkar and Prof. Sanjay Sharma for their valuable guidance, encouragement and providing necessary support in the development of models in this research work. Sincere gratified to my parents Late Mohan Swaroop Anand and Late Nisha Anand, who instilled inspiration in my life, who brought me up, nurtured and imparted me the real virtues of humanity, empathy and kindness. They are my real motivators who sacrificed in order to bring me to the present position and blessed me with their grace and affection. I wish they were alive to see me achieve this goal but I am sure they must be blessing me from heaven. Especial express thanks to my loving and caring younger brother Surya Anand, for his gigantic over all support from day one. I am profoundly thankful for the love and extreme affection of my elder sister Sanchita, collogue Anshul Ujaliyan and best friends Mahima & Nida to encourage me constantly. Special thanks to Ms. Rajni Nayyar and her family to devote the imperative time to facilitate me direct/indirect. I am immensely thankful to Indian Methodology Department, Dehradun for providing the valuable data for this research work. This murky world is a difficult place to walk without blessings and teachings of some people. I am grateful to my teachers who bestowed upon me the real lessons of life. I would like to thank all my family and friends who have directly or indirectly contributed in my research endeavor. Honest recognition to my In-laws Shree Madan Pal Sharma and Smt. Brijbala Sharma along with complete family to encourage me and taking care of my children. With an endless word of thanks to my backbone - my lovable husband Mr. Prashant Kumar Sharma, as he toughs each and every aspect of my life. The credible love of my kids had boosted me with their charming smiles and activities throughout the day since birth. Finally, but most importantly, I pay my reverence to the GOD, preserver and protector with whose grace I stand tall at present. He showed me the right path in all ups and downs and moments of despair throughout the tenure of this work. I bow my head in complete submission before Him. CONTENTS Proem (i) List of Tables (ii)-(iii) List of Figures (iv)-(vii) Chapter 1: Introduction 1-6 1.1 Scope 1 1.2 Motivation 1 1.3 Overview 2 1.4 Contribution 3 1.5 Objectives 4 1.6 Study area 5 Chapter 2: Review of literature 7-25 2.1 Probability Distribution 2.2 Multiple Regression (MR) 13 2.3 Autoregressive Integrated Moving Average (ARIMA) 16 2.4 Artificial Neural Network (ANN) 17 2.5 Comparison among MR, ARIMA and ANN 20 Chapter 3: Fitting of Probability Distribution 8 26-85 3.1 Introduction 26 3.2 Descriptive Statistics 26 3.3 Methodology 48 3.3.1 Fitting the probability distribution 48 3.3.2 Testing the goodness of fit 48 3.3.3 Identification of best fit probability distribution 52 Probability Distribution Pattern 54 3.4.1 Introduction 54 3.4.2 Rainfall 54 3.4 3.5 3.4.3 Maximum temperature 58 3.4.4 Minimum temperature 62 3.4.5 Relative humidity at 7 AM 66 3.4.6 Relative humidity at 2 PM 70 3.4.7 Pan evaporation 74 3.4.8 Bright sunshine 78 Conclusion 82 Chapter 4: Weather forecasting models 86-94 4.1 Introduction 86 4.2 Correlation Analysis 86 4.3 Methodology for forecasting models 87 4.3.1 Multiple Linear Regressions 87 4.3.2 Autoregressive Integrated Moving Average 88 4.3.3 Artificial Neural Network 90 4.3.4 Hybrid Approach 92 4.3.5 Performance Evaluation Criteria 93 Development of forecasting model for weather parameters 95 4.4.1 Introduction 95 4.4.2 Rainfall 95 4.4 4.5 4.4.3 Maximum temperature 100 4.4.4 Minimum temperature 103 4.4.5. Relative humidity at 7 A.M. 108 4.4.6 Relative humidity at 2 P.M. 111 4.4.7 Pan evaporation 114 4.4.8 Bright sunshine 119 Comparison of prediction ability of forecasting models 124 4.5.1 Introduction 124 4.5.2 Rainfall 124 4.6 4.5.3 Maximum temperature 128 4.5.4 Minimum temperature 128 4.5.5. Relative humidity at 7 A.M. 128 4.5.6 Relative humidity at 2 P.M. 138 4.5.7 Pan evaporation 138 4.5.8 Bright sunshine 138 Conclusion 148 Chapter 5: Identification of precise weather forecasting model 149-180 5.1 Introduction 149 5.2 Validation of weather forecasting model 149 5.2.1 Rainfall 149 5.2.2 Maximum temperature 154 5.2.3 Minimum temperature 158 5.2.4. Relative humidity at 7 A.M. 162 5.2.5 Relative humidity at 2 P.M. 167 5.2.6 Pan evaporation 171 5.2.7 Bright sunshine 175 Conclusion 180 5.3 Chapter 6: Summary and future scope 181-183 6.1 Summary 181 6.2 Future scope 183 Bibliography 184-200 APPENDICES (A) Procedure followed for Stepwise Regression Analysis (B) Programs for developing ANN models 225-238 (C) Programs for developing Hybrid MLR_ANN models 239-246 List of Reprints (Attached) of Publications 201-224 PROEM Copious business and economic time series are non-stationary and contains trend and seasonal discrepancy. So an accurate forecasting of such time series will always be an important chore for effective decisions in marketing, production, weather forecasting and many other sectors. Since weather forecasting is the most crucial and challenging operational errands accepted worldwide. There are many methodologies that decompose a time series linear and non-linear forms which will always require forecasting. We have indicated the open problems and scope of the research work in the first chapter, which delineate enthuse that drive us to identify the anticipated methods. Further, summing a concise contribution of the toil, intimating the objective implicated for the study area. Literature assess is briefly illustrated with recent progress in prediction of weather parameters in the second chapter. Third chapter presents the descriptive statistics of the weather data seasonally and weekly considered for the monsoon months for the study, moreover presenting the methodology for fitting the probability distribution for each weather parameter using goodness of fit tests. The fourth chapter dwells with the methodology of traditional and proposed hybrid forecasting models developed for each weather parameter, ensuring their predictive ability graphically. Chapter fifth corroborates the precise weather forecasting model for each parameter by comparing the models through performance evaluation criteria. Finally, the last chapter windups along with the future span of work. LIST OF TABLES Table No. Title Page No. 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9(a) 3.9(b) 3.9(c) 3.9(d) 3.10(a) 3.10(b) Summary of statistics for Rainfall. Summary of statistics for Maximum Temperature. Summary of statistics for Minimum Temperature. Summary of statistics for Relative Humidity at 7 AM. Summary of statistics for Relative Humidity at 2 PM. Summary of statistics for Pan Evaporation. Summary of statistics for Bright Sunshine. Description of various probability distribution functions. Distributions fitted for Rainfall data sets. Distributions with highest score for Rainfall data sets. Parameters of the distributions fitted for Rainfall data sets. Best fit probability distribution for Rainfall. Distributions fitted for Maximum Temperature data sets. Distributions with highest score for Maximum Temperature data sets. Parameters of the distributions fitted for Maximum Temperature data sets. Best fit probability distribution for Maximum Temperature data sets. Distributions fitted by the tests for Minimum Temperature data sets. Distributions with highest score for Minimum Temperature data sets. Parameters of the distributions fitted for Minimum Temperature data sets. Best fit probability distribution for Minimum Temperature. Distributions fitted for Relative Humidity at 7 AM data sets. Distributions with highest score for Relative Humidity at 7 AM data sets. Parameters of the distributions fitted for Relative Humidity at 7 AM data sets. Best fit probability distribution for Relative Humidity at 7 AM. Distributions fitted for Relative Humidity at 2 PM data sets. Distributions with highest score for Relative Humidity at 2 PM data sets. 27 31 34 37 40 43 46 49 56 56 57 58 59 60 3.10(c) 3.10(d) 3.11(a) 3.11(b) 3.11(c) 3.11(d) 3.12(a) 3.12(b) 3.12(c) 3.12(d) 3.13(a) 3.13(b) 60 62 63 63 64 66 67 67 68 69 71 72 LIST OF TABLES (CONTINUED) Table Title No. 3.13(c) Parameters of the distributions fitted for Relative Humidity at 2 PM data sets. 3.13(d) Best fit probability distribution for Relative Humidity at 2 PM. 3.14(a) Distributions fitted for Pan Evaporation data sets. 3.14(b) Distributions with highest score for Pan Evaporation data sets. 3.14(c) Parameters of the distributions fitted for Pan Evaporation data sets. 3.14(d) Best fit probability distribution for Pan Evaporation. 3.15(a) Distributions fitted for Bright Sunshine data sets. 3.15(b) Distributions with highest score for Bright Sunshine data sets. 3.15(c) Parameters of the distributions fitted for Bright Sunshine data sets. 3.15(d) Best fit probability distribution for Bright Sunshine. 4.1 Inter correlation coefficient between weather parameters for total data set. 5.1 Comparison of the performance of forecasting models for Rainfall. 5.2 Comparison of the performance of forecasting models for Maximum Temperature. 5.3 Comparison of the performance of forecasting models for Minimum Temperature. 5.4 Comparison of the performance of forecasting models for Relative Humidity at 7 AM. 5.5 Comparison of the performance of forecasting models for Relative Humidity at 2 PM. 5.6 Comparison of the performance of forecasting models for Pan Evaporation. 5.7 Comparison of the performance of forecasting models for Bright Sunshine. Page No. 72 74 75 75 76 77 79 80 80 81 87 153 158 162 163 167 171 176 LIST OF FIGURES Figure Title No. 3.1 Mean, standard deviation and range of weekly Rainfall. 3.2 50 years weekly Rainfall for monsoon period. 3.3 Mean, standard deviation and range of weekly Maximum Temperature. 3.4 50 years weekly Maximum Temperature for monsoon period 3.5 Mean, standard deviation and range of weekly Minimum Temperature. 3.6 50 years weekly Minimum Temperature for monsoon period 3.7 Mean, standard deviation and range of weekly Relative Humidity at 7 AM. 3.8 50 years weekly Relative Humidity at 7 AM for monsoon period 3.9 Mean, standard deviation and range of weekly Relative Humidity at 2 PM. 3.10 50 years weekly Relative Humidity at 2 PM for monsoon period 3.11 Mean, standard deviation and range of weekly Pan Evaporation. 3.12 50 years weekly Pan Evaporation for monsoon period 3.13 Mean, standard deviation and range of weekly Bright Sunshine. 3.14 50 years weekly Bright Sunshine for monsoon period 4.1 An (m x n x o) artificial neural network structure, showing a multilayer perceptron. 4.2 Plots of autocorrelation and partial autocorrelation coefficients and time lags of weakly average Rainfall parameter. 4.3 Artificial neural network structure for weekly average Rainfall prediction parameter 4.4 Mapping of the number of epochs obtained for desired goal for ANN model for Rainfall parameter 4.5 Hybrid MLR_ANN structure for weekly average Rainfall prediction parameter. 4.6 Mapping of the number of epochs obtained for desired goal for hybrid MLR_ANN model for Rainfall parameter. 4.7 Plots of autocorrelation and partial autocorrelation coefficients and time lags of weakly average Maximum Temperature parameter. Page No. 28 29 31 32 34 35 37 38 40 41 43 44 46 47 91 96 97 97 99 99 101 LIST OF FIGURES (CONTINUED) Figure Title No. 4.8 Artificial neural network structure for weekly average Maximum Temperature prediction parameter 4.9 Mapping of the number of epochs obtained for desired goal for ANN model for Maximum Temperature 4.10 Plots of autocorrelation and partial autocorrelation coefficients and time lags of weakly average Minimum Temperature parameter. 4.11 Artificial neural network structure for weekly average Minimum Temperature prediction parameter. 4.12 Mapping of the number of epochs obtained for desired goal for ANN model for Minimum Temperature parameter. 4.13 Hybrid MLR_ANN structure for weekly average Minimum Temperature prediction parameter 4.14 Mapping of the number of epochs obtained for desired goal for Hybrid MLR_ANN model for Minimum Temperature parameter. 4.15 Plots of autocorrelation and partial autocorrelation coefficients and time lags of weakly average Relative Humidity at 7 AM parameter. 4.16 Artificial neural network structure for weekly average Relative Humidity at 7 AM prediction parameter. 4.17 Mapping of the number of epochs obtained for desired goal for ANN model for Relative Humidity at 7 AM parameter. 4.18 Plots of autocorrelation and partial autocorrelation coefficients and time lags of weakly average Relative Humidity at 2 PM parameter. 4.19 Artificial neural network structure for weekly average Relative Humidity at 2 PM prediction parameter. 4.20 Mapping of the number of epochs obtained for desired goal for ANN model for Relative Humidity at 2 PM parameter. 4.21 Plots of autocorrelation and partial autocorrelation coefficients and time lags of weakly average Pan Evaporation parameter. 4.22 Artificial neural network structure for weekly average Pan Evaporation prediction parameter. 4.23 Mapping of the number of epochs obtained for desired goal for ANN model for Pan Evaporation parameter. 4.24 Hybrid MLR_ANN structure for weekly average Pan Evaporation prediction parameter. 4.25 Mapping of the number of epochs obtained for desired goal for hybrid MLR_ANN model for Pan Evaporation parameter. Page No. 102 102 104 105 105 106 107 109 110 110 112 113 113 115 116 117 118 118 LIST OF FIGURES (CONTINUED) Figure Title No. 4.26 Plots of autocorrelation and partial autocorrelation coefficients and time lags of weakly average Bright Sunshine parameter. 4.27 Artificial neural network structure for weekly average Bright Sunshine prediction parameter. 4.28 Mapping of the number of epochs obtained for desired goal for ANN model for Bright Sunshine parameter. 4.29 Hybrid MLR_ANN structure for weekly average Bright Sunshine prediction parameter. 4.30 Mapping of the number of epochs obtained for desired goal for hybrid MLR_ANN Bright Sunshine parameter. 4.31 Plots of the Actual and Predicted weekly average Rainfall for training data set using Multiple Linear Regression, ARIMA, Artificial Neural Network, Hybrid MLR_ARIMA and Hybrid MLR_ANN models. 4.32 Plots of the Actual and Predicted weekly average Maximum Temperature for training data set using Multiple Linear Regression, ARIMA and Artificial Neural Network models. 4.33 Plots of the Actual and Predicted weekly average Minimum Temperature for training data set using Multiple Regression, ARIMA, Artificial Neural Network, Hybrid MLR_ARIMA and Hybrid MLR_ANN models. 4.34 Plots of the Actual and Predicted weekly average Relative Humidity 7AM for training data set using Multiple Linear Regression, ARIMA and Artificial Neural Network models. 4.35 Plots of the Actual and Predicted weekly average Relative Humidity 2PM for training data set using Multiple Linear Regression, ARIMA and Artificial Neural Network models. 4.36 Plots of the Actual and Predicted weekly average Pan Evaporation for training data set using Multiple Linear Regression, ARIMA, Artificial Neural Network, Hybrid MLR_ARIMA and Hybrid MLR_ANN models. 4.37 Plots of the Actual and Predicted weekly average Bright Sunshine for training data set using Multiple Linear Regression, ARIMA, Artificial Neural Network, Hybrid MLR_ARIMA and Hybrid MLR_ANN models. Page No. 120 121 121 122 123 125 129 132 135 139 142 145 LIST OF FIGURES (CONTINUED) Figure Title No. 5.1 Plots of the Actual and Predicted weekly average Rainfall for testing data set using Multiple Linear Regression, ARIMA, Artificial Neural Network, Hybrid MLR_ARIMA and Hybrid MLR_ANN models. 5.2 Plots of the Actual and Predicted weekly average Maximum Temperature (OC) for testing data set using Multiple Linear Regression, ARIMA and Artificial Neural Network models. 5.3 Plots of the Actual and Predicted weekly average Minimum Temperature (OC) for testing data set using Multiple Linear Regression, ARIMA, and Artificial Neural Network, Hybrid MLR_ARIMA and Hybrid MLR_ANN models. 5.4 Plots of the Actual and Predicted weekly average Relative Humidity 7AM for testing data set using Multiple Linear Regression, ARIMA and Artificial Neural Network models. 5.5 Plots of the Actual and Predicted weekly average Relative Humidity 2PM for testing data set using Multiple Linear Regression, ARIMA, Artificial Neural Network models. 5.6 Plots of the Actual and Predicted weekly average Pan Evaporation for testing data set using Multiple Linear Regression, ARIMA, Artificial Neural Network, Hybrid MLR_ARIMA and Hybrid MLR_ANN models. 5.7 Plots of the Actual and Predicted weekly average Bright Sunshine for testing data set using Multiple Linear Regression, ARIMA, Artificial Neural Network, Hybrid MLR_ARIMA and Hybrid MLR_ANN models. Page No. 150 155 159 164 168 172 177 INTRODUCTION 1.1 Scope 1.2 Motivation 1.3 Overview 1.4 Contribution 1.5 Objectives 1.6 Study area CHAPTER 1 INTRODUCTION 1.1 Scope Weather forecasting is an important issue in the field of meteorology all over the world. Several factors contribute significantly to increase the forecasting accuracy; one among them is the development of statistical methods for enhancing the scope and accuracy of model predictions. Numerous efforts have been devoted to develop and improve the existing time series weather forecasting models by using different techniques. The role of statistical methodology for predicting the weather parameters is considered to be most important for their precise estimates. Although, high-speed computers, meteorological satellites, and weather radars are tools that had played major roles in improving weather forecasts. But the improvement in initial conditions is the result of an increased number of observations and better use of the observations in computational techniques. Since, many efforts have been made by researchers to identify the best precise weather forecasting models. The combinations of linear and non-linear models are one of the most popular and widely used hybrid models for improving the forecasting accuracy. The present study is planned to investigate the potential for using the existing Multiple Linear Regression, Autoregressive Integrated Moving Average and Artificial Neural Network models to forecast weather parameters. A comparative study of the existing and proposed weather forecasting models is performed to identify the precise and reliable weather forecasting models. 1.2 Motivation The prediction of weather conditions can have significant impacts on various sectors of society in different parts of the country. Forecasts are used by government and industry to protect life and property and to improve the efficiency of operations, and by individuals to plan a wide range of daily activities. The notable improvement in forecast accuracy has been achieved since the 1950s, that is, a direct outgrowth of technological developments, basic and applied research, and the application of new knowledge and methods by weather forecasters. The advance knowledge of weather parameters in a particular region is advantageous in effective planning. Several studies on forecasting weather variables based on time series data in reference to a particular region have been carried out at national and international level both in the farm and nonfarm sectors. It was observed that the combination of two or more computational models/ hybrid models decomposes a time series into linear and non-linear form and prove to be better approach in comparison to single models for the reason that hybrid model produces small forecasting error in terms of accuracy. In contrary some of the studies also mentioned that hybrid approaches are not always better. Such uncertainty in weather forecasting models open up new opportunities for the selection of precise forecasting model. These aspects motivate this thesis, to explore the existing opportunities to identify the precise weather forecasting model. Predictions of weather parameters provide by such identify models based on time series data will be of particular interest to the weather forecasters. 1.3 Overview The prime contribution of this thesis is to compare the existing weather forecasting model and to select the precise model based on their predictive ability. The methodology consists of four stages for each study period data of weather parameters which are (i) Computation of descriptive statistics. (ii) Statistical analysis to identify the best fit probability distribution. (iii) Development of weather forecasting models and comparison of their predictive ability. (iv) Identification of precise and reliable weather forecasting model. These four components correspond to reduce forecasting errors by relaxing certain assumptions of traditional forecasting techniques. These components are interlinked to each other. The first component of the methodology explains the details of different measures of general statistics of time series data to explore the real situation of the different weather parameters. The objective of this phase is to understand the distribution pattern of weather data. The second component of the methodology is concerned with the fitting of the suitable probability distribution to each weather parameter independently by using different goodness of fit test. This methodology further establishes the analytical devised and testing procedure for future application. Weather forecasting models were developed using time series weather data and their predictive ability was compared using graphical and numerical performance evaluation criteria in the third stage of the methodology. Finally hybrid models were developed and appropriate forecasting model was identifying for future application of researchers in the related field. 1.4 Contribution This thesis is divided into six chapters starting with acknowledgement, table of contents and appendix in the end. Chapter first consists of a brief introduction which outlines the scope, motivation, describes overview of the proposed methodology with objectives and study area of the thesis as well as summarizing the major contributions in brief. Chapter second describes the recent advances in predicting weather parameters which provide the brief review of literature in the related field. Chapter third explain the descriptive statistics of seasonal and weekly weather data and presents the methodology for fitting the probability distribution to weather parameters using testing of goodness of fit. The procedure of identifying the best fit probability distribution was explained for each weather parameter. Chapter fourth describes the methodology of Multiple Linear Regression (MLR), Autoregressive Integrated Moving Average (ARIMA), Artificial Neural Network (ANN) and hybrid forecasting models in brief. Forecasting models developed for each weather variables are presented and hybrid model of Multiple Linear Regression with ARIMA and ANN is proposed. Finally a comparison of prediction ability of forecasting model is presented graphically for each weather parameters. Chapter fifth describes a comparison of models which are designed to identify the precise weather forecasting model. Finally the precise weather forecasting model is identified based on minimum value of mean error, mean absolute error, root mean square error and prediction error and maximum value of correlation coefficients. The last chapter presents the main summary of the thesis and discusses direction for future work. Bibliography is presented just after the last chapter. Graphs for the comparison of the actual and predicted weekly average weather parameters for training and testing data set using all the statistical techniques are inclusive in the respective chapters. The details of the computer programme used through standard software’s SAS and Matlab in the thesis are presented in the form of appendix at the end along with the research papers already published in related to our research work on weather forecasting models. 1.5 Objective The advance knowledge of weather parameters in a particular region is very helpful in sound planning. A reliable prediction of Indian monsoon in a region on seasonal and interseasonal time series is not only scientifically challenging but also important for the future planning. The role of statistical techniques for predicting the weather parameters at a particular place and time depends on an understanding of the past time series data. The transient behavior of weather parameters over a particular period of time makes difficult to predict correctly and consistently. Indian economy in general and especially in the field of Agriculture and Industry depends upon weather conditions. The frequent fluctuation in weather parameters in different part of India is being faced by the government/ non-government planning agencies. In recent time, the concept of combined models/ hybrid weather forecasting model is introduced to increase the accuracy in prediction. The problem to identify the precise weather forecasting model seems to be interesting. Thus providing reliable prediction and forecasting of weather parameters in the Himalaya in particular and of India in general is an important challenge for planners and scientists. Keeping in view a comparative study of weather forecasting models and to propose hybrid model for seasonal and inter-seasonal time series data is planned with the following objectives: 1.6 (i) To study the distribution pattern of weather parameters. (ii) Development of weather forecasting models. (iii) To compare the predictive ability of the developed model. (iv) To identify the precise and reliable weather forecasting model. Study Area The present study is based on a time series weather data of 50 years observed at the Pantnagar station and collected from the IMD approved meteorology observatory Dehradun, India. India is situated in the east direction of earth and lies between latitude 220°00’N and longitude 770°00’W. Pantnagar station is located at 29°N latitude and 79°3’ E longitudes approximately 243.89 meters above mean sea level, in the Tarai region of Uttarakhand and lies within the Shivalik Ranges of the Himalayan foothills. On an average the region has a humid subtropical climate having hot summers (40-42OC) and cold winters (2-4OC) with monsoon rains occurring from June to September. July is the rainiest month followed by August. In September, due to the depression from the Bay of Bengal, the local weather is affected, causing heavy rains. With the withdrawal of monsoon in September, the intensity of rainfall rapidly decreases till in November it becomes practically rainless. Rain Gauge Station suggest that the annual average rainfall in and around Pantnagar is of the order of 1400 mm. More than 80% of the rain is received from southwest monsoon during these four month period from June to September, and the rainfall of rainy season is significantly different from that of dry season. Winter precipitation in the region, associated with the passage of Western disturbances, is in the form of snowfall in the Higher Central Himalaya. The average monsoon season in and around Pantnagar region, ranges between 15 to 20 weeks. 17 weeks data set from 4th June to 30th September of each year is considered as inter season monsoon periods for our study. The data comprises of seven parameters viz. rainfall, maximum and minimum temperature, relative humidity at 7:00 AM and 2:00 PM, pan evaporation and bright sunshine hours, collected during the monsoon months on a time series weather data of 850 weeks from 1961 to 2010. REVIEW OF LITERATURE Probability Distribution Multiple Regression Autoregressive Integrated Moving Average Artificial Neural Network Comparison among MR, ARIMA and ANN CHAPTER 2 REVIEW OF LITERATURE Weather is a continuous, data-intensive, multi-dimensional, dynamic and chaotic process, and these properties make weather forecasting a formidable challenge. It is one of the most imperative and demanding operational responsibilities carried out by meteorological services all over the world. At present, the assessment of the nature and causes of seasonal climate variability is still conception. Since, it is a complicated procedure that includes numerous specialized fields of know-how (Guhathakurata, 2006); therefore, in the field of meteorology all decisions are to be taken in the visage of uncertainty associated with local of and global climatic variables. Several authors have discussed the vagueness associated with the weather systems. Chaotic features associated with the atmospheric phenomena also have attracted the attention of the modern scientists (Sivakumar 2001; Sivakumar et al. 1999; Men et al. 2004). Different scientists over the globe have developed stochastic weather models. It is often used to predict and warn about natural disasters that are caused by abrupt change in climate conditions. The variables defining weather conditions vary continuously with time, forming time series of each parameter and can be used to develop a forecasting model either statistically or using some other means that uses this time series data (Chatfield 1994; Montgomery and Lynwood 1996). Weather prediction modeling involves a combination of computer models, observation and knowledge of trends and patterns. Generally, two methods are used to forecast weather: (a) the empirical approach and (b) the dynamical approach (Lorenz, 1969). The first approach is based upon the occurrence of analogues and is often referred to by meteorologists as analogue forecasting. This approach is useful for predicting local-scale weather if recorded cases are plentiful. The second approach is based upon equations and forward simulations of the atmosphere, and is often referred to as computer modeling. Because of the grid coarseness, the dynamical approach is only useful for modeling large-scale weather phenomena and may not predict short-term weather efficiently. Many weather prediction systems use a combination of empirical and dynamical techniques. At macro level, weather forecasting is usually done using the data gathered by remote sensing satellites. Weather parameters like maximum temperature, minimum temperature, extent of rainfall, cloud conditions, wind streams and their directions, are projected using images taken by these meteorological satellites to asses future trends. The satellites-based systems are inherently costlier and require complete support system. Moreover, such systems are capable of providing only such information, which is usually generalized over a larger geographical area. The successful weather predictions are performed since early 1920’s. The practical use of numerical weather prediction starts in the middle of nineteenth century. A number of forecast models, both global and regional are being used to create forecasts. This chapter is intended to provide the brief review of literature in the field of weather forecasting models in general and specially for comparative study of weather forecasts models. Thus, the chapter is divided into five parts and the research work done in each field is being reviewed in different sections. 2.1 Probability distribution Analysis of weather data strongly depends on its probability distribution pattern. Establishing a probability distribution that provides a good fit to the weather parameter has long been a topic of interest in the fields of hydrology, meteorology and other fields. Several studies have been conducted in India and abroad on weather analysis and best fit probability distribution function such as normal, log-normal, gumbel, weibull and Pearson type distribution were identified. Fisher (1924) studied the influence of rainfall on the yield of wheat in Rothamasted. He showed that it is the distribution of rainfall during a season rather than its total amount which influence the crop yield. Tippet (1929) subsequently applied the technique on sunshine distribution and found that sunshine has beneficial effect throughout the year on wheat crop. Another useful line of work relating to the study of rainfall distribution was introduced by Manning (1956). He transformed the skew frequency distribution of rainfall to approximate closely to the theoretical normal distribution showing that fifteen observations were enough to get a reasonable good estimate of the distribution and confidence limit. Further, Rao et al. (1963) have used the extreme value distribution (EVD) on rainfall (Chow, 1964) to predict the flood and drought situations in the parts of India. Abraham (1965) has applied the fisher’s method to see the joint relationship of crop yield and weather variable (rainfall and temperature). Rai and Jay (1966) studied humidity and upper winds temperature over Madras in relation to precipitation occurrence and found the vertical distribution of temperature and humidity associated with dry or wet days over the same area. Benson (1968) adopted a large scale planning for improved flood plain management and expending water resources development and he suggested adopting a procedure where records are available for all government agencies. Along with Pearson type I, Gumble’s and log normal distribution, the log Pearson type III distribution has been selected as the based method with provision for departure from the base method were justified continuing study leading towards improvements or revision of method is recommended. Kulkarni and Pant (1969) studied the cumulative frequency distribution of rainfall of different intensities during south-west monsoon for 20 stations in India. The distribution was found to be exponential and curves were fitted to observed date by the method of least square. Mooley et al. (1970) have studied statistical distribution of rainfall during south-west and north-east monsoon season at representative stations in India. Gamma distribution has been fitted to rainfall data and has been tested by Chi-square test. Bhargava et al. (1974) showed that for a number of crops the distribution of rainfall over the season has a great influence on the yield. Krishnan and Kushwaha (1972) studied the mathematical distribution of accumulated rainfall for 2 pentads, 4 pentads,…., 20 pentads commencing from the onset of monsoon in respect of a typical arid zone stations. In case of Jaipur the distribution beyond a month is normal while for Jodhpur, distribution is not normal at all. Raman Rao et al. (1975) analyzed the daily rainfall data collected at Bijapur for the year from 1921 to 1970. Parthsarthy and Dhar (1976) have studied the trends and periodicities in the annual rainfall of the metrological subdivisions of Madhya Pradesh for the 60 years period. It was seen that the frequency distribution of annual rainfall of east and west M.P. is normal. Significant increase of 15% of the mean annual rainfall per 30 years was observed in west M.P. Cunnane et al. (1978) and Gringorten (1963) have used plotting rule for extreme probability paper to study the extreme value analysis. Mukherjee et al. (1979) made the studies to improve the weather bulletin and a beginning in this direction is made with a detailed study of rainfall even within same district. They observed that there is a wide variation in the intensity and distribution of rainfall. Mukherjee et al. (1980) have conducted the study of monthly seasonal and annual rainfall distribution for 16 stations in Pune and 11 stations in Ahmed Nagar for 50 years period. Combined study of two district shows that the rainfall distribution in western part of Pune is same as western part of Ahmed Nagar while rainfall in eastern part of Pune is same as that in eastern part of Ahmed Nagar. Kulandaivelu (1984) analyzed the daily precipitation data of Coimbatore for a period of 70 years for weekly totals by fitting incomplete Gamma distribution model. The data indicate the likely commencement of rains, period of drought length of growing season and end of growing season. Based on the assured rainfall at (50%) probability level, suitable cropping system was suggested for Coimbatore. Phien and Ajirajah (1984) showed that for the annual flood, annual maximum rainfall, annual stream flow and annual rainfall, the log-Pearson type III distribution was highly suitable after evaluating by Chi-square and Kolmogorov- Smirnov tests. Biswas and Khambete (1989) computed the lowest amount of rainfall at different probability level by fitting gamma distribution probability model to week by week total rainfall of 82 stations in dry farming tract of Maharashtra. Rao and Singh (1990) studied the distributions of weather variables and developed methodology for forecasting extreme values of weather variables at Pantnagar. They observed that the square root model ( y a b x cx) is approximate to predict wheat yield based on metrological observations. Gumble distribution was applied by Mukherjee et al. (1991) to estimate return period of reoccurrence of highest one day rainfall. Lin et al. (1993) stated that in accordance with the probability distribution all stations in same area can be classified in different clusters and special characteristic among a clusters can have spatial relationship to a certain extent in that cluster. Chapman (1994) evaluated five daily rainfall generating models with several methods and analyzed that Srikanthan-McMahon model performed well when calibrated with long rainfall records. Nese and Jon (1994) estimated the potential effect of the biases on the mean and standard deviation of a temperature distribution; biasing simulations were performed on various normal distributions. In addition, it is shown that these biases can affect other relevant climatic statistics. Duan et al. (1995) suggested that for modeling daily rainfall amounts, the Weibull and to a lesser extent the exponential distribution is suitable. Extreme value analysis was done for seven stations of Krishna Godavari agro-climatic zone in Andhra Pradesh of India, Kulshrestha et al. (1995). Similar analysis was made for 14 stations of Gujarat state of India and ascertained the most suitable type of distribution, Kulshrestha et al. (1999). Statistical distribution has been used to define extremes with given return periods (Aggarwal et al. 1988, Bhatt et al. 1996). Upadhaya and Singh (1998) stated that it is possible to predict rainfall fairly accurate using various probability distributions for certain returns periods although the rainfall varies with space, time and have erratic nature. Sen and Eljadid (1999) reported that for monthly rainfall in arid regions, gamma probability distribution is best fit hence which enables one to construct regional maps for the area of two gamma parameters, shape and scale. Rai and Chandrahas (1996) studied the effect of intensity and distribution pattern of weather parameters at different stages of crop growth and rice yield. They found that temperature and sunshine hours are effective at the growing phase, whereas, sunshine hours were found ineffective during early growth phase. Ogunlela (2001) studied the stochastic analysis of rainfall event in Illorin using probability distribution functions. He concluded that the log-Peasson type III distribution best described the peak daily rainfall data for Ilorin. Kar (2002) has predicted the extreme rainfall for mid central table zone of Orissa using Extreme value Type-I distribution and was concluded that extreme value type-I distribution was a good fit for predicting the one day maximum rainfall. Tao et al. (2002) recommended generalized extreme value model as the most suitable distribution after a systematic assessment procedure for representing extreme-value process and its relatively simple parameter estimation. Topaloglu (2002) reported that gumbel probability model estimated by the method of moments and evaluated by chi-square tests was found to be the best model in the Seyhan river basin. Fowler et al. (2003) have used two methods to assess rainfall extremes and their probabilities. One of the method comprised percentile approach (Kar, 2002) and the other used the statistical distributions of rainfall (Hennersy et al., 1997). Salami (2004) studied the meteorological data for Texas and found that Gumbel distribution fits adequately for both evaporation and temperature data, while for precipitation data log-Pearson type III distribution conforms more accurate. Lee (2005) indicated that log-Pearson type III distribution fits for 50% of total station number for the rainfall distribution characteristics of Chia-Nan plain area. Bhakar et al. (2006) observed the frequency analysis of consecutive days peaked rainfall at Banswara, Rajasthan, India, and found gamma distribution as the best fit as compared by other distribution and tested by Chi-square value. Deidda and Puliga (2006) found for left-censored records of Sardinia, that some weak are evident for the generalized Pareto distribution. Kwaku et al. (2007) revealed that the log-normal distribution was the best fit probability distribution for one to five consecutive days’ maximum rainfall for Accra, Ghana. Hanson et al., (2008) analysis indicated that Pearson type III distribution fits the full record of daily precipitation data and Kappa distribution best describes the observed distribution of wet-day daily rainfall. Olofintoye et al. (2009) examined that 50% of the total station number in Nigeria follows logPearson type III distribution for peak daily rainfall, while 40% and 10% of the total station follows Pearson type III and log-Gumbel distribution respectively. Sharma and Singh (2010) studied the distribution pattern for extreme value rainfall for Pantnagar data. Generalized extreme value distribution was observed in most of the weekly period as best fit probability distribution. There are a wide variety of previous studies which have explored the probability distribution of daily rainfall for the purpose of rainfall frequency analysis. However, we are unaware of any studies that have used recent developments. This research seeks to reexamine the question of which continuous distribution best fits the weekly average monsoon weather variable. Our primary objective is to determine a suitable distribution of monsoon season for each weekly average weather variables using different probability distributions. 2.2 Multiple Regression (MR) Forecasting models based on time series data are being developed for prediction of the different variables. Regression is a statistical empirical technique and is widely used in business, the social and behavioral sciences, the biological sciences, climate prediction, and many other areas. Linear and non-linear multiple regression models of different orders are also being used for predicting purpose based on the time series data. These models can consider more than one predictor for rainfall prediction. There are some limitations of multiple regression approach such as multiple collinearly, inter relation, extreme observation and non-linear relationship between dependent and independent variables. Goulden (1962) found the relationship between monthly average of weather parameters and crop yield using multiple regression technique. Ramchandran (1967) made an analysis of the normal rainfall of 167 observatory station distributed over India and the neighborhood country, using regression equation representing monthly and annually rainfall as a linear function of latitude, longitude and elevation above sea level. Bali (1970) found more precise results with the help of regression method for calculating average yields and explained the inadequacy of currently employed methods for forecasting crop yield in India. Huda et al. (1975) reported that a second degree multiple regression can be employed for studying the relationship between rice yield and weather variables. Huda et al. (1976) applied second degree multiple regression equation to quantify the relationship between maize yield and meteorological data and it was found that maize yield was affected differently by different weather variables during different stages of growth. Singh et al. (1979) gave hints for forecasting the yield rate by traditional and objective methods, on the basis of biometrical character as well as weather parameters. Regression studies on the relationship of crop yield with weather factor have been made. Agrawal et al. (1980) studied regression models for forecasting the yield of rice in Raipur district on weekly data using weather variables. Khatri et al. (1983) used regression analysis for crop estimation surveys on historic and rainfall data for developing the forecasting model with the help of stepwise regression analysis. Hastenrath (1988) developed statistical model using regression method to predict Indian summer monsoon rainfall anomaly. Singh (1988) developed a suitable pre-harvesting forecasting model with the help of multiple regression techniques for sugarcane yield. Singh and Bapat (1988) developed a preharvest forecast model using stepwise regression for selection of yield attribute to entering finally in forecast model. Pal (1995) studied the relationship between weather parameters and yields using linear multiple regression and second degree multiple regression equation based on time series weather data. Sparks (1997) developed a multiple regression model for time series data to predict a production in arid climate at high evaluations. Shashi Kumar et al. (1998) showed that the principal components regression gives better precision for the estimates than ordinary least square regression analysis. Vaccari et al. (1999) modeled plant motion time-series and nutrient recovery data for advanced life support using multi variable polynomial regression. Hassani et al. (2003) proposed human height prediction model based on multiple polynomial regression that was used successfully to forecast the growth potentials of height with precision and was helpful in children growth study. Sen (2003) has presented long-range summer monsoon rainfall forecast model based on power regression technique with the use of Ei Nino, Eurasian snow cover, North West Europe temperature, Europe pressure gradient, 50 hpa Wind pattern, Arabian sea SST, east Asia pressure and south Indian ocean temperature in previous year. The experimental results showed that the model error was 4%. Nkrintra et al. (2005) described the development of a statistical forecasting method for SMR over Thailand using multiple linear regression and local polynomial based non-parametric approaches. SST, sea level pressure (SLP), wind speed, EiNino Southern Oscillation (ENSO), IOD was chosen as predictors. The experiments indicated that the correlation between observed and forecast rainfall was 0.6. Sohn et al. (2005) has developed a prediction model for the occurrence of heavy rain in South Korea using multiple linear regression, decision tree and artificial neural network. They used 45 synoptic factors generated by the numerical model as potential predictors. Anderson et al. (2006) examines the possibility of forecasting traffic volumes by using a multiple linear regression model to perform what is termed direct demand forecasting and obtained consistent results from the traditional four-step methodology. Zaw and Naing (2008) performed the modeling of monthly rainfall prediction over Myanmar by applying the polynomial regression equation and compared with multiple linear regression model. Experiments indicated that the prediction model based on MPR has higher accuracy than MLR. Radhika and Shashi (2009) used time series data of daily maximum temperature and found non-linear regression method suitable to train support vector machines (SVMs) for weather prediction. Kannan et al. (2010) computed values for rainfall fall in the ground level using five years input data by Karl Pearson correlation coefficient and predicted for future years rainfall fall in ground level by multiple linear regression. Ghani and Ahmad (2010) applied six types of linear regression including stepwise multiple regression to select the suitable controlled variables in forecast fish landing. 2.3 Autoregressive Integrated Moving Average (ARIMA) Two popular models for seasonal time series are multiplicative seasonal ARIMA (autoregressive-integrated-moving average) models (Box and Jenkins 1976) and ARIMA component (structural) models. Despite the rising popularity of ARIMA component models in the time series literature of recent years, empirical studies comparing these models with seasonal ARIMA models have been relatively rare. Cottrell et al. (1995) proposed a systematic methodology to determine which weights are nonsignificant and to eliminate them to simplify the architecture. They tried to combine the statistical techniques of linear and nonlinear time series with the connectionist approach. Zhang and Qi (2003) investigated as how to effectively model time series with both seasonal and trend patterns. They found that combined detrending and deseasonalization is the most effective data preprocessing approach. Campbell and Diebold (2005) used simple time-series approach to modeling and forecasting daily average temperature in U.S. cities and found it useful for the vantage point of participants in the weather derivatives market. Iqbal et al. (2005) made the study on ARIMA to forecast the area and production of wheat in Pakistan. Further, suggesting that the scope of higher area and production lies in adequate availability of inputs, educating and training the farming community, soil conservation and reclamation, and especially the supportive government policies regarding wheat cultivation in the country. Zhou and Hu (2008) proposed a hybrid modeling and forecasting approach based on the grey and the Box–Jenkins autoregressive moving average (ARMA) models to forecast the gyrodrift concluding that the hybrid method has a higher forecasting precision to the complex problems than the single method. Kal et al. (2010) developed a framework to determine the optimal inventory policy under the environment that the leadtime demand is generated by the ARIMA process. Alnaa and Ahiakpor (2011) considered ARIMA model to predict inflation in Ghana. Inflation was predicted highest for the months of March, April and May. Further, suggesting that inflation has a long memory and that once the inflation spiral is set in motion, it will take at least 12 periods (months) to bring it to a stable state. Badmus and Ariyo (2011) utilized ARIMA for forecasting the cultivated area and production of maize in Nigeria. They concluded that the total cropped area can be increased in future, if land reclamation and conservation measures are adopted. Saima et al. (2011) explained a hybrid fuzzy time series model is proposed that will develop an Interval type 2 fuzzy model based on ARIMA. IT2-FLS is utilized here for handling the uncertainty in the time series data to obtain accurate forecasting result. Ghosh et al. (2012) developed a model based on ARIMA to depict the future prospects of coal based thermal power sector of India. The evidence showed that India needs to identify alternative sources of power generation to grow without damaging world and maintaining sustainability. 2.4 Artificial Neural Network (ANN) An Artificial Neural Network is a powerful data modeling tool that provides a methodology for solving many types of non-linear problems that are difficult to solve by traditional techniques. Neural Network makes very few assumptions as opposed to normality assumptions commonly found in statistical methods. From a statistician’s point of view neural networks are analogous to nonparametric, nonlinear and regression models. The ANN approach has several advantages over conventional phenomenological or semi-empirical models, since they require known input data set without any assumptions (Gardner and Dorling, 1998; Nagendra and khare, 2006). It exhibits rapid information processing and is able to develop a mapping of the input and output variables. Such a mapping can subsequently be used to predict desired outputs as a function of suitable inputs (Nagendra and Khare, 2006). The ANNs use many simplifications over actual biological neurons that help us to use the computational principles employed in the massively parallel machine (Haykin 1999). The neural networks adaptively change their synaptic weights through the process of learning. Feed Forward Neural Networks with Back-propagation (BKP) of error have been used in past for modeling and forecasting various parameters of interest using time series data, Cottrell el al. (1995) used the time series modeling to provide a method for weight elimination in ANNs. Since the last few decades, ANN a voluminous development in the application field of ANN has opened up new avenues to the forecasting task involving atmosphere related phenomena (Gardner and Dorling, 1998; Hsieh and Tang, 1998). The prediction in an artificial neural network method (ANN) always takes place according to any data situation (without limitation) based on initial training as indicated by Adielsson (2005). Thus, for forecasting, certain statistical techniques can be combined with the connectionist approach of ANN exploiting the information contained in linear or nonlinear time series. The knowledge that an ANN gains about a problem domain is encoded in the weights assigned to the connections of the ANN. The ANN can then be thought as a black box, taking in and giving out information (Roadknight et al. 1997). ANN non-linear models have been widely used for resolving forecast problem as identified by Hill et al. (1996), Faraway and Chatfield (1998), Kaashoek and Van Dijk (2001), Tseng et al. (2002), Altun et al. (2007), Fallah-Ghalhary (2009), Wu et al. (2010) and El-Shafie et al.(2011). Hu (1964) initiated the implementation of ANN, an important Soft Computing methodology in weather forecasting. Forecasting the behavior of complex system has been a broad application domain for neural networks. In particular such as electric load forecasting (Park et al. 1991), economic forecasting (Refenes et al. 1994), forecasting natural physical phenomena (Weigend et al., 1994), river flow forecasting (Atiya et al. 1996) and forecasting student admission in colleges (Puri et al. 2007) have been widely studied. A successful application of ANN to rainfall forecasting has been done by French et al. (1992) who applied a neural network to forecast one-hour-ahead, two-dimensional rainfall fields on a regular grid. Moro et al. (1994) applied a neural network approach for weather forecasting for local data. Kalogirou et al. (1997) implemented ANN to reconstruct the rainfall time series over Cyprus. Kuligowski and Barros (1998) analyzed a precipitation forecasts model using neural network approach. Lee et al. (1998) applied Artificial Neural Network in rainfall prediction by splitting the available data into homogeneous subpopulations. Wong et al. (1999) constructed fuzzy rule bases with the aid of SOM and back propagation neural networks and then with the help of the rule base developed predictive model for rainfall over Switzerland using spatial interpolation. Atiya et al. (1997) studied the generalization performance in large network, which means producing appropriate outputs for those input samples not encountered during training process of the network is best described by training data size and number of synaptic weights. Share prices are also taken as time series to forecast stock prices of the future (Mathur et al. 1998). Maqsood et al. (2002a, 2002b) used neurocomputing based weather monitoring and analysis models. Anmala et al. (2000) reported that recurrent networks may perform better than standard feed forward networks in predicting monthly runoff. Sahai et al. (2000) applied the ANN technique to five time series of June, July, August, September monthly and seasonal rainfall. The previous five years values from all the five time series were used to train the ANN to predict for the next year. They found good performance in predicting rainfall. Toth et al. (2000) investigated the capability of ANN in short-term rainfall forecasting using historical rainfall data as the only input information. Kishtawal et al. (2003) assessed the feasibility of a nonlinear technique based on genetic algorithm, an Artificial Intelligence technique for the prediction of summer rainfall over India. Guhathakurta (2006) was the first ever to implement ANN technique to predict summer monsoon rainfall over a state of India. Miao et al. (2006) developed almost seven different methods for ANN and each one can be used in a different analysis rather than classical statistical methods in identifying factors influencing corn yield and grain quality variability. Chattopadhyay (2007) analyzed that neural network with three nodes in the hidden layer is found to be the best predictive model for possibility of predicting average summer-monsoon rainfall over India. Paras el al. (2007) concluded that neural networks are capable of modeling a weather forecast system. Statistical indicators chosen are capable of extracting the trends, which can be considered as features for developing the models. Hayati et al. (2007) showed that multi-layer perceptron (MLP) network has minimum forecasting error for each season and can be considered as a good method for temperature forecasting. Kumar et al. (2007) presented reasonably good Artificial Intelligence approaches for regional rainfall forecasting for Orissa state, India on monthly and seasonal time series scale. The study emphasizes the value of using large scale climate teleconnections for regional rainfall forecasting and the significance of Artificial intelligence approaches in predicting the uncertain rainfall. Hung et al. (2009) developed the ANN model and applied for real time rainfall forecasting and flood management is Bangkok, Thailand, Resulting that ANN forecasts have superiorly over the ones obtained by the persistent model. Rainfall forecast for Bangkok from 1 to 3 h ahead were found highly satisfactory. Sharma et al. (2011) proposed hybrid MR with ANN for the Himalayan monsoon data, suggesting that hybrid techniques can be used as a reliable rainfall forecasting tool in the Himalaya. 2.5 Comparison among of MR, ARIMA and ANN The comparative evaluation of the performance of all the three models has been conducted using forecasting weather variable and many other time series data. Many studies have been conducted by Lek (1996), Starett (1998), Manel (1999), Salt (1999), Ozesmi (1999), Gail (2005), Diane (2007) with their co-authors and Pastor (2005) to compare two methods to show that in predicting the dependent variable, the ANN method results are more accurate than Multiple Linear Regression. The performance of ANN and traditional statistical methods were also compared and discussed by Kumar (2005), Pao (2006), Wang and Elhag (2007), Zhang (2001) and Wang et al. (2010). Dutta and Shekhar (1988) compared neural networks to a multiple regression model in an application to predict bond ratings based on ten financial variables. The neural network model consistently outperformed the regression model yielding a success rate of 88.3 percent versus 64.7 percent for regression. In addition, the neural network was never off by more than one rating while the regression model was often off by several ratings. Marquez et al. (1991) compared the performance of neural network models to various regression models. They tested the data with the correct regression model, two other regression models that were one-step in either direction away from the correct model on a ladder of expression, and two neural network models. The neural network was generally within two percent of the mean absolute percentage error of the current model for the linear and inverse cases. The neural network performance on the log data was poorer and in general the neural networks did not perform quite as well as the other regression models but the authors concluded that neural networks have considerable potential as an alternative to regression. Specht and Donald (1991) examined the use of neural network to perform the function of multiple linear regressions. They compared the use of neural networks with standard linear regression in four cases: the regression model was correctly specified with all assumptions valid; the regression model was correctly specified, but the data contained an outlier; the regression variables exhibited multicollinearity; and the regression model was incorrectly specified by omitting an interaction. The authors concluded that neural networks are robust and relatively insensitive to problems with bad data, model assumptions, and faulty model construction. Chang et al. (1991) used neural networks to forecast rainfall based on time series data. The data showed both seasonal and cyclical components that were incorporated into the input data set. The forecasts were for one month into its future. Based on the mean square error, neural network outperformed the unnamed statistical approach. Dulibar (1991) performed a study to predict the performance of carriers in a particular segment of the transportation sector. In the study, she compared a neural network model to several regression models. The measure of performance used was percent capacity utilization. The neural network performed better than some of the regression models and not as well as others. According to the author, a possible reason for the poorer performance of the neural network model is that he firms involved in the study were explicitly included in the regression models but not in the Neural Network model. Raghupathi et al. (1991) found that a neural network provided 86 % correct classifications and would therefore likely provide a good model for the bankruptcy prediction process. Salchenberger et al. (1992) used neural networks to predict thrift failures. They compared a neural network to the more traditional logit model. They used five financial ratios in attempting to classify an institution as one that would fail or not fail. For each data set created, the neural network performed at least as well as the logit model. Also, as the classification cutoff was lowered, the neural network committed less type I errors than the logit model. Tam and Kiang (1992), neural networks were compared to several popular discriminant analysis methods in a bank failure classification application. The sample consisted of 59 matched pairs of Texas banks. The neural network model showed better predictive accuracy on the test set than the other methods. Wu and Yen (1992) proposed a neural network structure and the associated design-oriented procedure for neural network development for regression applications. The proposed methodology is illustrated by two practical applications. One is a linear regression case concerning the relation between marginal cost and cumulative production; the other is a nonlinear regression case concerning the yield of wheat corresponding to the application rate of fertilizer. They compared the results of the regression techniques with those of neural networks. Fletcher and Goss (1993) used financial ratios to compare neural networks to a logit model. The output of the models represented the probability that a particular firm would fail. The neural network was 82.4 % accurate with a 0.5 cutoff versus 77 % for the logit model. Similar results were found at other cutoff values. Also, the neural network had less error variance and lower prediction risk than the logit model. Yi and Prybutok (1996) compared neural networks to multiple regression and ARIMA models in an application to predict the maximum ozone concentration in a large metropolitan area. The independent variables consisted of nine meteorological and auto emission measures. The neural network model was statistically superior to both the regression and ARIMA models. Empirical results have shown that Neural Networks outperform linear regression as data quality varies was described by Bansal et al. (1993) and Marquez et al. (1991). Michaelides et al. (1995) compared the performance of ANN with multiple linear regression in estimating missing rainfall data over Cyprus. Goh (1996, 1998) studied the comparison between ANN and multiple regressions in construction, management, and engineering. Comrie (1997) studied multiple regression models and neural networks are examined for a range of cities under different climate and ozone regimes, enabling a comparative study of the two approaches, resulting neural network techniques are better than regression models for daily ozone prediction. Ostensibly, Neural Network is simply an extension of regression modeling which can be referred to a flexible non-linear regression models, Menamin and Stuart (1997). Venkatesan et al. (1997) have used neural network technique to predict monsoon rainfall of India using few predictions and compared the results with linear regression techniques, showing that the model based on neural network technique performed better. Baker (1998) compared linear regression and neural network methods for forecasting educational spending and found Neural Network provide comparable prediction accuracy. Man-Chung et al. (1998) proposed conjugate gradient with multiple linear regression (MLR) weight initialization requires a lower computation cost and learns better than steepest decent with random initialization for financial time series data collected from Shanghai Stock Exchange. Ranasinghe et al. (1999) compared ANN and multiple regression analysis in estimating willingness to pay for urban water supply found that forecasting error of the best ANN model was about half of the best multiple regression model. Hippert et al. (2000) proposed a hybrid forecasting system that combines linear models and multilayer neural networks to forecast hourly temperatures based on the past observed temperatures and the maximum and minimum forecast temperatures supplied by the weather service. Mathur et al. (2001) made a comparative study of neural network and regression models for predicting stock prices and results verified the suitability and superiority of neural network model over regression model. Victor (2001) described a neural network forecasting model as an alternative to regression model for messy data problems and limitations in variable structure specification. Zhang (2003) applied a hybrid methodology that combines both ARIMA and ANN models to take advantage of the unique strength of ARIMA and ANN models in linear and nonlinear modeling. Maqsood et al. (2004) developed neural network based ensemble models and applied for hourly weather forecasting of solution Saskatchewan. The experimental results show that the ensemble network can be trained effectively without excessively compromising the performance. Further, compared to the regression models, the ensemble networks forecast the weather parameter with higher accuracy. Suhartono et al. (2005) made a comparative study of forecasting models for trend and seasonal time series, concluding that the more complex model does not always yield better forecast than the simpler one, especially on the testing samples. Also showing FFNN model always yields better forecast in training data and it indicates an over fitting problem. Taskaya-Temizel et al. (2005) suggested that the use of a nonlinear component may degenerate the performance of hybrid methods and that a simpler hybrid comprising linear AR model with a TDNN outperforms the more complex hybrid in tests on benchmark economic and financial time series. Somvanshi et al. (2006) examined two fundamental different approaches ARIMA and ANN for designing a model and predict the behavioral pattern in rainfall phenomena based on past behavior. The study revealed that ANN model can be used as an approximate forecasting tool to predict the rainfall, that out performs the ARIMA model. Pandey et al. (2008) made a comparative study of neural network & fuzzy time series forecasting techniques for crop yield and observed that neural network produces more accurate results in comparison of fuzzy time series methods. Pao (2008) studied multiple linear regressions and neural networks models with seven explanatory variables of corporation’s feature and three external macro-economic control variables to analyze the important determinants of capital structures of high-tech and traditional industries in Taiwan, respectively. The ANN models achieve a better fit and forecast than the regression models for debt ratio. Aladag et al. (2009) studied a new hybrid approach combining ERNN and ARIMA modes to time series data resulting with best forecasting accuracy. Kulshrestha et al. (2009) examined that ANN gives more accurate results to predict the probability of extreme rainfall than the probability by Fisher-Tippet Type II distribution. Khashei and Bijari (2011) proposed hybrid methodologies c ombining linear models such as ARIMA and nonlinear models such as ANNs together and found them more effective than traditional hybrid methodologies. Sharma and Singh (2011) studied the forecasting models to make comparison among the model to identify the appropriate model for the prediction of rainfall, concluding that ANN approach is better than other models. Zaefizadeh et al. (2011) showed that in the ANN technique the mean deviation index of estimation significantly was one-third of its rate in the MLR, because there was a significant interaction between genotype and environment and its impact on estimation by MLR method. Therefore, they recommended ANN approach as better predictor of yield in Barley than in multiple linear regression. Recently, Ghodsi and Zakerinia (2012) used ARIMA, ANN and fuzzy regression to analyse price forecasting. Fuzzy regression was found to be best method in forecasting. Teri and Onal (2012) conducted MLR and ANN to forecast monthly river flow in Turkey. The performance of the models suggested that the flow could be forecasting easily from available flow data using ANN. After a thorough review about the forecasting model, it becomes necessary to compare the efficiency of different weather forecasting model and to identify the more precise weather forecasting model. In the present work MLR, ARIMA and ANN models were used and hybrid model of MLR with ARIMA and ANN, and a comparative analysis was done using different analytical methods for weekly average time series data so as to identify the appropriate model. FITTING OF PROBABILITY DISTRIBUTION Introduction Descriptive Statistics Methodology Probability Distribution Pattern Conclusion CHAPTER 3 FITTING OF PROBABILITY DISTRIBUTION 3.1 Introduction Establishing a best fit probability distribution for different parameter has long been a topic of interest in the field of meterology. The investigation of weather parameter distribution strongly depends upon their distribution pattern. The present study is planned to identify the best fit probability distribution based on distribution pattern for different data set. The 16 probability distribution are identified out of large number of commonly used probability distribution for such type of study. The descriptive statistics are computed first for each weather parameters for different study periods. The parameters are discussed and explained through tables and graphs. The test statistics D, A2 and 2 are computed for all 16 probability distribution. The best fit probability distribution is identified based on highest ranks computed through all the three tests independently. The best fit probability distribution so obtained is presented with their test statistic value in each study period. It was further weighted using highest scores of selected probability distribution for each study period. The combination of total test score of all the three test statistics was computed for all the 16 probability distributions. The distribution having the maximum score obtained from 18 set of data is identified. The parameters of the best fit probability distribution for different data set of each weather parameters are presented. Fitted distribution was used to generate random number for each data set. Finally, the best fit probability distribution for each weather parameters was identified using the least square method. 3.2 Descriptive Statistics The weekly data of seven parameters viz. rainfall, maximum and maximum temperature, relative humidity at 7.00 am and 2.00 pm. bright sunshine hours and pan evaporation for the four monsoon months were recorded. The monsoon season in this region lies between 15 to 20 weeks. Keeping this point in view 17 weeks weather data from 4th June to 30th September of each year was considered for the present study. The descriptive statistics of the seasonal and weekly weather data set was computed resulting the mean, standard deviation, skewness coefficient and coefficient of variation for all the seven parameters. Minimum and maximum weekly value is also presented for each weather parameter. The standard deviation indicate about the fluctuation of the parameter. The coefficient of skewness are computed for all parameter which explain about the shape of the curve. The coefficient of variation was computed for each parameters which explain the variability in the data. The details of each parameter was also presented in the form of graphs. The period-wise details of each weather parameters is presented and discussed in the subsequent sections. Rainfall (mm) The study period-wise summary of rainfall is presented in table 3.1 along with mean, standard deviation, skewness, coefficient of variation, maximum and minimum values. Table 3.1. Summary of statistics for Rainfall. Parameters Study Period Data (From – To) Mean Standard Deviation Skewness Coefficient of variation Maximum (weekly total) Minimum (weekly total) Seasonal 1 week 2 week 3 week 4 week 5 week 6 week 7 week 8 week 9 week 10 week 11 week 12 week 13 week 14 week 15 week 16 week 17 week 4June-30Sep 4 june-10 june 11 june-17 june 18 june-24 june 25 june-1 july 2 july-8 july 9 july-15 july 16 july-22 july 23 july-29 july 30 july-5 aug 6 aug-12 aug 13 aug-19 aug 20 aug-26 aug 27 aug-2 sep 3 sep-9 sep 10 sep-16 sep 17sep-23 sep 24 sep-30 sep 73.20 29.07 41.21 46.32 57.17 75.44 102.50 96.77 109.31 74.83 93.99 104.07 99.55 75.55 79.26 60.43 62.25 36.59 80.40 53.92 57.11 51.61 52.47 86.77 88.37 78.75 98.87 59.20 87.17 94.04 87.83 82.09 81.90 68.96 92.92 62.98 0.4402 3.1482 2.3822 1.941 1.4613 1.8573 1.1507 1.6998 1.2496 0.8854 1.8554 1.4451 1.4541 1.8776 1.5354 1.5986 1.7897 1.9873 0.4022 1.8546 1.3859 1.1142 0.9177 1.1501 0.8621 0.8139 0.9045 0.7911 0.9275 0.9036 0.8823 1.0865 1.0333 1.1411 1.4926 1.7211 443.20 263.60 291.80 217.40 245.20 361.60 355.10 396.00 434.20 223.30 443.20 422.80 395.20 413.20 353.20 296.20 347.60 230.40 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.80 0.00 0.90 0.00 3.10 0.00 0.00 0.00 0.00 0.00 0.00 Where, the mean of seasonal rainfall of 50 years was 73.20 mm and mean of weekly rainfall was varying from 29.07 mm in first week of June to 109.31 mm in last week of July. The maximum value of weekly rainfall lies between 217.4 mm in third week of June in year 1975 to 443.20 mm in fourth week of August in year 2000. The weekly minimum value of the rainfall in most of the weeks in 50 years was zero except for the year 2002 which was the highest in third week of August. The standard deviation for seasonal rainfall for 50 years was 80.40 mm while the weekly variation of standard deviation ranging from 51.61 mm in third week of June to 98.87 mm in fourth week of July. The graphical representation of the weekly rainfall is shown in figure 3.1 and weekly rainfall statistics for seasonal 850 weeks of total 50 years is also presented in figure 3.2. Figure 3.1 Mean, standard deviation and range of weekly Rainfall. The maximum value of coefficient of variation for weekly data was observed 1.8546 in the first week of June which indicates maximum fluctuation in the rainfall data set, that is, a large variation in the occurrence of rainfall during the 50 years was observed. The measure of skewness in seasonal data was 0.4402 and ranging from 0.8854 in first week of August to 3.1482 in first week of June, which further shows a large degree of asymmetry of a distribution around its mean. Maximum Temperature (OC) The summary of statistics for maximum temperature were presented in table 3.2 along with mean, standard deviation, skewness, coefficient of variation, maximum and minimum values, where, the mean of weekly maximum temperature of 50 years seasonally was 33.08 OC and mean of weekly maximum temperature was varying from 31.78 OC in second week of September to 37.75 OC in first week of June. The maximum value of seasonal maximum temperature was 43.20 O C in two years 1966 and 1967 and that of weekly maximum temperature was lying between 34.00 OC in third week of September in year 2007 to 43.20 OC in first week of June in two years 1966 and 1967. It was moreover observed that the minimum value of the seasonal maximum temperature was 23.60 OC in the year 2005. The weekly minimum value of the maximum temperature was between 23.60 OC in fourth week of June in year 2005 to 30.20 OC in first week of June in year 2002. The standard deviation for seasonal maximum temperature for 50 years was 2.67 OC while the weekly variation of standard deviation ranging from 1.19 OC in fourth week of September to 3.33 OC in first week of June. The maximum value of coefficient of variation for weekly data was observed as 0.1018 in the second week of June which indicates a large fluctuation in the maximum temperature data set. The measure of skewness in seasonal data was -0.2994 and ranging from -1.3196 in first week of September to 0.9789 in second week of July indicating the degree of asymmetry of a distribution around its mean. The graphical representation of the weekly maximum temperature is shown in figure 3.3 and weekly maximum temperature statistics for seasonal 850 weeks of total 50 years is also presented in figure 3.4. Table 3.2. Summary of statistics for Maximum Temperature. Parameters Study Period Data (From – To) Mean Standard Deviation Skewness Coefficient of variation Maximum Minimum Seasonal 1 week 2 week 3 week 4 week 5 week 6 week 7 week 8 week 9 week 10 week 11 week 12 week 13 week 14 week 15 week 16 week 17 week 4June-30Sep 4 june-10 June 11 june-17 june 18 june-24 june 25 june-1 july 2 july-8 july 9 july-15 july 16 july-22 july 23 july-29 july 30 july-5 aug 6 aug-12 aug 13 aug-19 aug 20 aug-26 aug 27 aug-2 sep 3 sep-9 sep 10 sep-16 sep 17sep-23 sep 24 sep-30 sep 33.08 37.75 36.44 35.53 33.83 33.80 32.84 32.25 32.10 32.23 32.35 31.88 31.89 32.11 31.79 31.78 31.85 31.91 2.67 3.33 3.71 3.01 2.55 1.89 1.95 1.48 1.43 1.39 1.26 1.35 1.19 1.32 1.79 1.39 1.26 1.46 -0.2994 -0.3093 -0.5374 0.4169 -0.7018 0.5194 0.9789 -0.2906 -0.1761 0.1455 0.5677 -0.3130 0.0737 -0.5801 -1.3196 -0.3879 -0.9759 -0.7634 0.0239 0.0881 0.1018 0.0847 0.0754 0.0558 0.0591 0.0459 0.0444 0.0433 0.0390 0.0425 0.0373 0.0411 0.0565 0.0436 0.0397 0.0458 43.20 43.20 42.70 42.90 41.90 39.30 39.70 34.90 34.90 36.00 36.00 34.10 35.40 35.10 35.30 34.80 34.00 34.20 23.60 30.20 24.40 29.30 23.60 29.80 29.60 28.60 28.20 29.20 29.90 28.50 29.10 28.60 24.80 28.20 28.20 28.10 Figure 3.3 Mean, standard deviation and range of weekly Maximum Temperature. Minimum Temperature (OC) The summary of statistics for minimum temperature in different period is presented in table 3.3 along with mean, standard deviation, skewness, coefficient of variation, maximum and minimum values, where, the mean of minimum temperature of 50 years seasonally was 24.43 OC and mean of weekly minimum temperature was varying from 21.57 OC in last week of September to 25.28 OC in fourth week of June. The maximum value of seasonal minimum temperature was 29.20 OC in year 1995 and for weekly minimum temperature lies between 24.50 OC in third week of September in year 1998 to 29.20 OC in second week of June in year 1995. It was further observed that the minimum value of the seasonal minimum temperature was 17.20 OC in the year 1984. The weekly minimum value of the minimum temperature was between 17.20OC in last week of September in year 1984 to 23.20 OC in second, third and fourth week of July in years 1976, 1979 and 1981, respectively and also in first week of August in year 1975. The standard deviation for seasonal minimum temperature for 50 years was 1.51OC while the weekly variation of standard deviation ranging from 0.68 OC in third week of July to 1.81 OC in third week of June. The maximum value of coefficient of variation for weekly data was observed to be 0.0714 in the last week of September which indicates fluctuation in the minimum temperature data set. The measure of skewness in seasonal data was -0.6409 and ranging from -2.3776 in third week of August to 0.0132 in third week of June indicating the degree of asymmetry of a distribution around its mean. The graphical representation of the weekly minimum temperature is shown in figure 3.5 and weekly minimum temperature statistics for seasonal 850 weeks of total 50 years is also presented in figure 3.6. Table 3.3. Summary of statistics for Minimum Temperature. Parameters Study Period Data (From – To) Mean Standard Deviation Skewness Coefficient of variation Maximum Minimum Seasonal 1 week 2 week 3 week 4 week 5 week 6 week 7 week 8 week 9 week 10 week 11 week 12 week 13 week 14 week 15 week 16 week 17 week 4June-30Sep 4 june-10 June 11 june-17 june 18 june-24 june 25 june-1 july 2 july-8 july 9 july-15 july 16 july-22 july 23 july-29 july 30 july-5 aug 6 aug-12 aug 13 aug-19 aug 20 aug-26 aug 27 aug-2 sep 3 sep-9 sep 10 sep-16 sep 17sep-23 sep 24 sep-30 sep 24.43 24.07 24.70 25.01 25.28 25.22 25.25 25.04 25.17 25.11 25.01 24.80 24.65 24.45 24.01 23.44 22.54 21.57 1.51 1.75 1.54 1.81 1.70 1.17 0.89 0.68 0.93 0.80 0.77 0.90 0.70 0.81 0.70 0.80 1.34 1.54 -0.6409 -0.2130 0.0132 2.0372 2.1990 -1.4444 -0.0492 -0.4531 0.2800 0.1502 -0.3468 -2.3776 -0.5365 -2.1114 -0.4238 -1.4321 -1.4456 -0.7552 0.0192 0.0727 0.0625 0.0723 0.0673 0.0463 0.0350 0.0272 0.0367 0.0318 0.0308 0.0361 0.0284 0.0331 0.0292 0.0341 0.0596 0.0714 29.20 28.30 29.20 33.50 34.00 27.90 27.30 26.20 27.20 27.10 26.60 26.40 26.50 25.80 25.30 24.80 24.50 24.90 17.20 18.80 21.50 21.50 20.00 20.10 23.20 23.20 23.20 23.20 22.70 20.30 22.60 20.70 22.20 20.10 17.60 17.20 Figure 3.5 Mean, standard deviation and range of weekly Minimum Temperature. Relative Humidity at 7AM (%) The study period wise summary of Relative Humidity at 7 AM is presented in table 3.4 along with mean, standard deviation, skewness, coefficient of variation, maximum and minimum values, where, the mean of Relative Humidity at 7 AM of 50 years seasonally was 86.71 % and mean of weekly average Relative Humidity at 7 AM was varying from 66.80 % in first week of June to 91.65 % in fourth week of August. The maximum value of seasonal average Relative Humidity at 7 AM was 98 % in four years 1988, 1995, 1996 and 1998 and for weekly average Relative Humidity at 7 AM lies between 87 % in first week of June in years 1971 and 1984 to 98 % in second and third week of August in year 1988 and 1995 respectively and also in first week of September in the two years 1996 and 1998. It was moreover pragmatic that the least value of the seasonal average Relative Humidity at 7 AM was 38 % in the year 1966. The weekly minimum value of the Relative Humidity at 7 AM was between 38 % in first week of June in year 1966 to 85 % in second week of September in two years 1981 and 2008. The standard deviation for seasonal average Relative Humidity at 7 AM for 50 years was 9.35 % while the weekly variation of standard deviation ranging from 2.97 % in second week of September to 12.16 % in first week of June. The maximum value of coefficient of variation for weekly data was observed as 0.1820 in the first week of June which indicates fluctuation in the Relative Humidity at 7 AM data set. The measure of skewness in seasonal data was -0.6237 and ranging from -1.7081 in fourth week of July to 0.0113 in first week of June indicating the degree of asymmetry of a distribution around its mean. The graphical representation of the weekly Relative Humidity at 7 AM is shown in figure 3.7 and weekly Relative Humidity at 7 AM statistics for seasonal 850 weeks of total 50 years is also presented in figure 3.8. Table 3.4. Summary of statistics for Relative Humidity at 7 AM. Parameters Study Period Data (From – To) Mean Standard Deviation Skewness Coefficient of variation Maximum Minimum Seasonal 1 week 2 week 3 week 4 week 5 week 6 week 7 week 8 week 9 week 10 week 11 week 12 week 13 week 14 week 15 week 16 week 17 week 4June-30Sep 4 june-10 June 11 june-17 june 18 june-24 june 25 june-1 july 2 july-8 july 9 july-15 july 16 july-22 july 23 july-29 july 30 july-5 aug 6 aug-12 aug 13 aug-19 aug 20 aug-26 aug 27 aug-2 sep 3 sep-9 sep 10 sep-16 sep 17sep-23 sep 24 sep-30 sep 86.71 66.80 72.75 78.93 84.49 86.64 87.71 89.65 90.20 90.67 90.64 90.96 91.65 90.89 91.60 91.61 90.44 88.43 9.35 12.16 11.87 10.79 6.26 4.85 6.23 4.00 5.05 3.20 3.59 3.39 3.10 3.65 3.29 2.97 3.25 5.23 -0.6237 0.0113 -0.1560 -0.9558 -1.0339 -0.4663 -1.3950 -0.4663 -1.7081 -0.1685 -0.48305 -0.2252 -0.6682 -0.6405 -0.7568 -0.2493 -0.6645 -1.4181 0.0286 0.1820 0.1632 0.1367 0.0740 0.0560 0.0710 0.0446 0.0560 0.0353 0.0396 0.0373 0.0338 0.0402 0.0359 0.0325 0.0359 0.0591 98 87 94 95 95 96 96 95 97 96 98 98 97 97 98 96 96 95 38 38 46 50 63 75 63 80 69 84 81 83 83 79 80 85 80 69 Figure 3.7 Mean, standard deviation and range of weekly average Relative Humidity at 7 AM. Relative Humidity at 2 PM (%) The summary of statistics for Relative Humidity at 2 PM for different study periods is presented in table 3.5 along with mean, standard deviation, skewness, coefficient of variation, maximum and minimum values, where, the mean of Relative Humidity at 2 PM of 50 years seasonally was 65.70 % and mean of weekly average Relative Humidity at 2 PM was varying from 39.98 % in first week of June to 74.45 % in eleventh week, that is, third week of August. The maximum value of coefficient of variation for seasonal average Relative Humidity at 2 PM was 0.0537. The maximum value of coefficient of variation for weekly data was observed to be 0.3903 in the first week of June which indicates a relatively high fluctuation in the Relative Humidity at 2 PM data set. The measure of skewness in seasonal data was -0.4778 and ranging from -1.0936 in fourth week of June to 0.3368 in fourth week of August indicating the degree of asymmetry of a distribution around its mean. The maximum value of seasonal average Relative Humidity at 2 PM was 92 % in year 1988 and for weekly average Relative Humidity at 2 PM lies between 72 % in first week of June in year 1962 to 92 % in second week of August in year 1988. It was further observed that the minimum value of the seasonal average Relative Humidity at 2 PM was 16 % in the two years 1965 and 2005. The weekly minimum value of the Relative Humidity at 2 PM was between 16 % in second week of June in two years 1965 and 2005 to 62 % in fourth week of August in year 1973. The standard deviation for seasonal average Relative Humidity at 2 PM for 50 years was 13.60 % whereas the weekly variation of standard deviation ranging from 5.01 % in twelfth week, that is, fourth week of August to 16.47 % in second week of June. The graphical representation of the weekly average Relative Humidity at 2 PM is shown in figure 3.9 and weekly Relative Humidity at 2 PM statistics for seasonal 850 weeks of total 50 years is also presented in figure 3.10. Table 3.5. Summary of statistics for Relative Humidity at 2 PM. Parameters Study Period Data (From – To) Mean Standard Deviation Skewness Coefficient of variation Maximum Minimum Seasonal 1 week 2 week 3 week 4 week 5 week 6 week 7 week 8 week 9 week 10 week 11 week 12 week 13 week 14 week 15 week 16 week 17 week 4June-30Sep 4 june-10 June 11 june-17 june 18 june-24 june 25 june-1 july 2 july-8 july 9 july-15 july 16 july-22 july 23 july-29 july 30 july-5 aug 6 aug-12 aug 13 aug-19 aug 20 aug-26 aug 27 aug-2 sep 3 sep-9 sep 10 sep-16 sep 17sep-23 sep 24 sep-30 sep 65.70 39.98 47.15 55.70 63.29 65.18 70.70 72.09 73.34 73.13 72.63 74.45 72.97 70.54 70.26 69.85 65.63 60.01 13.60 15.60 16.47 13.92 10.33 10.44 8.91 6.77 7.36 6.43 7.10 6.21 5.01 7.22 7.86 8.03 8.02 9.66 -0.4778 0.3116 0.1156 -0.7909 -1.0936 -0.3999 -0.4001 0.2866 -0.1840 -1.0567 0.1527 0.1023 0.3368 -0.3006 -0.2773 0.1521 -0.5267 -0.4532 0.0537 0.3903 0.3494 0.2499 0.1632 0.1602 0.1260 0.0938 0.1004 0.0879 0.0978 0.0834 0.0686 0.1024 0.1118 0.1150 0.1221 0.1610 92 72 80 79 82 88 85 85 88 85 92 89 87 83 85.6 89 81 78 16 17 16 20 24 35 50 61 56.6 50 59 61 62 53 50 54 42 36 Figure 3.9 Mean, standard deviation and range of weekly average Relative Humidity at 2 PM. Pan Evaporation (mm) The summary of statistics for Pan Evaporation is presented in table 3.6 along with mean, standard deviation, skewness, coefficient of variation, maximum and minimum values, where, the mean of Pan Evaporation of 50 years seasonally was 5.31 mm and mean of weekly average Pan Evaporation was varying from 3.83 mm in third week of September to 10.59 mm in first week of June. The maximum value of seasonal average Pan Evaporation was 18.50 mm in year 1967 and for weekly average Pan Evaporation lies between 5.20 mm in third week of September in year 1996 to 18.50 mm in first week of June in year 1967. It was also observed that the minimum value of the seasonal average Pan Evaporation was zero. The weekly minimum value of the Pan Evaporation was between zero mm in second and fourth week of July in the same year 1976 and second week of August in year 1972 to 4.20 mm in first week of June in year 1971. The standard deviation for seasonal average Pan Evaporation for 50 years was 2.67mm with the weekly variation of standard deviation ranging from 0.76 mm in last week of September to 2.94 mm in second week of June. The maximum value of coefficient of variation for weekly data was observed as 0.4402 in the fourth week of July indicating fluctuation in the Pan Evaporation data set. The measure of skewness in seasonal data was -0.4380 and weekly average is ranging from -0.0483 in last week of September to 0.9927 in third week of August indicating the degree of asymmetry of a distribution around its mean. The graphical representation of the weekly average Pan Evaporation is shown in figure 3.11 and weekly Pan Evaporation statistics for seasonal 850 weeks of total 50 years is also presented in figure 3.12. Table 3.6. Summary of statistics for Pan Evaporation. Parameters Study Period Data (From – To) Mean Standard Deviation Skewness Coefficient of variation Maximum Minimum Seasonal 1 week 2 week 3 week 4 week 5 week 6 week 7 week 8 week 9 week 10 week 11 week 12 week 13 week 14 week 15 week 16 week 17 week 4June-30Sep 4 june-10 June 11 june-17 june 18 june-24 june 25 june-1 july 2 july-8 july 9 july-15 july 16 july-22 july 23 july-29 july 30 july-5 aug 6 aug-12 aug 13 aug-19 aug 20 aug-26 aug 27 aug-2 sep 3 sep-9 sep 10 sep-16 sep 17sep-23 sep 24 sep-30 sep 5.31 10.59 9.00 8.19 6.26 5.89 5.29 4.88 4.12 4.15 4.04 4.14 4.04 4.07 3.97 3.88 3.83 3.85 2.67 2.78 2.94 2.67 1.83 2.17 1.92 1.75 1.82 1.23 1.45 1.49 1.20 1.26 1.04 0.91 0.83 0.76 -0.4380 0.1739 0.5974 0.7330 0.7244 0.4327 -0.2450 0.7634 0.3147 -0.2453 -0.6027 0.9927 -0.0942 0.2181 -0.2653 -0.9353 -0.3608 -0.0483 0.1646 0.2623 0.3264 0.3259 0.2921 0.3676 0.3633 0.3585 0.4402 0.2957 0.3573 0.3608 0.2958 0.3087 0.2608 0.2332 0.2163 0.1978 18.50 18.50 17.20 15.90 12.10 11.40 9.50 10.20 9.80 6.50 6.70 9.50 6.40 7.60 6.30 5.50 5.20 5.60 0.00 4.20 2.90 3.60 2.10 1.40 0.00 1.50 0.00 1.60 0.00 1.60 1.50 1.80 1.60 1.20 1.90 2.30 Figure 3.11 Mean, standard deviation and range of weekly average Pan Evaporation. Bright Sunshine (hours) The summary of Bright Sunshine for different study period is presented in table 3.6 along with mean, standard deviation, skewness, coefficient of variation, maximum and minimum values, where, the mean of Bright Sunshine of 50 years seasonally was 6.38 hours and mean of weekly average Bright Sunshine was varying from 5.30 hours in fourth week of July to 8.60 hours in first week of June. The maximum value of seasonal average Bright Sunshine was 11.60 hours in the two years 1986 and 2009 and for weekly average Bright Sunshine was between 8.80 hours in third week of August in year 1965 to 11.60 hours in first and third week of June in years 1986 and 2009 respectively. It was also observed that the minimum among the seasonal average Bright Sunshine was 0.70 hours in the year 2009. The weekly minimum value of the Bright Sunshine was between 0.70 hours in third week of August in the year 2009 to 4.20 hours in first week of June in year 2000. The standard deviation for seasonal average Bright Sunshine for 50 years was 2.16 hours with the weekly variation of standard deviation ranging from 1.72 hours in fourth week of July to 2.21 hours in third week of June. The maximum value of coefficient of variation for weekly data was observed in the second week of August which indicates a reasonable fluctuation in the Bright Sunshine data set. The measure of skewness in seasonal data was 0.1935 and weekly average is ranging from -1.2914 in second week of June to 0.4982 in second week of august indicating the measure of asymmetry of a distribution around its mean. The graphical representation of the weekly average Bright Sunshine is shown in figure 3.13 and weekly Bright Sunshine statistics for seasonal 850 weeks of total 50 years is also presented in figure 3.14. Table 3.7. Summary of statistics for Bright Sunshine. Parameters Study Period Data (From – To) Mean Standard Deviation Skewness Coefficient of variation Maximum Minimum Seasonal 1 week 2 week 3 week 4 week 5 week 6 week 7 week 8 week 9 week 10 week 11 week 12 week 13 week 14 week 15 week 16 week 17 week 4June-30Sep 4 june-10 June 11 june-17 june 18 june-24 june 25 june-1 july 2 july-8 july 9 july-15 july 16 july-22 july 23 july-29 july 30 july-5 aug 6 aug-12 aug 13 aug-19 aug 20 aug-26 aug 27 aug-2 sep 3 sep-9 sep 10 sep-16 sep 17sep-23 sep 24 sep-30 sep 6.38 8.60 7.79 7.12 6.19 6.15 5.66 5.54 5.30 5.66 5.56 5.41 5.52 6.02 6.03 6.49 7.31 8.02 2.16 1.80 1.97 2.21 1.74 1.92 2.10 1.77 1.72 1.80 2.07 1.73 1.89 2.02 2.20 2.06 1.90 1.87 0.1935 -0.5602 -1.2914 -0.4005 0.0237 -0.4944 -0.2721 -0.3552 0.2553 -0.1008 0.4982 -0.6706 0.1980 0.0019 -0.1090 -0.1149 -0.4497 -0.9273 0.1060 0.2092 0.2528 0.3107 0.2812 0.3126 0.3705 0.3197 0.3255 0.3186 0.3711 0.3192 0.3429 0.3360 0.3644 0.3166 0.2601 0.2328 11.60 11.60 11.30 11.60 10.00 9.30 9.70 8.90 9.10 9.30 11.30 8.80 9.30 9.90 9.50 10.70 10.60 10.30 0.70 4.20 1.30 1.70 2.40 1.70 0.90 1.40 1.20 2.00 0.90 0.70 1.60 1.60 2.40 2.60 2.30 3.20 Figure 3.13 Mean, standard deviation and range of weekly average Bright Sunshine. 3.3 Methodology Weather parameter data was analyzed to identify the best fit probability distribution for each period of study. Three statistical goodness of fit test were carried out in order to select the best fit probability distribution on the basis of highest rank with minimum value of test statistic. The appropriate probability distributions are identified for the different dataset using maximum overall score based on sum of individual point score obtained from three selected goodness of fit test. Random numbers were generated for actual and estimated weekly weather parameters for each period of study using the parameters of selected distributions. 3.3.1 Fitting the probability distribution The probability distributions viz. normal, lognormal, gamma, weibull, pearson, generalized extreme value were fitted to the data for evaluating the best fit probability distribution for weather parameters. In addition, the different forms of these distributions were also tried and thus total 16 probability distributions viz. normal, lognormal (2P, 3P), gamma (2P, 3P), generalized gamma (3P, 4P), log-gamma, weibull (2P, 3P), pearson 5 (2P, 3P), pearson 6 (3P, 4P), log-pearson 3, generalized extreme value were applied to find out the best fit probability distribution The description of various probability distribution functions viz. density function, range and the parameter involved are presented in table 3.8. 3.3.2 Testing the goodness of fit The goodness of fit test measures the compatibility of random sample with the theoretical probability distribution. The goodness of fit tests is applied for testing the following null hypothesis: HO: the weather parameter data follow the specified distribution HA: the weather parameter data does not follow the specified distribution. Table 3.8. Description of various probability distribution functions. Distribution Gamma (3P) Gamma (2P) Probability density function 1 x x f ( x) exp ( ) 1 f ( x) x ( ) Range x x exp Generalized Extreme Value 1 11 k 1 k k0 exp 1 kz 1 kz f ( x) 1 exp z exp z k 0 Generalized Gamma (4P) k 1 x k x f ( x) exp k ( ) Generalized Gamma (3P) LogGamma k 1 kx f ( x) exp k ( ) x 1 k x 0 x for k 0 for k 0 k x 1 In( x ) 2 2 exp f ( x) x Lognormal (2P) 0 x x 2 1 In( x ) 2 2 x ( k 0) shape parameter ( 0) scale parameter ( 0) shape parameter scale parameter ( 0) ( 0) scale parameter ( 0) shapeparameter ( 0) location parameter ( 0 yields thetwo parameter lognormaldistribution) 2 1 In( x) f ( x) x ( ) 1 k shape parameter Generalized gamma distribution ) exp f ( x) scale parameter ( 0) k shape parameter location parameter x where z yields the three parameter 1 In x exp x ( ) shape parameter ( 0) scale parameter ( 0) location parameter ( 0 yieldsthetwo parameter gamma distribution ) Gamma function location parameter ( 0 k In x f ( x) Lognormal (3P) LogPearson 3 Parameters 0 x e 0 shape In( x) exp e x 0 parameter ( 0) scale parameter ( 0) location parameter Table 3.8. Continued Distribution Probability density function Normal 1 x 2 f ( x) exp 2 2 Pearson 5 (3P) Pearson 5 (2P) Pearson 6 (4P) Pearson 6 (3P) Range mean x standard Deviation ( 0) 1 f ( x) x 1 ( ) ( x ) exp f ( x) exp x 1 ( ) x shape parameter ( 0) scale parameter ( 0) x location parameter ( 0 yields thetwo parameter pearson 5 distribution) 1 f ( x) x 1 B(1, 2 ) 1 ( x ) 1 2 x 1 x 1 f ( x) B (1, 2 ) 1 x 1 2 Weibull (3P) P( x) 1 x Weibull (2P) P( x) 1 x x exp x exp Parameters x 1 shape parameter (1 0) 2 shape parameter ( 2 0) scale parameter ( 0) location parameter ( 0 yields thethree parameter pearson 6 distribution) shape parameter ( 0) scale parameter ( 0) location parameter ( 0 yieldsthetwo parameter weibull distribution) The following goodness-of-fit tests viz. Kolmogorov-Smirnov test and Anderson-Darling test were used along with the chi-square test at (0.01) level of significance for the selection of the best fit probability distribution. The distribution function of these tests is explained in brief in the next section. Kolmogorov-Smirnov Test The Kolmogorov-Smirnov statistic (D) is defined as the largest vertical difference between the theoretical and the empirical cumulative distribution function (ECDF): D max F (x i ) 1in i 1 n , i n F (x i ) Where, Xi = random sample, i =1, 2,….., n. (1) CDF= Fn (x)= 1 . Number of observations x n (2) This test is used to decide if a sample comes from a hypothesized continuous distribution. Anderson-Darling Test The Anderson-Darling statistic (A2) is defined as 1 n 2 A =-n (2i - 1).[In F(X i ) + In(1 - F(X n-i+1 ))] n i=1 (3) It is a test to compare the fit of an observed cumulative distribution function to an expected cumulative distribution function. This test gives more weight to the tails then the KolmogorovSmirnov test. Chi-Squared Test The Chi-Squared statistic is defined as χ 2 Oi - Ei = 2 k i=1 (4) Ei Where, Oi = observed frequency, Ei = expected frequency, ‘i’= number of observations (1, 2, …….k) Ei is calculated by the following computation Ei =F(x 2 ) - F(x1 ) (5) F is the CDF of the probability distribution being tested. The observed number of observation (k) in interval ‘i’ is computed from equation given below k 1 log 2 n (6) Where, n is the sample size. This test is for continuous sample data only and is used to determine if a sample comes from a population with a specific distribution. 3.3.3 Identification of best fit probability distribution The three goodness of fit test mentioned above were fitted to the weather parameters data treating different data set. The test statistic of each test were computed and tested at ( =0.01) level of significance. Accordingly the ranking of different probability distributions were marked from 1 to 16 based on minimum test statistic value. The distribution holding the first rank was selected for all the three tests independently. The assessments of all the probability distribution were made on the bases of total test score obtained by combining the entire three tests. Maximum score 16 was awarded to rank first probability distribution based on the test statistic and further less score were awarded to the distribution having rank more than 1, that is 2 to 16. Thus the total score of the entire three tests were summarized to identify the fourth distribution on the bases of highest score obtained. The probability distribution having the maximum score was included as a fourth probability distribution in addition to three probability distributions which were previously identified. Thus on the bases of the four identified probability distribution the procedure for obtaining the best fitted probability distribution is explained below: Generating random numbers The four probability distributions identified for each data set were used to select the best probability distribution. The parameters of these four probability distributions were used to generate the random numbers. Least square method The least square method was used to identify the best fit probability. The random numbers were generated for the distributions and residuals (R) were computed for each observation of the data set. R Yi Yi i=1 n (7) Where, Yi = the actual observation Yi = the estimated observation ( i = 1, 2,….., n ) The distribution having minimum sum of residuals was considered to be the best fit probability distribution for that particular data set. Finally the best fit probability distributions for weather parameters on different sets of data were obtained and the best fit distribution for each set of data was identified. The convergence and performance of the best fit probability distribution had been evaluated on the basis of the Easy fit 5.5 version of software. 3.4 Probability Distribution 3.4.1 Introduction The methodology presented above is applied to the 50 years weather data as classified into 18 data sets. These 18 data sets are classified as 1 seasonal and 17 weekly to study the distribution pattern at different levels. The test statistic D, A2 and 2 for each data set are computed and the combination of total test score are obtained for each data set for all probability distributions. The distribution is identified using maximum overall score based on sum of individual point score obtained from three selected goodness of fit test. The distributions identified which are having highest score and the best fit are listed in counter form where the parameter of these identified distribution for each data set of weather parameters are also mentioned. These values of the parameter are used to generate random numbers for each data set and the least square method is used for the weather parameters analysis. The residuals are computed for each data set of weather parameters. Sum of the deviation are obtained for all identified distribution. The probability distribution having minimum deviation is treated as the best selected probability distribution for the individual data set. The discussions on the results for the individual parameters are mentioned below. 3.4.2 Rainfall The test statistic D, A2 and 2 for each data set, of rainfall is computed for 16 probability distribution. The probability distribution having the first rank along with their test statistic is presented in table 3.9(a). It is observed that for seasonal rainfall, Pearson 5 (3P) distribution is fitted using Kolmogorov Smirnov and Anderson Darling tests based on first rank. Similarly, Normal distribution is fitted using Chi-square test for seasonal rainfall. Thus these probability distributions are identified as the best fit based on these three tests independently. The fourth probability distribution identified, which is having highest score, is presented in table 3.9(b) with their scores. Those distributions which are having same highest score are also included in the selected probability distribution. The probability distributions fitted for the fourth week data set, that is, last week of June are Generalized Extreme Value distribution and Lognormal (3P) distribution based on highest score. While for ninth week data set, that is, in first week of August, Weibull (2P, 3P) distributions are fitted having 36 as highest score and for the thirteenth week, that is, in last week of August, Gamma (2P) distribution and Generalized Extreme Value distribution, having 38 as the highest score are selected. It is observed that Generalized Extreme Value probability distribution is fitted in more than 50% weeks. The distributions identified are thus listed in table 3.9(c) where the parameter of these identified distribution for each data set is mentioned. These values of the parameter are used to generate random numbers for each data set and the least square method is used for selecting the best fit probability distribution. The probability distribution having minimum deviation is treated as the best selected probability distribution for the individual data set for the rainfall as presented in \table 3.9(d). Normal distribution represents the best fitted distribution for seasonal rainfall and is also observed in the sixth week data set, that is, second week of July. Generalized Extreme Value is observed six times in the weekly data sets, means, first, second, tenth, fifteenth, twelfth and thirteenth weeks, that is, first week of June, second week of June, August and September, and last two weeks of august, respectively, indicating the highest contribution of the distribution. Further, we observe that Gamma (3P) distribution, Log-Pearson 3 distribution, Pearson 6 (3P) distribution and Lognormal (3P) distribution are found as the best fitted probability distributions for the weekly rainfall data sets. Table 3.9(a). Distributions fitted for Rainfall data sets. Study period Kolmogorov Smirnov Test ranking first position Anderson Darling Chi-square Distribution Statistic Distribution Statistic Distribution Statistic Seasonal 1 Week 2 Week 3 Week 4 Week 5 Week 6 Week 7 Week 8 Week 9 Week 10 Week 11 Week 12 Week 13 Week 14 Week 15 Week 16 Week 17 Week Pearson 5 (3P) 0.0681 Pearson 5 (3P) 0.2976 Normal 1.3956 Gen. Extreme 0.1762 Gen. Extreme 2.1837 Gen. Extreme 8.3819 Gen. Extreme Gen. Extreme 0.1371 0.0717 Gen. Extreme Gen. Extreme 1.1632 0.3888 Gen. Extreme Pearson 6 (3P) 2.9090 2.5659 Gen. Extreme Pearson 6 (3P) 0.0735 0.0912 Gen. Extreme Pearson 6 (3P) 0.3163 0.5049 Lognormal (3P) Gen. Gamma (4P) 0.8179 0.4317 Gen. Gamma (4P) Gen. Extreme 0.0777 0.0931 Gen. Extreme Gen. Extreme 0.4326 0.4959 Normal Gen. Gamma (3P) 2.5651 0.7131 Lognormal (3P) Weibull (2P) 0.0702 0.0719 Lognormal (3P) Log-Pearson 3 0.3222 0.2836 Lognormal (3P) Lognormal (3P) 1.0616 0.9351 Lognormal (3P) Gamma (2P) 0.0584 0.0689 Gen. Extreme Log- Pearson 3 0.1731 0.2110 Gen. Extreme Log-Pearson 3 0.9746 0.0377 Gen. Extreme Gamma (2P) 0.0792 0.1048 Gen. Extreme Gen. Extreme 0.4724 0.8361 Gen. Extreme Gamma (2P) 1.9015 2.1996 Gen. Extreme 0.0966 Gen. Extreme 0.6122 Gen. Extreme 1.7454 Weibull (3P) Gamma (3P) 0.0852 0.1337 Pearson 6 (3P) Pearson 6 (3P) 0.5338 1.1343 Lognormal (3P) Pearson 6 (3P) 1.0211 2.4038 Gamma (3P) 0.1596 Pearson 6 (3P) 2.4205 Gen. Extreme 3.9689 Table 3.9(b). Distributions with highest score for Rainfall data sets. Study Period Seasonal 1 Week 2 Week 3 Week 4 Week 5 Week 6 Week 7 Week 8 Week 9 Week 10 Week 11 Week 12 Week 13 Week 14 Week 15 Week 16 Week 17 Week Distributions with highest Score Distribution Score Pearson 5 (3P) 42 Gen. Extreme 42 Gen. Extreme 43 Gen. Extreme 39 Gen. Extreme and Lognormal (3P) 39 Gamma (2P) 34 Gen. Extreme 38 Gen. Extreme 41 Lognormal (3P) 42 Weibull (2P) and Weibull (3P) 36 Gen. Extreme 41 Gamma (2P) 43 Gen. Extreme 39 Gamma (2P) and Gen. Extreme 38 Gen. Extreme 40 Pearson 6 (3P) 38 Pearson 6 (3P) 39 Gen. Extreme 35 Table 3.9(c). Parameters of the distributions fitted for Rainfall data sets. Study Period Seasonal 1 week 2 week 3 week 4 week 5 week 6 week 7 week 8 week 9 week 10 week 11 week 12 week 13 week 14 week 15 week 16 week 17 week Distributions Normal Pearson 5(3P) Gen. Extreme Value Gen. Extreme Value Gen. Extreme Value Pearson 6 (3P) Gen. Extreme Value Lognormal (3P) Gamma (2P) Gen. Gamma (4P) Pearson 6 (3P) Gen. Extreme Value Gen. Gamma (4P) Normal Gen. Extreme Value Gen. Gamma (3P) Lognormal (3P) Log-Pearson 3 Lognormal (3P) Weibull (2P) Weibull (3P) Gen. Extreme Value Lognormal (3P) Gamma (2P) Log-Pearson 3 Gen. Extreme Value Gamma (2P) Gen. Extreme Value Gen. Extreme Value Lognormal (3P) Pearson 6 (3P) Weibull (3P) Gamma (3P) Pearson 6 (3P) Gamma (3P) Gen. Extreme Value Pearson 6 (3P) Parameters =29.44 =73.195 =112.06 =33906.0 =--232.11 k=0.57245 =12.389 =5.8622 k=0.40315 =22.496 =13.506 k=0.31177 =24.99 =20.906 1=0.57131 2=2.8403E+8 =2.2429E+10 k=0.16122 =33.769 =31.333 =0.80884 =3.9156 =-10.454 =0.75596 =99.791 k=1.5013 =0.31028 =197.88 =0.01 1=0.63702 2=4.6454E+7 =5.5026E+9 k=0.10111 =62.632 =59.431 k=3.1445 =0.19166 =270.22 =0.01 =88.365 =102.5 k=0.18365 =47.572 =58.857 k=0.98854 =1.4834 =64.094 =0.79731 =4.5807 =-21.17 =2.7764 =-0.64072 =5.6879 =0.73597 =4.2048 =-11.153 =1.0891 =79.975 =1.2235 =79.444 =0.33401 k=0.21148 =50.642 =51.512 =0.73608 =4.4708 =-19.338 =1.2248 =84.969 =12.356 =-0.30187 =7.9246 k=0.1263 =58.585 =57.435 =0.84718 =89.184 k=0.24491 =45.313 =35.09 k=0.2002 =49.471 =38.627 =1.4308 =3.3643 =-1.496 1=0.53587 2=2.4525E+5 =2.7661E+7 =0.64552 =61.049 =0.01 =0.23816 =209.68 =0.01 1=0.27517 2=7.8369E+7 =1.7728E+10 =0.18131 =139.08 =0.01 k=0.56455 =16.239 =6.8411 1=0.21422 2=3.4782E+7 =5.9395E+9 Table 3.9(d). Best fit probability distribution for Rainfall. STUDY PERIOD Seasonal 1 Week 2 Week 3 Week 4 Week 5 Week 6 Week 7 Week 8 Week 9 Week 10 Week 11 Week 12 Week 13 Week 14 Week 15 Week 16 Week 17 Week BEST-FIT Normal Gen. Extreme Value Gen. Extreme Value Pearson 6 (3P) Lognormal (3P) Pearson 6 (3P) Normal Gen. Gamma (3P) Lognormal (3P) Log-Pearson 3 Gen. Extreme Value Log-Pearson 3 Gen. Extreme Value Gen. Extreme Value Gen. Extreme Value Pearson 6 (3P) Pearson 6 (3P) Gamma (3P) 3.4.3 Maximum Temperature The test statistic D, A2 and 2 for each data set, of maximum temperature is computed for 16 probability distribution. The probability distribution having the first rank along with their test statistic is presented in table 3.10(a). It is observed that for seasonal maximum temperature, Weibull (2P) distribution is fitted using Kolmogorov Smirnov and Chi-square tests based on first rank. Similarly, Log Pearson 3 is fitted using Anderson Darling test for seasonal maximum temperature. Thus these probability distributions are identified as the best fit based on these three tests independently. The fourth probability distribution identified, which is having highest score, is presented in table 3.10(b) with their scores. The Log-Pearson 3 distribution is fitted having the highest score as 43 for the seasonal maximum temperature data set and also for third and seventh week with the score of 44 and 47 respectively. Pearson 5 (3P) is observed consecutively in the fifth and sixth week with the score 40 and 44, respectively, and is also in the eleventh week with a score of 37. Similarly, Weibull (3P) is observed successively in the last two weeks and also in the fourteenth week, that is, first week of September. The distributions identified are thus listed in table 3.10(c), where, the parameter of these identified distribution for each data set are mentioned. To generate random numbers for each data set these values of the parameter are used and the least square method is used for selecting the best fit probability distribution. The probability distribution having minimum deviation is treated as the best selected probability distribution for the individual data set for the maximum temperature as presented in table 3.10(d). Table 3.10(a). Distributions fitted for Maximum Temperature data sets. Study period Seasonal 1 Week 2 Week 3 Week 4 Week 5 Week 6 Week 7 Week 8 Week 9 Week 10 Week 11 Week 12 Week 13 Week 14 Week 15 Week 16 Week 17 Week Kolmogorov Smirnov Test ranking first position Anderson Darling Chi-square Distribution Statistic Distribution Statistic Distribution Statistic Weibull (2P) 0.0746 Log Pearson 3 0.4322 Weibull (2P) 2.2681 Gen. Extreme 0.0661 Gen. Extreme 0.1641 Pearson 6 (4P) 0.3980 Pearson 6 (4P) Pearson 5 (3P) 0.0647 0.0785 Pearson 6 (4P) Log Pearson 3 0.3325 0.2532 Pearson 5 (2P) Log Pearson 3 0.2498 1.4033 Gen. Extreme Pearson 5 ( 3P) 0.1139 0.0710 Log Normal (3P) Pearson 5 (3P) 1.1606 0.1673 Log Normal (3P) Weibull (3P) 4.2667 0.3935 Gen. Extreme Weibull (3P) 0.0500 0.0611 Pearson 5 (3P) Log Pearson 3 0.1330 0.1767 Log Pearson 3 Log Pearson 3 0.7313 1.0778 Pearson 5 (3P) Pearson 6 (3P) 0.0667 0.0871 Gen. Extreme Gen. Gamma (3P) 0.2656 0.2481 Gen. Extreme Weibull (2P) 0.3337 1.7466 Weibull (3P) Pearson 5 (3P) 0.0775 0.0754 Gen. Extreme Log Normal (3P) 0.3098 0.4681 Gen. Extreme Log Pearson 3 1.6984 0.3302 Weibull (2P) Gen. Extreme 0.0631 0.0736 Pearson 6 (4P) Weibull (2P) 0.3450 0.2783 Pearson 5 (3P) Weibull (2P) 2.3160 2.7992 Gen. Extreme 0.0785 Weibull (3P) 0.3401 Weibull (3P) 3.8631 Weibull (3P) Weibull (3P) 0.0557 0.0691 Log Pearson 3 Weibull (3P) 0.1649 0.2918 Pearson 5 (2P) Weibull (3P) 1.3690 1.9102 Weibull (3P) 0.0642 Weibull (3P) 0.2485 Weibull (3P) 1.4818 Weibull (2P) distribution represents the best fitted probability distribution for seasonal maximum temperature and is also observed in the ninth and twelfth week data set, that is, first and fourth week of August, respectively. Further, Log-Pearson 3 is observed consecutively in the sixth and seventh week data set, that is, second and third week of July, respectively. Similarly, Weibull (3P) is observed successively in the sixteenth and seventeenth week, that is, in last two weeks of September and also in the fifth and fourteenth week, that is, in first week of July and September, respectively. Further, we observe that Generalized Gamma (3P) distribution, Generalized Extreme Value, Pearson 5 (2P, 3P) distribution and Lognormal (3P) distribution are found as the best fitted probability distributions for the weekly maximum temperature data sets. Table 3.10(b). Distributions with highest score for Maximum Temperature data sets. Study Period Seasonal 1 Week 2 Week 3 Week 4 Week 5 Week 6 Week 7 Week 8 Week 9 Week 10 Week 11 Week 12 Week 13 Week 14 Week 15 Week 16 Week 17 Week Distributions with highest Score Distribution Score Log Pearson 3 43 Gen. Extreme Value 44 Pearson 6 (4P) 45 Log Pearson 3 44 Log Normal (3P) 44 Pearson 5 (3P) 40 Pearson 5 (3P) 44 Log Pearson 3 47 Log Gamma 45 Pearson 6 (4P) 41 Gen. Extreme Value 39 Pearson 5 (3P) 37 Pearson 6 (4P) 36 Weibull (2P) 44 Weibull (3P) 45 Gen. Gamma (3P) 36 Weibull (3P) 47 Weibull (3P) 48 Table 3.10(c). Parameters of the distributions fitted for Maximum Temperature data sets. Study Period Distributions Parameters Seasonal Log-Pearson 3 =25.842 =-0.00473 =3.6207 Weibull (2P) =50.1 =33.395 1 week Gen. Extreme Value k=-0.41111 =3.5854 =36.758 Pearson 6 (4P) 1=1530.1 2=2181.0 =142.91 =-62.564 2 week Pearson 5 (2P) =87.078 =3137.7 1=1.5633E+6 2=2.6044E+5 =289.5 =Pearson 6 (4P) 1701.3 3 week Log-Pearson 3 =100.04 =0.00839 =2.7275 Pearson 5 (3P) =68.215 =1634.5 =11.209 4 week Gen. Extreme Value k=-0.35246 =2.3597 =33.096 Lognormal (3P) =0.02174 =4.7508 =-81.87 Table 3.10(c). Continued Study Period 5 week 6 week 7 week 8 week 9 week 10 week 11 week 12 week 13 week 14 week 15 week 16 week 17 week Distributions Pearson 5 (3P) Weibull (3P) Gen. Extreme Value Log-Pearson 3 Pearson 5 (3P) Log-Pearson 3 Weibull (3P) Gen. Extreme Value Log-Gamma Pearson 5 (3P) Gen. Gamma (3P) Pearson 6 (3P) Pearson 6 (4P) Weibull (2P) Parameters =69.116 =1040.8 =18.517 =2.597 =5.1432 =29.223 k=-0.05566 =1.6168 =31.989 =7.0789 =0.02172 =3.3361 =21.315 =171.78 =24.383 =23.506 =-0.00954 =3.6966 =5.0304 =6.9459 =25.872 k=-0.28636 =1.4404 =31.595 =6023.6 =5.7572E-4 =331.76 =8638.6 =5.9613 k=1.0026 =543.3 =0.0603 1=2984.7 2=668.66 =7.2093 1=3429.0 2=950.5 =10.414 =-5.3796 =28.524 =32.749 Gen. Extreme Value k=-0.10418 =1.113 =31.808 Weibull (3P) =2.4245 =3.2317 =29.477 Log-Pearson 3 =22.25 =-0.00908 =3.6631 Lognormal (3P) Pearson 5 (3P) =0.03054 =3.792 =-12.46 =319.73 =7827.0 =7.3207 Pearson 5 (3P) =308.23 =6381.1 =11.114 Pearson 6 (4P) Weibull (2P) 1=4004.5 2=10022.0 =157.47 =-31.041 =33.748 =32.317 Gen. Extreme Value k=-0.53325 =1.4321 =31.809 Weibull (2P) =29.152 =32.632 Gen. Extreme Value k=-0.60001 =1.873 =31.454 Weibull (3P) =20.417 =29.794 =2.7487 Gen. Gamma (3P) k=1.0009 =528.38 =0.0605 Log-Pearson 3 Pearson 5 (2P) Weibull (3P) =14.722 =-0.01149 =3.6271 =521.2 =16533.0 =5.682 =7.2696 =25.058 Weibull (3P) =1.5522E+7 =1.5144E+7 =-1.5144E+7 Weibull (3P) =41.383 =48.366 =-15.802 Table 3.10(d). Best fit probability distribution for Maximum Temperature data sets. STUDY PERIOD Seasonal 1 Week 2 Week 3 Week 4 Week 5 Week 6 Week 7 Week 8 Week 9 Week 10 Week 11 Week 12 Week 13 Week 14 Week 15 Week 16 Week 17 Week BEST-FIT Weibull (2P) Pearson 6 (4P) Pearson 5 (2P) Pearson 5 (3P) Log Normal (3P) Weibull (3P) Log Pearson 3 Log Pearson 3 Pearson 5 (3P) Weibull (2P) Gen Extreme value Pearson 5 (3P) Weibull (2P) Gen. Extreme Weibull (3P) Gen. Gamma (3P) Weibull (3P) Weibull (3P) 3.4.4 Minimum Temperature The test statistic D, A2 and 2 for each data set, of minimum temperature is computed for 16 probability distribution. The probability distribution having the first rank along with their test statistic is presented in table 3.11(a). It is observed that for seasonal minimum temperature, Lognormal (3P) distribution is fitted using Kolmogorov Smirnov, Weibull (3P) distribution is fitted using Anderson Darling tests and Gamma (3P) distribution is fitted using Chi-square test based on first rank. Thus these probability distributions are identified as the best fit based on these three tests independently. The fourth probability distribution identified which is having highest score, is presented in table 3.11(b) with their scores. Those distributions which are having same highest score are also included in the selected probability distribution. Table 3.11(a). Distributions fitted by the tests for Minimum Temperature data sets. Test ranking first position Anderson Darling Study period Kolmogorov Smirnov Distribution Statistic Distribution Statistic Distribution Statistic Seasonal 1 Week 2 Week 3 Week 4 Week 5 Week 6 Week 7 Week 8 Week 9 Week 10 Week 11 Week 12 Week 13 Week 14 Week 15 Week 16 Week 17 Week Lognormal (3P) 0.0606 Weibull (3P) 0.1507 Gamma (3P) 0.2870 Lognormal (3P) 0.0667 Lognormal (3P) 0.2740 Weibull (3P) 3.4178 Weibull (2P) Gamma (3P) 0.0883 0.0785 Log- Pearson 3 Gen. Extreme 0.6013 0.3980 Weibull (2P) Weibull (2P) 2.0818 1.6087 Gen. Extreme Gen. Extreme 0.1264 0.1562 Gen. Extreme Weibull (3P) 1.7918 1.4011 Weibull (2P) Gamma (2P) 2.5028 4.8657 Gamma (3P) Gen. Gamma (4P) 0.0886 0.0712 Gen. Extreme Gen. Extreme 0.3089 0.4068 Gen. Extreme Pearson 6 (4P) 3.6625 1.5668 Gen. Extreme Normal 0.0576 0.0710 Gen. Extreme Lognormal (2P) 0.2106 0.3093 Gen. Extreme Normal 2.1897 0.7627 Gen. Extreme Gen. Extreme 0.0751 0.1462 Normal Weibull (3P) 0.2347 1.5743 Pearson 5 (3P) Weibull (3P) 0.4314 5.8003 Weibull (2P) Weibull (2P) 0.1372 0.1203 Weibull (2P) Weibull (3P) 1.1944 0.3749 Weibull (2P) Pearson 5 (3P) 5.3154 3.3573 Weibull (2P) 0.0887 Log-Pearson 3 0.3438 Gen. Extreme 3.2287 Weibull (3P) Gen. Extreme 0.0896 0.0723 Weibull (3P) Weibull (3P) 0.6710 0.3678 Gamma (2P) Pearson 5 (2P) 1.2229 1.5721 Weibull (3P) 0.0859 Weibull (3P) 0.3871 Weibull (2P) 2.6457 Chi-square Table 3.11(b). Distributions with highest score for Minimum Temperature data sets. Study Period Seasonal 1 Week 2 Week 3 Week 4 Week 5 Week 6 Week 7 Week 8 Week 9 Week 10 Week 11 Week 12 Week 13 Week 14 Week 15 Week 16 Week 17 Week Distributions with highest Score Distribution Score Lognormal (3P) 36 Lognormal (3P) and Pearson 6 (4P) 42 Log-Pearson 3 42 Gen. Extreme and Gen. Gamma (4P) 39 Gen. Extreme 43 Weibull (3P) 34 Log-Gamma 39 Normal 38 Gen. Extreme 46 Normal 39 Gen. Extreme 35 Pearson 6 (4P) 40 Weibull (2P) 45 Weibull (3P) 42 Gen. Extreme 43 Gamma (2P) 39 Pearson 6 (4P) 33 Weibull (3P) 46 The probability distributions fitted for the first week data set, that is, first week of June are Lognormal (3P) and Pearson 6 (4P) distributions based on the highest score as 42. While for the third week of June, Generalized Extreme value and Generalized Gamma (4P) distribution having 39 as the highest score are selected. The distributions identified are thus listed in table 3.11(c) where the parameter of these identified distribution for each data set are mentioned. These values of the parameter are further utilized to generate random numbers for each data set and the least square method is worn for selecting the best fit probability distribution. The probability distribution having minimum deviation is treated as the best selected probability distribution for the individual data set for the minimum temperature as presented in table 3.11(d). Weibull (3P) distribution represents the best fitted distribution for seasonal minimum temperature and is also observed in the fifteenth week data set, that is, second week of September. Further, we observe that Generalized Extreme Value is observed repetitively in sixth, seventh and eighth week, that is, in second, third and fourth week of July, respectively, also in the fourth week, that is, last week of June. Further, it was observed, Weibull (2P) appeared four times among the 17 weeks, that is, in second, twelfth, fourteenth and seventeenth week. Moreover, Gamma (2P, 3P) distributions, Normal distribution, Pearson 5 (2P, 3P) distributions and Pearson 6 (4P) distribution are obtained as the best fitted probability distributions for the weekly minimum temperature data sets. Table 3.11(c). Parameters of the distributions fitted for Minimum Temperature data sets. Study Period Seasonal 1 week 2 week 3 week Distributions Gamma (3P) Lognormal (3P) Weibull (3P) Lognormal (3P) Pearson 6 (4P) Weibull (3P) Log-Pearson 3 Weibull (2P) Gamma (3P) Parameters =417.41 =0.02319 =14.749 =0.02324 =2.9883 =4.5826 =9.1205 =3.7283 =20.897 =0.02953 =4.0735 =-34.706 1=2.1298E+6 2=4.1770E+5 =201.05 =-1001.1 =4.9884 =8.4329 =16.313 =91.984 =-0.00655 =3.8076 =19.808 =25.259 =8.8541 =0.56396 =20.017 Table 3.11(c). Continued Study Period 3 week 4 week 5 week 6 week 7 week 8 week 9 week 10 week 11 week 12 week 13 week 14 week 15 week 16 week 17 week Distributions Parameters Gen. Extreme Value Gen. Gamma (4P) Weibull (2P) Gen. Extreme Value Weibull (2P) Gamma (2P) Gen. Extreme Value Weibull (3P) Gamma (3P) Gen. Extreme Value Log-Gamma Gen. Extreme Value Gen. Gamma (4P) Normal Pearson 6 (4P) Gen. Extreme Value k=-0.04248 =1.3598 =24.28 k=0.66312 =23.481 =0.04531 =19.635 =21.248 =25.449 k=-0.09791 =1.1745 =24.703 =22.761 =25.681 =466.36 =0.05407 k=-0.43734 =1.1081 =24.927 =10.717 =11.146 =14.543 =189.35 =0.06414 =13.095 k=-0.27557 =0.88526 =24.93 =8468.0 =3.8121E-4 k=-0.39862 =0.71759 =24.841 k=1.2543 =139.59 =0.19991 =14.797 =0.68218 =25.044 1=7.5120E+5 2=1.0096E+6 =596.77 =-418.98 k=-0.16051 =0.85926 =24.791 Lognormal (2P) Normal Gen. Extreme Value Normal Pearson 5 (3P) Gen. Extreme Value Pearson 6 (4P) Weibull (3P) Weibull (2P) =0.03148 =3.2226 =0.79908 =25.106 k=-0.34377 =0.79003 =24.759 =0.77006 =25.008 =402.16 =6301.9 =9.2897 k=-0.47302 =0.81554 =24.607 1=6.8551E+7 2=1.2744E+7 =541.25 =-2886.7 =29.428 =20.996 =4.1713 =42.758 =24.921 Pearson 5 (3P) Weibull (2P) Weibull (3P) Gen. Extreme Value Log-Pearson 3 Weibull (2P) Gamma (2P) Weibull (3P) Gen. Extreme Value Pearson 5 (2P) Pearson 6 (4P) Weibull (3P) Weibull (2P) =1117.8 =31616.0 =-3.9018 =32.235 =24.83 =8.6285E+7 =5.0049E+7 =-5.0049E+7 k=-0.43381 =0.74913 =23.815 =15.75 =-0.00742 =3.295 =40.329 =24.304 =859.04 =0.02729 =24.716 =16.039 =7.7422 k=-0.71079 =1.43 =22.358 =251.28 =5641.6 1=5.4524E+6 2=2.5203E+6 =806.8 =-1722.9 =2.2691E+8 =2.1828E+8 =-2.1828E+8 =16.24 =22.186 Weibull (3P) =10.426 =13.7 =8.5124 Table 3.11(d). Best fit probability distribution for Minimum Temperature. STUDY PERIOD Seasonal 1 Week 2 Week 3 Week 4 Week 5 Week 6 Week 7 Week 8 Week 9 Week 10 Week 11 Week 12 Week 13 Week 14 Week 15 Week 16 Week 17 Week BEST-FIT Weibull (3P) Pearson 6 (4P) Weibull (2P) Gamma (3P) Gen. Extreme Value Gamma (2P) Gen. Extreme Value Gen. Extreme Value Gen. Extreme Value Normal Pearson 5 (3P) Pearson 6 (4P) Weibull (2P) Pearson 5 (3P) Weibull (2P) Weibull (3P) Pearson 5 (2P) Weibull (2P) 3.4.5 Relative Humidity at 7AM The test statistic D, A2 and 2 for each data set, of average relative humidity at 7 AM is computed for 16 probability distribution. The probability distribution having the first rank along with their test statistic is presented in table 3.12(a). It is observed that for seasonal average relative humidity at 7 AM, Weibull (3P) distribution is fitted using Kolmogorov Smirnov and Anderson Darling tests and Log-Pearson 3 distribution is fitted using Chi-square test based on first rank. Thus these probability distributions are identified as the best fit based on these three tests independently. The fourth probability distribution identified which is having the highest score, is presented in table 3.12(b) with their scores. Those distributions which are having same highest score are also included in the selected probability distribution. The probability distributions fitted for the second week of June are Generalized Extreme value and Generalized Gamma distribution based on highest score as 46. While for fifteenth week data set, that is, second week of September, LogPearson 3 and Normal distributions are fitted having 37 as the highest score are selected. Table 3.12(a). Distributions fitted for Relative Humidity at 7 AM data sets. Study period Seasonal 1 Week 2 Week 3 Week 4 Week 5 Week 6 Week 7 Week 8 Week 9 Week 10 Week 11 Week 12 Week 13 Week 14 Week 15 Week 16 Week 17 Week Test ranking first position Anderson Darling Kolmogorov Smirnov Chi-square Distribution Weibull (3P) Statistic 0.0644 Distribution Weibull (3P) Statistic 0.2282 Distribution Log-Pearson 3 Statistic 2.4455 Gen. Gamma (2P) 0.0809 Gen. Extreme 0.4724 Pearson 5 (2P) 3.1401 Gen. Gamma (4P) Gen. Extreme 0.0817 0.0645 Gen. Gamma (4P) Weibull (3P) 0.3753 0.3266 Gen. Extreme Weibull (3P) 2.5458 0.9433 Weibull (3P) Gen. Extreme 0.1067 0.0754 Weibull (3P) Weibull (3P) 0.4077 0.2839 Weibull (3P) Pearson 5 (3P) 0.6927 0.5172 Weibull (3P) Gen. Extreme 0.1003 0.1175 Gen. Extreme Gen. Extreme 0.5463 0.5201 Log-gamma Pearson 5 (3P) 0.9289 0.7461 Weibull (2P) Weibull (2P) 0.1140 0.0953 Weibull (3P) Gen. Extreme 0.3695 0.5176 Weibull (3P) Weibull (2P) 3.3202 1.2428 Gen. Extreme Gen. Extreme 0.0917 0.0819 Log-Pearson 3 Gen. Extreme 0.3199 0.2937 Gen. Extreme Pearson 6 (4P) 0.7057 0.7896 Weibull (3P) Gen. Extreme 0.0889 0.0999 Weibull (3P) Weibull (3P) 0.3774 0.3997 Weibull (2P) Pearson 5 (3P) 0.9981 3.9081 Weibull (3P) 0.1165 Weibull (3P) 0.6708 Pearson 5 (3P) 1.7364 Gen. Extreme Gen. Extreme 0.0931 0.1110 Gen. Extreme Gen. Extreme 0.5232 0.5244 Weibull (2P) Normal 0.9164 1.6683 Log-Pearson 3 0.1056 Log-Pearson 3 0.5137 Log-Pearson 3 2.3387 Table 3.12(b). Distributions with highest score for Relative Humidity at 7 AM data sets. Study Period Seasonal 1 Week 2 Week 3 Week 4 Week 5 Week 6 Week 7 Week 8 Week 9 Week 10 Week 11 Week 12 Week 13 Week 14 Week 15 Week 16 Week 17 Week Distributions with highest Score Distribution Score Weibull (3P) 42 Log-Gamma 42 Gen. Extreme and Gen. Gamma (4P) 46 Weibull (3P) 46 Weibull (3P) 47 Log-Pearson 3 35 Gen. Extreme Value 42 Gamma (3P) 36 Weibull (3P) 45 Gen. Extreme 43 Gen. Extreme 46 Normal 41 Weibull (3P) 44 Gen. Extreme 34 Weibull (3P) 42 Log-Pearson 3 and Normal 37 Normal 38 Log-Pearson 3 47 Table 3.12(c). Parameters of the distributions fitted for Relative Humidity at 7 AM data sets. Study Period Seasonal 1 week 2 week 3 week 4 week 5 week 6 week 7 week 8 week 8 week 9 week 10 week 11 week 12 week 13 week 14 week 15 week 16 week 17 week Distributions Log-Pearson 3 Weibull (3P) Gen. Extreme Gen. Gamma (3P) Log-Gamma Pearson 5 (2P) Gen. Extreme Value Gen. Gamma (4P) Gen. Extreme Value Weibull (3P) Weibull (3P) Gen. Extreme Value Log-Pearson 3 Pearson5 (3P) Weibull (3P) Gen. Extreme Value Log-gamma Weibull (3P) Gamma (3P) Gen. Extreme Value Pearson 5 (3P) Weibull (2P) Weibull (3P) Gen. Extreme Value Weibull (2P) Gen. Extreme Value Log-Pearson 3 Gen. Extreme Value Normal Pearson 6 (4P) Weibull (2P) Weibull (3P) Gen. Extreme Value Pearson 5 (3P) Weibull (3P) Pearson 5 (3P) Weibull (3P) Gen. Extreme Value Log-Pearson 3 Normal Weibull (2P) Gen. Extreme Value Normal Log-Pearson 3 Parameters =8.5299 =-0.00988 =4.5465 =16.171 =33.249 =54.544 k=-0.23796 =12.069 =62.183 k=0.99828 =29.997 =2.2137 =496.56 =0.00843 =28.444 =1835.6 k=-0.35205 =12.544 =68.866 k=6.8227 =0.27514 =46.462 =42.88 k=-0.73561 =11.967 =77.559 =6.1010E+7 =4.8501E+8 =-4.8501E+8 =25.424 =128.95 =-41.738 k=-0.49866 =5.3036 =85.431 =11.507 =-0.01676 =4.653 =300.04 =25916.0 =-0.02967 =7.4322 =31.766 =56.865 k=-0.58918 =6.5341 =86.52 =3503.9 =0.00128 =4.1714E+7 =1.9459E+8 =-1.9459E+8 =209.79 =0.27939 =31.041 k=-0.48214 =4.3697 =88.611 =376.91 =29802.0 =10.246 =18.978 =92.549 =6.1271E+7 =2.2208E+8 =-2.2208E+8 k=-0.37372 =3.3895 =89.667 =32.887 =92.033 k=-0.42925 =3.8021 =89.633 =10.99 =-0.01207 =4.6388 k=-0.37052 =3.5769 =89.887 =3.3925 =90.956 1=5.3567E+5 2=58139.0 =83.891 =-682.0 =34.646 =92.952 =17.665 =45.223 =47.786 k=-0.43112 =3.8265 =89.876 =498.21 =41655.0 =7.1435 =8.8403 =28.468 =63.942 =618.57 =51280.0 =8.5325 =9.0316 =26.325 =66.64 k=-0.37424 =3.1488 =90.672 =41.964 =-0.00503 =4.7282 =2.9724 =91.606 =35.621 =92.888 k=-0.43177 =3.3948 =89.546 =3.2459 =90.442 =1.4453 =-0.05156 =4.5549 Table 3.12(d). Best fit probability distribution for Relative Humidity at 7 AM. STUDY PERIOD Seasonal 1 Week 2 Week 3 Week 4 Week 5 Week 6 Week 7 Week 8 Week 9 Week 10 Week 11 Week 12 Week 13 Week 14 Week 15 Week 16 Week 17 Week BEST-FIT Log-Pearson 3 Gen. Gamma (3P) Gen. Gamma (4P) Weibull (3P) Weibull (3P) Log-Pearson 3 Weibull (3P) Gamma (3P) Weibull (2P) Gen. Extreme Value Gen. Extreme Value Gen. Extreme Value Weibull (2P) Gen. Extreme Value Pearson 5 (3P) Log-Pearson 3 Normal Log-Pearson 3 The distributions identified are thus listed in table 3.12(c), where the parameter of these identified distribution for each data set are mentioned. Random numbers are generated using the parametric values for each data set and the least square method is worn for selecting the best fit probability distribution. The probability distribution having minimum deviation is treated as the best selected probability distribution for the individual data set for the average relative humidity at 7 AM as presented in table 3.12(d). Log-Pearson 3 distribution represents the best fitted distribution for seasonal average relative humidity at 7 AM and is also observed in the fifth, fifteenth and seventeenth week data set, that is, first week of July, second and last week of September, respectively. Further, we observe that Generalized Extreme Value is obtained in recurrence form in the ninth, tenth and eleventh week, that is, in first three weeks of August and also in the thirteenth week, that is, in last week of August. Moreover, Gamma (3P) distribution, Generalized Gamma (3P, 4P) distributions, Normal distribution, Pearson 5 (3P) distribution, Weibull (2P, 3P) distributions are found as the best fitted probability distributions for the weekly average relative humidity at 7 AM data sets. 3.4.6 Relative Humidity at 2 PM The test statistic D, A2 and 2 for each data set, of average relative humidity at 2 PM is computed for 16 probability distribution. The probability distribution having the first rank along with their test statistic is presented in table 3.13(a). It is observed that for seasonal average relative humidity at 2 PM, Generalized Extreme Value is fitted using Kolmogorov Smirnov test, Weibull (2P) distribution is fitted using Anderson Darling test and Weibull (3P) distribution is fitted using Chi-square test based on first rank. Thus these probability distributions are identified as the best fit based on these three tests independently. The fourth probability distribution identified which is having highest score is presented in table 3.13(b) with their scores. Those distributions which are having same highest score are also included in the selected probability distribution. The Probability distributions with the highest score as 45 for seasonal average relative humidity at 2 PM are three, that is, Generalized Extreme Value and Weibull (2P, 3P) distributions. Moreover, the probability distributions fitted, based on the highest score as 41 for the third week data set, that is, third week of June are Pearson 6 (4P) and Weibull (3P) distributions. While for fourth week data set, that is, fourth week of June, Generalized Gamma (4P) and Lognormal (3P) distributions are fitted having 37 as the highest score. Similarly, for the sixth week, that is, second week of July, Generalized Extreme Value and Log-Pearson 3 distributions are having 38 as the highest score are selected. Further, for ninth week data set, that is, first week of August, Normal and Pearson 6 (4P) distributions are selected having 40 as the highest score fit. Also, with the highest score of 37 in the twelfth week, that is, fourth week of August, Lognormal (2P) and Pearson 5 (2P) distributions are selected. The distributions identified are thus listed in table 3.13(c) where the parameter of these identified distribution for each data set are mentioned. These values of the parameter are used to generate random numbers for each data set and the least square method is considered for selecting the best fit probability distribution. The probability distribution having minimum deviation is treated as the best selected probability distribution for the individual data set for the average relative humidity at 2 PM as presented in table 3.13(d). Weibull (2P) distribution represents the best fitted distribution for seasonal average relative humidity at 2 PM and is also observed in the tenth and fifteenth week data set, that is, second week of August and September, respectively. Further, we observe Weibull (3P) distribution as recurrence in the sixteenth and seventeenth week, which are the last two weeks of September, and is also observed in the fifth and ninth week, that is, first week of July and August, respectively. Besides, Generalized Extreme Value, Generalized Gamma (4P) distribution, Log-Pearson 3 distribution, Normal distribution, Pearson 5 (2P) distribution, Pearson 6 (4P) distributions are found as the best fitted probability distributions for the weekly average relative humidity at 2 PM data sets. Table 3.13(a). Distributions fitted for Relative Humidity at 2 PM data sets. Test ranking first position Anderson Darling Study period Kolmogorov Smirnov Distribution Statistic Distribution Statistic Distribution Statistic Seasonal 1 Week 2 Week 3 Week 4 Week 5 Week 6 Week 7 Week 8 Week 9 Week 10 Week 11 Week 12 Week 13 Week 14 Week 15 Week 16 Week 17 Week Gen. Extreme 0.0712 Weibull (2P) 0.3734 Weibull (3P) 2.0229 Gen. Extreme 0.0875 Gen. Extreme 0.3977 Weibull (2P) 0.5576 Log-Pearson 3 Weibull (3P) 0.0758 0.0570 Log-Pearson 3 Weibull (3P) 0.3080 0.1785 Lognormal (2P) Gamma (3P) 1.2683 0.8789 Gen. Extreme Log-Pearson 3 0.0995 0.0779 Weibull (3P) Weibull (3P) 0.3831 0.2791 Gamma (2P) Log-Pearson 3 1.2667 2.4067 Gen. Extreme Gen. Extreme 0.0796 0.0881 Gen. Extreme Gen. Extreme 0.2436 0.3657 Pearson 6 (4P) Gen. Extreme 0.6941 0.5427 Normal Weibull (3P) 0.0749 0.1113 Gen. Extreme Weibull (3P) 0.2524 0.3488 Weibull (2P) Pearson 6 (4P) 1.0323 3.6623 Weibull (2P) Log-Pearson 3 0.1026 0.0672 Normal Weibull (3P) 0.4765 0.2489 Weibull (2P) Gen. Extreme 2.2518 2.5681 Pearson 5 (2P) Gamma (3P) 0.0826 0.0915 Lognormal (2P) Gen. Extreme 0.3529 0.3627 Normal Gen. Extreme 1.5444 4.0812 Gen. Gamma (4P) 0.0692 Normal 0.2625 Weibull (3P) 0.6171 Weibull (2P) Weibull (3P) 0.0842 0.0465 Gen. Extreme Weibull (3P) 0.2904 0.1661 Gen. Gamma (4P) Gamma (3P) 0.7798 1.0619 Weibull (3P) 0.0632 Weibull (3P) 0.1706 Weibull (3P) 1.2512 Chi-square Table 3.13(b). Distributions with highest score for Relative Humidity at 2 PM data sets. Study Period Seasonal 1 Week 2 Week 3 Week 4 Week 5 Week 6 Week 7 Week 8 Week 9 Week 10 Week 11 Week 12 Week 13 Week 14 Week 15 Week 16 Week 17 Week Distributions with highest Score Distribution Score Gen. Extreme and Weibull (2P, 3P) 45 Gen. Extreme 46 Gen. Extreme 41 Pearson 6 (4P) and Weibull (3P) 41 Gen. Gamma (4P) and Lognormal (3P) 37 Log-Pearson 3 46 Gen. Extreme Value and Log-Pearson 3 38 Gen. Extreme 48 Normal 42 Normal and Pearson 6 (4P) 40 Normal 44 Weibull (3P) 44 Lognormal (2P) and Pearson 5 (2P) 37 Gen. Extreme 41 Gen. Gamma (4P) 45 Gen. Extreme 44 Weibull (3P) 40 Weibull (3P) 48 Table 3.13(c). Parameters of the distributions fitted for Relative Humidity at 2 PM data sets. Study Period Seasonal 1 week 2 week 3 week 4 week 5 week Distributions Gen. Extreme Value Weibull (2P) Weibull (3P) Gen. Extreme Value Weibull (2P) Gen. Extreme Value Log-Pearson 3 Lognormal (2P) Gamma (3P) Pearson 6 (4P) Weibull (3P) Gamma (2P) Gen. Extreme Value Gen. Gamma (4P) Lognormal (3P) Weibull (3P) Log-Pearson 3 Weibull (3P) Parameters k=-0.52018 =3.8843 =64.855 =21.659 =67.161 =8.9724 =27.338 =39.844 k=-0.14233 =14.596 =33.374 =2.8071 =44.234 k=-0.21603 =16.209 =40.701 =9.6018 =-0.12479 =4.984 =0.3828 =3.7858 =240.94 =0.92958 =-168.36 1=10915.0 2=5529.0 =426.83 =-786.39 =26.877 =302.82 =-240.97 =37.54 =1.6858 k=-0.51157 =10.857 =60.881 k=3.107 =191.32 =80.612 =-373.76 =0.02348 =6.0912 =-378.52 =24.603 =206.72 =-138.95 =3.9062 =-0.08697 =4.5031 =6.1205 =58.567 =10.777 Table 3.13(c). Continued Study Period 6 week 7 week 8 week 9 week 10 week 11 week 12 week 13 week 14 week 15 week 16 week 17 week Distributions Gen. Extreme Value Log-Pearson 3 Pearson 6 (4P) Gen. Extreme Value Gen. Extreme Value Normal Weibull (2P) Parameters k=-0.43316 =9.59 =68.177 =8.1847 =-0.04591 =4.626 1=22.745 2=4.5724E+7 =9.2934E+7 =24.56 k=-0.15802 =6.3781 =69.278 k=-0.32505 =7.5835 =70.865 =7.3628 =73.338 =11.591 =76.209 Normal =6.4266 =73.128 Pearson 6 (4P) Weibull (3P) 1=2.4128E+6 2=3.6815E+5 =550.43 =-3534.3 Normal =7.1024 =72.632 Weibull (2P) =12.408 =75.186 Gen. Extreme Value k=-0.27237 =6.2423 =72.21 Log-Pearson 3 Weibull (3P) =443.55 =-0.00397 =6.069 =3.0131 =18.909 =57.572 Lognormal (2P) =0.06761 =4.2878 Normal Pearson 5 (2P) =5.0058 =72.972 =219.35 =15934.0 Gamma (3P) =154.71 =0.58977 =-20.676 Gen. Extreme k=-0.36324 =7.5515 =68.249 Gen. Gamma (4P) k=18.772 =8.3733 =368.55 =-341.2 Normal Weibull (3P) =7.855 =70.258 =4.9722 =36.763 =36.513 Gen. Extreme Value k=-0.26115 =8.0205 =66.905 Gen. Gamma (4P) Weibull (2P) k=2.7474 =0.73193 =22.975 =52.467 =10.378 =72.787 Gamma (3P) =173.41 =0.61761 =-41.471 Weibull (3P) =7.3775 =52.812 =16.088 Weibull (3P) =6.7096 =58.176 =5.7427 =22.354 =116.68 =-40.776 Table 3.13(d). Best fit probability distribution for Relative Humidity at 2 PM. STUDY PERIOD Seasonal 1 Week 2 Week 3 Week 4 Week 5 Week 6 Week 7 Week 8 Week 9 Week 10 Week 11 Week 12 Week 13 Week 14 Week 15 Week 16 Week 17 Week BEST-FIT Weibull (2P) Gen. Extreme Value Log-Pearson 3 Pearson 6 (4P) Gen. Gamma (4P) Weibull (3P) Pearson 6 (4P) Gen. Extreme Value Normal Weibull (3P) Weibull (2P) Log-Pearson 3 Pearson 5 (2P) Gen. Extreme Value Normal Weibull (2P) Weibull (3P) Weibull (3P) 3.4.7 Pan Evaporation The test statistic D, A2 and 2 for each data set, of average Pan Evaporation is computed for 16 probability distribution. The probability distribution having the first rank along with their test statistic is presented in table 3.14(a). It is observed that for seasonal average pan evaporation, Log-Pearson 3 is fitted using Kolmogorov Smirnov test, Generalized Extreme Value is fitted using Anderson Darling test and Gamma (3P) distribution is fitted using Chi-square test based on first rank. Thus these probability distributions are identified as the best fit based on these three tests independently. The fourth probability distribution identified, which is having the highest score, is presented in table 3.14(b) with their scores. Those distributions which are having same highest score are also included in the selected probability distribution. The probability distributions selected, based on the highest fit score as 35 for the sixth week data set, that is, second week of July are Pearson 6 (3P) and Generalized Gamma (3P) distributions. Table 3.14(a). Distributions fitted for Pan Evaporation data sets. Test ranking first position Anderson Darling Study period Kolmogorov Smirnov Distribution Statistic Distribution Statistic Distribution Statistic Seasonal 1 Week 2 Week 3 Week 4 Week 5 Week 6 Week 7 Week 8 Week 9 Week 10 Week 11 Week 12 Week 13 Week 14 Week 15 Week 16 Week 17 Week Log-Pearson 3 0.0537 Gen. Extreme 0.1404 Gamma (3P) 1.0902 Log-Pearson 3 0.0596 Normal 0.1440 Log-Pearson 3 0.6899 Pearson 6 (3P) Weibull (3P) 0.0626 0.1092 Pearson 5 (3P) Gamma (3P) 0.1640 0.3565 Lognormal (2P) Gen. Extreme 1.0880 0.6749 Pearson 5 (3P) Lognormal (3P) 0.0744 0.0982 Gen. Extreme Pearson 5 (3P) 0.3528 0.3740 Log-Gamma Gen. Gamma (3P) 1.2723 0.5740 Pearson 6 (3P) Weibull (2P) 0.0845 0.0985 Normal Pearson 5 (3P) 0.4287 0.5113 Pearson 6 (3P) Gamma (2P) 1.3534 2.3513 Normal Gen. Extreme 0.0659 0.1014 Normal Gen. Extreme 0.2283 0.3822 Normal Gen. Extreme 1.4441 1.8532 Gen. Gamma (4P) Normal 0.0842 0.0984 Gen. Extreme Gen. Extreme 0.2938 0.5842 Weibull (3P) Normal 0.7581 0.3715 Gen. Extreme Normal 0.1113 0.0955 Normal Normal 0.7725 0.4676 Gen. Gamma (4P) Normal 2.7838 1.2692 Gen. Extreme 0.0557 Weibull (3P) 0.2112 Pearson 6 (4P) 0.5223 Weibull (3P) Gen. Gamma (4P) 0.0870 0.0708 Weibull (3P) Gen. Gamma (4P) 0.3080 0.1977 Gamma(2P) Lognormal (2P) 1.7154 0.6335 Weibull (2P) 0.0911 Gen. Extreme 0.3561 Log Gamma 0.4024 Chi-square Table 3.14(b). Distributions with highest score for Pan Evaporation data sets. Study Period Seasonal 1 Week 2 Week 3 Week 4 Week 5 Week 6 Week 7 Week 8 Week 9 Week 10 Week 11 Week 12 Week 13 Week 14 Week 15 Week 16 Week 17 Week Distributions with highest Score Distribution Score Weibull (3P) 45 Normal 43 Pearson 6 (3P) 40 Gen. Extreme 41 Gen. Extreme 45 Lognormal (3P) 34 Pearson 6 (3P) and Gen. Gamma (3P) 35 Gen. Extreme 40 Normal 42 Gen. Extreme 48 Gen. Extreme 40 Gen. Extreme 45 Weibull (3P) 44 Normal 48 Weibull (3P) 45 Weibull (3P) 45 Gen. Gamma (4P) 41 Gen. Extreme and Weibull (2P) 43 Table 3.14(c). Parameters of the distributions fitted for Pan Evaporation data sets. Study Period Seasonal 1 week 2 week 3 week 4 week 5 week 6 week 7 week 8 week 9 week 10 week 11 week 12 week 13 week 14 week 15 week 16 week 17 week Distributions Gamma (3P) Gen. Extreme Value Log-Pearson 3 Weibull (3P) Log-Pearson 3 Normal Lognormal (2P) Pearson 5 (3P) Pearson 6 (3P) Gamma (3P) Gen. Extreme Value Weibull (3P) Gen. Extreme Value Log-Gamma Pearson 5 (3P) Gen. Gamma (3P) Lognormal (3P) Pearson 5 (3P) Gen. Gamma (3P) Normal Pearson 6 (3P) Gamma (2P) Gen. Extreme Value Pearson 5 (3P) Weibull (2P) Normal Gen. Extreme value Gen. Gamma (4P) Gen. Extreme Value Weibull (3P) Gen. Extreme Value Normal Gen. Extreme Value Gen. Gamma (4P) Normal Weibull (3P) Normal Gen. Extreme Value Pearson 6 (4P) Weibull (3P) Gamma (2P) Weibull (3P) Gen. Gamma (4P) Lognormal (2P) Gen. Extreme Value Log-Gamma Weibull (2P) Parameters =176.56 =0.06712 =-6.5449 k=-0.46176 =0.94758 =5.071 =5.3329 =-0.07625 =2.0609 =6.9709 =5.419 =0.24249 =7.448 =-0.10341 =3.0937 =2.7786 =10.592 =0.33983 =2.1431 =53.101 =1082.3 =-11.77 1=9.4905 2=3.9194E+5 =3.7171E+5 =4.3022 =1.2891 =2.6401 k=-0.01007 =2.1805 =6.9489 =1.9242 =5.4847 =3.3242 k=-0.06906 =1.5321 =5.4741 =34.648 =0.0517 =66.621 =946.37 =-8.1624 k=0.9814 =7.1148 =0.79576 =0.13189 =2.7742 =-10.276 =100.88 =2128.3 =-15.42 k=1.0123 =7.9218 =0.69854 =1.9227 =5.292 1=8.5077 2=1.7018E+8 =1.0754E+8 =7.7831 =0.62674 k=-0.14136 =1.5403 =4.1801 =49.793 =579.97 =-7.0088 =3.1558 =5.3419 =1.8146 =4.122 k=-0.41167 =1.3065 =3.7864 k=7.3788 =0.35509 =5.7372 k=-0.52272 =1.5798 =3.703 =3.5914 =4.5836 k=-0.16384 =1.3216 =3.562 =1.493 =4.138 k=-0.4002 =1.2408 =3.6924 k=4.1478 =0.48649 =4.1257 =1.1742 =1.1954 =4.042 =3.5458 =4.1785 =0.27855 =1.2559 =4.068 k=-0.40987 =1.1073 =3.6661 1=4.3526E+5 2=2.8077E+5 =273.81 =-420.49 =4.6889 =4.5552 =-0.1909 =18.396 =0.21092 =20.584 =15.172 =-10.901 k=16.334 =0.11141 =3.3346 =1.7597 =0.23367 =1.3172 k=-0.34383 =0.79331 =3.6023 =41.233 =0.03222 =5.7794 =4.117 While for the seventeenth week, that is, last week of September, Generalized Extreme Value and Weibull (2P) distributions are having 43 as the highest fit score are selected. The distributions identified are thus listed in table 3.14(c) where the parameter of these identified distribution for each data set are mentioned. The least square method is utilized for selecting the best fit probability distribution after generating random number for each data set with the help of the parametric values obtained. The probability distribution having minimum deviation is treated as the best selected probability distribution for the individual data set for the average pan evaporation as presented in table 3.14(d). Table 3.14(d). Best fit probability distribution for Pan Evaporation. STUDY PERIOD BEST-FIT Gamma (3P) Seasonal 1 Week Log-Pearson 3 2 Week Pearson 6 (3P) Weibull (3P) 3 Week 4 Week Gen. Extreme Value 5 Week Gen. Gamma (3P) 6 Week Normal 7 Week Pearson 5 (3P) 8 Week Normal 9 Week Gen. Extreme 10 Week Gen. Gamma (4P) 11 Week Normal 12 Week Gen. Gamma (4P) 13 Week Normal 14 Week Weibull (3P) 15 Week Gamma (2P) 16 Week Gen. Gamma (4P) Weibull (2P) 17 Week Gamma (3P) distribution represents the best fitted distribution for seasonal average pan evaporation. Further, we observe Normal distribution plays a vital role by appearing four times as the best fit in the weekly data set in sixth, eighth, eleventh and thirteenth week, that is, in second and last week of July and also in third and last week of August, respectively. Further, Generalized Gamma (4P) distribution is observed thrice in the weekly data set, means, in the tenth, twelfth and sixteenth week, that is, second and fourth week of August and third week of September, respectively. Besides, Gamma (2P), Generalized Extreme Value, Generalized Gamma (3P) distribution, Log-Pearson 3 distribution, Normal distribution, Pearson 5 (3P) distribution, Pearson 6 (3P) distribution and Weibull (2P, 3P) are obtained as the best fitted probability distributions for the weekly average pan evaporation data sets. 3.4.8 Bright Sunshine The test statistic D, A2 and 2 for each data set, of average bright sunshine is computed for 16 probability distribution. The probability distribution having the first rank along with their test statistic is presented in table 3.15(a). It is observed that for seasonal average Bright Sunshine, Generalized Extreme Value is fitted using Kolmogorov Smirnov and Anderson Darling tests and Log-Gamma distribution is fitted using Chi-square test based on first rank. Thus these probability distributions are identified as the best fit based on these three tests independently. The fourth probability distribution identified which is having highest score is presented in table 3.15(b) with their scores. Those distributions which are having same highest score are also included in the selected probability distribution. The Probability distributions with the highest score as 45 for seasonal average bright sunshine are Generalized Extreme Value and Log-Gamma distributions. Moreover, the probability distributions selected, based on the highest fit score as 36 for the eleventh week data set, that is, third week of August are Weibull (3P) and Generalized Gamma (4P) distributions. While for last week of August, Generalized Extreme Value and Pearson 6 (4P) distributions having 36 as the highest score are selected. The distributions identified are thus listed in table 3.15(c) where the parameter of these identified distribution for each data set are mentioned. Random numbers are generated using the parameter values for each data set and the least square method is inculcated for selecting the best fit probability distribution. The probability distribution having minimum deviation is treated as the best selected probability distribution for the individual data set for the average bright sunshine and is presented in table 3.15(d). Log-Gamma distribution represents the best fitted distribution for seasonal average bright sunshine. Further, we observe Generalized Extreme Value plays an essential role by appearing six times in the weekly data set, means, first, fifth, fourteenth, eleventh, eighth and seventeenth week, that is, in the first week of June, July and September, third week of August, last week of July and September, respectively. Further, Log-Pearson 3 distribution is observed thrice in the weekly data set, in the sixth, thirteenth and sixteenth week, that is, second week of July, last week of August and third week of September, respectively. In addition, Gamma (2P, 3P) distributions, Generalized Gamma (3P) distribution, Pearson 5 (2P) distribution, Pearson 6 (4P) distribution and Weibull (2P, 3P) distributions are found as the best fitted probability distributions for the weekly average bright sunshine data sets. Table 3.15(a). Distributions fitted for Bright Sunshine data sets. Test ranking first position Study period Kolmogorov Smirnov Distribution Statistic Distribution Statistic Distribution Statistic Seasonal 1 Week 2 Week 3 Week 4 Week 5 Week 6 Week 7 Week 8 Week 9 Week 10 Week 11 Week 12 Week 13 Week 14 Week 15 Week 16 Week 17 Week Gen. Extreme 0.0709 Gen. Extreme 0.1333 Log Gamma 0.0521 Gen. Extreme 0.0619 Gen. Extreme 0.1545 Gen. Extreme 0.8832 Weibull (3P) Weibull (3P) 0.0852 0.0769 Weibull (3P) Weibull (3P) 0.4103 0.3180 Normal Gamma (2P) 3.8528 1.2523 Gamma (3P) Gen. Extreme 0.0536 0.0605 Gen. Extreme Gen. Extreme 0.1784 0.2168 Pearson 5 (3P) Gen. Extreme 0.9833 0.5146 Log-Pearson 3 Weibull (3P) 0.0546 0.0772 Log-Pearson 3 Weibull (3P) 0.1712 0.2748 Log-Pearson 3 Pearson 6 (4P) 0.4699 1.1762 Gen. Gamma (3P) Gen. Gamma (4P) 0.1039 0.0755 Gen. Extreme Gen. Extreme 0.7188 0.1798 Weibull (2P) Pearson 5 (2P) 2.6207 0.9543 Pearson 6 (4P) Gen. Extreme 0.0674 0.0752 Gen. Extreme Gen. Gamma (4P) 0.3030 0.4851 Pearson 5 (2P) Pearson 6 (3P) 0.7440 0.8082 Lognormal (2P) 0.0719 Log-Pearson 3 0.3558 Gamma (2P) 0.9435 Gen. Gamma (4P) Gen. Extreme Gen. Extreme Log-Pearson 3 Gen. Extreme 0.0725 0.0628 0.0900 0.0647 0.0704 Log-Pearson 3 Gen. Extreme Gen. Extreme Log-Pearson 3 Gen. Extreme 0.2354 0.4214 0.3967 0.1547 0.2137 Pearson 6 (4P) Weibull (2P) Weibull (2P) Gen. Extreme Log-Pearson 3 2.1774 1.2300 0.2969 0.2675 0.1466 Anderson Darling Chi-square Table 3.15(b). Distributions with highest score for Bright Sunshine data sets. Study Period Seasonal 1 Week 2 Week 3 Week 4 Week 5 Week 6 Week 7 Week 8 Week 9 Week 10 Week 11 Week 12 Week 13 Week 14 Week 15 Week 16 Week 17 Week Distributions with highest Score Distribution Score Gen. Extreme and Log-Gamma 45 Gen. Extreme 48 Gen. Gamma (4P) 42 Weibull (3P) 42 Gen. Extreme 45 Gen. Extreme 48 Log-Pearson 3 45 Weibull (3P) 46 Gen. Gamma (3P) 44 Gen. Gamma (4P) 46 Gen. Extreme 37 Weibull (3P) and Gen. Gamma (4P) 36 Gen. Gamma (3P) 43 Gen. Extreme and Pearson 6 (4P) 36 Gen. Extreme 46 Gen. Extreme 46 Log-Pearson 3 47 Gen. Extreme 45 Table 3.15(c). Parameters of the distributions fitted for Bright Sunshine data sets. Study Period Seasonal 1 week 2 week 3 week 4 week 5 week 6 week 7 week 8 week 9 week 10 week Distributions Gen. Extreme Value Log-Gamma Gen. Extreme Value Gen. Gamma (4P) Normal Weibull (3P) Gamma (2P) Weibull (3P) Gamma (3P) Gen. Extreme Value Pearson 5 (3P) Gen. Extreme Value Log-Pearson 3 Pearson 6 (4P) Weibull (3P) Gen. Extreme Value Gen. Gamma (3P) Weibull (2P) Gen. Extreme Value Gen. Gamma (4P) Pearson 5 (2P) Gen. Extreme Value Parameters k=-0.19099 =0.65064 =6.104 =302.87 =0.0061 k=-0.5255 =1.9833 =8.1687 k=4.5392E+7 =3.6791 =1.4983E+8 =-1.4983E+8 =1.9683 =7.786 =42.341 =65.371 =-56.739 =10.361 =0.6874 =5.5538 =11.374 =-3.388 =99.275 =0.17517 =-11.197 k=-0.24962 =1.7275 =5.5468 =186.61 =4420.6 =-17.66 k=-0.48463 =2.103 =5.6586 =1.9754 =-0.33814 =2.3106 1=34811.0 2=84746.0 =671.58 =-270.33 =5.5963 =9.1116 =-2.8707 k=-0.10922 =1.5379 =4.5656 k=0.98179 =9.0462 =0.56163 =3.1144 =5.8867 k=-0.33192 =1.8814 =5.0503 k=7.6955 =0.18288 =6.6278 =1.865 =7.2398 =35.996 k=-0.12606 =1.8538 =4.7016 Table 3.15(c). Parameters of the distributions fitted for Bright Sunshine data sets. Study Period 10 week 11 week 12 week 13 week 14 week 15 week 16 week 17 week Distributions Pearson 5 (2P) Pearson 6 (4P) Gen. Extreme Value Gen. Gamma (4P) Pearson 6 (3P) Weibull (3P) Gamma (2P) Gen. Gamma (3P) Lognormal (2P) Log-Pearson 3 Gen. Extreme Value Gen. Gamma (4P) Log-Pearson 3 Pearson 6 (4P) Gen. Extreme Value Weibull (2P) Gen. Extreme Value Weibull (2P) Gen. Extreme Value Log-Pearson 3 Gen. Extreme Value Log-Pearson 3 Parameters =4.645 =21.397 1=71.073 2=88.787 =15.705 =-7.1531 k=-0.51525 =1.8675 =4.9998 k=9.3109 =0.41883 =9.1555 =-1.8893 1=7.1213 2=1.1589E+8 =8.7428E+7 =12.653 =18.439 =-12.289 =8.5054 =0.64853 k=0.98578 =8.2386 =0.64853 =0.37365 =1.6434 =8.0178 =-0.1333 =2.7121 k=-0.27324 =2.0501 =5.2841 k=6.1523 =0.28286 =7.5382 =1.3255 =4.8308 =-0.17463 =2.5739 1=9730.1 2=13924.0 =216.92 =-145.57 k=-0.34374 =2.3213 =5.2994 =2.7832 =6.7243 k=-0.35206 =2.1715 =5.8174 =3.3259 =7.1501 k=-0.45339 =2.063 =6.7856 =2.1593 =-0.20805 =2.3975 k=-0.74188 =2.0831 =7.7856 =1.7031 =-0.21526 =2.4145 Table 3.15(d). Best fit probability distribution for Bright Sunshine. STUDY PERIOD BEST-FIT Seasonal Log-Gamma 1 Week Gen. Extreme Value 2 Week Weibull (3P) Gamma (2P) 3 Week 4 Week Gamma (3P) 5 Week Gen. Extreme Value 6 Week Log-Pearson 3 7 Week Weibull (3P) 8 Week Gen. Extreme 9 Week Pearson 5 (2P) 10 Week Pearson 6 (4P) 11 Week Gen. Extreme 12 Week Gen. Gamma (3P) 13 Week Log-Pearson 3 14 Week Gen. Extreme 15 Week Weibull (2P) 16 Week Log-Pearson 3 Gen. Extreme Value 17 Week 3.5 Conclusion In this chapter before identifying the best fit probability distribution, the descriptive statistics are computed for each weather parameters for different study period. The result of weather parameters analysis for identifying the best fit probability distribution revealed that the distribution pattern for different data set can be identified out of a large number of commonly used probability distributions by using different goodness of fit tests. After observing the weekly weather parameters independently we can conclude them as follows: Rainfall The data represented that the maximum value of weekly rainfall is 443.20 mm in fourth week of August in year 2000. Normal distribution represents the best fitted probability distribution for seasonal rainfall and is also observed in the second week of July. Moreover, Generalized Extreme Value is observed six times in the weekly data, in the first week of June, second week of June, August and September, and also, in the last two weeks of August, indicating the highest contribution of the distribution. In addition, Gamma (3P) distribution, Log-Pearson 3 distribution, Pearson 6 (3P) distribution and Lognormal (3P) distribution are pragmatic as the best fitted probability distributions for the weekly rainfall data sets. Maximum Temperature The data offered that the seasonal maximum temperature ranged between 23.6 OC in the year 2005 to 43.2 OC in the years 1966 and 1967. Weibull (2P) distribution represents the best fitted distribution for seasonal maximum temperature and is also observed in the ninth and twelfth week, that is, first and fourth week of August. Further, Log-Pearson 3 is observed consecutively in the second and third week of July. Similarly, Weibull (3P) is observed successively in the last two weeks of September and also in the first week of July and September. Moreover, we observe that Generalized Gamma (3P) distribution, Generalized Extreme Value, Pearson 5 (2P, 3P) distributions and Lognormal (3P) distribution as the best fitted probability distributions for the weekly maximum temperature data sets. Minimum Temperature The seasonal minimum temperature ranged between 17.2 OC in the year 1984 to 29.2 OC in the year 1995. Weibull (3P) distribution represents the best fitted distribution for seasonal minimum temperature and is also observed in the second week of September. Further, we observe that Generalized Extreme Value is obtained repetitively in three weeks, that is, second, third and fourth week of July, also in the last week of June. Also, Weibull (2P) appeared four times among the 17 weeks, that is, second week of June, first week of September and fourth week of August and September. Besides, Gamma (2P, 3P) distributions, Normal distribution, Pearson 5 (2P, 3P) distributions and Pearson 6 (4P) distribution are obtained as the best fitted probability distributions for the weekly minimum temperature data sets. Relative Humidity at 7AM The data explains the seasonal average relative humidity at 7 AM ranged between 38% (minimum) to 98% (maximum). Log-Pearson 3 distribution represents the best fitted distribution for seasonal average relative humidity at 7 AM and is also observed in the first week of July, second and last week of September. Likewise, we observe that Generalized Extreme Value is obtained in recurrence form in the first three weeks of August and also in the last week of August. Moreover, Gamma (3P) distribution, Generalized Gamma (3P, 4P) distributions, Normal distribution, Pearson 5 (3P) distribution, Weibull (2P, 3P) distributions are found as the best fitted probability distributions for the weekly average relative humidity at 7 AM data sets. Relative Humidity at 2 PM The seasonal average relative humidity at 2 PM ranged between 16 % in the years 1965 and 2005 to 92 % in the year 1988. Weibull (2P) distribution represents the best fitted distribution for seasonal average relative humidity at 2 PM and is also observed in the tenth and fifteenth week data set, that is, second week of August and September respectively. As well as, we observe Weibull (3P) distribution as recurrence in last two weeks of September, and are also observed in first week of July and August. Besides, Generalized Extreme Value, Generalized Gamma (4P) distribution, Log-Pearson 3 distribution, Normal distribution, Pearson 5 (2P) distribution, Pearson 6 (4P) distributions are obtained as best fitted probability distributions for the weekly average relative humidity at 2 PM data sets. Pan Evaporation The data shows the seasonal average Pan Evaporation ranged between zero mm to 18.5 mm in the year 1967. Gamma (3P) distribution represents the best fitted distribution for seasonal average pan evaporation. Additionally, we observe Normal distribution plays a vital role by appearing four times as best fit in the weekly data set, that is, in second and last week of July and also in third and last week of August. Moreover, Generalized Gamma (4P) distribution is observed thrice in the weekly data set, that is, second and fourth week of August and third week of September respectively. Besides, Gamma (2P), Generalized Extreme Value, Generalized Gamma (3P) distribution, Log-Pearson 3 distribution, Normal distribution, Pearson 5 (3P) distribution, Pearson 6 (3P) distribution and Weibull (2P, 3P) are obtained as the best fitted probability distributions for the weekly average pan evaporation data sets. Bright Sunshine The seasonal average bright sunshine ranged between 0.70 hours in the year 2009 to 11.6 hours in the years 1986 and 2009. Log-Gamma distribution represents the best fitted distribution for seasonal average bright sunshine. Further, we observe Generalized Extreme Value plays an essential role by appearing six times in the weekly data set, in the first week of June, July and September and also in third week of August as well as the last week of July and September. Moreover, Log-Pearson 3 distribution is observed thrice in the weekly data set, that is, second week of July, last week of August and third week of September, respectively. In addition, Gamma (2P, 3P) distributions, Generalized Gamma (3P) distribution, Pearson 5 (2P) distribution, Pearson 6 (4P) distribution and Weibull (2P, 3P) distributions are found as the best fitted probability distributions for the weekly average bright sunshine data sets. The best fit probability distributions for seasonal and weekly weather parameters for different study period are different. In general, Generalized Extreme Value, Weibull (2P, 3P) distributions are most commonly the best fit probability distribution for most of the weeks among the different weather parameters. WEATHER FORECASTING MODELS Introduction Correlation Analysis Methodology for forecasting models Development of forecasting model for weather parameters Comparison of prediction ability of forecasting models Conclusion CHAPTER 4 WEATHER FORECASTING MODELS 4.1 Introduction The early civilizations used reoccurring astronomical and meteorological events to monitor seasonal changes in the weather. In arid and non-arid regions, the most dominant meteorological parameter is rainfall which reflects wet and dry period characteristics and is measured at point locations but assumed to represent the surrounding areas. The occurrence of rainfall depends on several other weather parameters. This chapter describes all the seven weather parameters under study for which the weekly data of seven parameter viz. rainfall, maximum and maximum temperature, relative humidity at 7.00 am and 2.00 pm, bright sunshine hours and pan evaporation for the four monsoon months were recorded. The monsoon season in this region ranges between 15 to 20 weeks, thus 17 weeks weather data from 4th June to 30th September of 50 years (1961-2010) was considered for the present study. 4.2 Correlation Analysis The Inter correlation coefficient between different parameters based on 50 years data set is computed and presented in table 4.1. From table 4.1 it was observed that maximum temperature is positively correlated with minimum temperature but highly positively correlated with pan evaporation and bright sunshine. Relative humidity at 7 am is also highly positively correlated with relative humidity at 2 pm. There is negative correlation between rainfall and maximum temperature, minimum temperature, pan evaporation and bright sunshine hours while rainfall is positively correlated with relative humidity at 7 am and 2 pm. It can be observed that there is highest correlation between relative humidity at 7 am and relative humidity at and minimum temperature. 2 pm and lowest correlation among rainfall Table 4.1. Inter correlation coefficient between weather parameters for total data set. Maximum Minimum Relative Relative Pan Temperature Temperature humidity humidity Evaporation (7 am) (2 pm) Maximum temperature 1.00000 0.19218 -0.82554 -0.83234 0.77801 Minimum Temperature 0.19218 1.00000 -0.11481 0.07373 0.14586 Relative Humidity (7 am) -0.82554 -0.11481 1.00000 0.83736 -0.77588 Relative Humidity (2 pm) -0.83234 0.07373 0.83736 1.00000 -0.73340 Pan Evaporation 0.77801 0.14586 -0.77588 -0.73340 1.00000 Bright Sunshine 0.57776 -0.14112 -0.48374 -0.66752 0.43054 Rainfall -0.47340 -0.00666 0.36459 0.52490 -0.28520 Parameters 4.3 Bright Sunshine Rainfall 0.57776 -0.47340 -0.14112 -0.00666 -0.48374 0.36459 -0.66752 0.52490 0.43054 -0.28520 1.00000 -0.53330 -0.53330 1.00000 Methodology for forecasting Models The methodology for forecasting models viz. Multiple Linear Regression (MLR), Autoregressive Integrated Moving Average (ARIMA) and Artificial Neural Network (ANN) models are given in brief in the next subsections. Hybrid approach for developing weather forecasting model and their performance evaluation criteria is also discussed in this section. 4.3.1 Multiple Linear regression model Multiple regression analysis is to include a number of independent parameters at the same time for predicting the significance of a dependent parameter, (Snedecor and Cochran, 1967). In the study, the multiple linear regression equation fitted to the weekly weather parameters treating one as independent parameter and six other as independent parameters are given below in generalized form. Y = β 0 + β1X1 + β 2 X 2 + ............... + β 6 X 6 + ε (4.1) Where: 0 = Intercept, i = regression coefficient of ith independent parameters, ( i = 1,2,…, 6), = error term, Xi = ith weather parameter. To identify the significant parameters for predicting the dependent parameter based on the six independent parameters, stepwise regression analysis was used. Stepwise process starts with a simple regression model in which most extremely correlated one independent parameter was only incorporated at first in the company of a dependent parameter. Correlation coefficient is further examined in the practice to find an additional independent parameter that explains the major portion of the error remaining from the initial regression model. Until the model includes all the significant contributing parameters, the procedure keeps on repeating. The possible bias in the stepwise regression procedure fallout from the consideration of only one parameter at a time. 4.3.2 Autoregressive Integrated Moving Average Model The equally spaced univariate time series data, transfer function data, and intervention data are analyzed and forecast using the Autoregressive Integrated Moving-Average (ARIMA) or autoregressive moving-average (ARMA) model. An ARIMA model predicts a value in a response time series as a linear combination of its own past values, past errors (also called shocks or innovations), and current and past values of other time series. The ARIMA approach was first popularized by Box and Jenkins (1976), and ARIMA models are often referred to as Box-Jenkins models. ARIMA (p, d, q) models are the extension of AR model that use three components for modeling the serial correlation in the time series data. The first component is the autoregressive (AR) term, where there is a memory of past events and it uses the ‘p’ lags of the time series. The second component is the integration (I) term which accounts for stabilizing or making the data stationary, making it easier to forecast. Each integration order corresponds to differencing the time series. I (d) means differencing the data‘d’ times. The third component is the moving average (MA) term of the forecast errors, such that the longer the historical data, the more accurate the forecasts will be, as it learns overtime. The MA (q) model uses the ‘q’ lags of the forecast errors to improve the forecast. A dependent weekly parameter time series data, Yt : 1 t n , mathematically the pure ARIMA model is written as: Wt B a B t (4.2) where, t = indexes time of weekly parameter. Wt = the response series Y t = 1 B Yt . d = the mean time of weekly parameter. B = the backshift operator, that is, BX t X t 1 B 1 1B1 ............ pBp = the autoregressive operator. B 1 1B1 ............ qBq = the moving average operator. a t = the independent disturbance (random error). d = the degree of differencing. In this ARIMA (p, d, q) modeling, the foremost step is to decide whether the time series is stationary or non-stationary. If it is non-stationary, it is transformed into a stationary time series by applying appropriate degree of differencing by selecting suitable value of ‘d’. The appropriate values of p and q are chosen by examining the autocorrelation function (ACF) and partial autocorrelation function (PACF) of the time series data set. 4.3.3 Artificial Neural Network Model Artificial Neural Networks are massively parallel adaptive networks of simple non-linear computing elements called neurons which are intended to abstract and model some of the functionality of the human nervous system in an attempt to partially capture some of its computational strengths. Artificial Neural Network (ANN) is loosely based on biological neural systems, in that; they are made up of an interconnected system of neurons. Also, a neural network can identify patterns adaptively between input and output data set in a somewhat analogous fashion to the learning process. Neural networks are highly robust with respect to underlying data distributions and no assumptions are made about relationships between parameters. Artificial Neural Networks (ANNs) provide a methodology for solving many types of nonlinear problems that are difficult to solve by traditional techniques. In Artificial Neural Network Software all inputs and outputs are normalized between 0 and 1. Appropriate process of normalization and denormalization of data is needed before and after the program execution. The best and the simplest way is to divide it by the maximum for normalization and after the program execution the result is to be multiplied by the same amount. There are many neural network models, but the basic structure involves a system of layered, interconnected nodes and neurons are presented in figure 4.1. The nodes are arranged to form an input layer, with neurons in each hidden layer connected to all neurons in neighboring layers. The input layer supplies data to the hidden layer and does not contain activation or transfer functions. A typical feed-forward network might use a dot-product activation function that, for each neuron Bj ( j = 1, 2, …..,n) in the hidden layer, is computed as: m B j w ijAi w ojAo (4.3) i 1 with input nodes Ai ( i = 1, 2,…., m) and weights Wij between nodes Ai and neurons Bj. The bias node (Ao) typically has a constant input of 1, with a matching weight Woj. A similar calculation is made for each neuron Ck ( k = 1, 2, …., o) in the output layer (o = 1 for the example in figure 4.1), using weights Wjk between neurons Bj and Ck ( with Wok and Bo for the bias). Each neuron value is subsequently passed through a transfer function, which may be linear or nonlinear (Zurada, 1992). A common choice of nonlinear transfer function is a sigmoid, of the general form: u 1 eu 1 (4.4) Where, u = Bj (or Ck ). Ai (i 1, 2,...., m) Figure 4.1. B j ( j 1, 2,....., n) C k (k 1, 2,....., O) An (m x n x o) artificial neural network structure, showing a multilayer perceptron. Nonlinearities are incorporated into the network via the activation and transfer functions in each neuron. Complexities in the data are captured through the number of neurons in the hidden layer. In adoption of Neural Network for practical purpose, it is desired to restrict the connections between neurons. This is done by fixing some of the weights to zero, so that, they can dropped out from the calculations, the working principal for subsequent adjustment of weight is in accordance with the error propagation in the network. If increasing a given weight leads to more error, we adjust the weight downwards and if increasing a weight leads to less error, we adjust the weight upwards. Adjustment of all the up or down continues throughout this process until the weights and error settled down. To avoid over fitting to the data, a neural network is usually trained on a subset of inputs and outputs to determine weights, and subsequently validated on the remaining (quasiindependent) data to measure the accuracy of prediction. 4.3.4 Hybrid Approach In recent times, the concept of combined model instead of single time series model is being prepared for prediction purpose. Several researchers have used a hybrid principal component and ANN approach to improve the accuracy of the prediction results of their long range forecasting investigations. In the present investigation, the hybrid approach for weather forecasting is tried to improve the accuracy of prediction. Several studies show that the techniques of combinations of ANN with ARIMA offer a competitive edge over each of the individual model. Taskaya et al. (2007) doughty the degrade performance of ARIMA neural network hybrids, if the relationship between the linear and non-linear components is different from additive assumption. The combination of MLR with ARIMA and MLR with ANN is proposed in the present study. The hybrid of multiple linear regression with ARIMA and ANN techniques to analyze the weekly weather parameters of all the seven parameter studied and included in the comparative study to identify the best precise weather forecasting model. Hybrid Model of Multiple Linear Regression and Autoregressive Integrated Moving Average (MLR_ARIMA) The composition of a multiple linear regression with autoregressive integrated moving average model is proposed to develop a new hybrid model in this section. It is assumed that the predictive performance improves by integrating two single models. For this purpose the selected significantly contributed parameters obtained through stepwise regression analysis are used to develop the MLR_ARIMA model and their performance is compared with all other models. Hybrid Model of Multiple Linear Regression and Artificial Neural Network (MLR_ANN) It has been observed in the current researches that a single model may not be sufficient to identify all the characteristics of the time series data. The hybrid models decompose a time series into linear and non-linear form and prove to be better approach in comparison to single model. In this section the hybrid model of multiple linear regression with neural network approach is proposed to yield more accurate results. Similar to previous model, the significantly contributed parameters selected through stepwise regression analysis in multiple linear regression model are used to develop the hybrid MLR_ANN model. 4.3.5 Performance evaluation criteria Many analytical methods have been proposed for the evaluation and inter-comparison of different models, which can be evaluated in terms of graphical representation and numerical computations. The graphical performance criteria involves: A linear scale plot of the predicted and observed weather parameters for training and testing data sets for all the models. The numerical performance criterion involves: 1 N Mean error (BIAS): Yi Yi N i 1 (4.5) 1 N Mean absolute error (MAE): Yi Yi N i 1 (4.6) Root mean square error (RMSE): 1 N Y i Yi N i 1 2 (4.7) Y i - Yi Prediction error (PE): Yi (4.8) Correlation Coefficient (r): This is obtained by performing a regression between the predicted values and the actual values and is computed by Y Y Y Y i i i i i 1 N r Y Y N i 1 where, i i 2 Y Y i i i 1 N 2 (4.9) implies the average over the whole test set, N is the total number of forecast outputs. Yi and Y i are the actual and predicted values respectively for i = 1, 2, …..,N, Yi and Y i are the mean values of the actual and predicted values respectively. For the best prediction, the BIAS, MAE and RMSE values should be small and PE should be sufficiently small i.e., close to 0. But ‘r’ should be found closer to 1 (between 0 - 1) for indicating better agreement between observed and predicted values. The recital of weather forecasting models had been evaluated on the basis of Mat lab 7.0.1 version, students’ academic SAS version and Microsoft Excel. 4.4 Development of forecasting model for weather parameters 4.4.1 Introduction The most common problems, that a contemporary data analyst encounters, is pulling out of significant conclusions about a intricate system using data from a solitary measured parameter. The methodology presented above was applied to the 50 years monsoon weather data for the months of June, July, August and September. The weather data was further classified into weeks for further analysis. A total of 850 data sets (17 weeks x 50 years) of weekly parameters were used. The most significantly contributed parameters were selected using stepwise regression analysis based on training data set of 35 years i.e., 595 data sets and the remaining data sets (15 years) are used in testing of the developed models which are used for comparing the real and predicted values. 4.4.2 Rainfall The multiple linear regression model is fitted to predict the weekly rainfall as dependent parameter taking the other weekly independent parameters as maximum temperature, minimum temperature, relative humidity at 7 am and 2 pm, pan evaporation and bright sunshine. The most significantly contributed parameters are selected using stepwise regression analysis based on 595 data set of 35 years and the best fit multiple regression model is given below: Y = 391.90674 - 8.65592X1 -1.89389X 3 + 2.46673X 4 + 4.70662X5 - 8.50348X 6 (4.10) where, maximum temperature (X1), relative humidity at 7 am (X3), relative humidity at 2 pm (X4), pan evaporation (X5), bright sunshine (X6) are observed with coefficient of determination as 37.66% as most contributing parameters for predicting the rainfall (Y), as dependent parameter. The parameter minimum temperature (X2) seems to have least control over rainfall and hence does not appear in the proposed multiple regression model. The stepwise regression procedure for selecting the significant parameters for the rainfall parameter is mentioned in Appendix A (a). We next performed an ARIMA modeling of the data using all the weather parameters. The rainfall parameter time series data set is stationary so we do not require any transformation in the data set. Then we used the autocorrelation function (ACF) and partial autocorrelation function (PACF) of the weekly rainfall time series (see figure 4.2) to estimate the values of ‘p’ and ‘q’ of the ARIMA model. Note that while both the ACF and PACF have significant terms at lags 1and 17, they have maximum correlation coefficient (0.152) at lag 1. This possibly suggests that an ARIMA of order 1 is the best fit for the rainfall weather data set. Using an iterative model building process of identification, estimation, and diagnostic checking using all the weather parameters, we finally selected an ARIMA (1, 0, 1) model as the most appropriate fit for the observed data. Figure 4.2. Plots of autocorrelation and partial autocorrelation coefficients and time lags of weakly Rainfall parameter. Then an ANN model building process was performed using all the weather parameters. We selected the best suited architecture of Feed Forward Neural Network Model for our weekly rainfall data by comparing methods and changing the layer and number of neurons in each network. This proposed model had an input environment with all the weather parameters, three hidden layers (8 neurons in the first hidden layer, 10 neurons in the second hidden layer and 12 neurons in the third hidden layer) and one neuron in the output layer (see figure 4.3). Figure 4.3. Artificial neural network structure for weekly Rainfall prediction parameter Figure 4.4. Mapping of the number of epochs obtained for desired goal for ANN model for Rainfall parameter We used a Scaled Conjugate Gradient Algorithm for training this multilayer perceptron, a tan sigmoid activation function in the first, second and third hidden layer and a log sigmoid activation function in the output layer. A set of random values distributed uniformly between -1 to +1 are used to initialize the weight of the neural network model. 515 epochs are used to train the neural network model with 0.003 goal (see figure 4.4). The program used for training the 595 data sets using all the weather parameters for ANN model for predicting the rainfall parameter is mentioned in Appendix B (a). We next performed a hybrid MLR_ARIMA modeling of the data using the five weather parameters, significantly selected through stepwise regression earlier, for predicting the rainfall weather parameter. We used the same ACF and PACF of the rainfall weather parameter, having maximum correlation coefficient (0.152) at lag 1 (see figure 4.2). Suggesting that an MLR_ARIMA (1, 0, 1) model as the most appropriate fit using the significantly selected weather parameters data set. Finally we performed a hybrid MLR_ANN model building process using the same, more significant parameters as suggested by the multiple regression model. We selected the same best suited architecture of Feed Forward Neural Network Model as obtained while developing the ANN model for weekly rainfall data set. This proposed hybrid MLR_ANN model had different input environment, that is, only the significant parameters selected through stepwise regression analysis are used, but the same three hidden layers (8 neurons in the first hidden layer, 10 neurons in the second hidden layer and 12 neurons in the third hidden layer) and one neuron in the output layer as considered earlier while developing the ANN model (see figure 4.5). We used the same Scaled Conjugate Gradient Algorithm for training this multilayer perceptron, a tan sigmoid activation function in the first, second and third hidden layer and a log sigmoid activation function in the output layer. A new different set of random values distributed uniformly between -1 to +1 are used to initialize the weight of the MLR_ANN model. 415 epochs are used to train the neural network model with the same 0.003 goal (see figure 4.6). The program used for training the 595 data sets using the significantly selected parameter for the hybrid MLR_ANN model of dependent rainfall weather parameter is mentioned in Appendix C (a). Figure 4.5. Hybrid MLR_ANN structure for weekly Rainfall prediction parameter. Figure 4.6. Mapping of the number of epochs obtained for desired goal for hybrid MLR_ANN model for Rainfall parameter. 4.4.3 Maximum temperature The multiple regression model is fitted to predict the weekly maximum temperature as dependent parameter taking the other weekly independent parameters as rainfall, minimum temperature, relative humidity at 7 am and 2 pm, pan evaporation and bright sunshine. The most significantly contributed parameters are selected using stepwise regression analysis based on 595 data set of 35 years and the best fit multiple regression model is given below: Y = 34.9866 - 0.0025X1 + 0.3343X 2 - 0.0778X3 - 0.0703X 4 + 0.2465X5 + 0.0683X6 (4.11) where, rainfall (X1), minimum temperature (X2), relative humidity at 7 am (X3) and 2 pm (X4), pan evaporation (X5), bright sunshine (X6) are observed as 84.32% contributing parameters for predicting the maximum temperature (Y), as dependent parameter. The stepwise regression procedure for selecting the significant parameters for the maximum temperature is mentioned in Appendix A (b). We next performed an ARIMA modeling of the data using all the weather parameters. The maximum temperature time series data set is found to be stationary data set. Then, we use the autocorrelation function (ACF) and partial autocorrelation function (PACF) of the weekly maximum temperature time series (see figure 4.7) to estimate the parameters (p and q) of the ARIMA model. Note that while both the ACF and PACF have significant terms at lags 1, 16 and 17, they have maximum correlation coefficient (0.463) at lag 1. This possibly suggests that an ARIMA of order 1 is best fit for the maximum temperature data. Using an iterative model building process of identification, estimation, and diagnostic checking using all the weather parameters, we finally selected an ARIMA (1, 0, 1) model as the most appropriate fit for the observed data. Further, we performed an ANN model building process using all the same weather parameters. We selected the best suited architecture of Feed Forward Neural Network Model for our weekly maximum temperature data by comparing methods and changing the layer and number of neurons in each network. This proposed model had an input environment with all the weather parameters, three hidden layers (8 neurons in the first hidden layer, 10 neurons in the second hidden layer and 12 neurons in the third hidden layer) and one neuron in the output layer (see figure 4.8). We used a Scaled Conjugate Gradient Algorithm for training this multilayer perceptron, a tan sigmoid activation function in all the four layers, that is, the three hidden layers and an output layer. A set of random values distributed uniformly between -1 to +1 are used to initialize the weight of the neural network model. 386 epochs are used to train the neural network model with 0.00000117083 goal (see figure 4.9). The program used for training the 595 data sets using all the weather parameter for ANN model for predicting the maximum temperature parameter is mentioned in Appendix B (b). Since in the stepwise regression analysis, all the parameters are selected as significant parameters, therefore, there is no possibility of developing the hybrid models for maximum temperature weather parameter, that is, the hybrid model will be same as ARIMA and ANN model. Figure 4.7. Plots of autocorrelation and partial autocorrelation coefficients and time lags of weakly Maximum Temperature parameter. Figure 4.8. Artificial neural network structure for weekly Maximum Temperature prediction parameter. Figure 4.9. Mapping of the number of epochs obtained for desired goal for ANN model for Maximum Temperature. 4.4.4 Minimum temperature The multiple regression model is fitted to predict the weekly minimum temperature as dependent parameter taking the other weekly independent parameters as maximum temperature, rainfall, relative humidity at 7 am and 2 pm, pan evaporation and bright sunshine. The most significantly contributed parameters are selected using stepwise regression analysis based on 595 data set of 35 years and the best fit multiple regression model is given below: Y = 2.64644 + 0.52100X1 - 0.01740X 3 + 0.09727X 4 0.09121X 6 (4.12) where, minimum temperature (Y) as the dependent parameter having 30.13% contribution of the significant parameters maximum temperature (X1), relative humidity at 7 am (X3), relative humidity at 2 pm (X4) and bright sunshine (X6). The parameters rainfall (X2) and pan evaporation (X5) seems to have least control over minimum temperature and hence does not appear in the proposed multiple regression model. The stepwise regression procedure for selecting the significant parameters for the minimum temperature parameter is mentioned in Appendix A (c). We next performed an ARIMA modeling of the data using all the weather parameters. The minimum temperature time series data set is stationary, so no transformation of the data set is required. We then used the autocorrelation function (ACF) and partial autocorrelation function (PACF) of the weekly minimum temperature time series (see figure 4.10) to estimate the values of ‘p’ and ‘q’ of the ARIMA model. Note that while both the ACF and PACF have significant terms commonly at lags 1, 15, 16 and 17, they have maximum correlation coefficient (0.435) at lag 1. This possibly suggests that an ARIMA of order 1 is the best fit for the minimum temperature data set. Using an iterative model building process of identification, estimation, and diagnostic checking using all the weather parameters, we finally selected an ARIMA (1, 0, 1) model as the most appropriate fit for the observed data. But after applying ARIMA (1, 0, 1) it is observed that there is lag 18 in the autocorrelation plots, so we applied ARIMA (1, 0, 18) model and obtained the appropriate fit for the observed data. Then an ANN model building process was performed using all the weather parameters. We selected the best suited architecture of Feed Forward Neural Network Model for our weekly minimum temperature data by comparing methods and changing the layer and number of neurons in each network. This proposed model had an input environment with all the weather parameters, three hidden layers (8 neurons in the first hidden layer, 10 neurons in the second hidden layer and 12 neurons in the third hidden layer) and one neuron in the output layer (see figure 4.11). We used a Scaled Conjugate Gradient Algorithm for training this multilayer perceptron, a tan sigmoid activation function for all the hidden and output layers. A set of random values distributed uniformly between -1 to +1 are used to initialize the weight of the neural network model. 594 epochs are used to train the neural network model with 0.000001717 goal (see figure 4.12). The program used for training the 595 data sets using all the weather parameters for ANN model for predicting the minimum temperature parameter is mentioned in Appendix B (c). Figure 4.10. Plots of autocorrelation and partial autocorrelation coefficients and time lags of weakly Minimum Temperature parameter. Figure 4.11. Artificial neural network structure for weekly Minimum Temperature prediction parameter. Figure 4.12. Mapping of the number of epochs obtained for desired goal for ANN model for Minimum Temperature parameter. We next performed a hybrid MLR_ARIMA modeling of the data using the four weather parameters, significantly selected through stepwise regression earlier, for predicting the minimum temperature weather parameter. We used the same ACF and PACF of the minimum temperature weather parameter (see figure 4.10), we applied ARIMA (1, 0, 18) model and obtained the appropriate fit using the significantly selected weather parameters data set as earlier applied for ANN model. Figure4.13. Hybrid MLR_ANN structure for weekly Minimum Temperature prediction parameter. Finally we performed a hybrid MLR_ANN model building process using the same, more significant parameters as suggested by the multiple regression model. We selected the same best suited architecture of Feed Forward Neural Network Model as obtained while developing the ANN model for weekly minimum temperature data set. This proposed hybrid MLR_ANN model had different input environment, that is, only the significant parameters selected through stepwise regression analysis are used, but the same three hidden layers (8 neurons in the first hidden layer, 10 neurons in the second hidden layer and 12 neurons in the third hidden layer) and one neuron in the output layer (see figure 4.13) are considered. We used the same Scaled Conjugate Gradient Algorithm for training this multilayer perceptron, a tan sigmoid activation function for all the hidden and output layers. A new different set of random values distributed uniformly between -1 to +1 are used to initialize the weight of the MLR_ANN model. 1028 epochs are used to train the neural network model with the same 0.000001717 goal (see figure 4.14). The program used for training the 595 data sets using the significantly selected parameter for the hybrid MLR_ANN model of dependent Minimum Temperature weather parameter is mentioned in Appendix C (b). Figure 4.14. Mapping of the number of epochs obtained for desired goal for Hybrid MLR_ANN model for Minimum Temperature parameter. 4.4.5. Relative Humidity at 7 AM The multiple regression model is fitted to predict the weekly average relative humidity at 7 am as dependent parameter taking the other weekly independent parameters as maximum temperature, minimum temperature, rainfall, relative humidity at 2 pm, pan evaporation and bright sunshine. The most significantly contributed parameters are selected using stepwise regression analysis based on 595 data set of 35 years and the best fit multiple regression model is given below: Y = 116.06296 -1.31618X1 - 0.21386X2 - 0.01111X3 + 0.33298X 4 - 0.60294X5 + 0.29476X6 (4.13) where, relative humidity at 7 am (Y) as the dependent parameter having 79.04% contribution of the significant parameters maximum temperature (X1), minimum temperature (X2), rainfall (X3), relative humidity at 2 pm (X4) pan evaporation (X5) and bright sunshine (X6). The stepwise regression procedure for selecting the significant parameter for the relative humidity at 7 am parameter is mentioned in Appendix A (d). We next performed an ARIMA modeling of the data using all the weather parameters. Further, seeing the relative humidity at 7 AM time series data set it is clear that the data is stationary and, therefore, does not require any transformation. We used the autocorrelation function (ACF) and partial autocorrelation function (PACF) of the weekly average relative humidity at 7 AM time series (see figure 4.15) to estimate the parameters (p and q) of the ARIMA model. Here, while both the ACF and PACF have common significant terms at lags 1, 15, 16 and 17, but autocorrelation function (ACF) has maximum correlation coefficient (0.531) at lag 17 and partial autocorrelation function (PACF) have maximum correlation coefficient (0.513) at lag 1. Using an iterative model building process of identification, estimation, and diagnostic checking using all the weather parameters, we finally selected an ARIMA (1, 0, 1) model as the most appropriate fit for the observed data. But after applying ARIMA (1, 0, 1) it is observed that there is lag 10 in the autocorrelation plots, so we applied ARIMA (1, 0, 10) model and obtained the appropriate fit for the observed data without lags. Next, we performed an ANN model building process using all the same weather parameters. We selected the best suited architecture of Feed Forward Neural Network Model for our weekly relative humidity at 7 am by comparing methods and changing the layer and number of neurons in each network. This proposed model had an input environment with all the parameters, three hidden layers (8 neurons in the first hidden layer, 10 neurons in the second hidden layer and 12 neurons in the third hidden layer) and one neuron in the output layer (see figure 4.16). We used a Scaled Conjugate Gradient Algorithm for training this multilayer perceptron, a tan sigmoid activation function in the first, second and third hidden layer and a log sigmoid activation function in the output layer. A set of random values distributed uniformly between -1 to +1 are used to initialize the weight of the neural network model. 747 epochs are used to train the neural network model with 0.0000166 goals (see figure 4.17). The program used for training the 595 data sets using all the weather parameters for ANN model for predicting relative humidity at 7 AM parameter is mentioned in Appendix B (d). Since in the stepwise regression analysis all the parameters are selected as significant parameters, therefore, there is no possibility of developing the hybrid models for relative humidity at 7 AM weather parameter, that is, the hybrid model will be same as ARIMA and ANN model. Figure 4.15. Plots of autocorrelation and partial autocorrelation coefficients and time lags of weakly average Relative Humidity at 7 AM parameter. Figure 4.16. Artificial neural network structure for weekly average Relative Humidity at 7 AM prediction parameter. Figure 4.17. Mapping of the number of epochs obtained for desired goal for ANN model for Relative Humidity at 7 AM parameter. 4.4.6. Relative Humidity at 2 PM The multiple regression model is fitted to predict the weekly average relative humidity at 2 pm as dependent parameter taking the other weekly independent parameters as maximum temperature, minimum temperature, relative humidity at 7 am, rainfall, pan evaporation and bright sunshine. The most significantly contributed parameters are selected using stepwise regression analysis based on 595 data set of 35 years and the best fit multiple regression model is given below: Y = 46.85286 -1.85133X1 +1.74419X 2 + 0.51835X 3 + 0.02415X 4 - 0.40761X5 -1.02706X 6 (4.14) where, relative humidity at 2 pm (Y) as the dependent parameter having 84.45% contribution of the significant parameters maximum temperature (X1), minimum temperature (X2), relative humidity at 7 am (X3), rainfall (X4), pan evaporation (X5) and bright sunshine (X6). The stepwise regression procedure for selecting the significant parameters for the relative humidity at 2 pm parameter is mentioned in Appendix A (e). We next performed an ARIMA modeling of the data using all the weather parameters. The relative humidity at 2 pm time series data set required no transformation in the data since it is a stationary data set. We then used the autocorrelation function (ACF) and partial autocorrelation function (PACF) of the weekly average relative humidity at 2 pm time series (see figure 4.18) to estimate the values of ‘p’ and ‘q’ of the ARIMA model. Note that while both the ACF and PACF have common significant terms at lags 1, 15, 16 and 17, they have maximum correlation coefficient (0.622) at lag 1. This possibly suggests that an ARIMA of order 1 is best fit for the data. Using an iterative model building process of identification, estimation, and diagnostic checking using all the weather parameters, we finally selected an ARIMA (1, 0, 0) model as the most appropriate fit for the observed data. Further we performed an ANN model building process using all the same weather parameters. We selected the best suited architecture of Feed Forward Neural Network Model for our weekly relative humidity at 2 PM by comparing methods and changing the layer and number of neurons in each network. This proposed model had an input environment with all the weather parameters, three hidden layers (8 neurons in the first hidden layer, 10 neurons in the second hidden layer and 12 neurons in the third hidden layer) and one neuron in the output layer (see figure 4.19). We used a Scaled Conjugate Gradient Algorithm for training this multilayer perceptron, a tan sigmoid activation function in the three hidden layer and the output layer. A set of random values distributed uniformly between -1 to +1 are used to initialize the weight of the neural network model. 544 epochs are used to train the neural network model with 0.00003155 goals (see figure 4.20). The program used for training the 595 data sets using all the weather parameter for ANN model for predicting relative humidity at 2 pm parameter is mentioned in Appendix B (e). Figure 4.18. Plots of autocorrelation and partial autocorrelation coefficients and time lags of weakly average Relative Humidity at 2 PM parameter. Since in the stepwise regression analysis all the parameters are selected as significant parameters, therefore, there is no possibility of developing the hybrid models for relative humidity at 2 pm weather parameter, that is, the hybrid model will be same as ARIMA and ANN model. Figure 4.19. Artificial neural network structure for weekly average Relative Humidity at 2 PM prediction parameter. Figure 4.20. Mapping of the number of epochs obtained for desired goal for ANN model for Relative Humidity at 2 PM parameter. 4.4.7. Pan Evaporation The multiple regression model is fitted to predict the weekly average pan evaporation as dependent parameter taking the other weekly independent parameters as maximum temperature, minimum temperature, relative humidity at 7 am and 2 pm, rainfall, and bright sunshine. The most significantly contributed parameters are selected using stepwise regression analysis based on 595 data set of 35 years and the best fit multiple regression model is given below: Y = -4.10879 + 0.53040X1 - 0.07685X3 - 0.02902X4 + 0.00351X5 (4.15) where, maximum temperature (X1), relative humidity at 7 am (X3), relative humidity at 2 pm (X4) and rainfall (X5) are observed with coefficient of determination as 69.22% as most contributing parameters for predicting the pan evaporation (Y), as dependent parameter. The parameter minimum temperature (X2) and bright sunshine (X6) seems to have least control over pan evaporation and hence does not appear in the proposed multiple regression model. The stepwise regression procedure for selecting the significant parameters for the pan evaporation parameter is mentioned in Appendix A (f). We next performed an ARIMA modeling of the data using all the weather parameters. The pan evaporation time series data set is found to be stationary, so we did not require any transformation of the data set. We then used the autocorrelation function (ACF) and partial autocorrelation function (PACF) of the weekly average pan evaporation time series (see figure 4.21) to estimate the parameters (p and q) of the ARIMA model. Note that, both the ACF and PACF have common significant terms at lags 1, 15, 16 and 17, but autocorrelation function (ACF) has maximum correlation coefficient (0.605) at lag 17 and partial autocorrelation function (PACF) have maximum correlation coefficient (0.548) at lag 1. Using an iterative model building process of identification, estimation, and diagnostic checking, we finally selected an ARIMA (1, 0, 17) model as the most appropriate fit for the observed data. Further, we performed an ANN model building process using all the weather parameters. We selected the best suited architecture of Feed Forward Neural Network Model for our weekly pan evaporation by comparing methods and changing the layer and number of neurons in each network. This proposed model had an input environment with all the weather parameters, three hidden layers (8 neurons in the first hidden layer, 10 neurons in the second hidden layer and 12 neurons in the third hidden layer) and one neuron in the output layer (see figure 4.22). Figure 4.21. Plots of autocorrelation and partial autocorrelation coefficients and time lags of weakly average Pan Evaporation parameter. Figure 4.22. Artificial neural network structure for weekly average Pan Evaporation prediction parameter. We used a Scaled Conjugate Gradient Algorithm for training this multilayer perceptron, a tan sigmoid activation function in the first, second and third hidden layer and a log sigmoid activation function in the output layer. A set of random values distributed uniformly between -1 to +1 are used to initialize the weight of the neural network model. 117 epochs are used to train the neural network model with 0.00000285 goals (see figure 4.23). The program used for training the 595 data sets using all the weather parameters for ANN model for predicting pan evaporation parameter is mentioned in Appendix B (f). We next performed a hybrid MLR_ARIMA modeling of the data using the four weather parameters, significantly selected through stepwise regression earlier, for predicting the pan evaporation weather parameter. We used the same ACF and PACF having maximum correlation coefficient (0.605) at lag 17 and (0.548) at lag 1, respectively (see figure 4.21). Suggesting that an MLR_ARIMA (1, 0, 17) model as the most appropriate fit using the significantly selected weather parameters data set. Figure 4.23. Mapping of the number of epochs obtained for desired goal for ANN model for Pan Evaporation parameter. Finally we performed a hybrid MLR_ANN model building process using the same, more significant parameters as suggested by the multiple regression model. We selected the same best suited architecture of Feed Forward Neural Network Model as obtained while developing the ANN model for weekly average pan evaporation data set. This proposed hybrid MLR_ANN model had different input environment, that is, only the significant parameters selected through stepwise regression analysis, but the same three hidden layers (8 neurons in the first hidden layer, 10 neurons in the second hidden layer and 12 neurons in the third hidden layer) and one neuron in the output layer (see figure 4.24). We used the same Scaled Conjugate Gradient Algorithm for training this multilayer perceptron, a tan sigmoid activation function in the first, second and third hidden layer and a log sigmoid activation function in the output layer. A new different set of random values distributed uniformly between -1 to +1 are used to initialize the weight of the MLR_ANN model. 76 epochs are used to train the neural network model with 0.00000285 goals (see figure 4.25). The program used for training the 595 data sets using the significantly selected parameter for the hybrid MLR_ANN model of dependent pan evaporation weather parameter is mentioned in Appendix C (c). Figure 4.24. Hybrid MLR_ANN structure for weekly average Pan Evaporation prediction parameter. Figure 4.25. Mapping of the number of epochs obtained for desired goal for hybrid MLR_ANN model for Pan Evaporation parameter. 4.4.8. Bright Sunshine The multiple regression model is fitted to predict the weekly average bright sunshine as dependent parameter taking the other weekly independent parameters as maximum temperature, minimum temperature, relative humidity at 7 am and 2 pm, pan evaporation and rainfall. The most significantly contributed parameters are selected using stepwise regression analysis based on 595 data set of 35 years and the best fit multiple regression model is given below: Y = 7.86562 + 0.12606X1 - 0.13358X2 + 0.03621X3 - 0.07713X 4 - 0.00597X6 (4.16) where, bright sunshine (Y) as the dependent parameter having 46.68% contribution of the significant parameters maximum temperature (X1), minimum temperature (X2) , relative humidity at 7 am (X3), relative humidity at 2 pm (X4) and rainfall (X6). The parameter pan evaporation (X5) seems to have least control over bright sunshine and hence does not appear in the proposed multiple regression model. The stepwise regression procedure for selecting the significant parameters for the bright sunshine parameter is mentioned in Appendix A (g). We next performed an ARIMA modeling of the data using all the weather parameters. The bright sunshine time series data set is found to be stationary so we did not require any transformation of the data set. We used the autocorrelation function (ACF) and partial autocorrelation function (PACF) of the weekly average bright sunshine time series (see figure 4.26) to estimate the parameters (p and q) of the ARIMA model. Note that while both the ACF and PACF have common significant terms at lags 1 and 17, they have maximum correlation coefficient (0.356) at lag 1. This possibly suggests that an ARIMA of order 1 is best fit for the bright sunshine data set. Using an iterative model building process of identification, estimation, and diagnostic checking using all the weather parameters, we finally selected an ARIMA (1, 0, 0) model as the most appropriate fit for the observed data. Then we performed an ANN model building process using all the same weather parameters. We selected the best suited architecture of Feed Forward Neural Network Model for our weekly bright sunshine by comparing methods and changing the layer and number of neurons in each network. This proposed model had an input environment with all the weather parameters, three hidden layers (8 neurons in the first hidden layer, 10 neurons in the second hidden layer and 12 neurons in the third hidden layer) and one neuron in the output layer (see figure 4.27). We used a Scaled Conjugate Gradient Algorithm for training this multilayer perceptron, a tan sigmoid activation function in the first, second and third hidden layer and a purelin activation function in the output layer. A set of random values distributed uniformly between -1 to +1 are used to initialize the weight of the neural network model. 570 epochs are used to train the neural network model with 0.00000219 goals (see figure 4.28). The program used for training the 595 data sets using all the weather parameters for ANN model for predicting bright sunshine parameter is mentioned in Appendix B (g). Figure 4.26. Plots of autocorrelation and partial autocorrelation coefficients and time lags of weakly average Bright Sunshine parameter. Figure 4.27. Artificial neural network structure for weekly average Bright Sunshine prediction parameter. Figure 4.28. Mapping of the number of epochs obtained for desired goal for ANN model for Bright Sunshine parameter. We next performed a hybrid MLR_ARIMA modeling of the data using the five weather parameters, significantly selected through stepwise regression earlier, for predicting the bright sunshine weather parameter. We used the same ACF and PACF of the bright sunshine weather parameter, having maximum correlation coefficient (0.356) at lag 1 (see figure 4.26). Suggesting that an ARIMA (1, 0, 0) model as the most appropriate fit using the significantly selected weather parameters data set. Figure 4.29. Hybrid MLR_ANN structure for weekly average Bright Sunshine prediction parameter. Finally, we performed a hybrid MLR_ANN model building process using the same, more significant parameters as suggested by the multiple regression model. We selected the same best suited architecture of Feed Forward Neural Network Model as obtained while developing the ANN model for weekly average bright sunshine data set. This proposed hybrid MLR_ANN model had different input environment, that is, only the significant parameters selected through stepwise regression analysis, but the same three hidden layers (8 neurons in the first hidden layer, 10 neurons in the second hidden layer and 12 neurons in the third hidden layer) and one neuron in the output layer (see figure 4.29) are considered. We used the same Scaled Conjugate Gradient Algorithm for training this multilayer perceptron, a tan sigmoid activation function in the first, second and third hidden layer and a purelin activation function in the output layer as considered while developing the ANN model. A new different set of random values distributed uniformly between -1 to +1 are used to initialize the weight of the MLR_ANN model. 1257 epochs are used to train the neural network model with the same 0.00000219 goal (see figure 4.30). The program used for training the 595 data sets using the significantly selected parameter for the hybrid MLR_ANN model of dependent bright sunshine weather parameter is mentioned in Appendix C (d). Figure 4.30. Mapping of the number of epochs obtained for desired goal for hybrid MLR_ANN Bright Sunshine parameter. 4.5 Comparison of prediction ability of forecasting models 4.5.1. Introduction Recognition of pertinent model is an important errand, obtained through comparing the predictive ability of forecasting models. The methodology presented above was applied to training data set of 35 years, that is, 595 data sets, weekly weather data for the monsoon period. For the training data sets, the comparison among the developed models is made for each weather parameters by comparing the real and predicted values. The comparison is made on the basis of the analytical methods, which can be evaluated in terms of graphical representation in the form of linear scale plot and numerical computations through mean error (BIAS), mean absolute error (MAE), root mean square error (RMSE), prediction error (PE), and correlation coefficient (r). Here the comparison among the predicted and observed values for all the seven parameters for training date set are discussed graphically in the next sub sections. 4.5.2. Rainfall The process developed through MLR, ARIMA, ANN, hybrid MLR_ARIMA and hybrid MLR_ANN in section 4.4.2 are compared using the graphical method. The comparison among the developed models is made to identify the finest model for the rainfall weather parameter by comparing the real and predicted values. In figure 4.31, we provide a comparison of the actual rainfall with the predicted rainfall using all the five developed models, for 595 data sets used for training purposes, observing that predicted values by ANN and Hybrid MLR_ANN models are tending more towards the actual values of weekly rainfall. 4.5.3. Maximum Temperature The procedure developed through multiple linear regression, ARIMA and neural network in section 4.4.3 are compared using the graphical method. The comparison among the developed models is presented to identify the finest model for the maximum temperature weather parameter by comparing the real and predicted values. In figure 4.32, we provide a comparison of the weekly actual maximum temperature with the predicted maximum temperature using the three developed models, for 595 data sets used for training purposes. The values obtained through ANN model are tending more to overlap the actual weekly maximum temperature values. A few points among them can be observed viz. the last week of 1961, second week of 1965, 1973, first week of 1980 and seventh week of 1982, 1986. 4.5.4. Minimum Temperature The course of action developed through MLR, ARIMA, ANN, hybrid MLR_ARIMA and hybrid MLR_ANN in section 4.4.4 are compared using the graphical method. The comparison among the developed models is discussed to identify the finest model for the minimum temperature weather parameter by comparing the real and predicted values. In figure 4.33, we provide a comparison of the weekly actual minimum temperature with the predicted minimum temperature using all the developed models, for 595 data sets used for training purposes. For one or two weeks of few years the predicted values obtained through the Hybrid MLR_ANN model tended more towards the actual values of minimum temperature. 4.5.5. Relative Humidity at 7 AM The guiding principle developed through multiple regression, ARIMA and neural network in section 4.4.5 are compared using the graphical method. The comparison among the developed models is fundamental, so as to identify the finest model for the relative humidity at 7 am, weather parameter by comparing the real and predicted values. In figure 4.34, we provide a comparison of the weekly actual relative humidity at 7 am with the predicted relative humidity at 7 am using the three developed models, for 595 data sets used for training purposes. The predicted values obtained through ANN model gives more precise results as can be seen graphically at few point’s viz., first week of 1964, 1965, 1966, 1975, 1992; seventh week of 1969; second week of 1979, 1987 and twelfth week of 1982. 4.5.6. Relative Humidity at 2 PM The procedure developed through multiple regression, ARIMA and neural network in section 4.4.6 are compared using the graphical method. The comparison among the developed models is presented, so as to identify the finest model for the relative humidity at 2 pm, weather parameter by comparing the real and predicted values. In figure 4.35, we provide a comparison of the weekly actual relative humidity at 2 pm with the predicted relative humidity at 2 pm using the three developed models, for 595 data sets used for training purposes, indicating ANN model as preferred model. 4.5.7. Pan Evaporation The path developed through multiple regression, ARIMA, ANN, hybrid MLR_ARIMA and hybrid MLR_ANN in section 4.4.7 are compared using the graphical method. The comparison among the developed models are primary, so as to identify the premium model for the pan evaporation weather parameter by comparing the real and predicted values. Figure 4.36, presents a comparison of the weekly actual pan evaporation with the predicted pan evaporation using all the developed models, for 595 data sets used for training purposes. The values obtained through ANN and hybrid MLR_ANN models are tending to partly cover the actual weekly pan evaporation but we can examine that hybrid MLR_ANN is leaning more towards the actual values of pan evaporation, as observed in few point’s viz., first week of 1961, 1962, 1964, 1965, 1967, 1980, 1984; third week of 1973, 1992, 1994; sixth week of 1972 and tenth week of 1972, 1975. 4.5.8. Bright Sunshine The path developed through MLR, ARIMA, ANN, hybrid MLR_ARIMA and hybrid MLR_ANN in section 4.4.8 are compared using the graphical method. The comparison among the developed models are primary, so as to identify the premium model for the bright sunshine weather parameter by comparing the real and predicted values. Figure 4.37, is presented with a comparison of the weekly actual bright sunshine with the predicted bright sunshine using all the developed models, for 595 data sets used for training purposes indicating hybrid MLR_ANN model as a preferred model. 4.6 Conclusion Complexity of the nature of weekly weather parameters record has been studied using the Multiple Linear Regression, Autoregressive Integrated Moving Average, Artificial Neural Network, Hybrid MLR_ARIMA and Hybrid MLR_ANN techniques. The weekly weather parameters data for the months of June, July, August and September over a period of 35 years of pantnagar region was used to develop and train the models. Since, the parameter selection (input pattern) in the models is always a challenging task, so to reduce this complexicity we proposed the hybrid model by introducing those parameters only which are found significant using stepwise regression analysis to obtain valid non-bias results. The result showed that the variation during the four months was among all the parameters and correlation between relative humidity at 7 am and relative humidity at 2 pm was maximum (0.83736) and it was minimum between rainfall and minimum temperature (-0.00666). The above mentioned five models were developed for each weather parameter. In developing the ANN and hybrid MLR_ANN models, the three hidden layers and scale conjugate gradient algorithm for training were same for all the weather parameters, expect the input parameters and the weight and biases considered. Observing the graphical presentation of each weather parameter, it was concluded that ANN model is a preferred model in comparison to the MLR and ARIMA models for all the weekly weather parameter. Finally, the study reveals that hybrid MLR_ANN model can be used as an appropriate forecasting tool to estimate the weather parameters, in contrast to the MLR, ARIMA, ANN and hybrid MLR_ARIMA models. IDENTIFICATION OF PRECISE WEATHER FORECASTING MODEL Introduction Validation of weather forecasting model Conclusion CHAPTER 5 IDENTIFICATION OF PRECISE WEATHER FORECASTING MODEL 5.1 Introduction The performance of the entire existing weather forecasting model was evaluated in the previous chapter and the comparison of prediction ability of forecasting models indicates the better performance of Artificial Neural Network model. All the seven weather parameters were used to develop the MLR, ARIMA and ANN models. The most significantly contributed variables selected for each weather parameter using stepwise regression analysis were used to develop the hybrid MLR_ARIMA and hybrid MLR_ANN models. The comparison among the real and predicted values, to identify the precise weather forecasting is made in this section and applied to testing quasi-independent 255 data sets of 15 years weekly weather data for the monsoon period which was not used while developing the model. The comparison is made on the basis of the analytical methods, which can be evaluated in terms of graphical representation in the form of linear scale plot and numerical computations through mean error (BIAS), mean absolute error (MAE), root mean square error (RMSE), prediction error (PE), and correlation coefficient (r). The details of the numerical computational methodology are used as explained in section 4.3.5. The appropriateness effectiveness of these models is demonstrated by comparing the actual value of all the seven weather parameters with their predicted value and results are presented in next section. 5.2. Validation of weather forecasting model 5.2.1 Rainfall A comparison of the performance of the actual weekly rainfall with its predicted value using the proposed models are presented with those of other traditional forecasting models, graphically in figure 5.1. It is clearly perceptible that the predicted values of the 17 weeks of weekly monsoon rainfall for the first three years 1996 to 1998 varied with slight fluctuation from the actual values. Further, it identify that there is a large variation among few monsoon weeks of the two years, 1999 and 2000, since the predicted values are in a trend form but the actual value are having high variation. Next, in the years 2001 and 2002 the actual values are slightly smaller than the predicted values, but a large variation again due to the actual values different from the trend pragmatic through previous years are observed for two to three monsoon weeks of the years 2003, 2004 and 2005. Presently, it is also observed that in the last monsoon weeks of 2003 and beginning monsoon weeks of 2004, very less variation is observed between the actual value and predicted value through hybrid MLR_ANN model. Further, similar observation is identified for the few monsoon weeks in the years 2007 and 2009. Thus indicating the hybrid MLR_ANN model is a proficient predictor in comparison to the other considered models. The estimates viz. mean error, mean absolute error, root mean square error, prediction error and correlation coefficient are presented in table 5.1 for the testing data set. The bias for testing data set is the least for hybrid MLR_ANN model than that obtained from hybrid MLR_ARIMA, ANN, ARIMA and MLR models. Table 5.1. Comparison of the performance of forecasting models for Rainfall parameter. Techniques BIAS MAE RMSE PE CC MLR 3.39615 53.91620 75.58739 0.66798 0.60251 ARIMA 1.57248 52.87119 74.90918 0.65503 0.61154 ANN -6.23647 49.74627 74.74657 0.61632 0.61441 MLR_ARIMA 3.59415 53.90974 75.50548 0.66790 0.60344 MLR_ANN -6.28196 49.33765 74.34915 0.61125 0.61894 The MAE further explains that the hybrid MLR_ANN model is more precise than ANN model. The hybrid MLR_ANN model shows a smaller value for RMSE as compared to those of the other models. The PE obtained for testing data from MLR, hybrid MLR_ARIMA, ARIMA and ANN models is 0.66798, 0.66790, 0.65503 and 0.61632, respectively, while for hybrid MLR_ANN is 0.61125 which is the least and indicating it as precise prediction model. Further, the correlation coefficient is found to be highest for hybrid MLR_ANN model in comparison to other models. Thus, the graphical representation as well as the numerical estimates both favored, the hybrid multiple linear regression with artificial neural network (MLR_ANN) model as a preferred performance in comparison to the other models, concluding that this hybrid technique can be used as an effective rainfall forecasting tool in the Himalaya. 5.2.2 Maximum Temperature Comparison of the performance of the weekly maximum temperature parameter is presented graphically in figure 5.2. The weekly monsoon maximum temperature for the years 1996, 1999, 2000, 2001, 2002, 2005, 2006 2007 and 2008 clearly shows some variation among the predicted value and actual values for most of the weeks of each year. But for the years 1997, 1998, 2003, 2004, 2009 and few starting weeks of 2010 very less variation is observed among the actual values and predicted value through ANN model, indicating that ANN predicts comparatively better than the other traditional models. The estimates viz. mean error, mean absolute error, root mean square error, prediction error and correlation coefficient are also presented in table 5.2 for the same data set. Since in the stepwise regression analysis all the parameters are selected as significant parameters, therefore, there is no possibility of developing the hybrid models for maximum temperature in the study area. The bias obtained for testing data set is smaller for artificial neural network than the values that are obtained from autoregressive integrated moving average and multiple linear regression. Table 5.2. Comparison of the performance of forecasting models for Maximum Temperature. Techniques BIAS MAE RMSE PE CC MLR 0.87990 1.05235 1.45396 0.03228 0.88964 ARIMA 0.87050 1.04583 1.45192 0.03208 0.88881 ANN 0.79330 1.00275 1.40719 0.03076 0.88995 The MAE measure for testing data set for multiple linear regression model is 1.05235 and for ARIMA model is 1.04583, while the same error measure is considerably lower at 1.00275 for the artificial neural network model. The ANN model shows a lesser value for RMSE compared to those of MLR and ARIMA models. The PE obtained for testing data from multiple linear regression model is 0.03228 and ARIMA is 0.03208 and through artificial neural network model it is 0.03076 which is smaller representing it as preferred prediction model. Moreover, the neural network model had the highest correlation coefficient among all the models. These numerical estimates hold up that the ANN model has a superior performance in comparison to ARIMA and MLR model which coincides with the previous results. 5.2.3 Minimum Temperature Comparison of the performance of the proposed model is presented with those of other forecasting model of the weekly minimum temperature parameter graphically in figure 5.3. The weekly minimum temperature for the monsoon season shows fluctuations among the actual and predicted values. The predicted values developed through hybrid MLR_ANN models following a trend based on previous years, showed least variation to the actual values for few weeks of the years 1997, 1998, 1999, 2000, 2004 and 2009. Thus the hybrid MLR_ANN model proved to be better performer and can be further analyzed more precisely by evaluating through numerical performance of each model. The estimates viz. mean error, mean absolute error, root mean square error, prediction error and correlation coefficient are presented in table 5.3 for the testing data set. Table 5.3. Comparison of the performance of forecasting models for Minimum Temperature. BIAS MAE RMSE PE CC MLR -0.79700 1.09592 1.38227 0.04416 0.44361 ARIMA -0.71713 1.08877 1.29726 0.04387 0.49334 ANN -0.77922 1.08745 1.33547 0.04382 0.49699 MLR_ARIMA -0.72705 1.08911 1.31870 0.04388 0.46572 MLR_ANN -0.72980 1.03726 1.29656 0.04179 0.50767 Techniques It can be seen from table 5.3 that the bias for testing data set is -0.72980 for hybrid MLR_ANN model which is neither the lowest nor the highest among all the other compared models. Since, the bias does not prove to be a perfect criterion for comparison among the developed models, so we move forward to the other considered performance evaluation criteria’s. Next, the MAE for testing data set is 1.09592, 1.08911, 1.08877 and 1.08745 for MLR, hybrid MLR_ARIMA, ARIMA and ANN respectively; while the same error measure is lowest as 1.03726 for the hybrid MLR_ANN model. The RMSE is also found to be least as 1.29656 for the hybrid MLR_ANN model in comparison to the other forecasting models. The PE obtained for testing data from MLR model is 0.04416, hybrid MLR_ARIMA is 0.04388, ARIMA model is 0.04387 and ANN is 0.04382 and through hybrid MLR_ANN model is 0.04179 which is lesser indicating it as preferred prediction model. Further, the correlation coefficient is observed highest for hybrid MLR_ANN model. As the graphical representation as well as the numerical estimates sustains that the hybrid MLR_ANN model has a preferred performance. It can be concluded that this hybrid technique can be used as a reliable minimum temperature forecasting contrivance in the Himalaya. 5.2.4 Relative Humidity at 7 AM A comparison of the performance of the actual values with the predicted values of relative humidity at 7 am is presented graphically in figure 5.4. The predicted values of the 17 weeks of weekly average relative humidity at 7 am monsoon season, showed least variation of predicted ANN values to the actual values for most of the weeks of all the 15 years, except for September weeks of 1998, 2003 and 2007, and few weeks of 1999, 2004, 2008 and 2010. Thus, it can be competently said that ANN model is an efficient performer than the other developed models. In table 5.4, the estimates viz. mean error, mean absolute error, root mean square error, prediction error and correlation coefficient for the testing data set are also given. Since in the stepwise regression analysis, all the parameters are selected as significant parameters, therefore, there is no possibility of developing the hybrid models for Relative Humidity at 7 AM weather parameter in the present study area. Table 5.4. Comparison of the performance of forecasting models for Relative Humidity at 7 AM . Techniques BIAS MAE RMSE PE CC MLR 1.13423 3.16400 4.11119 0.03663 0.89181 ARIMA 1.13182 3.08161 4.01749 0.03568 0.89820 ANN 1.06157 2.92431 3.87023 0.03385 0.90479 It can be observed from table 5.4 that the bias for testing data set is lesser for artificial neural network than the values that are obtained from ARIMA and MLR. The MAE measure for testing data set for multiple linear regression model is 3.16400 and for ARIMA model is 3.08161, while the same error measure is considerably lower at 2.92431 for the artificial neural network model. The artificial neural network model also shows a smaller value for RMSE judge against to those of MLR and ARIMA models. The PE obtained for testing data from MLR, ARIMA and ANN models is 0.03663, 0.03568 and 0.03385, indicating ANN model as the lowest among the entire three prediction models. Further, the correlation coefficient is observed to be highest for artificial neural network model as 0.90479 in comparison to 0.89181 and 0.09820 for multiple linear regression and ARIMA models respectively. These numerical estimates maintain that the ANN model has a preferred performance and can be used as a forecasting tool for relative humidity at 7 am in the Himalayas range, which validate the previous findings. 5.2.5 Relative humidity at 2 PM A comparison of the performance of the actual relative humidity at 2 pm with the predicted relative humidity at 2 pm using the three forecasting models is presented graphically in figure 5.5. The weekly average relative humidity at 2 pm of the monsoon season showed less variation between the actual values and predicted values through ANN for most of the weeks of all years. A reasonable variation of predicted values to the actual value can be observed in a week of 1996, 2005, 2006 and 2007. Thus, it can be said that ANN model is a better performer in comparison to the other developed models. The table 5.5, represents the estimates viz. mean error, mean absolute error, root mean square error, prediction error and correlation coefficient for the testing data set. Since in the stepwise regression analysis all the parameters are selected as significant parameters, therefore, there is no possibility of developing the hybrid models for Relative Humidity at 2 PM weather parameter in the present study area. Table 5.5. Comparison of the performance of forecasting models for Relative Humidity at 2 PM. Techniques BIAS MAE RMSE PE CC MLR 1.71266 3.47877 4.93385 0.05245 0.93181 ARIMA 1.58552 3.38308 4.83330 0.05100 0.93333 ANN 0.45569 3.28471 4.55504 0.04952 0.93479 It can be seen from the above table that the bias for testing data set is the least for artificial neural network model than the other considered models. The MAE measure for testing data set for multiple linear regression model is 3.47877 and for ARIMA model is 3.38308, while the same error measure is considerably lower at 3.28471 for the artificial neural network model. The artificial neural network model shows a smaller value for RMSE compared to those of multiple linear regression and ARIMA models. The PE obtained for testing data is 0.05245 and 0.05100 from multiple linear regression model and ARIMA, respectively, and through artificial neural network model is 0.4952 which is smaller and can be preferred as predictive model. Moreover, the correlation coefficient is observed to be highest for artificial neural network model. Thus, these numerical estimates support the graphical presentation indicating that the artificial neural network model has a preferred performance in comparison to multiple linear regression and autoregressive integrated moving average models. 5.2.6 Pan Evaporation Comparison on the performance of the proposed model is presented with those of other forecasting model for the weekly average pan evaporation parameter graphically in figure 5.6. The monsoon weekly average pan evaporation showed the least variation among the actual and predicted values obtained through hybrid MLR_ANN for most of the weeks of almost all the years, except for the two years 1999 and 2000, since these two years also followed a prediction pattern developed from the previous years, are reasonably different, from the actually observed values of weekly average pan evaporation. Thus, indicating that the hybrid MLR_ANN is a better performer in comparison to the other models. The estimates viz. mean error, mean absolute error, root mean square error, prediction error and correlation coefficient are presented in table 5.6 for the testing data set. The bias for testing data set is the highest for hybrid MLR_ANN model than the values that are obtained from hybrid MLR_ARIMA, ANN, ARIMA and MLR models. Table 5.6. Comparison of the performance of forecasting models for Pan Evaporation. BIAS MAE RMSE PE CC MLR -0.61433 1.12179 1.50834 0.20349 0.80829 ARIMA -0.55696 1.09292 1.46588 0.19826 0.80699 ANN -0.42118 1.07216 1.43977 0.19449 0.80355 MLR_ARIMA -0.58701 1.09695 1.48079 0.19899 0.80593 MLR_ANN -0.40118 1.07019 1.40800 0.19413 0.81262 Techniques The MAE for testing data set for multiple linear regression model is 1.12179 and for hybrid MLR_ARIMA model is 1.09695, ARIMA model is 1.09292 and ANN is 1.07216 while the same error measure is noticeably lower as 1.07019 for the hybrid MLR_ANN model. Moreover, the hybrid MLR_ANN model shows a smaller value for RMSE as 1.40800 in comparison to the other models. The PE obtained for testing data from multiple linear regression model is 0.20349, hybrid MLR_ARIMA is 0.19899, ARIMA model is 0.19826 and ANN is 0.19449 and through hybrid MLR_ANN model is 0.19413 which is the smallest, indicating it as preferred prediction model. Further, the correlation coefficient is found to be the highest for neural network model. These numerical estimates also support the graphical presentation, indicating that, the hybrid multiple linear regression with artificial neural network (MLR_ANN) model has a favored performance and can be used as a reliable pan evaporation forecasting tool in the Himalaya. 5.2.7 Bright Sunshine Comparison of the performance of the proposed model is presented with those of other forecasting model of the weekly average bright sunshine parameter graphically in figure 5.7. It can be identified that the weekly average bright sunshine of the monsoon season reflects variation between the actual and predicted values through all the five considered models, except for few weeks among the years, a less variation is observed between the actual and predicted value through hybrid MLR_ANN model. Indicating hybrid MLR_ANN model as a preferred performer than the other methods considered and developed. The estimates viz. mean error, mean absolute error, root mean square error, prediction error and correlation coefficient are presented in table 5.7 for the testing data set. The bias for testing data set is the highest for hybrid MLR_ANN model than the values that are obtained from hybrid MLR_ARIMA, ANN, ARIMA and MLR models. Table 5.7. Comparison of the performance of forecasting models for Bright Sunshine. Techniques BIAS MAE RMSE PE CC MLR 0.11512 1.21750 1.48545 0.20045 0.79686 ARIMA 0.07957 1.19586 1.46565 0.19689 0.79746 ANN 0.24863 1.22118 1.46282 0.20106 0.81908 MLR_ARIMA 0.08059 1.19610 1.46593 0.19693 0.79742 MLR_ANN 0.30745 1.17726 1.43552 0.19383 0.81924 The MAE for testing data set for MLR model is 1.21750 and for hybrid MLR_ARIMA model is 1.19610, ARIMA model is 1.19586 and ANN model is 1.22118 while the same error measure is noticeably lower as 1.17726 for the hybrid MLR_ANN model. The hybrid MLR_ANN model also shows a smaller value for RMSE as 1.43552 compared to those of MLR, ARIMA, ANN and hybrid MLR_ARIMA models as 1.48545, 1.46565, 1.46282 and 1.46593 respectively, for testing data set. In general, a good forecast may have a relatively high BIAS value but a relatively low MAE and RMSE values (if the predicted variable is well correlated with independent variables) or low BIAS value and high MAE and RMSE (if the predicted variable is poorly correlated with independent variables). The PE obtained for testing data from MLR model is 0.20045, hybrid MLR_ARIMA model is 0.19693, ARIMA model is 0.19689 and ANN model is 0.20106 and through hybrid MLR_ANN model is 0.19383 which is the smallest indicating it as preferred prediction model. Further, the correlation coefficient is found to be the highest for hybrid MLR_ANN model as 0.81924 than the other models. These numerical estimates, thus, support that the hybrid multiple linear regression with artificial neural network (MLR_ANN) model has a better performance and can be preferred as a reliable bright sunshine forecasting tool in the Himalaya. 5.3 CONCLUSION Hybrid MLR_ANN, hybrid MLR_ARIMA, artificial neural network, autoregressive moving average and multiple linear regression models are used to study the impediment of the environment of weekly weather parameters. The weekly weather parameters data for the months of June, July, August and September over a period of 15 years of pantnagar region was used to identify the precise weather forecasting models. Since, the parameter assortment in the models is always an exigent task, so we introduced those parameters for hybrid model building only which are found significant using stepwise regression analysis during the training period. The comparison among the five models shows the trend, based on previous years but the actual values were fluctuating. The proposed hybrid MLR_ANN model was observed as precise model in comparison to MLR, ARIMA, ANN and hybrid MLR_ARIMA models. In view of the fact that, all the prediction models are consistent but the finest model is the lone having least mean absolute error and root mean square error, prediction error and high correlation coefficient, as observed in hybrid MLR_ANN model. It was observed that the ANN model is also precise weather forecasting model as compared to MLR and ARIMA models which coincides with the previous findings. At last, the study reveal that hybrid MLR_ANN model maintained can be used as an appropriate forecasting interest to estimate the weather parameters, in contradiction to the multiple linear regression, ARIMA, ANN and hybrid MLR_ARIMA models. SUMMARY AND FUTURE SCOPE Summary Future scope CHAPTER 6 SUMMARY AND FUTURE SCOPE 6.1. Summary The role of statistical techniques for providing reliable predictions of weather parameters is considered to be most important in the field of metrology all over the world. These predictions influence the agricultural as well as the industrial strategies. In India, the month of June, July, August and September are identified as the summer monsoon month. The summer monsoon in early June marks the beginning of the principal raining season for the Himalaya. The monsoon season in and around Indian Metrological Department (IMD) Pantnagar observatory which is situated in the foot hills of Himalayas, ranges between 15 to 20 weeks. Accordingly our study is based on a time series weather data set, collected at the IMD observatory at Pantnagar, India, over a period of 50 years. Pantnagar, located at 29 N, 79.45 E approximately 293.89 meter above mean sea level, in the tarai region of Uttarakhand. Assuming that the monsoon season in and around Pantnagar ranges between 15 to 20 weeks: we consider a 17 weeks data set for our study during 1961-2010. The weekly data comprises of seven weather parameters, viz., Rainfall, Maximum and Minimum Temperature, Relative Humidity at 7 AM and 2 PM, Bright Sunshine and Pan Evaporation, collected during monsoon months June to September. Thus providing reliable prediction and forecasting of weather parameters in the Himalayas in particular and of India in general is an important challenges for planners and scientists. The present study is planned with the following objectives: 1. To study, the distribution pattern of weather parameters. 2. Prediction of weather parameters using different forecasting model. 3. To compare the prediction ability of these model. 4. To identify the precise weather forecasting model. 5. To study the reliability of the developed model by comparing the forecast value with its observed value. In this research work we took up a study to identify the precise and reliable weather forecasting model through comparison of several existing and proposed models. The thesis is divided into seven chapters and the salient results obtained and the main significance of study is summarized in the following paragraphs. Probability Distribution: The descriptive statistics are computed for each weather parameters for different study period. The best fit probability distribution was identified out of a large number of commonly used probability distribution by using different goodness of fit tests. It was observed that the best fit probability distribution obtained for the weather parameters data sets are different. For seasonal weather parameters Normal distribution, Weibull (3P) distribution, LogPearson 3 distribution, Gamma (3P) and Log-Gamma distribution represents the best fit distribution for Rainfall, Maximum temperature, Relative humidity at 7AM, Pan Evaporation and Bright Sunshine respectively, while Weibull (2P) was fitted for both Maximum Temperature and Relative Humidity at PM. The best fit probability distributions for weekly weather parameters for different study periods are different. In general, Generalized Extreme Value distribution, Weibull (2P, 3P) distributions are most commonly best fit probability distribution for most of the weeks among the different weather parameters. Forecasting Models: The weekly weather parameters data for the months of June, July, August and September over a period of 50 years of pantnagar region was worn in which 35 years data was employed to develop and train the models and 15 years was used to test and validate the developed models. The variation during the four months was among all the parameters and correlation between relative humidity at 7 am and relative humidity at 2 pm was maximum (0.83736) and it was minimum between rainfall and minimum temperature (-0.00666). Since, the parameter selection in the models is always a difficult task, so we introduced those parameters for hybrid model building only which are found significant using stepwise regression analysis during the training period. Hybrid MLR_ANN, hybrid MLR_ARIMA, Artificial Neural Network, ARIMA and Multiple Linear Regression models are used to study the environment of weekly weather parameters. The models were developed and compared individually for each weather parameter graphically and numerically through Mean Error, Mean Absolute Error, Root Mean Square Error, Prediction Error and Correlation Coefficient. Finally, the study divulges that Hybrid MLR_ANN model is an appropriate forecasting interest to estimate the weather parameters, in disparity to the multiple linear regression, ARIMA, ANN and hybrid MLR_ARIMA models. 6.2. Future Scope Weather forecasting is an art using astronomical and meteorological events to monitor changes in the weather. Although with modern technology, particularly computers and weather satellites and the availability of data provided by coordinated meteorological observing networks, has resulted in enormous improvements in the accuracy of weather forecasting. But still with the ever growing demand for more accurate and reliable weather forecasts, the field opens-up to additional investigation. However there are some observations in this regard and issues related to our research investigation which can be addressed in future. In contrast to the traditional hybrid linear and non-linear methodologies, we can generally say that the performance of the hybrid MLR_ANN model will not be worse than either of the components used in isolation, so that it can be applied as an appropriate methodology for combination linear and non-linear models for time series forecasting. Further, we can say that more hybrid models can be developed using different ANN ensemble and by using more weather parameters. Moreover, the inclusion of other seasonal factors may also improve the forecasting accuracy. It is expected that instead of developing hybrid models based on two techniques, the integrated hybrid model may be developed by combining more than two techniques together. This integrated hybrid model may produce more precise model for future and may be an upcoming topic for further studies. BIBLIOGRAPHY 1. Abraham, T.P. (1965): Isolation of effect of weather on productivity including other risk such as damage by pests and diseases. Journal of Indian Society of Agriculture Statistics, Vol. 17, No. 2, pp. 208. 2. Adielsson, S. (2005): Statistical and neural networks analysis of pesticides losses to surface water in small agricultural catchments in Sweden. M.Sc. Thesis, Sweden University, Sweden. 3. Aggarwal, M.C., Katiyar, V.S. and Ramu Babu. (1988): Probability analysis of annual maximum dailt rainfall of U.P. Himalayas. Indian Journal of Soil conservation, Vol. 16, No. 1, pp. 35-42. 4. Agrawal, R., Jain, R.C., Jha, M.P. and Singh, D. (1980): Forecasting of rice yield using climatic variables. Indian Journal of Agricultural Science, Vol. 50, No. 9, pp. 680-684. 5. Aladag, C.H., Egrioglu, E. and Kadilar, C. (2009): Forecasting nonlinear time series with a hybrid methodology. Applied Mathematics letters, Vol. 22, pp. 1467-1470. (www.elsevier.com/locate/aml) 6. Alnaa, S.E. and Ahiakpor, F. (2011): ARIMA approach to predicting inflation in Ghana. Journal of Economics and International Finance, Vol. 3, No. 5, pp. 328-336. 7. Altun, H., Bilgil, A., & Fidan, B.C. (2007): Treatment of multidimensional data to enhance network estimators in regression problems. Expert System with Applications, Vol.32, No. 2, pp. 599-605. 8. Anderson, M.D., Sharfi, K. and Gholston, S.E. (2006): Direct demand forecasting model for small urban communities using multiple linear regression. Journal of the transportation Research Board, Vol. 1981, pp. 114-117. 9. Anmala, J., Zhang, B. and Govindaraju, R.S. (2000): Comparison of ANNs and empirical approaches for predicting watershed runoff. Journal of water Resource Planning Management, ASCE, Vol. 126, No. 3, pp. 156-166. 10. Atiya, A. El-Shoura, S., Shaheen, S. and El-sherif, M. (1996): River flow forecasting using neural networks. World congr. Neural Networks, pp. 461-464. 11. Atiya, A., and C. Ji. (1997): How Initial Condition Affect Generalization Performance in Large Networks. IEEE Trans. Neural Networks, Vol. 8, pp. 448-451. 12. Badmus, M.A. and Ariyo, O.S. (2011): Forecasting cultivated areas and production of maize in Nigeria using ARIMA model. Asian Journal of Agricultural Sciences, Vol. 3, No. 3, pp. 171-176. 13. Baker, B.D. (1998): A comparison of linear regression and neural network methods for forecasting educational spending. Annual meeting of the American Education finance association mobile, Alabama. 14. Bali, Y.N. (1970): On objective estimation of crop yield at pre-harvest stage. Agricultural Situations in India, Vol. 25, No. 3, pp. 267-271. 15. Bansal, A., Kauffman, R.J., Weitz, R.R. (1993): Comparing the modeling performance of regression and neural networks as data quality varies: A Business value approach, Journal of management information system, Vol. 10, pp. 11-32. 16. Benson, Manuel, A. (1968): Uniform flood frequency estimating methods for federal agencies. Water resources research, Vol. 4, No. 5, pp. 891-895. 17. Bhargava, P.N. (1974): The influence of rainfall on crop production. Research Journal of Jawahar Lal Nehru Krishi Vishwavidyalaya, Vol. 32, No. 1, 2. 18. Bhakar, S.R., Bansal, A.K., Chhajed, N, and Purohit, R.C. (2006): Frequency analysis of consecutive day’s maximum rainfall at Banswara, Rajasthan, India. ARPN Journal of Engineering and Applied Science, Vol. 1, No. 3, pp. 64-67. 19. Bhatt, V.K., Tewari, A.K. and Sharma, A.K. (1996): Probability models for prediction of annual maximum daily rainfall of Datai. Indian Journal of Soil conservation, Vol. 24, No. 1, pp. 25-27. 20. Biswas, B.C. and Khambete, N.K. (1989): Distribution of short period rainfall over dry farming tract of Maharashtra. Journal of Maharashtra Agricultural University. 21. Box, G.E.P., and Jenkins, G.M. (1976): Time Series Analysis: Forecasting and Control, Holden-day Publication, San Francisco. 22. Campbell, S.D. and Diebold, F.X. (2005): Weather Forecasting for Weather Derivatives. Journal of the American Statistical Association, Vol. 100, pp. 6-16. 23. Chang, I., Rapiraju, S., Whiteside, M. and Hwang, G. (1991): A Neural Network to Time Series Forecasting. Proceedings of the Decision Sciences Institute, Vol.3, pp. 1716-1718. 24. Chapman, T. (1994): Stochastic models for daily rainfall. The Institution of Engineers, Australia, National Conference Publication, Vol. 94, No. 15, pp. 7-12. 25. Chatfield, C. (1994): The Analysis of Time Series-An Introduction. Chapman and Hall. 26. Chattopadhay, S. (2007): Multiplayer feed forward artificial neural network model to predict the average summer monsoon rainfall in India. Acta Geophysica, Vol. 55, No. 3, pp. 369-382. 27. Chow, V.T. (1964): Handbook of Applied Hydrology, McGraw Hill Book Co., New York. 28. Comrie. (1997): Comparing Neural Networks and Regression Models for Ozone forecasting. Journal of Air & Waste Management Association, Vol. 47, pp. 653-663. 29. Cottrell, M., Girard, B., Girard, Y., Mangeas, M. and Muller, C. (1995): Neural modeling for time series: A statistical stepwise method for weight elimination. IEEE Transactions on Neural Network, Vol. 6, No. 6, pp. 1355-1363. 30. Cunnane, C. (1978): Unbiased potting positions: A review. Journal of Hydrology, Vol. 39, pp. 205-222. 31. Diane, M.L. and David, P.A. (2007): For predicting facial cal from concentrations. Hydrological science journal, Vol. 52, pp. 713-731. 32. Deidda, R. and Puliga, M. (2006): Sensitivity of goodness-of-fit statistics of rainfall data rounding off. Physics and Chemistry of the Earth, Vol. 31, pp. 1240-1251. 33. Duan, j., Sikka, A.K., and Grant, G.E. (1995): A comparison of stochastic models for generating daily precipitation at the H.J. Andrews Experiment Forest. Northwest Science, Vol. 69, No. 4, pp. 318-329. 34. Dulibar, Katherine, A. (1991): Contrasting Neural Nets with Regression in Predicting Performance in the Transportation Industry. IEEE Transactions Neural Network. 35. Dutta, S. and Shekhar, S. (1988): Bond Rating: A Non-conservative Application of Neural Network. International Conference on Neural Networks, pp. 443-450. 36. El-Shafie, A.H., El-Shafie, A., El-Mazoghi, H.G., Shehata, A. and Taha, Mohd.R. (2011): Artificial neural network technique for rainfall forecasting applied to Alexandria, Egypt. International Journal of the Physical Science, Vol.6, No. 6, pp. 1306-1316. 37. Fallah-Ghalhary, G.A., Mousavi-Baygi, M. and Majid, H.N. (2009): Annual rainfall forecasting by using mamdani fuzzy inference system. Research Journal of Enviromental Sciences, Vol. 3, pp. 400-413. 38. Faraway, J. and Chatfield, C. (1998): Time series forecasting usiong neural network: A case study. Applied Statistics, Vol. 47, pp. 231-250. 39. Fisher R.A. (1924): The influence of the rainfall on the yield of wheat at Rothamsted. Philosophical transaction of the Royal Society of London, Series B, Vol. 213, pp. 89-142. 40. Fletcher, D., and Goss, E. (1993): Forecasting with neural networks: an application using bankruptcy data. Information and Management, Vol. 24, No. 3, pp. 159-167. 41. Fowler, H.J. and Kilsby, C.G. (2003): Implication of changes in seasonal and annual extreme rainfall. Geophysical Research letters, Vol. 30, No. 13, pp. 53 (1-4). 42. French, M.N., Krajewski, W.F. and Cuykendal, R.R. (1992): Rainfall forecasting in space and time using neural networks. Journal of Hydrology, Vol. 137, pp. 1-31. 43. Gail, B., Viswanthan, C., Nelakantan, T.R., Srinivasa, L., Girones, R., Lees, D., Allard, A., and Vantarakis, A. (2005): Artificial neural networks prediction of viruses in shellfish. Applied and Environment Microbiology, Vol. 31, pp. 5244-5253. 44. Gardner, M.W. and Dorling, S.R. (1998): Artificial Neural Network (Multilayer Perceptron)-a review of applications in atmospheric science. Atmospheric Environment, Vol. 32, pp. 2627-2636. 45. Ghani Md., I.M. and Ahmad, S. (2010): Stepwise Multiple Regression Method to forecast fish landing. Procedia-Social and Behavioral Sciences, Vol. 8, pp. 549-554. 46. Ghodsi, R. And Zakerinia, M.S. (2012): Forecasting short term electricity price using ANN and fuzzy regression. International Journal of Academic Research in Business and Social Science, Vol. 2, No. 1, pp. 286-293. 47. Ghosh. S., Sengupta, P.P. and Maity, B. (2012): Evidence on the future prospects of Indian thermal power sector in the perspective of depleting coal reserve. Global Journal of Business Research, Vol. 6, No. 1, pp. 77-90. 48. Goh, B.H. (1996): Residential construction demand forecasting using economic indicators: A comparative study of artificial neural networks and multiple regressions. Construction management and economics, Vol. 14, No. 1, pp. 25-34. 49. Goh, B.H. (1998): Forecasting residential construction demand in Singapore: A comparative study of the accuracy of time series, regression and artificial neural network techniques. Engineering, construction and architectural management, Vol.5, No 3, pp.261-275. 50. Goulden, C.H. (1962): Methods of statistical analysis. Wiley, New York, pp. 467. Paper. Journal of Geophysics Res., Vol. 68, pp. 813-814. 51. Gringorten, I.I. (1963): A plotting rule for extreme probability paper. Journal of Geophysics Res., Vol. 68, pp. 813-814. 52. Guhathakurta, P. (2006). Long range monsoon rainfall prediction of 2005 for the districts and sub-division kerala with artificial neural network. Current science, Vol. 90, pp. 773779. 53. Hanson, L.S. and Vogel, R. (2008). The probability distribution of Daily Rainfall in the united status. Conference Proceeding Paper-World Environment and Water Resources Congress, pp. 1-10. 54. Hassani, H., Zokaei, M, and Amidi, A. (2003): A new approach to polynomial regression and its application to polynomial growth of human height. Proceeding of the Hawaii International Conference. 55. Hastenrath, S. (1988): Prediction of Indian summer monsoon rainfall: further exploration. Journal of Climatology, Vol. 1, pp. 298-304. 56. Hayati, M. and Mohebi, Z. (2007): Temperature forecasting based on neural network approach. World Applied Science Journal, Vol. 2, No. 6, pp. 613-620. 57. Haykin, S. (1999): Neural Network- A Comprehensive Foundation. Addison Wesley Longman. 58. Hennersy, K.J., Gregory, J.M. and Mitchel, J.F.B. (1997): Changes in daily precipitation under enhanced greenhouse conditions. Climate Dynamics, Vol. 13, pp. 667-680. 59. Hippert, H.S., Pedreira, C.E. and Souza, R.C. (2000): Combining neural networks and ARIMA models for hourly temperature forecast. Neural Networks, Vol. 4, pp. 414-419. 60. Hill, T., O’Connor, M., & Remus, W. (1996): Neural network models for time series forecasts. Management Science, Vol. 42, No. 7, pp. 1802-1092. 61. Hsieh, W.W. and Tang, T. (1998): Applying Neural Network models to prediction and Data Analysis in Metrology and Oceanography. Bulletin of the American Metrological Society, Vol. 79, pp. 1855-1869. 62. Hu, M.J.C. (1964): Application of Adaline system to weather forecasting. Technical Report. Stanford Electron. 63. Huda, A.K.S., Ghildyal, B.P., Tomar, V.S. and Jain, R.C. (1975): Contribution of climatic variables in predicting rice yield. Agricultural Meteorology, Vol. 15, pp.71-86. 64. Huda, A.K.S., Ghildyal, B.P. and Tomar, V.S. (1976): Contribution of climatic variables in predicting maize yield under monsoon conditions. Agricultural Meteorology, Vol. 17, No. 1, pp. 33-47. 65. Hung, N.Q., Babel, M.S., Weesakul, S. and Tripathi, N.K. (2009): An artificial neural network model for rainfall forecasting in Bangkok, Thailand. Hydrology and Earth System Science, Vol. M13, pp. 1413-1425. 66. Iqbal, N., Bakhsh, K., Maqbod, A. And Ahmad, A.S. (2005): Use of the ARIMA model for forecasting wheat area and production in Pakistan. Journal of Agricultural and social sciences, Vol. 1, No. 2, pp. 120-122. 67. Kaashoek. J.F. and Van Dijk, H.K. (2001): Long term values of euro/dollar and European exchange rates: A neural network analysis. Medium Econometrische Toepassingen, Vol. 10, No. 4, pp. 26-29. 68. Kal, N., Jim, C. and Moula, C. (2010): The inventory policy using ESWSO measure for the ARIMA lead-time demand and discrete stochastic lead-time. Journal of Academy of Business and Economics, Vol. 10, No. 2. 69. Kalogirou, S.A., Constantions, C.N., Michachides, S.C and Schizas, C.N. (1997): A time series construction of precipitation records using Artificial Neural Networks. EUFIT’ 97, September 8-11, pp. 2409-2413. 70. Kannan, M. Prabhakaran, S. and Ramachandran, P. (2010): Rainfall forecasting using data mining technique. International Journal of Engineering and Technology, Vol. 2, No. 6, pp. 397-401. 71. Kar, G. (2002): Rainfall probability analysis for sustainable crop production strategies in coastal Orissa. Journal of Agricultural Meteorology, Vol. 4, No. 2, pp 181-185. 72. Khashei, M. and Bijari. M. (2011): A New Hybrid Methodology for Nonlinear Time Series Forecasting. Modeling and Simulation in Engineering, Vol. 2011, 5 pages. 73. Khatri, T.J., Patel, R.M. and Mistry, R.M. (1983): Crop weather analysis for pre-harvest forecasting of groundnut yield in surat at Bular district of Gujarat State, G.A.U. Resource Journal, Vol. 9, No. 1, pp. 29-32. 74. Kishtawal, C.M., Basu, S., Patadia, F. and Thapliyal, P.K. (2003): Forecasting summer rainfall over India using Genetic Algorithm. Geophysical Research Letter 30: doi: 10.1029/2003GL018504. 75. Krishnan, A. and Kushwaha, R. S. (1972): Mathematical distribution of rainfall in arid and semi arid zones of Rajasthan. Indian Journal of Meteorology Geophysics, Vol. 23, No. 2, pp. 153-160. 76. Kuligowski, R.J. and Barros, A.P. (1998): Localized precipitation forecasts from a numerical weather prediction model using artificial neural networks. Weather Forecast, Vol. 13, pp. 1194-1205. 77. Kulkarni, N.S. and Pant, M.B. (1969): Cumulative frequency distribution of rainfall of different intensities. Indian Journal of Meteorology Geophysics, Vol.20, No. 2, pp. 109114. 78. Kulshrestha, M.S., Shekh, A.M., Rao, B.B. and Upadhyay, U.G. (1995): Extreme Value Analysis of Rainfall of Krishna Godavari Basin, Andhra Pradesh. Water and Energy, 2001, pp. 96-101. 79. Kulshrestha, M.S and Shekh, A.M. (1999): Extreme value analysis of rainfall of Gujarat. Vayu Mandal, Bulletin of India Metrological Society, January-December, pp. 45-48. 80. Kulshrestha, M., George, R.K. and Shekh, A.M. (2009): Application of artificial neural networks to predict the probability of Extreme rainfall and comparison with the probability by Fisher-Tippet Type II distributions. International Journal of Applied Mathematics and Computations, Vol. 1, No. 3, pp. 118-131. 81. Kulandaivelu, R. (1984): Probability analysis of rainfall and evolving cropping system for Coimbatore. Mausam, Vol. 5, No. 3, pp. 257-258. 82. Kumar, U. A. (2005): Comparison of neural networks and regression analysis: a new insight. Expert system with applications, Vol. 29, No. 2, pp. 424-430. 83. Kumar, D.N., Reddy, M., Jad Maity, and Rojional, R. (2007): Rainfall forecasting using large scale climate tele-connections and artificial intelligence techniques. Journal of Intelligence Systems, Vol. 16, No.4, pp. 307-322. 84. Kwaku, X.S. and Duke, O. (2007): Characterization and frequency analysis of one day annual maximum and two to five consecutive days Maximum rainfall of Accra, Ghana. ARPN Journal of Engineering and Applied Sciences, Vol. 2, No. 5, pp 27-31. 85. Lee, C. (2005): Application of rainfall frequency analysis on studying rainfall distribution charecterstics of Chia-Non plain area in Southern Taiwan. Journal of Crop, Environment and Bioinformatics, Vol. 2, pp. 31-38 86. Lee, S., Cho, S., and Wang, P.M. (1998): Rainfall Prediction using Artificial Neural Networks. Journal of Geographic Information and Decision Analysis, Vol. 2, No.2, pp. 233-242. 87. Lek, S., Delacoste, M., Baran, P., Dimopoulos, I., Lauga, J., and Aulagnier, A. (1996): Application of neural networks to modeling nonlinear relationships in Ecology. Ecological Modeling, Vol. 90, pp. 39-52. 88. Lin, Long-Ji. (1993): Scaling up reinforcement learning for robot control. Proceedings of the tenth International Conference on Machine Learning. 89. Lorenz, E.N. (1969): Three approaches to atmospheric predictability. Bulletin of American Meteorological Society, Vol. 50, pp. 345-349. 90. Man-Chang, C., Chi-Cheong, W., and Chi- Chung, L (1998): Financial time series forecasting by neural network using conjugate gradient learning algorithm and multiple linear regression weight initialization. 91. Manel. S., Dias, S.M., and Ormerod, S.J. (1999): Comparing discriminant analysis, neural networks and logistic regression for predicting species distributions: a study with a Himalayan river Bird. Ecological Modeling, Vol. 120, pp. 337-347. 92. Manning, H.L. (1956): Calculation of confidence limits of monthly rainfall. Journal of Agricultural Statistics, Vol. 47, No. 2, pp. 154-156. 93. Maqsood I., Khan M.R. and Abraham A. (2002a): Intelligent weather monitoring systems using connectionist models. Neural Parallel Science Computation, Vol. 10, pp. 157-178. 94. Maqsood, I., Khan, M.R. and Abraham, A. (2002b): Neuro-computing based Canadian weather analysis. In: Proceedings of the 2nd international workshop on intelligent systems design and applications ISDA’02), Atlanta, Georgia, August 2002, Dynamic Publishers, Atlanta, Georgia, pp. 39-44. 95. Maqsood, I., Khan, M.R., Abraham, A. (2004): An ensemble of neural networks for weather forecasting. Neural Computation & Application, Vol. 13, pp. 112-122. 96. Marquez, L., Hill, T., Worthley, R., and Remus, W. (1991): Neural Network Models as Alternative to Regression. IEEE, Proceedings of the Twenty-four Annual Hawaii International Conference on System Sciences, Vol. 4, pp. 129-135. 97. Mathur, S., Shukla, A.K., and Pant, R.P. (2001): A Comparative Study of Neural Network and Regression Models for Estimating Stock Prices. International Conference on Optimization Techniques and its Applications in Engineering and Technology. 98. Mathur, S., (1998): Stock Market Forecasting Using Neural Networks- An MBA Project Report Submitted to School of Management Studies. IGNOU, New Delhi. 99. Men,B., Xiejing, Z. and Liang, L. (2004): Chaotic Analysis on Monthly Precipitation on Hills Region in Middle Sichuan of China. Nature and Science, Vol. 2, pp. 45-51. 100. Menamin, Mc. and Stuart, J. (1997): A primer on neural networks for forecasting. Journal of business forecasting, pp. 17-22. 101. Miao, Y., Mulla, D., and Robert, P. (2006): Identifying important factors influencing corn yield and grain quality variability using artificial neural networks. Springer, Vol. 7, pp. 117-135. 102. Michaelides, S.C., Neacleous, C.C., and Schizas, C.N. (1995): Artificial neural networks and multiple liner regression in estimating missing rainfall data. Proceedings of the DSP’95 international conference on digital signal processing limassol, Cyprus, pp.668673. 103. Moaley, D.A. and Appa Rao, G. (1970): Statistical distribution of pentads rainfall over India during monsoon season, Indian Journal of Meteorology Geophysics, Vol. 21, No. 2, pp. 219-230. 104. Moro, Q.I., Alonso, L. and Vivaracho, C.E. (1994): Application of neural network to weather forecasting with local data. Applied Informatics, pp. 68-70. 105. Montgomery, D.C., and Lynwood A.J. (1996): Forecasting and Time Series Analysis, Mc. Graw-Hill. 106. Mukharjee, A.K., Shyamala, B. and Majumdar, R. (1979): Study of normal rainfall of strata district. Mausam, Vol. 30, No. 4, pp. 493-500. 107. Mukherjee, A.K., Shyamla, B. and Lakshami, S. (1980): Study of normal distribution over central Madhya Maharastra. Mausam, Vol. 31, No. 2, pp. 247-260. 108. Mukherjee, K. Kaur, S. and Mehra, A.K. (1991): Applicability of extreme value distribution for analysis of rainfall over India. Mausam, Vol. 42, pp. 29-32. 109. Nagendra, S.M.S. and Khare, M. (2006): Artificial neural network approach for modeling nitrogen dioxide dispersion from vehicular exhaust emissions. Ecological Model, Vol. 190, pp. 99-115. 110. Nese and Jon, M. (1994): Systematic Biases in Manual Observations of Daily Maximum and Minimum Temperature. Journal of Climate, Vol. 7, No. 5, pp. 834-848. 111. Nkrintra, S. (2005): Seasonal forecasting of Thailand summer monsoon rainfall. International Journal of climatology, American Meteorological Society, Vol. 25, No. 5, pp. 649-664. 112. Ogunlela, A.O. (2001): Stochastic Analysis of Rainfall Events in Ilorin, Nigeria. Journal of Agricultural Research and Development, Vol. 1, pp. 39-50. 113. Olofintoye, O.O, Sute, B.F. and Salami, A.W. (2009): Best fit probability distribution model for peak daily rainfall of selected cities in Nigeria, New York Science Journal, Vol.2, No. 3, pp. 1-12. 114. Ozesmi, S.L. and Ozesmi, U. (1999): An artificial neural network approach to spatial habitat modeling with inter-specific interaction. Ecological Modeling, Vol. 116, pp.15-31. 115. Pal, Y. (1995): Sugarcane weather relationship for north-west region U.P. Thesis, Ph.D. G.B.P.U.A & T., Pantnagar, pp. 161. 116. Pandey, A.K., Sinha, A.K. and Srivastava, V.K. (2008): A comparative study of Neuralnetwork and fuzzy time series forecasting techniques-case study: Wheat Production forecasting. International Journal of Computer Science and Network Security, Vol. 8, No. 9, pp. 382-387. 117. Pao, H.T. (2006): Comparing linear and non- linear forecasts for Tiwan’s electricity consumption. Energy, Vol. 31, pp. 1993-1805. 118. Pao, H.T. (2008): A comparison of neural network and multiple regression analysis in modeling capital structure. Science Direct, Expert systems with applications, Vol. 35, pp.720-727. 119. Paras, Mathur, S., Kumar, A. and Chandra, M. (2007): A feature based neural network model for weather forecasting. International Journal of Computational Intelligence, Vol.4, No. 3, pp. 209-216. 120. Park, D.C. and Osama, M. (1991): Artificial neural network based peak load forecasting. IEEE proceedings of the southeast on’91, Vol. 1, pp. 225-228. 121. Parthsarthy, B. and Dhar, O.N. (1976): A study of trends and periodicities in the seasonal and annual rainfall in India. Indian Journal Meteorology Hydrological Geophysics, Vol. 27, No. 1, pp. 23-28. 122. Pastor, O. (2005): Unbiased sensitivity analysis and pruning techniques in ANN for surface ozone modeling. Ecological modeling, Vol. 182, pp. 149-158. 123. Phien, H.N. and Ajirajah, T.J. (1984): Applications of the Log-Pearson Type-3 distribution in hydrology. Journal of Hydrology, Vol.73, pp. 359-372. 124. Puri, P. and Kohli, M. (2007): Forecasting student admission in colleges with neural networks. International Journal of Computer Science and Network Security, Vol. 7, No. 11, pp. 298-303. 125. Radhika, Y. and Shashi, M. (2009): Atmospheric Temperature prediction using support vector machines. International Journal of Computer Theory and Engineering, Vol. 1, No. 1, pp. 55-58. 126. Raghupathi, W., Sehkade, L.L. and Raju, B.S. (1991): A neural network approach to bankruptcy prediction. Proceedings of the IEEE 24th Annual Hawaii International conference on system sciences, reprinted in Neural Networks in Finance and investing, pp. 141-158. 127. Rai Sircar, N.C. and Jay Raman, N.C. (1966): Study upper winds, temperature and humidity over Madras in relation to precipitation occurrences there during the monsoon season. Indian Journal of Meteorological Geophysics, Vol. 17, No. 4, pp. 649-651. 128. Rai, T. and Chandrahas. (1996): Forecast of rice yield using linear discriminant score of weather parameters and input variables. Journal of Indian Society of Agricultural Statistics, Vol. 52, No. 1, pp. 96. 129. Raman Rao, B.V., Kavi, P.S. and Sridharan, P.C. (1975): Study of rainy days and wet spells at Bijapur. Annual Arid Zone, Vol. 14, No. 4, pp. 371-372. 130. Ramchandran, G. (1967): Rainfall distribution in India in relation to longitude-latitude and elevation. Indian Journal of Meteorological Geophysics, Vol. 18, No.2, pp. 227-232. 131. Ranasinghe, M., Hua, G.B. and Barathithason, T. (1999). A comparative study of neural networks and multiple regression analysis in estimating willingness to pay for urban water supply. 132. Rao, A. and Singh, J.B. (1990): A study on the distribution of weather parameters and their influence on forecasting the yield of wheat. Journal of Indian Society of Agriculture Statistics, Vol13, No. 3, pp. 338. 133. Rao, K.N. and Jagannathan, P. (1963): Climatic changes in India. Proceeding Synop. Canges of Climates. UN-ESCO, pp. 53-66. 134. Refenes, A., Zapranis, A. and Francis, G. (1994): Stock performance modeling using neural networks: A comparative study using regression models. Neural Networks, Vol. 7, pp. 375-388. 135. Roadknight, C.M., Balls, G.R., Mills, G.E. and Palmer-Brown, D. (1997): Modeling complex environmental data. IEEE Trans. Neural Networks, Vol. 8, No.4, pp. 852-861. 136. Sahai, A.K., Soman, M.K. and Satyan, V. (2000): All India summer monsoon rainfall prediction using an artificial neural network. Climate Dynamics, Vol. 16, pp. 291-302. 137. Saima, H., Jaafar, J., Belhaouari, S. and Jillani, T.A. (2011): ARIMA based Interval Type2 Fuzzy Model for Forecasting. International Journal of Computer Applications, Vol. 28, No. 3, pp. 17-21. 138. Salami, A.W. (2004): Prediction of the annual flow regime along Asia River using probability distribution model. AMSE periodical, Lyon, France, Modeling C-2004, Vol.65, No. 2, pp. 41-56, (http://www.amse-modeling.org/content_amse 2004. com). 139. Salcheuberger, L.M., Cinar, E.M. and Lash, N.A. (1992): Neural Network: A new tool for predicting thrift failures. Decision Sciences, Vol. 23, No. 4, pp. 899-916. 140. Salt, D.W., Yildiz, N., and Livingstone, D.J. (1999): The use of artificial neural networks in QSAR. Pesticide Science, Vol. 36, pp. 161-170. 141. Sen, N. (2003): New forecast models for Indian south-west monsoon season rainfall. Current Science, Vol. 84, No. 10, pp. 1290-1291. 142. Sen Z. and Elijadid, A.G. (1999): Rainfall Distribution functions for Libye and Rainfall Prediction. Hydrological Sciences Journal, Vol. 44, No. 5, pp. 665-680. 143. Sharma, M.A. and Singh, J.B. (2010): Use of Probability Distribution in Rainfall Analysis. New York Science Journal, Vol. 3, No. 9, pp. 40-49. 144. Sharma, M.A. and Singh, J.B. (2011): Comparative study of Rainfall forecasting models. New York Science Journal, Vol.4, No. 7, pp. 115-120. 145. Sharma, Sharma, M.A., Sarkar, I. and Singh, J.B. (2011): Statistical Forecasting of Rainfall in the Himalaya: A Case Study. The 5th International Conference MSAST of IMBIC, December, pp. 127-132. 146. Shashi Kumar, Awasthi, R.P. and Kumar, S. (1998): Yield forecasting in apple based on meterological parameters. Indian Journal of Horticulture, Vol. 55, No.3, pp. 190-195. 147. Singh, B.H. and Bapat, S.R. (1988): Pre-harvest forecast models for prediction of sugarcane yield. Indian Journal of Agriculture Science, Vol. 8, No. 6, pp. 465-469. 148. Singh, D., Singh, H.P., Singh Padam and Jha, M.P. (1979): A study of preharvest forecasting of yield of jute. Indian Journal of Agriculture Science, Vol. 13, No. 3, pp.167169. 149. Singh, K.K. (1988): Estimation of sugarcane and cotton yields at some selected stations in India based on weather parameters. Thesis, Ph.D, B.H.U. Varanasi, pp.155. 150. Sivakumar, B., Liong, S.Y., Liow, C. Y. and Phoon, K.K. (1999): Singapore rainfall behavior Chaotic? Journal of Hydrology Engineering, ASCE, Vol.4, pp.38-48. 151. Sivakumar, B. (2001): Rainfall dynamics in different temporal scale: A Chaotic percepective. Hydrology and Earth System Science, Vol.5, pp. 645-651. 152. Sohn, T., Lee, J.H., Lee, S.H. and Ryu, C.S. (2005): Statistical prediction of heavy rain in South Korea. Advances in Atmospheric Sciences, Vol. 22, No. 5, pp. 703-710. 153. Somvanshi, V.K., Pandey, O.P.Agarwal, P.K.Kalanker, N.V.Prakesh, M.R., and Chand, R. (2006): Modeling and prediction of rainfall using artificial neural network and ARIMA techniques. Journal of Indian Geophysical Union, Vol. 10, No. 2, pp. 141-151. 154. Starett, S.K., Najjar, Y., Adams, S.G., and Hill, J. (1998): Modeling pesticide leaching from golf courses using artificial neural networks. Communications in soil science and plant analysis, Vol. 29, pp. 3093-3106. 155. Snedecor, G.W. and Cochran, W.G. (1967): Statistical Methods. The Iowa University Press, Ames, Iowa, 6th ed. 156. Sparks, D. (1997): A model for predicting pecan production order arid conditions at high elevations. Journal of the American Society for Horticultural Science, Vol. 122, No 5, pp.648-652. 157. Specht, Donald, F. (1991): A General Regression Neural Network. IEEE Transactions on Neural Networks, Vol. 2, No. 6, pp. 568-576. 158. Suhartono, Subanar and Guritno, S. (2005): A comparative study of forecasting models for trends and seasonal time series: Does complex model always yield better forecast than simple models? Jurnal Teknik Industri, Vol. 7, No. 1, pp. 22-30. 159. Tam, K. Y. and Kiang, M.Y. (1992): Managerial Applications of Neural Networks: the case of Bank Failure Predictions. Management Science, Vol. 38, No. 7, pp. 926-947. 160. Tao, D.Q., Nguyen, V.T and Bourque, A. (2002): On selection of probability distribution for representing extreme precipitations in southern Quebec. Annual Conference of the Canadian Society for Civil Engineering, pp. 1-8. 161. Taskaya-Temizel, T., Ahmad, K. (2005): Are ARIMA neural network hybrids better than single models? Proceedings of International Joint Conference on Neural Networks, Montreal, Canada. 162. Taskaya-Temizel, T., Ahmad, K. (2007): A comparative study of Autoregressive Neural network hybrids. Neural Networks, Vol. 5, pp. 3192 – 3197. 163. Teri, O. And Onal, S. (2012): Application of Artificial neural networks and multiple linear regression to forecast monthly river flow in Turkey. African Journal of Agricultural Research, Vol.7, No. 8, pp. 1317-1323. 164. Tippet, L.H.C. (1929): On the effect of sunshine on wheat yield at Rothamsted. Journal of Agricultural Science, Vol. 60, No. 2. 165. Topaloglu, F. (2002): Determining suitable probability distribution models for flow and precipitation series of the Seyhan River Basin. Turk Journal of Agriculture, Vol. 26, pp.189-194. 166. Toth, E., Montanari, A. and Brath, A. (2000): Comparison short-term rainfall prediction model for real-time flood forecasting. Journal of Hydrology, Vol. 239, pp. 132-147. 167. Tseng F.M., Yu, H.C., & Tzenf, G.H. (2002): Combining neural network model with seasonal time series ARIMA model. Technological forecasting and social change, Vol.69, No.1, pp. 71-78. 168. Upadhaya, A., and Singh, S.R. (1998): Estimation of consecutive day’s maximum rainfall by various methods and their comparison. Indian Journal of Science Cons., Vol. 26, No.2, pp. 193-2001. 169. Vaccari, D.A. and Levri, J. (1999): Multivariable Empirical modeling of ALS systems using polynomials. Life support and biosphere science, Vol. 6, pp. 265-271. 170. Venkatesan, C., Raskar, S.D. Tambe, S.S., Kulkarni, B. Dand, Keshavamurty, R.N. (1997): Prediction of all India monsoon rainfall using error-back-propagation neural networks. Metrological atmospheric Physics, Vol. 62, pp. 225-240. 171. Victor, R. Prybutok, and Yi, J. (2001): Neural Networks Forecasting Model as an Alternative to OLS Regression Model for Handling Messy Data. Commerce and Economic Review, Korean publication by Industrial Development Institute at Kyungsung University, Vol. 17, No. 1, pp. 137-158. 172. Wang, Y.M., & Elhag, T.M.S. (2007): A comparison of neural network, evidential reasoning and multiple regression analysis in modeling bridge risks. Expert system with applications, Vol. 32, No. 2, pp. 336-348. 173. Wang, Zhi-L. and Sheng, Hui-h. (2010): Rainfall prediction using generalized regression neural network: case study Zhengzhou. Computational and Information Science (ICCIS), pp. 1265-1268. 174. Weigend, A, Refenes, A. and Abu-Mostafa, Eds. Y. (1996): Forecasting natural physical phenomena. Proceeding of Neural Networks in the capital markets conference. Pasadena, CA: World Scientfic. 175. Wilks, D.S. (1998): Multi-site generalization of a daily stochastic precipitation model. Journal of Hydrology, Vol. 210, pp. 178-191. 176. Wong, K.W., Wong, P.M., Gedeon, T.D. and Fung, C.C. (1999): Rainfall prediction using neural fuzzy technique. URL:www.it.murdoch.edu.au/nwrong/ publications/SIC97. pdf.213-221 177. Wu, C.L., Chau, K.W. and Fan, C. (2010): Prediction of rainfall time series using modular artificial neural networks coupled with data preprocessing techniques. Journal of Hydrology , Vol. 389, No. (1-2), pp. 146-167. 178. Wu, F.Y. and Yen, K.K. (1992): Application of Neural Network in Regression Analysis. Computers and Industrial Engineering, Vol. 23, No.1-4. 179. Wu, J., Liu, M. and Jin, L. (2010): A hybrid support vector regression approach for rainfall forecasting using particle swarm optimization and projection pursuit technology. International Journal of Computational Intelligence and Application (IJCIA), Vol. 9, No.2, pp. 87-104. 180. Yi, J., Prybutok, V.R. (1996): A neural network model forecasting for prediction of daily maximum ozone concentration in an industrial urban area. Environmental Pollution, Vol. 92, pp. 349-357. 181. Zaefizadeh, M., Khayatnezhad, M., and Gholamin, R. (2011): Comparison of Multiple Linear Regression (MLR) and Artificial Neural Network (ANN) in predicting the yield using its components in the Hulless Barley. American-Eurasian Journal of Agriculture & Environment Science, Vol. 10, No. 1, pp. 60-64. 182. Zaw, W.T. and Naing, T.T. (2008): Empirical Statistical Modeling of Rainfall prediction over Myanmar. World Academy of Science, Engineering and Technology, Vol. 46, pp. 565-568. 183. Zhang. G.P. (2001): An investigation of neural networks for linear time series forecasting. Computer & Operations Research, Vol. 28, pp. 1183-1202. 184. Zhang, G.P. (2003): Time series forecasting using a hybrid ARIMA and neural networks model. Neurocomputing, Vol. 50, pp 159-175. 185. Zhang, P.G. and Qi, M. (2003): Neural network forecasting for seasonal and trend time series. European Journal of Operational Research, Vol. 160, No. 2, pp. 501–514. 186. Zhou, Z.J. and Hu, C.H. (2008): An effective hybrid approach based on grey and ARMA for forecasting gyrodrifts. Chaos, Solitons & Fractals, Vol. 35, No. 3, pp. 525–529. 187. Zurada, J.M. (1992): Introduction to Artificial Neural Systems. West Publishing Company, Saint Paul, Minnesota. 188. www.google.com APPENDIX A(a) Procedure followed for Stepwise Regression Analysis for Dependent Variable: Rainfall Stepwise Selection: Step 1 Variable Relative Humidity 2PM Entered: R-Square = 0.2996 and C (p) = 71.6676 Source Analysis of Variance Sum of Mean DF Squares Square Model Error Corrected Total 1 593 594 Variable Intercept Relative Humidity 2PM 963290 2252448 3215738 963290 3798.39389 F Value Pr > F 253.60 <.0001 Parameter Estimate Standard Error Type II SS F Value Pr > F -119.44984 2.89509 12.16003 0.18180 366524 963290 96.49 253.60 <.0001 <.0001 Bounds on condition number: 1, 1 -------------------------------------------------------------------------------------------------Stepwise Selection: Step 2 Variable Bright Sunshine Entered: R-Square = 0.3444 and C (p) = 31.2555 Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model Error Corrected Total 2 592 594 1107452 2108286 3215738 553726 3561.29453 155.48 <.0001 Variable Parameter Estimate Standard Error Type II SS F Value Pr > F Intercept Relative Humidity 2PM Bright Sunshine 6.22513 1.95314 -9.84784 22.99585 0.23001 1.54782 260.97892 256785 144161 0.07 72.10 40.48 0.7867 <.0001 <.0001 Bounds on condition number: 1.7074, 6.8295 -------------------------------------------------------------------------------------------------- Stepwise Selection: Step 3 Variable Relative Humidity 7AM Entered: R-Square = 0.3545 and C (p) = 23.7044 Analysis of Variance Source DF Sum of Squares Mean Square Model Error Corrected Total 3 591 594 1139917 2075821 3215738 379972 3512.38808 Variable Intercept Relative Humidity 7AM Relative Humidity 2PM Bright Sunshine F Value Pr > F 108.18 <.0001 Parameter Estimate Standard Error Type II SS F Value Pr > F 66.26725 -1.38703 2.81011 -9.17876 30.19235 0.45622 0.36281 1.55283 16920 32465 210708 122722 4.82 9.24 59.99 34.94 0.0286 0.0025 <.0001 <.0001 Bounds on condition number: 4.3072, 27.888 -------------------------------------------------------------------------------------------------Stepwise Selection: Step 4 Variable Maximum temperature Entered: R-Square = 0.3666 and C (p) = 14.2837 Analysis of Variance Source DF Sum of Squares Mean Square Model Error Corrected Total 4 590 594 1178736 2037002 3215738 294684 3452.54544 Variable Parameter Estimate Standard Error F Value Pr > F 85.35 <.0001 Type II SS F Value Intercept 379.40528 98.06595 51678 14.97 Maximum Temperature -6.25287 1.86477 38820 11.24 Relative Humidity 7AM -2.28785 0.52608 65295 18.91 Relative Humidity 2PM 2.35796 0.38415 130078 37.68 Bright Sunshine -8.74845 1.54488 110716 32.07 Bounds on condition number: 4.9125, 61.686 -------------------------------------------------------------------------------------------------- Pr > F 0.0001 0.0009 <.0001 <.0001 <.0001 Stepwise Selection: Step 5 Variable Pan Evaporation Entered: R-Square = 0.3766 and C (p) = 6.7459 Analysis of Variance Source DF Sum of Squares Mean Square Model Error Corrected Total 5 589 594 1211156 2004582 3215738 242231 3403.36528 F Value 71.17 Pr > F <.0001 Parameter Estimate Standard Error Type II SS F Value Pr > F Intercept 391.90674 Maximum Temperature -8.65592 Relative Humidity 7AM -1.89389 Relative Humidity 2PM 2.46673 Pan Evaporation 4.70662 Bright Sunshine -8.50348 97.44921 2.00849 0.53770 0.38303 1.52496 1.53589 55045 63212 42223 141151 32420 104323 16.17 18.57 12.41 41.47 9.53 30.65 <.0001 <.0001 0.0005 <.0001 0.0021 <.0001 Variable Bounds on condition number: 5.1343, 98.501 -------------------------------------------------------------------------------------------------All variables left in the model are significant at the 0.1500 level. No other variable met the 0.1500 significance level for entry into the model. Summary of Stepwise Selection Step 1 2 3 4 5 Variable Entered Relative Humidity 2PM Bright Sunshine Relative Humidity 7AM Maximum Temperature Pan Evaporation Number Partial Model Variables In R-Square R-Square C (p) 1 2 3 4 5 0.2996 0.2996 0.0448 0.3444 0.0101 0.3545 0.0121 0.3666 0.0101 0.3766 71.6676 31.2555 23.7044 14.2837 6.7459 F Value Pr > F 253.60 40.48 9.24 11.24 9.53 <.0001 <.0001 0.0025 0.0009 0.0021 APPENDIX A(b) Procedure followed for Stepwise Regression Analysis for Dependent Variable: Maximum Temperature Stepwise Selection: Step 1 Variable Relative humidity 7AM Entered: R-Square = 0.7151 and C (p) = 477.2735 Analysis of Variance Source DF Model Error Corrected Total 1 593 594 Sum of Squares 3097.61259 1234.05268 4331.66528 Mean Square 3097.61259 2.08103 F Value Pr > F 1488.50 <.0001 Variable Parameter Estimate Standard Error Type II SS F Value Intercept Relative humidity 7AM 53.93488 -0.23778 0.53853 0.00616 20873 3097.61259 10030.3 <.0001 1488.50 <.0001 Pr > F Bounds on condition number: 1, 1 -------------------------------------------------------------------------------------------------Stepwise Selection: Step 2 Variable Pan Evaporation Entered: R-Square = 0.7720 and C (p) = 266.0520 Analysis of Variance Source DF Sum of Squares Model Error Corrected Total 2 592 594 3343.92268 987.74260 4331.66528 Variable Intercept Relative humidity 7AM Pan Evaporation Mean Square 1671.96134 1.66848 F Value Pr > F 1002.08 <.0001 Parameter Estimate Standard Error Type II SS F Value Pr > F 44.80651 -0.15463 0.36545 0.89273 0.00879 0.03008 4203.00312 516.16007 246.31008 2519.05 309.36 147.63 <.0001 <.0001 <.0001 Bounds on condition number: 2.5379, 10.152 -------------------------------------------------------------------------------------------------- Stepwise Selection: Step 3 Variable Relative Humidity 2PM Entered: R-Square = 0.8034 and C (p) = 150.1939 Analysis of Variance Source DF Sum of Squares Model Error Corrected Total 3 591 594 3480.07053 851.59475 4331.66528 Mean Square 1160.02351 1.44094 F Value Pr > F 805.05 <.0001 Variable Parameter Estimate Standard Error Type II SS Intercept Relative humidity 7AM Relative Humidity 2PM Pan Evaporation 44.14902 -0.09487 -0.06352 0.29327 0.83238 0.01022 0.00653 0.02892 4053.61208 2813.17 <.0001 124.06086 86.10 <.0001 136.14785 94.49 <.0001 148.17111 102.83 <.0001 F Value Pr > F Bounds on condition number: 3.9748, 30.293 -------------------------------------------------------------------------------------------------Stepwise Selection: Step 4 Variable Minimum Temperature Entered: R-Square = 0.8374 and C (p) = 24.8486 Analysis of Variance Source DF Sum of Squares Model Error Corrected Total 4 590 594 3627.17774 704.48753 4331.66528 Mean Square 906.79444 1.19405 F Value Pr > F 759.43 <.0001 Variable Parameter Estimate Standard Error Type II SS F Value Pr > F Intercept Minimum Temperature Relative humidity 7AM Relative Humidity 2PM Pan Evaporation 35.78241 0.34132 -0.07114 -0.08928 0.23873 1.06880 1338.35354 1120.86 0.03075 147.10722 123.20 0.00955 66.26731 55.50 0.00639 233.44846 195.51 0.02678 94.87433 79.46 <.0001 <.0001 <.0001 <.0001 <.0001 Bounds on condition number: 4.1845, 48.364 -------------------------------------------------------------------------------------------------- Stepwise Selection: Step 5 Variable Rainfall Entered: R-Square = 0.8417 and C (p) = 10.5253 Analysis of Variance Source DF Sum of Squares Mean Square Model Error Corrected Total 5 589 594 3646.03416 685.63112 4331.66528 729.20683 1.16406 F Value Pr > F 626.43 <.0001 Variable Parameter Estimate Standard Error F Value Pr > F Intercept Rainfall Minimum Temperature Relative humidity 7AM Relative Humidity 2PM Pan Evaporation 35.85264 -0.00295 0.32834 -0.07620 -0.07638 0.24756 1.05543 1343.24493 1153.93 0.00073325 18.85641 16.20 0.03053 134.61214 115.64 0.00951 74.69547 64.17 0.00707 135.76660 116.63 0.02653 101.32602 87.05 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 Type II SS Bounds on condition number: 4.9388, 73.488 -------------------------------------------------------------------------------------------------Stepwise Selection: Step 6 Variable Bright Sunshine Entered: R-Square = 0.8432 and C (p) = 7.0000 Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model Error Corrected Total 6 588 594 Variable Intercept Rainfall Minimum Temperature Relative humidity 7AM Relative Humidity 2PM Pan Evaporation Bright Sunshine 3652.41694 679.24833 4331.66528 Parameter Estimate 34.98664 -0.00252 0.33439 -0.07787 -0.07036 0.24658 0.06837 608.73616 1.15518 526.96 <.0001 Standard Error Type II SS 1.11408 0.00075311 0.03053 0.00950 0.00750 0.02644 0.02909 1139.25818 986.21 12.93483 11.20 138.62525 120.00 77.56213 67.14 101.72363 88.06 100.50376 87.00 6.38279 5.53 F Value Bounds on condition number: 5.5926, 104.02 --------------------------------------------------------------------------------------------- Pr > F <.0001 0.0009 <.0001 <.0001 <.0001 <.0001 0.0191 All variables left in the model are significant at the 0.1500 level. All variables have been entered into the model. Summary of Stepwise Selection Step 1 2 3 4 5 6 Variable Entered Relative humidity 7AM Pan Evaporation Relative Humidity 2PM Minimum Temperature Rainfall Bright Sunshine Number Partial Model Variables In R-Square R-Square 1 2 3 4 5 6 0.7151 0.0569 0.0314 0.0340 0.0044 0.0015 0.7151 0.7720 0.8034 0.8374 0.8417 0.8432 C (p) F Value Pr > F 477.273 1488.50 <.0001 266.052 147.63 <.0001 150.194 94.49 <.0001 24.8486 123.20 <.0001 10.5253 16.20 <.0001 7.0000 5.53 0.0191 APPENDIX A(c) Procedure followed for Stepwise Regression Analysis for Dependent Variable: Minimum Temperature Stepwise Selection: Step 1 Variable Maximum Temperature Entered: R-Square = 0.0421 and C (p) = 217.5398 Analysis of Variance Source DF Sum of Squares Model Error Corrected Total 1 593 594 62.20914 1415.79260 1478.00175 Variable Intercept Maximum Temperature Mean Square 62.20914 2.38751 F Value 26.06 Pr > F <.0001 Parameter Estimate Standard Error Type II SS F Value Pr > F 20.25823 0.11984 0.78396 0.02348 1594.25897 62.20914 667.75 26.06 <.0001 <.0001 Bounds on condition number: 1, 1 -------------------------------------------------------------------------------------------------Stepwise Selection: Step 2 Variable Relative Humidity 2PM Entered: R-Square = 0.2887 and C (p) = 11.3595 Analysis of Variance Source DF Sum of Squares Mean Square Model Error Corrected Total 2 592 594 426.74307 1051.25867 1478.00175 213.37154 1.77577 Variable Parameter Estimate Standard Error Intercept Maximum Temperature Relative Humidity 2PM -0.76196 0.55257 0.10114 1.61540 0.03636 0.00706 F Value Pr > F 120.16 <.0001 Type II SS 0.39509 410.09498 364.53393 F Value Pr > F 0.22 230.94 205.28 0.6373 <.0001 <.0001 Bounds on condition number: 3.2251, 12.9 -------------------------------------------------------------------------------------------------- Stepwise Selection: Step 3 Variable Bright Sunshine Entered: R-Square = 0.2987 and C (p) = 4.9410 Analysis of Variance Source DF Sum of Squares Mean Square Model Error Corrected Total 3 591 594 441.48427 1036.51748 1478.00175 147.16142 1.75384 Variable Parameter Estimate Standard Error Intercept Maximum Temperature Relative Humidity 2PM Bright Sunshine 0.51214 0.55250 0.09160 -0.09958 1.66446 0.03614 0.00775 0.03435 F Value Pr > F 83.91 Type II SS 0.16605 409.98422 245.12691 14.74120 <.0001 F Value Pr > F 0.09 233.76 139.77 8.41 0.7584 <.0001 <.0001 0.0039 Bounds on condition number: 3.9343, 26.6 -------------------------------------------------------------------------------------------------Stepwise Selection: Step 4 Variable Relative Humidity 7AM Entered: R-Square = 0.3013 and C (p) = 4.7836 Analysis of Variance Source DF Sum of Squares Mean Square Model Error Corrected Total 4 590 594 445.26193 1032.73982 1478.00175 111.31548 1.75041 F Value Pr > F 63.59 <.0001 Variable Parameter Estimate Standard Error Type II SS F Value Pr > F Intercept Maximum Temperature Relative humidity 7AM Relative Humidity 2PM Bright Sunshine 2.64644 0.52100 -0.01740 0.09727 -0.09121 2.20810 0.04199 0.01185 0.00865 0.03479 2.51435 269.50114 3.77766 221.37195 12.03446 1.44 0.2312 153.96 <.0001 2.16 0.1423 126.47 <.0001 6.88 0.0090 Bounds on condition number: 4.9125, 61.686 -------------------------------------------------------------------------------------------------- All variables left in the model are significant at the 0.1500 level. No other variable met the 0.1500 significance level for entry into the model. Summary of Stepwise Selection Step 1 2 3 4 Variable Entered Maximum Temperature Relative Humidity 2PM Bright Sunshine Relative humidity 7AM Number Partial Model Variables In R-Square R-Square 1 2 3 4 0.0421 0.2466 0.0100 0.0026 0.0421 0.2887 0.2987 0.3013 C (p) F Value Pr > F 217.540 26.06 <.0001 11.3595 205.28 <.0001 4.9410 8.41 0.0039 4.7836 2.16 0.1423 APPENDIX A(d) Procedure followed for Stepwise Regression Analysis for Dependent Variable: Relative Humidity at 7 AM Stepwise Selection: Step 1 Variable Maximum Temperature Entered: R-Square = 0.7151 and C (p) = 208.3268 Analysis of Variance Source DF Sum of Squares Model Error Corrected Total 1 593 594 39178 15608 54786 Mean Square 39178 26.32032 F Value Pr > F 1488.50 <.0001 Variable Parameter Estimate Standard Error Type II SS Intercept Maximum Temperature 186.94704 -3.00741 2.60296 0.07795 135767 39178 F Value Pr > F 5158.24 <.0001 1488.50 <.0001 Bounds on condition number: 1, 1 -------------------------------------------------------------------------------------------------Stepwise Selection: Step 2 Variable Relative Humidity 2PM Entered: R-Square = 0.7660 and C (p) = 67.4853 Analysis of Variance Source DF Sum of Squares Model Error Corrected Total 2 592 594 41967 12819 54786 Variable Intercept Maximum Temperature Relative Humidity 2PM Mean Square 20983 21.65334 F Value Pr > F 969.06 <.0001 Parameter Estimate Standard Error Type II SS F Value Pr > F 128.80293 -1.81043 0.27977 5.64091 0.12697 0.02465 11290 4402.23745 2789.17590 521.38 203.31 128.81 <.0001 <.0001 <.0001 Bounds on condition number: 3.2251, 12.9 -------------------------------------------------------------------------------------------------- Stepwise Selection: Step 3 Variable Pan Evaporation Entered: R-Square = 0.7804 and C (p) = 29.1147 Analysis of Variance Source DF Sum of Squares Model Error Corrected Total 3 591 594 42755 12030 54786 Mean Square F Value 14252 700.12 20.35615 Pr > F <.0001 Variable Parameter Estimate Standard Error Type II SS F Value Pr > F Intercept Maximum Temperature Relative Humidity 2PM Pan Evaporation 118.78898 -1.34028 0.25027 -0.71005 5.70115 0.14444 0.02437 0.11410 8837.36012 1752.60870 2147.56886 788.29066 434.14 86.10 105.50 38.72 <.0001 <.0001 <.0001 <.0001 Bounds on condition number: 4.4397, 32.356 -------------------------------------------------------------------------------------------------Stepwise Selection: Step 4 Variable Rainfall Entered: R-Square = 0.7870 and C (p) = 12.5982 Analysis of Variance Source DF Sum of Squares Mean Square Model Error Corrected Total 4 590 594 43117 11669 54786 10779 19.77784 F Value Pr > F 545.01 <.0001 Variable Parameter Estimate Standard Error Type II SS Intercept Maximum Temperature Rainfall Relative Humidity 2PM Pan Evaporation 119.67441 -1.42546 -0.01290 0.28687 -0.62239 5.62340 0.14376 0.00302 0.02550 0.11432 F Value Pr > F 8957.43103 452.90 1944.38902 98.31 361.56017 18.28 2503.59168 126.59 586.18727 29.64 <.0001 <.0001 <.0001 <.0001 <.0001 Bounds on condition number: 4.5267, 51.506 -------------------------------------------------------------------------------------------------- Stepwise Selection: Step 5 Variable Bright Sunshine Entered: R-Square = 0.7896 and C (p) = 7.4216 Analysis of Variance Source DF Sum of Squares Mean Square F Value Model 5 43257 8651.37901 Error 589 11529 19.57350 Corrected Total 594 54786 Variable Intercept Maximum Temperature Rainfall Relative Humidity 2PM Pan Evaporation Bright Sunshine 441.99 Pr > F <.0001 Parameter Estimate Standard Error Type II SS 115.86488 -1.43045 -0.01089 0.31277 -0.60811 0.31774 5.77261 0.14303 0.00309 0.02715 0.11386 0.11875 7885.47968 402.87 <.0001 1957.70893 100.02 <.0001 242.83120 12.41 0.0005 2597.72731 132.72 <.0001 558.38604 28.53 <.0001 140.13310 7.16 0.0077 F Value Pr > F Bounds on condition number: 4.5274, 76.775 -------------------------------------------------------------------------------------------------Stepwise Selection: Step 6 Variable Minimum Temperature Entered: R-Square = 0.7904 and C (p) = 7.0000 Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model Error Corrected Total 6 588 594 43304 11482 54786 7217.36338 19.52637 369.62 Variable Parameter Estimate Standard Error Intercept Maximum Temperature Minimum Temperature Rainfall Relative Humidity 2PM Pan Evaporation Bright Sunshine 116.06296 -1.31618 -0.21386 -0.01111 0.33298 -0.60294 0.29476 5.76706 7908.60919 0.16063 1311.05245 0.13743 47.28526 0.00309 252.18817 0.03007 2395.09251 0.11377 548.45500 0.11952 118.75498 Type II SS <.0001 F Value Pr > F 405.02 67.14 2.42 12.92 122.66 28.09 6.08 <.0001 <.0001 0.1202 0.0004 <.0001 <.0001 0.0139 Bounds on condition number: 5.7236, 114.04 -------------------------------------------------------------------------------------------------- All variables left in the model are significant at the 0.1500 level. All variables have been entered into the model. Summary of Stepwise Selection Step 1 2 3 4 5 6 Variable Entered Maximum Temperature Relative Humidity 2PM Pan Evaporation Rainfall Bright Sunshine Minimum Temperature Number Partial Model Variables In R-Square R-Square C (p) 1 2 3 4 5 6 0.7151 0.0509 0.0144 0.0066 0.0026 0.0009 0.7151 0.7660 0.7804 0.7870 0.7896 0.7904 F Value 208.327 1488.50 67.4853 128.81 29.1147 38.72 12.5982 18.28 7.4216 7.16 7.0000 2.42 Pr > F <.0001 <.0001 <.0001 <.0001 0.0077 0.1202 APPENDIX A(e) Procedure followed for Stepwise Regression Analysis for Dependent Variable: Relative Humidity at 2 PM Stepwise Selection: Step 1 Variable Maximum Temperature Entered: R-Square = 0.6899 and C (p) = 581.3400 Analysis of Variance Source Model Error Corrected Total DF Sum of Squares 1 593 594 79294 35636 114930 Variable Parameter Estimate Intercept Maximum Temperature 207.83164 -4.27851 Mean Square F Value Pr > F 79294 1319.49 60.09429 <.0001 Standard Error 3.93313 0.11778 Type II SS 167795 79294 F Value Pr > F 2792.20 1319.49 <.0001 <.0001 Bounds on condition number: 1, 1 -------------------------------------------------------------------------------------------------Stepwise Selection: Step 2 Variable Minimum Temperature Entered: R-Square = 0.7698 and C (p) = 281.4895 Analysis of Variance Source DF Sum of Squares Mean Square Model Error Corrected Total 2 592 594 88469 26460 114930 44235 44.69677 F Value Pr > F 989.66 <.0001 Variable Parameter Estimate Standard Error Type II SS F Value Pr > F Intercept Maximum Temperature Minimum Temperature 156.25954 -4.58359 2.54574 4.94592 0.10379 0.17768 44614 87175 9175.42530 998.16 1950.36 205.28 <.0001 <.0001 <.0001 Bounds on condition number: 1.0439, 4.1758 -------------------------------------------------------------------------------------------------- Stepwise Selection: Step 3 Variable Relative humidity 7AM Entered: R-Square = 0.8076 and C (p) = 140.5917 Analysis of Variance Source DF Sum of Squares Mean Square Model Error Corrected Total 3 591 594 92813 22117 114930 30938 37.42265 F Value Pr > F 826.71 <.0001 Variable Parameter Estimate Standard Error Type II SS F Value Pr > F Intercept Maximum Temperature Minimum Temperature Relative humidity 7AM 61.95949 -2.94557 2.27424 0.53384 9.85360 0.17926 0.16452 0.04955 1479.65519 10104 7150.90949 4343.70133 39.54 270.00 191.09 116.07 <.0001 <.0001 <.0001 <.0001 Bounds on condition number: 3.7196, 25.149 -------------------------------------------------------------------------------------------------Stepwise Selection: Step 4 Variable Bright Sunshine Entered: R-Square = 0.8324 and C (p) = 48.8003 Analysis of Variance Source DF Sum of Squares Model Error Corrected Total 4 590 594 95664 19266 114930 Mean Square 23916 32.65388 F Value Pr > F 732.41 <.0001 Variable Parameter Estimate Standard Error Type II SS Intercept Maximum Temperature Minimum Temperature Relative humidity 7AM Bright Sunshine 61.71801 -2.34854 1.81464 0.53482 -1.31787 9.20442 0.17923 0.16136 0.04629 0.14104 1468.13280 44.96 5606.88575 171.71 4129.69912 126.47 4359.54612 133.51 2850.99934 87.31 F Value Pr > F <.0001 <.0001 <.0001 <.0001 <.0001 Bounds on condition number: 4.2612, 42.321 -------------------------------------------------------------------------------------------------- Stepwise Selection: Step 5 Variable Rainfall Entered: R-Square = 0.8424 and C (p) = 12.9769 Analysis of Variance Source DF Model Error Corrected Total 5 589 594 Sum of Squares 96814 18116 114930 Mean Square 19363 30.75732 F Value Pr > F 629.53 <.0001 Variable Parameter Estimate Standard Error Type II SS F Value Pr > F Intercept Maximum Temperature Minimum Temperature Relative humidity 7AM Rainfall Bright Sunshine 49.13831 -2.09240 1.76062 0.55662 0.02307 -1.03245 9.16702 0.17892 0.15685 0.04506 0.00377 0.14462 883.75694 4206.57361 3875.11862 4692.75771 1149.72761 1567.47160 28.73 136.77 125.99 152.57 37.38 50.96 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 Bounds on condition number: 4.5083, 62.612 -------------------------------------------------------------------------------------------------Stepwise Selection: Step 6 Variable Pan Evaporation Entered: R-Square = 0.8445 and C (p) = 7.0000 Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model Error Corrected Total Variable 6 588 594 97056 17874 114930 Parameter Estimate 16176 30.39725 Standard Error 532.15 <.0001 Type II SS F Value Pr > F Intercept 46.85286 9.14906 797.17553 26.23 <.0001 Maximum Temperature -1.85133 0.19729 2676.73218 88.06 <.0001 Minimum Temperature 1.74419 0.15604 3797.88959 124.94 <.0001 Relative humidity 7AM 0.51835 0.04680 3728.50755 122.66 <.0001 Rainfall 0.02415 0.00377 1246.78191 41.02 <.0001 Pan Evaporation -0.40761 0.14432 242.47563 7.98 0.0049 Bright Sunshine -1.02706 0.14379 1550.87910 51.02 <.0001 Bounds on condition number: 5.5465, 102.7 -------------------------------------------------------------------------------------------------- All variables left in the model are significant at the 0.1500 level. All variables have been entered into the model. Summary of Stepwise Selection Step 1 2 3 4 5 6 Variable Entered Maximum Temperature Minimum Temperature Relative humidity 7AM Bright Sunshine Rainfall Pan Evaporation Number Partial Model Variables In R-Square R-Square 1 2 3 4 5 6 0.6899 0.0798 0.0378 0.0248 0.0100 0.0021 0.6899 0.7698 0.8076 0.8324 0.8424 0.8445 C (p) F Value Pr > F 581.340 1319.49 <.0001 281.489 205.28 <.0001 140.592 116.07 <.0001 48.8003 87.31 <.0001 12.9769 37.38 <.0001 7.0000 7.98 0.0049 APPENDIX A(f) Procedure followed for Stepwise Regression Analysis for Dependent Variable: Pan Evaporation Stepwise Selection: Step 1 Variable Maximum Temperature Entered: R-Square = 0.6528 and C (p) = 72.6197 Analysis of Variance Source Model Error Corrected Total DF 1 593 594 Sum of Squares Mean Square 3055.56776 1625.05738 4680.62514 3055.56776 2.74040 F Value Pr > F 1115.01 <.0001 Variable Parameter Estimate Standard Error Type II SS F Value Pr > F Intercept Maximum Temperature -22.73697 0.83988 0.83990 0.02515 2008.26447 3055.56776 732.84 1115.01 <.0001 <.0001 Bounds on condition number: 1, 1 -------------------------------------------------------------------------------------------------Stepwise Selection: Step 2 Variable Relative humidity 7AM Entered: R-Square = 0.6846 and C (p) = 13.8232 Analysis of Variance Source DF Sum of Squares Mean Square Model Error Corrected Total 2 592 594 3204.44481 1476.18034 4680.62514 1602.22240 2.49355 Variable Parameter Estimate Intercept Maximum Temperature Relative humidity 7AM -4.47870 0.54616 -0.09767 Standard Error 2.49508 0.04495 0.01264 F Value Pr > F 642.55 <.0001 Type II SS F Value Pr > F 8.03440 368.11018 148.87705 3.22 147.63 59.70 0.0732 <.0001 <.0001 Bounds on condition number: 3.5101, 14.04 -------------------------------------------------------------------------------------------------- Stepwise Selection: Step 3 Variable Rainfall Entered: R-Square = 0.6876 and C (p) = 10.1554 Analysis of Variance Source DF Sum of Squares Model Error Corrected Total 3 591 594 3218.32405 1462.30110 4680.62514 Mean Square 1072.77468 2.47428 F Value Pr > F 433.57 <.0001 Variable Parameter Estimate Standard Error Type II SS Intercept Maximum Temperature Relative humidity 7AM Rainfall -5.95940 0.58167 -0.09613 0.00237 2.56285 0.04722 0.01261 0.00100 13.37857 375.44478 143.85708 13.87924 F Value Pr > F 5.41 151.74 58.14 5.61 0.0204 <.0001 <.0001 0.0182 Bounds on condition number: 3.9035, 26.17 -------------------------------------------------------------------------------------------------Stepwise Selection: Step 4 Variable Relative Humidity 2PM Entered: R-Square = 0.6922 and C (p) = 3.4119 Analysis of Variance Source DF Sum of Squares Mean Square Model Error Corrected Total 4 590 594 3239.73488 1440.89027 4680.62514 809.93372 2.44219 Variable Intercept Maximum Temperature Relative humidity 7AM Relative Humidity 2PM Rainfall Parameter Estimate -4.10879 0.53040 -0.07685 -0.02902 0.00351 F Value Pr > F 331.64 <.0001 Standard Error Type II SS F Value Pr > F 2.62176 0.05001 0.01412 0.00980 0.00107 5.99821 274.75956 72.38297 21.41083 26.51822 2.46 112.51 29.64 8.77 10.86 0.1176 <.0001 <.0001 0.0032 0.0010 Bounds on condition number: 4.5211, 59.696 -------------------------------------------------------------------------------------------------- All variables left in the model are significant at the 0.1500 level. No other variable met the 0.1500 significance level for entry into the model. Summary of Stepwise Selection Step Variable Entered 1 Maximum Temperature 2 Relative humidity 7AM 3 Rainfall 4 Relative Humidity 2PM Number Partial Variables In R-Square 1 2 3 4 0.6528 0.0318 0.0030 0.0046 Model R-Square 0.6528 0.6846 0.6876 0.6922 C (p) F Value Pr > F 72.6197 1115.01 <.0001 13.8232 59.70 <.0001 10.1554 5.61 0.0182 3.4119 8.77 0.0032 APPENDIX A(g) Procedure followed for Stepwise Regression Analysis for Dependent Variable: Bright Sunshine Stepwise Selection: Step 1 Variable Relative Humidity 2PM Entered: R-Square = 0.4143 and C (p) = 55.0909 Analysis of Variance Source DF Sum of Squares Mean Square Model Error Corrected Total 1 593 594 1051.50791 1486.50536 2538.01328 F Value Pr > F 1051.50791 2.50675 419.47 <.0001 Variable Parameter Estimate Standard Error Type II SS F Value Pr > F Intercept Relative Humidity 2PM 12.76168 -0.09565 0.31239 0.00467 4183.56363 1051.50791 1668.92 419.47 <.0001 <.0001 Bounds on condition number: 1, 1 -------------------------------------------------------------------------------------------------Stepwise Selection: Step 2 Variable Rainfall Entered: R-Square = 0.4518 and C (p) = 15.7398 Analysis of Variance Source DF Sum of Squares Mean Square Model Error Corrected Total 2 592 594 1146.64727 1391.36601 2538.01328 573.32364 2.35028 Variable Parameter Estimate Intercept Relative Humidity 2PM Rainfall 11.98536 -0.07684 -0.00650 Standard Error 0.32616 0.00540 0.00102 F Value Pr > F 243.94 <.0001 Type II SS F Value Pr > F 3173.63613 475.26076 95.13936 1350.32 202.21 40.48 <.0001 <.0001 <.0001 Bounds on condition number: 1.4277, 5.7107 -------------------------------------------------------------------------------------------------- Stepwise Selection: Step 3 Variable Minimum Temperature Entered: R-Square = 0.4589 and C (p) = 9.9418 Analysis of Variance Source DF Sum of Squares Model Error Corrected Total 3 591 594 1164.58858 1373.42470 2538.01328 Mean Square 388.19619 2.32390 F Value Pr > F 167.05 <.0001 Variable Parameter Estimate Standard Error Type II SS F Value Pr > F Intercept Minimum Temperature Relative Humidity 2PM Rainfall 14.57439 -0.11087 -0.07521 -0.00660 0.98662 0.03990 0.00540 0.00102 507.10468 17.94131 450.04584 97.97309 218.21 7.72 193.66 42.16 <.0001 0.0056 <.0001 <.0001 Bounds on condition number: 1.4446, 11.66 ------------------------------------------------------------------------------------------------Stepwise Selection: Step 4 Variable Relative humidity 7AM Entered: R-Square = 0.4618 and C (p) = 8.6718 Analysis of Variance Source DF Sum of Squares Mean Square Model Error Corrected Total 4 590 594 1172.11215 1365.90113 2538.01328 293.02804 2.31509 F Value Pr > F 126.57 <.0001 Variable Parameter Estimate Standard Error Type II SS F Value Pr > F Intercept Minimum Temperature Relative humidity 7AM Relative Humidity 2PM Rainfall 12.90426 -0.08534 0.02245 -0.08930 -0.00627 1.35205 0.04227 0.01246 0.00950 0.00103 210.88565 9.43668 7.52357 204.69657 85.77070 91.09 4.08 3.25 88.42 37.05 <.0001 0.0439 0.0719 <.0001 <.0001 Bounds on condition number: 4.4776, 43.059 -------------------------------------------------------------------------------------------------- Stepwise Selection: Step 5 Variable Maximum Temperature Entered: R-Square = 0.4668 and C (p) = 5.2364 Analysis of Variance Source DF Sum of Squares Model Error Corrected Total 5 589 594 1184.61776 1353.39551 2538.01328 Parameter Estimate Variable Intercept Maximum Temperature Minimum Temperature Relative humidity 7AM Relative Humidity 2PM Rainfall 7.86562 0.12606 -0.13358 0.03621 -0.07713 -0.00597 Mean Square 236.92355 2.29779 Standard Error 2.54542 0.05404 0.04691 0.01374 0.01080 0.00103 F Value Pr > F 103.11 <.0001 Type II SS 21.94101 12.50562 18.62797 15.96364 117.10102 76.53762 F Value Pr > F 9.55 5.44 8.11 6.95 50.96 33.31 0.0021 0.0200 0.0046 0.0086 <.0001 <.0001 Bounds on condition number: 5.8389, 93.791 -------------------------------------------------------------------------------------------------All variables left in the model are significant at the 0.1500 level. No other variable met the 0.1500 significance level for entry into the model. Summary of Stepwise Selection Step 1 2 3 4 5 Variable Entered Relative Humidity 2PM Rainfall Minimum Temperature Relative humidity 7AM Maximum Temperature Number Partial Model Variables In R-Square R-Square 1 2 3 4 5 0.4143 0.0375 0.0071 0.0030 0.0049 0.4143 0.4518 0.4589 0.4618 0.4668 C (p) F Value 55.0909 419.47 15.7398 40.48 9.9418 7.72 8.6718 3.25 5.2364 5.44 Pr > F <.0001 <.0001 0.0056 0.0719 0.0200 APPENDIX B(a) Program for developing ANN model for Rainfall % Training for Rainfall % Creating the feed forward network net=newff([0.0236 0.0432;0.0172 0.0335;0.038 0.098; 0.016 0.092;0.0000 0.0185; 0.0007 0.0116],[8,10,12,1],{'tansig','tansig','tansig','logsig'},'trainscg'); % Enter the pattern and target p = [0.0381 0.0265 0.074 0.044 0.0122 0.0066 0.0347 0.0233 0.080 0.052 0.0065 0.0069 …………………………..……………………. 0.0328 0.024 0.093 0.065 0.0041 0.0088 0.032 0.0229 0.094 0.064 0.0034 0.0078]; p = p'; t = [0.0056 0.0738 ……… 0.0272 0.0112]; t = t'; % set initial weights&biases net.IW{1,1} = [-0.8754 0.5691 -0.5960 -0.7580 -0.2941 -0.5808 0.6511 -0.2726 -0.0521 0.5720 -0.6031 0.5401 0.5375 0.3341 0.0824 0.2887 0.0589 -0.9999 0.3523 0.2958 0.9987 0.8578 0.1649 0.2789 -0.4838 -0.3115 0.6352 0.8444 0.6760 -0.9644 0.1908 -0.8439 0.2789 -0.5051 0.8061 0.1336 0.0303 0.3956 0.0722 -0.1205 0.0424 0.9686 -0.8503 0.9714 -0.9321 0.3208 -0.3348 0.1436]; net.LW{2,1} =[ 0.1669 0.6254 0.5631 0.0461 -0.0918 -0.2504 0.0376 -0.1060 0.3322 0.6177 -0.2275 -0.1279 0.0275 -0.0126 0.2725 0.0139 -0.2849 0.7109 -0.1916 -0.9661 0.9048 0.3261 0.6344 -0.6618 0.6657 -0.0168 0.7077 -0.1694 0.2635 0.2823 -0.1229 0.8654 0.5925 0.8494 -0.8233 0.2653 0.2465 0.1414 0.6385 0.1062 0.0052 -0.2024 -0.2001 0.9764 0.1687 -0.6857 0.8503 0.7989 0.9569 0.7751 0.4519 0.2634 0.2132 -0.5242 -0.2326 0.4896 -0.0316 0.0919 -0.7745 -0.4365 -0.0007 -0.4960 -0.0084 -0.9707 0.2303 0.5802 0.8999 -0.6458 0.7177 -0.5838 0.9549 -0.4680 0.2209 -0.2903 -0.6861 0.1895 -0.2585 0.2821 -0.6412 -0.0255]; net.LW{3,2} = [-0.8486 -0.6747 0.3305 0.3452 -0.2362 0.7200 0.3338 0.3797 -0.2056 -0.3913 0.8301 -0.6520 0.0935 0.4716 -0.1065 -0.1097 0.3176 -0.6392 -0.8581 -0.0295 -0.0533 -0.0064 0.8642 0.6741 0.8208 -0.4858 -0.0578 0.1573 -0.2212 -0.5869 -0.2314 0.1757 0.7960 0.7165 0.8207 -0.0764 0.0396 0.3412 0.9951 -0.1821 -0.6669 -0.7202 -0.3719 -0.8201 -0.6210 0.4288 -0.6957 0.3734 -0.6647 -0.2793 0.4430 -0.1451 -0.1824 -0.2177 0.2350 -0.7160 -0.6020 -0.8999 -0.2222 0.6489 0.0563 0.9196 -0.1114 -0.5283 -0.4075 -0.5730 -0.5688 0.4672 -0.2678 0.1231 0.6867 -0.6748 0.6196 0.2423 0.0344 -0.3697 -0.3770 0.1384 -0.6303 -0.1273 0.9198 0.2298 0.0153 -0.6400 -0.5125 0.8340 0.5722 -0.9242 0.0430 0.8806 -0.5146 0.9841 -0.1578 0.0088 0.4637 0.6674 0.4519 0.8357 0.9454 0.8613 0.6740 0.6118 -0.6070 0.3347 0.4817 0.5853 0.4767 -0.6444 -0.7207 0.0067 -0.2896 0.1226 -0.8477 -0.5013 0.2867 0.4182 -0.4899 0.4453 0.5295 -0.4088]; net.LW{4,3} = [-0.2489 0.1706 -0.8142 0.9684 0.7488 -0.1751 0.5437 -0.0984 0.3068 -0.3575]; net.b{1} = [0.3660 -0.4437 -0.4755 -0.9644 0.9308 -0.6543 -0.2085 -0.3939]; net.b{2} = [ -0.0064 -0.4747 0.9436 -0.9568 -0.9078 0.6698 0.4679 0.2685 0.5845 -0.5981]; net.b{3} = [ 0.3816 -0.2013 0.2325 -0.6459 -0.3839 0.4836 -0.1686 0.1700 -0.3120 -0.0293 -0.6155 -0.0767]; net.b{4}=[0.0487]; % set the training parameters net.trainParam.show=100; 0.8994 0.8689 net.trainParam.epochs=100000; % net.trainParam.time=115; net.trainParam.goal=0.003; % train the network net=train(net,p,t); % simulate the trained network y=sim(net,p) APPENDIX B(b) Program for developing ANN model for Maximum Temperature % Training for Maximum temperature % Creating the feed forward network net = newff([0.0000 0.4432;0.0172 0.0340;0.0380 0.0980;0.0160 0.0920;0.0000 0.0185;0.0007 0.0116], [8,10,12,1],{'tansig','tansig','tansig','tansig'},'trainscg'); % Enter the pattern and target p = [0.0056 0.0265 0.0740 0.0440 0.0122 0.0066 0.0738 0.0233 0.0800 0.0520 0.0065 0.0069 …….…………………………………………..…. 0.0272 0.0240 0.0930 0.0650 0.0041 0.0088 0.0112 0.0229 0.0940 0.0640 0.0034 0.0078]; p = p'; t = [0.0381 0.0347 ..……. 0.0328 0.032]; t = t'; % set initial weights&biases net.IW{1,1} = [0.1830 0.9827 -0.8075 0.9702 -0.2300 -0.1041 0.5743 0.6219 -0.7227 0.3095 -0.1905 0.4341 0.9856 -0.7181 0.2101 0.5207 -0.6333 0.0682 -0.9022 -0.7562 0.0774 -0.2469 -0.1813 -0.7190 -0.6815 -0.3828 -0.3076 0.6001 -0.4974 -0.1432 0.4774 -0.1611 -0.9719 0.5146 0.7935 0.6651 -0.6774 0.9253 -0.9237 0.3601 0.5258 0.8976 0.4031 0.8722 0.6227 0.5242 0.7912 0.1363]; net.LW{2,1} = [0.7155 -0.4739 0.8768 -0.2460 -0.1839 -0.0928 0.8877 0.8881 -0.7649 -0.3336 -0.7183 -0.0781 -0.9393 0.4054 -0.8248 0.7563 -0.0680 0.3087 0.3761 0.3995 0.4740 -0.4405 -0.8136 -0.7103 0.3111 0.8226 -0.5903 -0.3393 -0.5643 0.8982 0.7213 0.7331 0.2030 -0.8793 -0.6909 0.0901 -0.1530 0.3457 0.8256 0.0985 0.5692 0.9367 0.4832 -0.5163 -0.0804 -0.3912 -0.7015 0.8838 0.1234 -0.7528 -0.5495 -0.5984 -0.5395 -0.9752 -0.0063 0.9674 -0.3328 -0.3471 0.7951 0.4012 0.8578 -0.0749 -0.0243 0.8066 -0.1534 -0.7072 -0.8250 0.7536 -0.6620 -0.4152 0.7060 0.6676 0.0302 -0.0263 -0.6002 -0.6233 0.6482 -0.9846 -0.2759 0.4045]; net.LW{3,2} = [-0.7892 0.7948 -0.5644 0.1354 -0.3702 0.8356 -0.5480 0.1078 0.4876 -0.4043 0.5195 -0.4049 0.3418 0.7638 0.5408 0.1434 0.3398 -0.6380 0.5458 0.7920 0.4372 -0.9754 -0.0865 0.4747 -0.0022 0.0400 -0.7902 0.3215 0.9271 -0.5078 0.8674 0.9659 0.0349 0.5260 -0.0902 0.7690 -0.3179 0.4917 -0.2053 0.4285 -0.6505 0.9109 0.1923 0.2575 -0.7420 -0.2751 0.8238 -0.5874 0.4226 -0.0624 0.0962 0.0151 0.3048 0.2957 -0.6096 0.2236 0.0455 0.8854 -0.4382 0.6825 0.6181 0.0326 -0.7992 -0.2536 0.4944 -0.3513 0.3241 0.8275 -0.3580 0.8405 -0.2734 0.4844 0.2711 -0.7802 -0.6452 -0.1765 -0.3122 -0.7835 -0.0471 -0.7535 0.8305 0.2894 -0.1686 0.3789 0.2623 -0.4946 -0.7184 -0.3682 -0.7482 0.6708 0.3940 -0.7313 0.0449 0.0082 0.3848 -0.4528 -0.6386 0.4990 -0.8370 -0.6484 0.4221 -0.4647 -0.9156 -0.8721 -0.3680 -0.4029 -0.6183 0.3187 -0.5224 -0.3911 0.8366 -0.5711 0.9745 0.3079 -0.1101 0.1189 0.9379 -0.2884 -0.1105 0.2908]; net.LW{4,3} = [-0.0692 0.1473 0.1255 0.0563 0.2041 -0.0443 0.1328 -0.1395 0.1926]; net.b{1} = [-0.2063 -0.9503 0.0815 0.0754 0.5538 0.1737 -0.3515 -0.0912]; net.b{2} = [-0.9913 0.5159 0.1396 -0.5876 -0.1088 0.2714 0.7654 -0.9691 -0.4134 0.9016]; net.b{3} = [0.8224 -0.4396 -0.1421 -0.8167 0.5647 -0.0764 0.1147 -0.6510 0.8410 0.1498 0.4667 0.8328]; net.b{4} = [-0.0534]; % set the training parameters net.trainParam.show=100; 0.0952 0.1070 0.1474 net.trainParam.epochs=100000; % net.trainParam.time=115; net.trainParam.goal=0.00000117083; % train the network net = train(net,p,t); % simulate the trained network y = sim(net,p) APPENDIX B(c) Program for developing ANN model for Minimum Temperature % Training for Minimum temperature % Creating the feed forward network net=newff([0.0236 0.0432;0.0000 0.4432;0.038 0.098; 0.016 0.092;0.0000 0.0185; 0.0007 0.0116],[8,10,12,1],{'tansig','tansig','tansig','tansig'},'trainscg'); % Enter the pattern and target p = [ 0.0381 0.0056 0.074 0.044 0.0122 0.0066 0.0347 0.0738 0.08 0.052 0.0065 0.0069 ……………………………… 0.0328 0.0272 0.093 0.065 0.0041 0.0088 0.0320 0.0112 0.094 0.064 0.0034 0.0078]; p = p'; t = [0.0265 0.0233 ……… 0.024 0.0229]; t= t'; % set initial weights & biases net.IW{1,1} = [0.5536 0.0430 0.2723 -0.3851 0.0014 0.0443 0.9695 0.4651 0.5682 0.2038 0.1346 0.5991 0.5100 -0.7832 0.7564 0.2457 -0.4487 -0.1632 0.9535 -0.7101 -0.6487 0.1373 -0.7912 -0.1160 -0.5104 -0.5715 -0.0948 0.2581 0.5745 -0.6275 -0.2897 -0.4698 -0.2563 0.1590 0.3275 -0.2306 0.6673 0.1785 -0.8307 0.7457 0.8579 0.9837 -0.0878 -0.5797 0.6688 -0.0553 0.0985 0.9830]; net.LW{2,1} = [-0.6010 0.0488 0.4981 0.1298 -0.4060 0.4664 0.1657 -0.6048 -0.1352 -0.8005 -0.9650 0.5459 -0.3057 -0.8117 -0.4227 0.2894 0.8353 0.0013 0.7041 0.7583 -0.4720 -0.9661 -0.3515 0.7300 -0.6360 -0.2446 -0.9679 -0.3216 0.7613 -0.4419 -0.9072 0.0408 0.9010 0.4517 0.0144 -0.1730 0.2443 -0.7843 0.7669 0.1533 0.3172 0.3161 -0.6449 0.0063 -0.4995 0.0023 0.8137 0.6739 0.9079 0.9721 0.3042 -0.7417 0.6953 0.5513 -0.6486 -0.4126 0.0550 -0.6494 -0.6131 -0.7381 0.3837 -0.7441 0.0888 0.9754 0.8614 -0.6895 0.8226 0.0267 -0.4730 -0.8976 -0.4635 -0.6496 -0.3869 0.2148 -0.2552 0.7695 0.5063 0.6647 0.3153 0.4653]; net.LW{3,2}= [0.3018 -0.9096 -0.1047 0.6253 0.4674 -0.5164 0.1890 0.8921 -0.1206 0.8019 0.6692 -0.2640 -0.2445 -0.9090 0.3149 0.7342 0.4179 0.5778 -0.8540 -0.9360 -0.3564 -0.2943 0.0032 -0.1677 0.5503 0.4887 0.8478 0.2316 0.2608 0.0614 -0.9227 -0.6927 -0.5703 -0.8114 -0.0176 -0.8090 -0.7305 0.3228 0.3272 -0.6274 0.0439 0.8080 -0.0190 -0.4571 0.3789 0.0237 -0.9561 -0.3868 0.4882 -0.8610 0.3212 -0.7708 0.2412 0.7825 -0.0516 -0.1919 0.0745 -0.4586 0.1078 -0.7197 -0.4023 0.0042 -0.9565 -0.9942 -0.3228 0.7342 0.5418 -0.5751 0.4136 0.1952 0.8292 0.2899 -0.8530 -0.6065 -0.0009 -0.7381 0.3235 -0.6587 0.5778 -0.2424 0.5352 0.0719 0.0653 0.1451 0.0245 0.8829 0.5687 -0.5043 0.3079 -0.5644 0.5371 -0.4090 -0.6623 -0.1085 0.0607 0.6793 -0.9470 -0.9720 -0.6044 0.5173 0.4455 -0.7501 net.LW{4,3} = [0.5012 -0.1792 -0.1174 -0.0282 0.3103 0.6469 0.0914 0.5573 -0.1817]; net.b{1} = [ -0.8705 -0.7203 -0.6073 0.1588 0.2553 0.9526 -0.4646 -0.0111]; net.b{2} = [ 0.9235 0.5283 -0.0836 0.6827 -0.3115 0.2728 0.4533 -0.1677 0.4202 -0.8918]; net.b{3} = [-0.8857 -0.5030 0.1551 0.8718 0.5743 -0.0888 -0.0141 -0.5285 0.8396 0.3056 -0.5666 -0.8621]; net.b{4} = [0.3720]; % set the training parameters net.trainParam.show=100; -0.9675 0.2503 0.4483 0.4314 0.0003 0.5707 0.5835 0.4133 0.5935 0.2180 -0.1015 0.1169 -0.8114 -0.6446 -0.0937 0.4337 -0.5388 -0.3810]; 0.0855 -0.0692 -0.3121 - net.trainParam.epochs=100000; % net.trainParam.time=115; net.trainParam.goal=0.000001717; % train the network net=train(net,p,t); % simulate the trained network y=sim(net,p) APPENDIX B(d) Program for developing ANN model for Relative Humidity at 7 AM % Training for Relative Humidity at 7 AM % Creating the feed forward network net = newff([0.0236 0.0432; 0.0172 0.0335;0.0000 0.4432;0.016 0.092; 0.0000 0.0185; 0.0007 0.0116], [8,10,12,1],{'tansig','tansig','tansig','logsig'},'trainscg'); % Enter the pattern and target p = [ 0.0381 0.0265 0.0056 0.044 0.0122 0.0066 0.0347 0.0233 0.0738 0.052 0.0065 0.0069 ………………………………………………….. 0.0328 0.0240 0.0272 0.065 0.0041 0.0088 0.0320 0.0229 0.0112 0.064 0.0034 0.0078]; p = p'; t = [0.074 0.080 …..… 0.093 0.094]; t = t'; % set initial weights & biases net.IW{1,1} = [0.3993 0.0559 -0.5671 -0.5619 0.1204 0.4636 0.6472 0.8795 -0.3328 0.0796 0.8470 -0.5849 0.1825 0.0479 0.6551 -0.0231 -0.9869 -0.0046 -0.0719 0.1821 -0.4123 -0.0032 0.1074 -0.3365 0.0785 -0.3795 0.4819 -0.1454 0.8002 0.3035 0.6397 0.5071 0.4646 0.1765 -0.2689 0.4831 0.0287 0.0059 0.0119 0.0260 -0.5120 0.8247 -0.2506 0.3626 0.9474 -0.0514 -0.4404 -0.4746]; net.LW{2,1} = [0.9260 0.0727 0.2154 0.8846 0.1803 0.1020 -0.6001 0.9212 -0.0342 -0.8027 0.5245 -0.2896 -0.0707 -0.0686 0.1523 0.4920 -0.3674 -0.6659 -0.5690 0.9120 -0.3458 -0.7657 0.7953 -0.6038 0.7335 -0.2760 0.2560 0.1025 -0.7507 -0.5497 0.4720 0.0999 0.1976 0.5876 0.1307 0.1320 0.0643 0.1058 0.7823 0.6258 -0.7387 -0.8299 -0.7319 0.7221 -0.9057 0.3150 0.5930 -0.7231 0.1501 -0.3159 -0.0717 0.5868 -0.2530 -0.6144 0.6100 -0.3937 0.7887 -0.7121 -0.5279 0.2268 -0.8491 -0.2602 -0.3849 0.7716 0.1960 -0.2286 -0.3955 0.8952 0.6934 0.4108 -0.0136 -0.2832 -0.0316 -0.1811 0.8871 -0.8321 -0.7273 -0.3724 -0.7060 -0.6157]; net.LW{3,2} = [0.9303 -0.0139 0.0380 0.4166 0.7025 0.5978 -0.4936 -0.2215 -0.6515 0.8029 -0.1378 -0.6000 0.2000 -0.0268 0.6782 -0.2308 0.4191 0.0127 0.0514 0.5411 -0.1286 -0.7915 0.7275 -0.4092 -0.1055 0.6563 -0.1320 0.0308 -0.5607 0.4708 -0.0443 -0.2298 -0.4990 0.8830 0.8076 -0.3750 -0.1501 0.7050 0.8523 -0.6972 0.7379 0.3482 -0.5548 -0.5223 -0.1830 -0.6471 -0.4609 -0.8649 -0.5113 0.5506 -0.5910 0.3072 -0.2210 0.4513 -0.3957 0.8798 -0.2536 0.3912 -0.7545 0.2224 0.5711 0.3076 0.7320 0.5271 0.8774 -0.6874 -0.9355 0.7616 -0.4140 0.7713 -0.1706 -0.3568 0.1034 -0.3087 0.2274 -0.4005 0.0936 -0.0178 -0.2205 0.4447 -0.4210 -0.3363 0.2312 -0.2641 -0.8573 -0.7018 -0.6343 0.6316 -0.2708 0.6061 -0.5789 0.2237 0.7362 -0.1129 0.7386 -0.0261 0.3808 0.3231 0.7188 -0.6021 0.3247 -0.1514 net.LW{4,3}= [0.9768 -0.8569 -0.9742 0.4239 0.5066 0.2937 0.0338 -0.5995 -0.4561]; net.b{1} = [-0.0096 -0.1848 -0.9375 0.2745 -0.1984 -0.9723 0.2074 0.6717]; net.b{2} = [-0.1418 0.2738 0.3458 -0.6403 -0.2777 0.0127 0.8046 0.0245 0.5307 0.0546]; net.b{3} = [-0.8557 0.5106 0.2926 0.8038 -0.5511 -0.1513 0.1840 -0.2164 0.7711 0.1041 0.7744 0.8390]; net.b{4} = [ -0.7024]; % set the training parameters 0.5864 -0.0388 0.3095 0.7165 0.3825 0.5295 -0.4221 0.8255 -0.0006 0.7476 -0.5177 0.3511 0.6002 -0.3358 0.5794 0.7911 0.6939 -0.2067 0.5844 -0.1898]; 0.1514 net.trainParam.show=100; net.trainParam.epochs=100000; % net.trainParam.time=115; net.trainParam.goal=0.0000166; % train the network net =train(net,p,t); % simulate the trained network y = sim(net,p) APPENDIX B(e) Program for developing ANN model for Relative Humidity at 2 PM % Training for Relative Humidity at 2 PM % Creating the feed forward network net = newff([0.0236 0.0432; 0.0172 0.0335;0.038 0.098;0.0000 0.4432;0.0000 0.0185; 0.0007 0.0116],[8,10,12,1],{'tansig','tansig','tansig','tansig'},'trainscg'); % Enter the pattern and target p = [0.0381 0.0265 0.0740 0.0056 0.0122 0.0066 0.0347 0.0233 0.0800 0.0738 0.0065 0.0069 ………………………………………………….. 0.0328 0.0240 0.0930 0.0272 0.0041 0.0088 0.0320 0.0229 0.0940 0.0112 0.0034 0.0078]; p = p'; t = [0.044 0.052 …….. 0.065 0.064]; t = t'; % set initial weights&biases net.IW{1,1} = [0.7359 0.1843 0.9574 0.0541 -0.5302 0.1915 -0.1483 -0.3240 0.5386 -0.6799 -0.2885 0.4064 -0.0541 0.8281 -0.4575 -0.1265 -0.6093 0.3627 0.2008 0.0897 0.7760 -0.4324 -0.3566 0.3755 0.7916 -0.1845 0.6725 -0.5910 0.9761 -0.5051 0.1901 0.5141 -0.5460 0.3930 -0.2015 -0.4294 -0.3855 0.9105 -0.0842 0.0241 0.6178 -0.1599 0.7207 0.5913 0.8029 -0.2480 0.5532 -0.3286]; net.LW{2,1} = [ -0.1515 0.0597 0.3223 0.0325 0.4205 0.0592 0.0392 0.9734 -0.5860 0.8262 0.1230 -0.0356 0.2368 0.3968 -0.8404 -0.6009 0.4300 -0.7895 -0.4803 0.2303 -0.6325 -0.0268 0.9145 0.2492 0.2832 0.7708 0.6287 0.4624 -0.9252 -0.2994 -0.8239 0.5658 -0.0173 -0.0276 -0.5713 -0.7164 0.0090 0.1134 0.2793 0.6552 0.3992 -0.2867 0.8809 -0.5434 0.7010 -0.6220 -0.1316 0.8412 -0.1605 -0.0369 0.3897 -0.4651 0.7298 -0.2298 0.5777 -0.4485 -0.0646 -0.6122 0.1675 0.5054 0.1979 -0.4009 -0.0385 -0.0661 -0.2886 0.5084 0.1057 0.4251 -0.4550 0.7736 0.0845 -0.9929 0.6689 0.2743 -0.4505 0.5538 -0.4464 0.3179 0.4715 -0.5543]; net.LW{3,2} = [0.6800 -0.3204 0.5834 -0.5676 -0.1674 0.6121 0.6261 -0.4545 0.6760 -0.7316 -0.1814 0.5979 0.6621 0.6532 0.6253 -0.5038 0.7562 -0.2141 0.1145 -0.5323 -0.0929 0.7605 0.7157 -0.8474 -0.5593 -0.6711 -0.0027 -0.3229 -0.1120 0.5892 0.1920 -0.6137 -0.3324 0.4786 -0.7747 0.7508 0.3353 -0.8811 -0.5460 -0.7564 0.1314 0.5747 -0.0391 0.6691 -0.7323 -0.4133 0.8818 -0.3188 0.3723 -0.8967 0.1848 -0.2698 0.4607 -0.8323 -0.0265 -0.7108 -0.8618 -0.5132 -0.2076 0.2117 -0.7576 0.8972 -0.8588 0.0226 0.5123 0.4206 0.6499 0.4395 0.5257 0.2483 -0.5329 -0.5751 -0.8528 -0.5422 0.5042 -0.0519 -0.4689 -0.8644 -0.6629 -0.4070 0.0594 0.7665 -0.3846 -0.0663 0.0937 0.4045 -0.2184 0.3992 -0.1855 0.6990 0.6108 -0.8160 -0.4312 -0.0655 -0.8265 -0.1432 -0.0616 0.1586 -0.6882 0.4354 -0.4937 0.2059 net.LW{4,3} = [-0.0766 -0.4366 -0.3554 -0.8038 -0.3609 0.1600 -0.5794 -0.1973 0.0354]; net.b{1} = [-0.9832 0.0068 0.9386 -0.4354 -0.7773 0.4431 0.8719 -0.0728]; net.b{2}= [0.8723 0.5704 -0.0178 -0.3930 -0.0104 0.3372 -0.6547 -0.7064 -0.4553 0.5818]; net.b{3} = [-0.7931 0.4295 0.3078 -0.6710 -0.4580 -0.1164 0.0229 0.5187 -0.0030 0.2545 0.4883 0.7336]; net.b{4} = [-0.1122]; % set the training parameters net.trainParam.show=100; 0.5420 -0.2606 -0.9036 0.8971 -0.4698 0.8181 -0.6451 -0.1679 0.6038 -0.1116 0.6119 0.0779 0.5170 0.1730 -0.1262 0.8056 0.6722 0.3270]; 0.6728 -0.8659 -0.8034 net.trainParam.epochs=100000; % net.trainParam.time=115; net.trainParam.goal=0.00003155; % train the network net = train(net,p,t); % simulate the trained network y = sim(net,p) APPENDIX B(f) Program for developing ANN model for Pan Evaporation % Training for Pan Evaporation % Creating the feed forward network net=newff([0.0236 0.0432;0.0172 0.0335;0.0380 0.0980; 0.016 0.092; 0.0000 0.4432; 0.0007 0.0116],[8,10,12,1],{'tansig','tansig','tansig','logsig'},'trainscg'); % Enter the pattern and target p = [0.0381 0.0265 0.074 0.044 0.0056 0.0066 0.0347 0.0233 0.08 0.052 0.0738 0.0069 ……………….……………………………….. 0.0328 0.024 0.093 0.065 0.0272 0.0088 0.032 0.0229 0.094 0.064 0.0112 0.0078]; p = p'; t = [0.0122 0.0065 ……. 0.0041 0.0034]; t = t'; % set initial weights&biases net.IW{1,1} = [ -0.4092 -0.9839 -0.5397 0.2725 0.5089 -0.3987 -0.8912 -0.6087 0.7342 -0.2753 -0.9905 -0.8467 -0.4415 0.8863 0.6940 -0.7260 -0.2904 0.6507 0.7042 -0.4639 -0.2243 -0.8664 -0.7879 -0.1902 -0.4116 0.9310 -0.9683 -0.8323 0.0182 0.4231 0.0729 0.9679 -0.3609 -0.6115 0.0171 0.4061 -0.2921 -0.6636 0.8394 0.8251 -0.8558 0.0283 0.8973 -0.5499 0.6815 -0.1661 -0.9342 0.4265]; net.LW{2,1} = [-0.0233 -0.7616 0.3765 0.8237 -0.2273 -0.6645 -0.6830 0.0294 -0.2323 0.6478 -0.5732 0.0491 -0.8631 -0.6252 -0.1364 -0.0355 0.6712 0.7559 0.7828 -0.9838 -0.1100 -0.1012 0.7689 -0.2783 -0.2397 0.1351 0.9838 -0.7191 0.2285 -0.1063 -0.5203 0.0590 0.8100 -0.6106 -0.5614 -0.5072 0.4814 -0.7074 -0.7250 0.3408 0.5096 -0.5903 0.6014 -0.6320 -0.6054 0.1244 -0.8258 0.0195 -0.8097 -0.6649 0.6742 -0.9578 0.5613 0.7347 0.1129 -0.0748 -0.0109 -0.9295 -0.1454 -0.7372 -0.5355 -0.3679 0.8520 -0.9246 0.6237 -0.6560 0.0031 -0.5621 0.1688 -0.8662 -0.3380 -0.6366 -0.9506 0.0001 -0.6194 0.0373 0.0526 -0.7463 0.2886 -0.2908]; net.LW{3,2} = [-0.0625 0.5352 0.3186 0.7122 -0.0415 0.5259 -0.7025 -0.0979 0.2204 -0.5865 0.7363 -0.2367 0.0692 0.8259 -0.2239 -0.3696 -0.6591 -0.2831 0.7458 -0.4855 0.1753 -0.6956 0.1714 -0.1048 0.4919 0.1439 -0.7313 0.8375 -0.6558 0.8587 0.1097 -0.2748 0.0113 0.0071 0.1185 -0.7124 0.4206 -0.5631 -0.9303 0.1635 0.3956 -0.8018 -0.5243 0.3055 -0.1133 0.6456 -0.3963 -0.9289 0.4450 0.4664 -0.0344 0.8625 0.4825 0.7005 -0.6820 0.4453 -0.5271 -0.4359 -0.3107 -0.2429 0.0025 -0.7645 0.2320 -0.4138 0.6134 -0.7840 0.0236 0.8105 0.5238 0.3795 -0.7037 0.2739 -0.0718 0.6901 0.8057 -0.8171 -0.1247 0.2110 -0.1482 0.1574 0.0832 -0.6319 -0.0688 -0.0815 0.7360 -0.2731 0.0732 0.6095 0.2183 -0.1434 -0.1513 0.0365 -0.2848 -0.1230 0.0706 0.8660 0.4661 0.1130 -0.6999 0.2671 -0.4168 0.2675 -0.1002 -0.0853 -0.2539 -0.3619 0.4522 -0.1427 0.1569 -0.8366 -0.1903 0.2990 -0.7653 -0.7780 -0.3581 0.9953 0.9215 -0.3592 0.0038 0.1498]; net.LW{4,3} = [0.4256 -0.0068 0.0028 0.0153 0.0247 -0.0619 -0.1291 -0.6879 0.0065 0.2692 0.0086 0.0468]; net.b{1}= [-0.9901 0.1540 -0.0596 0.9989 -0.8190 -0.4594 0.0302 0.3135]; net.b{2} = [-0.7062 0.5665 -0.2220 0.6866 -0.3450 0.4251 -0.7662 -0.0620 0.6925 -0.9648]; net.b{3} = [ 0.8081 -0.5466 -0.1421 -0.8709 -0.4226 0.2599 0.0083 -0.7717 -0.7710 -0.3026 0.4866 -0.8168]; net.b{4} = [ -0.8644]; % set the training parameters net.trainParam.show=100; net.trainParam.epochs=100000; % net.trainParam.time=115; net.trainParam.goal=0.00000285; % train the network net=train(net,p,t); % simulate the trained network y=sim(net,p) APPENDIX B(g) Program for developing ANN model for Bright Sunshine % Training for Bright Sunshine % Creating the feed forward network net=newff([0.0236 0.0432; 0.0172 0.0335; 0.038 0.098;0.016 0.0920;0.0000 0.0185; 0.0000 0.4432],[8,10,12,1],{'tansig','tansig','tansig','purelin'},'trainscg'); % Enter the pattern and target p = [0.0381 0.0265 0.074 0.044 0.0122 0.0056 0.0347 0.0233 0.08 0.052 0.0065 0.0738 ………………………………….. 0.0328 0.024 0.093 0.065 0.0041 0.0272 0.032 0.0229 0.094 0.064 0.0034 0.0112]; p = p'; t = [0.0066 0.0069 …….. 0.0088 0.0078]; t = t'; % set initial weights&biases net.IW{1,1} = [-0.8535 0.5669 -0.1547 -0.5520 -0.7675 0.4131 0.1915 -0.6454 -0.0462 0.3010 0.5427 0.9058 -0.2946 -0.9411 -0.2110 -0.2481 0.9858 -0.8979 -0.3001 -0.2426 0.4265 0.8768 0.2576 0.6613 -0.5004 0.0477 0.9804 0.8947 -0.7140 0.5993 0.0643 -0.8281 0.9127 -0.5197 0.1732 -0.8671 -0.3453 -0.5308 0.6877 0.0677 -0.6298 -0.6339 0.5272 0.6893 -0.9391 -0.9730 0.9129 0.3312]; net.LW{2,1} = [-0.0165 -0.8781 0.1787 0.4896 0.6519 0.3350 0.0164 0.5407 0.8321 0.9109 -0.5751 0.0363 0.3604 -0.4380 -0.0497 0.1764 0.6101 -0.1766 -0.1908 0.8318 0.6970 -0.1683 0.0181 0.5219 0.5407 -0.1219 -0.5670 0.4181 0.3672 -0.4965 -0.9079 -0.9077 0.7888 -0.9364 0.6827 -0.2635 0.2550 -0.7381 0.7438 -0.3497 -0.8629 -0.9476 -0.8186 0.4906 -0.4659 0.3562 0.4617 -0.5893 -0.3644 -0.6823 0.2343 -0.2374 -0.7231 -0.8462 -0.0139 0.2752 0.0644 0.3444 -0.3105 -0.6914 0.4428 0.5727 0.6512 -0.7346 0.1995 0.2863 -0.8312 0.7299 0.7664 0.0098 -0.0303 0.1676 0.3587 -0.1578 0.7947 -0.7489 0.6266 0.9098 -0.5373 -0.7023]; net.LW{3,2} = [-0.9401 -0.5537 -0.1126 -0.6935 -0.2994 -0.4936 0.0817 0.2900 0.8568 -0.3653 -0.7923 -0.0465 -0.5556 -0.2343 0.8661 0.2658 -0.9185 -0.0165 -0.3367 0.6481 -0.4383 0.5638 -0.2955 0.4648 -0.3946 -0.7700 0.5356 0.3340 -0.8727 -0.5542 0.0759 -0.3463 -0.1862 -0.9984 -0.2188 -0.5141 0.7060 0.2972 -0.9402 -0.2873 0.4266 0.4748 0.0738 0.1115 -0.1253 0.3665 -0.1424 -0.1122 -0.0345 0.3118 -0.0085 0.8326 0.5058 -0.6702 0.0791 0.6774 0.6242 -0.0511 0.9090 0.1599 0.1457 -0.3860 0.1273 -0.6735 0.0185 0.7677 0.2216 0.7718 0.6336 0.9643 0.5422 -0.0460 0.7534 0.3091 0.4671 -0.0825 -0.7114 -0.0155 -0.1920 -0.1915 0.4048 0.3088 0.1604 0.2299 -0.7419 0.7806 -0.7848 0.1871 0.4994 0.2946 -0.8022 -0.1262 0.4338 -0.6027 -0.6282 0.3462 -0.3962 0.3971 -0.6612 -0.2806 0.0607 0.6229 net.LW{4,3} = [-0.2916 0.4238 -0.0932 0.1281 0.0077 0.0180 0.1889 -0.0356 -0.7170]; net.b{1} = [0.2532 -0.6654 0.5751 0.6534 -0.5557 0.6703 -0.3919 -0.2879]; net.b{2} = [ -0.7839 -0.5423 -0.1824 -0.9420 -0.0624 -0.0237 -0.7544 0.0924 0.5790 0.9473]; net.b{3} = [0.9031 0.6027 0.0509 -0.9412 -0.4695 -0.1232 0.2131 -0.6329 0.8309 -0.1721 0.4098 0.0188]; net.b{4} = [ -0.7555]; % set the training parameters net.trainParam.show=100; 0.7158 0.1017 -0.6392 -0.4200 -0.1771 0.1633 0.6820 0.8914 0.3986 0.7341 -0.4737 0.6460 -0.3920 0.2116 -0.5327 -0.0280 -0.7218 0.4172]; 0.0207 -0.0054 -0.0314 - net.trainParam.epochs=100000; % net.trainParam.time=115; net.trainParam.goal=0.00000219; % train the network net=train(net,p,t); % simulate the trained network y=sim(net,p) APPENDIX C(a) Program for developing Hybrid MLR_ANN model for Rainfall % Training for Hybrid MLR_ANN for Rainfall % Creating the feed forward network net = newff([0.0236 0.0432;0.0380 0.0980;0.0160 0.0920;0.0000 0.0185; 0.0007 0.0116],[8,10,12,1], {'tansig','tansig','tansig','logsig'},'trainscg'); % Enter the pattern and target P = [0.0381 0.0740 0.0440 0.0122 0.0066 0.0347 0.0800 0.0520 0.0065 0.0069 ………………………………………….. 0.0328 0.0930 0.0650 0.0041 0.0088 0.0320 0.0940 0.0640 0.0034 0.0078]; P = P'; T = [0.0056 0.0738 ……… 0.0272 0.0112]; T = T'; % set initial weights & biases net.IW{1,1} = [-0.0165 -0.5602 -0.9254 0.5428 -0.0347 0.9585 0.4071 0.9759 -0.2952 -0.4694 -0.5800 0.0012 0.6181 0.6248 -0.2739 -0.4363 0.9256 -0.5187 -0.6159 -0.3227 -0.6768 -0.9767 0.6415 0.5018 -0.7421 0.8588 -0.5151 -0.5123 0.3539 -0.2847 0.6738 -0.8447 -0.4710 -0.8824 0.6447 0.2823 0.0128 -0.0896 0.0977 0.0253]; net.LW{2,1} = [ 0.4033 0.8710 0.0954 0.3346 0.0840 -0.4327 -0.8064 -0.4690 -0.2987 -0.0491 0.8702 0.1368 0.4376 0.8301 0.3531 -0.7863 0.8230 -0.9564 -0.8084 0.3470 -0.3655 -0.0281 -0.0353 0.1396 0.7490 -0.3869 -0.0854 0.8928 0.1007 0.2776 -0.1711 0.5790 0.5934 -0.5088 0.0043 0.6617 0.0539 0.8461 0.2708 -0.2750 0.7333 -0.7863 0.5775 0.8317 0.7859 -0.2347 -0.3770 -0.7176 -0.5908 0.8086 -0.0282 0.5866 -0.2306 0.9521 0.5944 0.0665 -0.4093 0.5681 0.5001 -0.0451 0.1344 0.9377 0.0803 -0.6780 -0.7031 -0.3047 -0.9889 0.5810 -0.1917 -0.4595 0.3169 -0.2686 0.3083 0.0619 0.2205 0.2461 -0.5910 -0.0581 0.4316 -0.4020]; net.LW{3,2} = [-0.2992 0.0736 0.1809 -0.1987 0.0344 0.3314 0.7406 -0.3810 0.0014 -0.3096 0.0585 -0.8555 0.8627 -0.8211 -0.0035 0.7772 -0.5794 -0.5452 -0.9230 0.2821 -0.9921 0.5174 0.3439 -0.4363 0.1567 -0.5887 0.2269 0.1921 -0.7470 0.2161 -0.2624 -0.1125 -0.3643 -0.5726 -0.6557 -0.4331 0.3500 -0.8120 -0.3295 -0.8066 0.8511 -0.2858 0.8399 -0.4140 -0.8231 -0.9964 0.9448 -0.1758 -0.5992 0.5170 0.6006 0.1218 0.2986 0.0855 -0.1872 0.8996 0.2479 0.5829 -0.5938 -0.0209 0.1483 -0.4478 0.3409 -0.0821 0.8429 -0.2655 -0.1589 0.2656 0.3838 0.6305 0.9359 -0.3821 0.5353 -0.0969 -0.9974 0.5099 0.1330 -0.4520 -0.4556 -0.3269 -0.7623 0.2730 0.1193 -0.0292 0.0892 0.0018 -0.5218 -0.7267 -0.1069 -0.7265 0.5796 -0.3841 -0.9472 -0.1770 0.7049 0.3017 0.2925 0.1331 0.8959 -0.9942 -0.2095 0.0033 -0.4574 0.7348 -0.2188 0.4866 0.5246 0.6487 net.LW{4,3} = [ 0.9979 0.5627 -0.5817 0.9911 -0.4803 -0.7350 0.4904 0.1651 0.5294 0.0905]; net.b{1} = [0.5913 -0.7489 -0.1915 0.0567 0.0964 0.5604 0.1472 -0.0661]; net.b{2} = [-0.0586 0.3628 -0.7196 0.0973 0.6545 -0.3077 0.1530 -0.3279 -0.1812 0.9241]; net.b{3} = [0.0867 0.1228 0.8659 0.0213 0.2901 0.7938 -0.1097 0.9280 -0.0195 0.2999 0.9186 -0.5942]; net.b{4} = [-0.6442]; % set the training parameters net.trainParam.show = 100; -0.0395 0.9646 -0.1944 -0.2774 0.5098 0.9488 0.0063 -0.0366 0.4547 0.2978 0.6319 0.3724 0.0321]; 0.5415 net.trainParam.epochs = 100000; % net.trainParam.time = 115; net.trainParam.goal = 0.003; % train the network net = train(net,p,t); % simulate the trained network y = sim(net,p) APPENDIX C(b) Program for developing Hybrid ANN model for Minimum Temperature % Training for Hybrid MLR_ANN for Minimum temperature % Creating the feed forward network net = newff([0.0236 0.0432;0.038 0.098; 0.016 0.092; 0.0007 0.0116],[8,10,12,1], {'tansig','tansig','tansig','tansig'},'trainscg'); % Enter the pattern and target P = [0.0381 0.074 0.044 0.0066 0.0347 0.080 0.052 0.0069 ……………………………….. 0.0328 0.093 0.065 0.0088 0.0320 0.094 0.064 0.0078]; P = P'; T = [0.0265 0.0233 ..……. 0.0240 0.0229]; T = T'; % set initial weights&biases net.IW{1,1} = [0.1089 -0.6727 -0.7924 -0.5323 0.2135 0.5008 0.7948 0.4625 -0.1546 0.2521 0.3830 -0.5170 0.2101 0.2383 0.2221 0.0331 -0.3497 0.8230 -0.5889 -0.5643 -0.8754 0.4537 0.1024 -0.7260 0.0623 -0.2027 0.3841 0.5354 -0.4644 -0.5981 -0.9038 0.2432]; net.LW{2,1} = [-0.4277 0.0179 0.9611 -0.2847 0.1105 -0.1778 -0.3126 -0.8705 0.4252 -0.0500 -0.8309 0.2823 0.0191 -0.7340 0.2093 0.9023 -0.3982 0.3143 0.6544 0.6092 -0.6228 -0.3388 0.8481 0.8864 0.0745 -0.1689 0.8537 -0.1139 -0.7820 0.1127 -0.8242 0.9426 -0.4584 0.9795 -0.6219 -0.5359 0.6711 -0.8649 -0.2676 0.5663 -0.3195 -0.1308 -0.4303 -0.6655 -0.6043 -0.7324 0.2152 -0.2900 -0.1622 0.4071 0.0196 0.3368 -0.7206 0.9249 0.7834 -0.0955 -0.1597 -0.2529 -0.9409 0.0042 -0.2452 0.3254 -0.5294 0.7569 -0.5632 -0.8681 -0.0340 0.7844 0.2597 -0.6888 -0.5017 -0.5120 -0.3300 -0.7983 0.9042 -0.5364 -0.2758 -0.3137 0.0577 -0.0872]; net.LW{3,2} = [0.0216 0.6615 -0.5375 0.0296 -0.0124 0.3568 -0.6221 -0.4996 0.2910 0.7199 0.0868 0.7066 -0.1379 0.0388 0.0877 0.4924 -0.6638 -0.5075 -0.6423 0.0039 0.0775 -0.4218 -0.6704 0.8214 0.1289 0.3250 -0.2822 0.2744 -0.7502 -0.5108 0.6607 0.0993 0.8013 0.4702 -0.4041 0.2922 -0.4714 0.2923 -0.2316 -0.6579 -0.8421 -0.3035 0.4866 0.4739 -0.3395 -0.1389 -0.4692 -0.2069 -0.2946 0.3367 0.4515 -0.1562 -0.5675 0.3154 0.6925 -0.8764 -0.1842 -0.4420 0.8363 -0.2085 -0.5962 -0.7712 -0.0588 0.5991 -0.3036 0.6456 -0.4865 -0.6244 -0.6184 -0.9430 -0.8884 0.1542 -0.1099 -0.9410 0.8017 0.4101 0.3107 0.0703 -0.4632 0.9070 -0.4986 -0.4603 0.5624 0.5252 0.4839 0.7567 0.1131 -0.7504 0.6066 0.8308 -0.0587 -0.4974 -0.9043 0.6578 0.5451 0.2982 0.4288 0.2797 -0.1239 0.0244 0.6421 0.7434 net.LW{4,3} = [-0.2025 0.2587 0.5378 0.1469 0.2400 0.4419 0.0444 0.6404 0.4148]; net.b{1} = [ -0.6335 -0.9164 0.6704 -0.8871 0.2425 -0.0358 0.4393 0.6078]; net.b{2} = [ 0.8546 -0.5453 0.9412 -0.0433 0.2023 -0.1785 -0.6180 -0.5442 0.1806 -0.9093]; net.b{3} = [ 0.1424 0.8841 0.1328 0.6196 0.8312 -0.2198 0.8906 -0.2929 0.8971 0.8391 0.3129 0.2420]; net.b{4} = [0.6705]; % set the training parameters net.trainParam.show=100; 0.5822 -0.0453 -0.4225 -0.4394 0.7287 0.1186 0.5679 -0.9753 -0.6122 -0.2440 -0.6833 0.8350 -0.9318 0.3295 -0.0001 0.0673 0.1758 0.7662]; 0.4669 -0.3847 -0.0810 net.trainParam.epochs=100000; % net.trainParam.time=115; net.trainParam.goal=0.00000170076; % train the network net = train(net,p,t); % simulate the trained network y = sim(net,p) APPENDIX C(c) Program for developing Hybrid ANN model for Pan Evaporation % Training for Hybrid MLR_ANN for Pan Evaporation % Creating the feed forward network net = newff([0.0236 0.0432;0.0380 0.0980; 0.016 0.092; 0.0000 0.4432],[8,10,12,1], {'tansig','tansig','tansig','logsig'},'trainscg'); % Enter the pattern and target P = [0.0381 0.074 0.044 0.0056 0.0347 0.080 0.052 0.0738 ………………………………… 0.0328 0.093 0.065 0.0272 0.0320 0.094 0.064 0.0112]; P = P'; T = [0.0122 0.0065 .…….. 0.0041 0.0034]; T = T'; % set initial weights & biases net.IW{1,1} = [ -0.7272 0.9862 0.5016 -0.7559 0.3100 -0.3557 0.3844 0.3400 0.1637 0.4781 0.0540 0.2945 0.3590 0.2991 0.0643 -0.4392 -0.2047 0.3757 0.1413 -0.0442 0.8556 -0.1522 -0.6192 -0.2727 -0.2320 0.8968 -0.3116 -0.7358 0.5899 -0.6269 0.5045 0.3510]; net.LW{2,1} = [ -0.2135 0.4473 -0.7220 -0.1285 0.0845 -0.7815 -0.2210 -0.0254 0.3368 0.8847 0.4520 -0.7329 0.8688 -0.6782 0.7876 0.0363 0.3701 -0.1084 0.6084 0.1106 -0.3238 0.2478 -0.1009 0.7651 0.0292 0.2022 0.4127 -0.4327 0.0904 0.2583 0.0766 -0.2817 -0.8854 0.6838 0.4883 -0.0478 -0.6398 -0.9361 0.8232 -0.3119 -0.6100 0.8362 0.3270 0.8323 -0.0021 0.7533 0.9206 -0.4554 -0.2608 -0.8379 0.7985 -0.5701 -0.2513 -0.9704 -0.3494 0.8006 -0.6165 0.8619 -0.7470 0.1843 0.5553 0.1230 -0.0332 0.4142 0.6901 0.1444 -0.1737 0.3201 0.2916 -0.9144 -0.0566 -0.7507 -0.2038 -0.7971 -0.5943 0.7631 0.8431 0.4336 -0.7287 0.0105]; net.LW{3,2} = [0.1557 0.4094 0.4130 -0.0939 -0.7884 -0.6147 -0.9166 0.1717 -0.8233 -0.5406 -0.5574 -0.0167 0.6716 -0.1286 0.8273 0.0231 -0.4696 0.3720 0.1806 -0.0803 0.5380 -0.7225 0.1275 0.2300 -0.0864 0.3762 0.0812 -0.0660 -0.0953 0.0520 0.2769 -0.2214 0.7927 -0.5422 -0.1949 -0.8049 0.5569 0.1738 0.6784 -0.8245 -0.6488 -0.7821 0.4875 -0.7928 -0.2820 0.5986 -0.2970 0.8247 -0.3944 -0.0402 -0.7540 0.6614 -0.5384 0.5072 -0.4650 0.5057 0.6111 0.5349 -0.7210 -0.4863 -0.4538 0.8376 0.8278 0.0409 0.9084 0.4262 -0.8118 -0.5475 -0.6702 -0.6423 -0.3983 -0.0980 -0.6887 0.2576 0.3332 -0.5090 0.5357 -0.6282 0.7363 -0.7128 0.3516 -0.7168 -0.1879 0.6679 -0.7172 0.0069 0.7357 -0.4856 -0.4966 -0.9634 -0.0472 -0.4132 -0.1695 -0.2286 0.0003 -0.1881 -0.1011 0.1542 0.2805 0.3366 0.2556 0.4694 net.LW{4,3} = [0.3270 -0.6599 -0.6565 0.0350 0.1762 0.1506 -0.4282 0.8604 0.4187]; net.b{1} = [0.4896 -0.6720 -0.4822 -0.3595 0.0715 0.2632 0.8832 -0.1039]; net.b{2} = [0.7302 -0.4762 -0.9153 -0.6191 0.1788 -0.1960 -0.5998 -0.9473 0.4340 -0.7052]; net.b{3} = [-0.8028 0.5738 -0.0520 -0.8137 0.4453 0.1055 -0.1666 -0.4724 -0.9803 0.1444 -0.2031 -0.7824]; net.b{4} = [-0.3111]; % set the training parameters net.trainParam.show=100; -0.3536 0.1578 -0.5193 0.5135 0.6031 0.3752 -0.0742 0.0552 0.5750 0.1297 0.4909 0.5988 0.7012 0.2281 -0.7378 0.8156 0.6793 0.9887]; 0.4022 0.4682 -0.1866 net.trainParam.epochs=100000; % net.trainParam.time=115; net.trainParam.goal=0.00000285; % train the network net = train(net,p,t); % simulate the trained network y = sim(net,p) APPENDIX C(d) Program for developing Hybrid ANN model for Bright Sunshine % Training for Hybrid MLR_ANN for Bright Sunshine % Creating the feed forward network net = newff([0.0236 0.0432; 0.0172 0.0335; 0.038 0.098;0.016 0.0920;0.0000 0.4432],[8,10,12,1], {'tansig','tansig','tansig','purelin'},'trainscg'); % Enter the pattern and target P = [0.0381 0.0265 0.074 0.044 0.0056 0.0347 0.0233 0.080 0.052 0.0738 ………………………………………… 0.0328 0.0240 0.093 0.065 0.0272 0.0320 0.0229 0.094 0.064 0.0112]; P = P'; T = [0.0066 0.0069 ………. 0.0088 0.0078]; T = T'; % set initial weights & biases net.IW{1,1} = [ 0.2535 0.8032 0.4855 0.0653 -0.2864 -0.2039 0.2511 -0.6987 0.6960 -0.7380 0.3083 -0.6499 -0.3556 -0.1896 -0.8882 -0.9468 -0.8670 -0.8440 0.9455 -0.1885 -0.3951 0.6232 0.2924 0.3258 0.3214 0.5504 0.1191 -0.2252 -0.9820 0.9634 -0.5582 0.1118 0.9788 0.1732 0.3051 0.5210 -0.3703 -0.7575 -0.5050 0.6340]; net.LW{2,1} = [ -0.8838 0.4402 -0.8666 0.9213 -0.1844 -0.3731 0.4570 -0.6265 0.8433 -0.8236 0.1431 0.8929 -0.3520 0.4636 0.3173 -0.9237 -0.4854 -0.1954 -0.7909 -0.0581 0.1324 -0.7104 0.0966 0.8494 -0.0146 0.5240 -0.4160 0.4259 0.3998 0.2870 0.6868 -0.7486 -0.2532 0.8772 0.5227 0.1875 0.7124 0.5905 0.8099 0.4093 -0.9875 -0.4068 -0.9776 0.8645 0.0461 0.2457 -0.7013 0.3713 -0.2534 0.2298 0.8699 0.5598 -0.1372 0.8524 -0.1342 -0.4537 -0.4086 -0.5370 0.5958 0.7701 -0.8375 0.8347 -0.3304 0.1210 0.4934 0.0580 -0.8741 0.3820 -0.9255 0.4894 -0.0820 0.3495 0.1691 -0.8404 -0.9516 0.0078 -0.4245 0.1088 -0.8149 -0.9201]; net.LW{3,2} = [ 0.3296 0.7735 0.0756 0.8328 0.7670 -0.4081 0.3805 -0.5196 0.4570 0.2741 -0.4748 0.0588 0.7263 -0.5607 0.2643 0.7994 0.5495 -0.7471 -0.3801 -0.1198 0.8962 -0.4212 -0.1303 0.6565 0.3993 -0.3729 -0.2272 0.7454 0.6997 0.8395 -0.5805 0.1323 -0.7514 0.4964 0.6184 0.1572 0.6246 0.5602 -0.5959 0.5630 -0.4586 -0.5891 0.6003 -0.5022 -0.6595 0.2273 -0.6526 -0.5330 -0.0464 -0.6340 -0.8230 0.8332 -0.5941 0.3858 0.3046 0.7668 0.1651 -0.1934 -0.7275 -0.0343 0.5300 -0.6276 0.2037 0.4591 0.5458 -0.1765 0.4139 -0.5261 0.8647 -0.6960 -0.6874 -0.6446 0.9723 -0.7997 -0.5703 -0.3138 -0.8361 0.8022 0.1301 -0.7174 0.2161 0.1939 0.7449 -0.8591 0.4944 -0.3420 -0.2602 -0.0357 0.1117 0.0230 -0.5189 -0.6313 0.1436 -0.5145 0.6738 -0.7721 -0.3001 -0.4725 0.3466 0.2635 0.1509 0.0777 net.LW{4,3} = [-0.0691 0.4847 0.0491 -0.4330 -0.4161 -0.0993 0.2504 0.3449 -0.1747]; net.b{1} = [0.9807 0.6875 0.5157 0.5344 -0.2808 -0.1161 -0.8841 0.6626]; net.b{2} = [0.1246 -0.4557 0.1187 0.7682 0.4334 0.0097 -0.5773 -0.2063 0.5837 0.9203]; net.b{3} = [-0.8444 0.6469 -0.0153 0.7519 0.3512 0.1348 0.2617 -0.4950 0.8987 0.1456 0.4628 -0.8149]; net.b{4} = [-0.3443]; % set the training parameters 0.5363 -0.5147 -0.2824 0.4528 -0.3140 -0.1528 0.7358 -0.5239 0.3372 -0.6164 -0.0875 0.4186 0.4828 0.0139 0.6723 0.1270 0.0417 0.1827 -0.7798 0.7594]; 0.4180 net.trainParam.show=100; net.trainParam.epochs=100000; % net.trainParam.time=115; net.trainParam.goal=0.00000219; % train the network net = train(net,p,t); % simulate the trained network y = sim(net,p)