Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
International Journal of Emerging Technology and Advanced Engineering Website: www.ijetae.com (ISSN 2250-2459, ISO 9001:2008 Certified Journal, Volume 2, Issue 12, December 2012) Study and Analysis of Decision Tree Based Irrigation Methods in Agriculture System Ravindra M1, V. Lokesha2, Prasanna Kumara3, Alok Ranjan4 1 Research Scholar, Dept. of Computer Scince, Singhania University, Rajasthan – INDIA 2 Acharya Institute of Technology, Bangalore – INDIA 3 Asst. Professor, Dept. of MCA, Cambridge Institute of Technology, Bangalore – INDIA 4 Asst. Professor, Dept. of CSE, Canara Engg. College, Mangalore – INDIA Data Mining is also known as Knowledge Extraction, Data/Pattern Analysis, Data Archeology, Data Dredging, Information Harvesting, Business Intelligence etc. Data Mining is not specific to one kind of data or media; instead, it is applicable to any kind of information in the repository [4]. Its fundamental objective is to provide insight and understanding about the structure of the data and its important features, and to discover and extract patterns contained in the data set. Data mining brings together a multitude of disciplines, such as database systems, statistics, artificial intelligence, data visualization, and others. The discovered knowledge can be applied to Information Management, Query Processing, DecisionMaking, Process Control and many other applications [5]. Abstract - Most of the energy coupling materials currently available have been around for decades, their use for the specific purpose of power harvesting has not been thoroughly examined until recently, when the power requirements of many electronic devices has reduced drastically. The objective of this research is to focus on the power source characteristics of various transducer devices in order to find some basic way to compare the relative energy densities of each type of device and, where possible, the comparative energy densities within subcategories of harvesting techniques. The successful implementation of decision tree helps to select the optimized utilization of the available methods in the irrigation to the fields to help farmer’s in making the decision in selecting the best suited pump set for irrigation. The various parameters such as irrigation types, the area coverage of the field in terms of acres, capacity of the motor being used for pumping the water, the height at which the water is being pumped etc., are considered while making the decision in the selection of best suited pump set for the irrigation. II. DATA M INING P ROCESS Data Mining is a convergence of three key technologies i.e. Increasing Computing Power, Statistical and Learning Algorithms, and Improved Data Collection and Management. The idea of Data Mining is drawn from Artificial Intelligence, Machine Learning, Pattern Recognition, Statistics and Database Systems [5]. The main techniques of data mining are Association Analysis, Clustering Analysis, Classification, Prediction, Time-Series Patterns and Bias Analysis. The steps in the data mining process are: i. Problem Definition: Defining the business problem is the first and the most important step in the Data Mining Process., ii. Data Collection and Enhancement: Involves the steps such as a. Define Data-Sources, b. Join and Deformalize-Data, c. Enrich Data, d. Transform-Data iii. Modeling Strategies: Fall into two categories: supervised learning and unsupervised learning, iv. Training, Validation, and Testing of Models: Partitioning data sets into one set of data used to train a model, another data set used to validate the model, and a third used to test the trained and validated model, v. Analyzing Results: Diagnosing and evaluating the results obtained from a Data Mining Model, vi. Modeling Iterations vii. Implementing Results. Keywords- Classification, Decision Trees, Data Mining techniques, irrigation system, harvesting techniques. I. INTRODUCTION The development of Information Technology lead to a large amount of data generation, and are stored in the repositories, thus most of the organizations become „data rich and information poor‟.[1] The conversion of huge volume of data into highly valued information, which are used to aid in decision making in various business processes. The extraction of knowledge from large sets of data help the business organizations to focus on most important information‟s in their data warehouses.[2] Data mining is the science of extracting useful information from large data sets or databases. The different kinds of information which could be mined are: i) Business Transactions Data, ii) Scientific Data, iii) Media and Personal Data, iv) Surveillance video and images, v) Spatial data, vi) World Wide Web repositories etc.[3] The Data Mining techniques are the result of long process of research and product development. 167 International Journal of Emerging Technology and Advanced Engineering Website: www.ijetae.com (ISSN 2250-2459, ISO 9001:2008 Certified Journal, Volume 2, Issue 12, December 2012) Test metrics are used to assess how accurately the model predicts the known values. If the model performs well and meets the business requirements, it can then be applied to new data to predict the future. Since the classification model uses the Decision Tree algorithm, rules are generated with the predictions and probabilities. III. C LASSIFICATION Classification is a data mining function that assigns items in a collection to target categories or classes. In data mining, classification is one of the most important tasks. It maps the data in to predefined targets. It is a supervised learning as targets are predefined. The aim of the classification is to build a classifier based on some cases with some attributes to describe the objects or one attribute to describe the group of the objects. Then, the classifier is used to predict the group attributes of new cases from the domain based on the values of other attributes [6]. The goal of classification is to accurately predict the target class for each case in the data. Classification technique is capable of processing a wider variety of data than regression and is growing in popularity. The classification task begins with a data set in which the class assignments are known. In the model build (training) process, a classification algorithm finds relationships between the values of the predictors and the values of the target. Different classification algorithms use different techniques for finding relationships. These relationships are summarized in a model, which can then be applied to a different data set in which the class assignments are unknown. Classification models are tested by comparing the predicted values to known target values in a set of test data. The historical data for a classification project is typically divided into two data sets: one for building the model; the other for testing the model. Scoring a classification model results in class assignments and probabilities for each case.[7] The basic classification techniques are: i. Decision Tree Based Methods, ii. Rule-Based Methods, iii. Neural Networks, iv. Naïve Bayes and Bayesian Belief Networks, v. Support Vector Machines etc. The Classification methods are used in many applications such as customer segmentation, business modeling, marketing, credit analysis, and biomedical and drug response modeling etc. A classification model is tested by applying it to test data with known target values and comparing the predicted values with the known values. The test data must be compatible with the data used to build the model and must be prepared in the same way that the build data was prepared. Typically the build data and test data come from the same historical data set. A percentage of the records are used to build the model; the remaining records are used to test the model. IV. DECISION T REES Most Data Mining techniques are based on inductive learning, where a model is constructed explicitly or implicitly by generalizing from a sufficient number of training examples. Traditionally, data collection was regarded as one of the most important stages in data analysis. The number of variables selected was usually small and the collection of their values could be done manually. In the case of computer-aided analysis, the analyst had to enter the collected data into a statistical computer package or an electronic spreadsheet. Due to the high cost of data collection, people learned to make decisions based on limited information. Since the dawn of the Information Age, accumulating data has become easier and storing it is inexpensive [8]. It has been estimated that the amount of stored information doubles every twenty months unfortunately, as the amount of machine-readable information increases; the ability to understand and make use of it does not keep pace with its growth. Training Set Tree Induction Algorithm Induction Learn Model Deduction Model Test Set Apply Model Figure 1: Decision Tree Model 168 Decision Tree International Journal of Emerging Technology and Advanced Engineering Website: www.ijetae.com (ISSN 2250-2459, ISO 9001:2008 Certified Journal, Volume 2, Issue 12, December 2012) There are two main types of data mining: VerificationOriented (the system verifies the user‟s hypothesis) and Discovery-Oriented (the system finds new rules and patterns autonomously). Each type has its own methodology. In data mining, a Decision Tree is a predictive model which can be used to represent both Classifiers and Regression models. In Operations Research, on the other hand, decision trees refer to a hierarchical model of decisions and their consequences. The decision maker employs decision trees to identify the strategy most likely to reach her goal. When a decision tree is used for classification tasks, it is more appropriately referred to as a Classification Tree. When it is used for regression tasks, it is called Regression Tree. Classification Trees are used to classify an object or an instance to a predefined set of classes (such as risky/nonrisky) based on their attributes values. Classification Trees are frequently used in applied fields such as finance, marketing, engineering and medicine. Classification trees are usually represented graphically as hierarchical structures, making them easier to interpret than other techniques. The use of a decision tree is a very popular technique in data mining. In the opinion of many researchers, decision trees are popular due to their simplicity and transparency. Decision trees (DTs) are either univariate or multivariate [9]. Univariate Decision Trees (UDTs) approximate the underlying distribution by partitioning the feature space recursively with axis parallel hyperplanes. The underlying function, or relationship between inputs and outputs, is approximated by a synthesis of the hyper-rectangles generated from the partitions. Multivariate Decision Trees (MDTs) have more complicated partitioning methodologies and are computationally more expensive than UDTs [10]. In the process of determining the right choice for the harvesting for the farmers, the various parameters such as irrigation types, the area coverage of the field in terms of acres, capacity of the motor being used for pumping the water, the height at which the water is being pumped etc., The following points are to be considered during are considered while making the decision in the selection of best suited pump set for the irrigation: a. Selection of Power Supply: i. Single Phase ii. Three Phase, b. Area available for irrigation, iii. Head to lift water supply, iv. Additional head for sprinkler irrigation. v. Type of cultivation, vi. Source of water (Plenty or Limited). vii. Economy of the power (to decide the cost method of irrigation), viii. Distance from the transformer, ix. Type of the Pump set: a). Mono block, b). Submersible, c) New Open Source, x. Required season for irrigation etc. I S D H Q P L H Q P Q Q L P L P L Figure 2: Decision Tree I-Irrigation, S-Sprinkler, D-Drip, H-Height, Q-Quantity, P-Plenty, L-Limited Tree Classification Rules are : i. Type(Drip)^Distance(<100)^Height(<50)^Quantity (Limited) ^ Acres(<5) => 2HP Good ii. Type(Drip)^Distance(>100)^Height(<50)^Quantity (Plenty)^Acres(<10) => 5HP Good iii. Type(Sprinkler)^Distance(<100)^Height(<100)^Quant ity(Limited)^Acres(<10) => 5HP Good iv. Type(Sprinkler)^Distance(>100)^Height(<50)^Quantit y(Plenty)^Acres(<5) => 2HP Good. The Information Gain for the Decision Tree is calculated using the formula I=(Log 2 N= log N/log 2). So the information gain for the left and right hand side of the tree is: P=35, N=29, I(P,N) = -(35/(35+29))log2 (35/(35+29)) (29/(35+29))log 2 (29/(35+29)) = -0.547 log 2 (0.547) - 0.453 log 2 (0.453) = -0.547 * -0.87 - 0.453 * -1.142 = 0.476+0.517 = 0.993. (Left) P=18, N=14, I(PSub1,NSub1) = -(18/(18+14))log2 (18/(18+14)) = (14/(18+14))log 2 (14/(18+14)) = -0.56 log 2 (0.56) - 0.44 log 2 (0.44) = -0.56 * -0.837 - 0.44 * -1.18 = 0.469+0.5192 = 0.9882 169 International Journal of Emerging Technology and Advanced Engineering Website: www.ijetae.com (ISSN 2250-2459, ISO 9001:2008 Certified Journal, Volume 2, Issue 12, December 2012) (Right) : P=17, N=15, I(PSub1,NSub1)= -(17/(17+15))log2 (17/(17+15)) (15/(17+15))log2 (15/(17+15)) = -0.53 log 2 (0.53) - 0.47 log 2 (0.47) = -0.53 * -0.92 - 0.47 * -1.089 = 0.4876+0.512 = 0.9996 E(Type) = ((18+14)/64) * 0.9882 + ((17+15)/64) * 0.9996 = 0.5 * 0.9882 + 0.5 * 0.9996 = 0.4943 + 0.4998 = 0.9941 Therefore the Information Gain for the Irrigation Type is : I(Type of Irrigation) = 0.993 – 0.9941= -0.0011. Sprinkler->Distance-> Height…>Left P=4, N=4 = (4/(8))log 2 (4/(8))-(4/(8))log 2 (4/(8)) = 0.5log2(0.5))-0.5log2(0.5) == -0.5*-1 - 0.5*-1 = 1.0 Type I Drip->Distance->Left P=9, N=7 = -(9/(16))log 2 (9/(16))-(7/(16))log 2 (7/(16)) = -0.562log2 (0.562))-0.437log2 (0.437) = (-0.562*-0.831)-(0.437*-1.194) = 0.9888 Drip->Distance-> Right P=9, N=7 = (9/(16))log 2 (9/(16))-(7/(16))log 2 (7/(16)) = 0.562log2 (0.562))-0.437log2 (0.437) = (-0.562*-0.831)-(0.437*-1.194) = 0.9888 Sub Type P N Drip(D) Sprinkler(S) D->Distance(DD) S->Distance(SD) DD-Height(DDH) SD-Height(SDH) DDH-Qty(DDHQ) DDH-Qty(SDHQ) DDHQ->Plenty(DDHQP) SDHQ->Limited(SDHQL) 35 18 17 17 09 09 09 08 04 05 29 14 15 15 07 07 07 08 04 03 I(PSubi, NSubi) 0.993 0.9882 0.9996 0.9996 0.9888 0.9888 0.9888 1.0000 1.0000 0.9460 Table 1: Information Gain summary V. CONCLUSION Classification methods are typically strong in modeling interactions. The goal of classification result integration algorithms is to generate more certain, precise and accurate system results. Numerous methods have been suggested for the creation of ensemble of classifiers. The table lists the information gains with various types of parameters that could be used in the agriculture irrigation by a farmer in his/her fields. By comparing the various parameters considered, one could identify the best suitable method of irrigation for his/her fields, which could help them in the cultivation, and thus maximize the profit. Sprinkler->Distance-> Left P=9, N=7 = -(9/(16))log 2 (9/(16))-(7/(16))log 2 (7/(16)) = -0.562log2 (0.562))-0.437log2(0.437) = (-0.562*-0.831)-(0.437*-1.194) = 0.9888 Sprinkler->Distance-> Right P=8, N=8 = -(8/(16))log 2 (8/(16))-(8/(16))log 2 (8/(16)) = -0.5log2(0.5))-0.5log2(0.5) = -0.5*-1 - 0.5*-1 = 1.0 REFERENCES Drip->Distance-> Height…>Left P=4, N=4 = -(4/(8))log 2 (4/(8))-(4/(8))log 2 (4/(8)) = -0.5log2(0.5))-0.5log2(0.5) = -0.5*-1 - 0.5*-1 = 1.0 [1 ] Shen-Ming Gu, Yun Zheng, Lin-Ting Guan, Yue-ting Zhuang: “The Explore of Some Cases with Data Mining Techniques”, International Conference on Electronic Computer Technology, 2009, 978-0-76953559-3/09, DOI 10.1109/ICECT.2009.47. [2 ] Mrs. Bharati M. Ramageri: “Data Mining Techniques and Applications”, Indian Journal of Computer Science and Engineering, Vol. 1 No. 4 301-305, ISSN : 0976-5166. [3 ] B.N Lakshmi, G.H. Raghunandan: “A Conceptual Overview of Data Mining”, Proc. National Conference on Innovation in Emerging Technology, 2011, pp. 27-32. [4 ] Prasanna Kumara, Alok Ranjan : “Data Mining And Its Significance In Industrial Applications”, International Journal of Advanced Research in Computer Science, 2012, Vol 3, No. 2, ISSN No. 09765697. [5 ] Jochen Hipp, Ulrich Günter, and Udo Grimmer: “Data Quality Mining, unpublished. Drip->Distance-> Height…>Right P=5, N=3 = -(5/(8))log 2 (5/(8))-(3/(8))log 2 (3/(8)) = -0.62log2(0.62))-0.375log2(0.375)) = -0.62*-0.670 - 0.375*-1.415 = 0.9460 Sprinkler->Distance-> Height…>Right P=5, N=3 = (5/(8))log 2 (5/(8))-(3/(8))log 2 (3/(8)) 0.62log2 (0.62))-0.375log2 (0.375)) = -0.62*-0.670 - 0.375*-1.415 = 0.9460 170 International Journal of Emerging Technology and Advanced Engineering Website: www.ijetae.com (ISSN 2250-2459, ISO 9001:2008 Certified Journal, Volume 2, Issue 12, December 2012) [9 ] Abbass, H.A., Towsey M., &Finn G. (2001). C-Net: “A Method for Generating Nondeterministic and Dynamic Multivariate Decision Trees.” Knowledge and Information Systems: An International Journal, Springer-Verlag, 5(2). [10 ] D. Shanthi, Dr. G. Sahoo, Dr. N. Saravanan: Decision Tree Classifiers to Determine the Patient‟s Post-Operative Recovery Decision, International Journal of Artificial Intelligence and Expert Systems (IJAE), Volume (1): Issue (4). [6 ] Shelly Gupta, Dharminder Kumar, and Anand Sharma: Data Mining Classification Techniques Applied for Breast Cancer Diagnosis and Prognosis, Indian Journal of Computer Science and Engineering (IJCSE), Vol. 2 No. 2 Apr-May 2011, ISSN : 0976-516, Pg. No : 188-195. [7 ] Tan, Steinbach, Kumar : Data Mining Classification - on: Basic Concepts, Decision Trees, and Model Evaluation [8 ] Data Mining with Decision Trees - Theory and Applications: World Scientific Publishing Co. Pte. Ltd. 171