Download TMT 2005-project- Datamining on wine fields

9th International Research/Expert Conference “Trends in the Development of Machinery and Associated Technology” TMT 2005, Antalya, Turkey, 26-30 September, 2005 WINE RECOGNITION ANALYSIS BY USING DATA MINING Kivanc Kilicer T.C. Bahcesehir University Bahcesehir, Istanbul Turkey Adem Karahoca T.C. Bahcesehir University Bahcesehir, Istanbul Turkey ABSTRACT The aim of this study is to evaluate and to understand the indicators of wine quality by using data mining methods. We used a java based program “Weka” to compare the effects of 13 constituents found in each of three types of wines. Our Wine recognition dataset contains the results of a chemical analysis of wines grown in the same region in Italy but derived from three different cultivars. A chemical analysis of 178 Italian wines from three different cultivars yielded 13 measurements. This dataset is often used to test and compare the performance of various classification algorithms. Keywords: Data Mining, Wine, Classification, Clustering, Discretisizing, Bayes, K-means 1. INTRODUCTION Wine recognition dataset contains the results of a chemical analysis of wines grown in the same region in Italy but derived from three different cultivars. A chemical analysis of 178 Italian wines from three different cultivars yielded 13 measurements. This dataset is often used to test and compare the performance of various classification algorithms. The analysis determined the effects of 13 constituents found in each of the three types of wines [1, 5]. These are: 1) Alcohol 2) Malic acid 3) Ash 4) Alcalinity of ash 5) Magnesium 6) Total phenols 7) Flavanoids 8) Nonflavanoid phenols 9) Proanthocyanins 10)Color intensity 11)Hue 12)OD280/OD315 of diluted wines 13)Proline By using datamining methods, my main target is to evaluate these figures and try to understand the indicators of wine classification. PDF created with pdfFactory Pro trial version www.pdffactory.com 2. MATERIAL METHOD In order to evaluate our dataset, we will use pre-processing, classification and clustering methods of data mining. Association method is not applicable because of numeric values. By preprocessing, my main aim is to summarize the data in the best way. With the help of discretisizing and range removal, we will try to visualize the constituent ratios used in 3 different types of wine. By Classification methods, my main aim is to classify the data into decision trees and making a prediction most near to reality. We will use j48 filter for treeing. To understand entropy and gain values, we will use Bayes theorem. To reach the best results we will also use SMO and Multilayer Perceptron filters [2]. During classification, we will also make a prediction with the help of Weka software. By Clustering, our main aim is to observe the effects of variables in one chart. Cogweb and K-means will be helpful methods in my investigation to reach the cluster means and make judgement about the contituents. 3. RESULTS We have used 178 different wines from which 59 of them belong to “Class1”; 71 of them belong to “Class2” and 48 of them belong to “Class3”. “Class1” can be seen as blue colour; “Class2” as red colour and “Class3” as cyan colour. Our dataset revealed the graphs of 13 constituents present in these 3 classes: Figure 1. Thirteen constitunets in three different classes of wine quality 3.1. Preprocessing by Discretisizing In order to have a more clear picture on our dataset, we discretisized the data into 12 bins for each constituent and use equal frequencies: Figure 2. Discretisizing the constituents into twelve bins PDF created with pdfFactory Pro trial version www.pdffactory.com 3.2 Classification by Bayes Theorem By using NaiveBayes classification method, we reached the P(h1) values as %33 for Class 1, %40 for Class 2, %27 for Class 3. Bayes classification method gave us good percentage on prediction. Posterior probabilities are also shown below. Entropy value is close to zero which means almost no surprises would happen on our predictions. [3] ResultResult Figure 3.Prediction of Bayes Therorem can be accepted with 98.3 per cent We used other classification methods such as Multilayer Perceptron [4] and SMO where we also obtained good results. Let’s make a prediction here with SMO by adding “?” into the dataset and choosing “output prediction” in the options menu. The last line of dataset is like this “13.42,4.65,2.55,20,93,3,.9,.47,1.32,4.3,.94,2.35,580,?” We can see in the below table that Weka’s prediction is “class 3” with %95 certainty. This means that using these values for each constraint in the soil, it’s possible to grow a grape with class 3 quality. PDF created with pdfFactory Pro trial version www.pdffactory.com Figure 4.Output Prediction of SMO shows that the grape will be in class 3 quality when we use the constraint levels as 13.42, 4.65, 2.55, 20, 93, 3, 0.9, 0.47, 1.32, 4.3, 0.94, 2.35, 580 respectively 3.3 Decision Trees and Clustering Our analysis under Weka J48 filter decision tree showed us that there is an important relationship between flavonoids and color intensity. Colorintensity is a good indicator for defining wine type. Plus, the amount of proline can affect the class type if flavanoid amount is above 1,57. By using Cogweb and K-means tests[6], we saw the same relationship between flavanoid and color intensity. We saw that flavonoid, proline and color intensity variables affects most the class of wine in the final situation. PDF created with pdfFactory Pro trial version www.pdffactory.com Figure 5.Clustering 4. CONCLUSIONS AND DISCUSSION Our classification, clustering and decision tree methods show that the most important factors lying beneath defining the wine quality are color intensity, flavonoid, alchool and proline attributes in the soil. On the other hand, 3 different classes are clustered in different areas, which means that the classification of the wine types are will-made. The content of wine can be more than %95 seperable according to 3 mentioned classes. With this reliable analysis, it’s now more easy to understand the type of an unknown third party wine because our output prediction worked very well in the above mentioned experiment. A future analysis may help us to divide the regions of Italy and other countries into different clusters according to types of cultivation in grape fields. 5. REFERENCES [1] Kment, Petr and Mihalijevic, Martin, Differentiation of Czech Wines Using Multielement Composition – A Comparison With Vineyard Soil, Faculty of Science, Institute of Geochemistry, Mineralogy and Mineral Resources, Charles Uiversity, Czech Republic [2] S. Aeberhard, D. Coomans and O. de Vel, Comparison of Classifiers in High Dimensional Settings, Tech. Rep. no. 92-02, (1992), Dept. of Computer Science and Dept. of Mathematics and Statistics, James Cook University of North Queensland [3] S. Aeberhard, D. Coomans and O. de Vel, "THE CLASSIFICATION PERFORMANCE OF RDA" Tech. Rep. no. 92-01, (1992), Dept. of Computer Science and Dept. of Mathematics and Statistics, James Cook University of North Queensland. [4] Forina, M. et al, PARVUS - An Extendible Package for Data Exploration, Classification and Correlation. Institute of Pharmaceutical and Food Analysis and Technologies, Via Brigata Salerno,16147 Genoa, Italy [5] Ying, Guang-Guo and Williams, Bryan, Dissipation of Herbicides in Soil and Grapes in a South Australian Vineyard(1999), Department of Environmental Science and Management, University of Adelaide, Australia [6] J. Leonard and P. Andrieux, Infiltration Charactheristics of Soils in Mediterranean Vineyards in Southern France (1998), INRA, UFR Science du Sol, 2 Place Viala, 34060, Montpellier cedex 2, France PDF created with pdfFactory Pro trial version www.pdffactory.com

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download TMT 2005-project- Datamining on wine fields