Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Data Mining Database Systems Timothy Vu Mining Mining is the extraction of valuable minerals or other geological materials from the earth, usually bauxite, coal, diamonds, iron, precious metals, lead, limestone, nickel, phosphate, rock salt, tin, and uranium, petroleum, natural gas, and even water. Often something that is valuable, rare, or useful. Database 2 What is Data Mining Data Mining, also known as Knowledge-Discovery in Databases (KDD), is the process of automatically searching large volumes of data for patterns. In order to achieve this, data mining uses computational techniques from statistics, machine learning and pattern recognition. Machine learning - a method for creating computer programs by the analysis of data sets. Pattern recognition - classify data (patterns) based on either a priori knowledge or on statistical information extracted from the patterns. Database 3 Why Data Mining Data mining is a technique that helps individuals or companies find useful information to make better decisions from large amounts of data. - Reduce risks - Find problems and issues - Save money - High confidence predictions - Simplifying information Database 4 Discussion Topics 1 ) Classification 2 )Regression 3) Association 4) Clustering Database 5 Classifiers Decision-Tree Classifiers – each node has an associated class and each internal node has a predicate. Bayesian Classifiers – find the distribution of attribute values for each class in the training data ( the maximum probability predicted ). Nuro Net Classifiers – Use the training data to train artificial nuro nets. Database 6 Regression Regression – Deals with the prediction of a value rather than a class. Linear Regression – Predict values using a polynomial by finding the curve fitting, meaning finding coefficients that give the best answer. Database 7 Associations Finding the association or relationship between two or more items. Support – measure of what fractions of the pupulation satisifies both the antecedent and the consequent of the rule. MILK => Screwdrivers Confidence – how often the consequent is true when the antecedent is true. MILK => Bread Database 8 Clustering Clustering is the classification of similar objects into different groups, or more precisely, the partitioning of a data set into subsets (clusters), so that the data in each subset (ideally) share some common trait - often proximity according to some defined distance measure. Database 9 Applications of Data Mining 1. Predictions - Stock Market - Earth Quakes - NBA games 2. Association - Store Inventory - Fashion Trends 3. Descriptive Patterns - Disease Analysis - Image Recognition - Fraud Detection Database 10 Gather Data Database 11 Electrocardiogram Database 12 Disease Analysis Database 13 References Silberschatz, H.F. Korth, S. Sudershan: Database System Concepts, 5th ed., McGraw-Hill, 2006 Runge , Marschall, Magnus Ohman , and Frank Netter. Netter's Cardiology (Netter Clinical Science). W.B. Saunders Company, 2004. "Data mining". Wikipedia. 4/1/2006 <http://en.wikipedia.org/wiki/Data_Mining>. Database 14