Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
From Data Mining To Knowledge Discovery : An Overview Course: CIS 864 Class: 6 Presenter: MANMOHAN K. UTTARWAR On: Monday, January 29, 2001 CIS 830: Advanced Topics in Artificial Intelligence Kansas State University Department of Computing and Information Sciences Contents Reasoning Advances KDD:Definition Data Mining & KDD The KDD Process Stages Of KDD Primary Tasks of Data Mining Components Of Data Mining Algorithm Popular Data Mining methods Application Issues Of Data Mining Guidelines for Selecting Potential KDD Applications Research & Application Challenges for KDD CIS 830: Advanced Topics in Artificial Intelligence Kansas State University Department of Computing and Information Sciences An Overview Explosive growth of Business,Govt. & Scientific databases. Ability to Interpret & Digest Data Inappropriate Tools & Techniques for Database Analysis CIS 830: Advanced Topics in Artificial Intelligence Kansas State University Department of Computing and Information Sciences Advances in Storage Technology Wal-Mart : 20 Million Trans/Day Health care : Multi Gigabyte Mobil Oil : 100 TB of Oil Exploration Data NASA: EOS –generates 50GB /hr Remotely Sensed Image Data CIS 830: Advanced Topics in Artificial Intelligence Kansas State University Department of Computing and Information Sciences KDD: Definition The non-trivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data Multiple process non-trivial process Justified patterns/models valid novel useful understandable Previously unknown Can be used by human and machine CIS 830: Advanced Topics in Artificial Intelligence Kansas State University Department of Computing and Information Sciences Data Mining & KDD Data Mining - Step in KDD process - Consists of particular Data Mining algorithms • Under specified computational efficiency limitations • produces specific enumeration of patterns CIS 830: Advanced Topics in Artificial Intelligence KDD process is the process of using Data Mining methods (Algorithms) to extract deemed knowledge according to the specifications of measures and threshold using the database along with any required preprocessing, sub sampling & transformations of that database. Kansas State University Department of Computing and Information Sciences The KDD Processes Preliminaries Developing an understanding of the application Domain Creating a Target Data Set Choosing the Data Mining Tasks Preprocessing Data cleaning and Pre-processing Data Reduction and Projection Choosing the Data Mining Algorithms Data Mining Application Interpretation of Mining Patterns Consolidating discovered Knowledge CIS 830: Advanced Topics in Artificial Intelligence Kansas State University Department of Computing and Information Sciences Stages of KDD CIS 830: Advanced Topics in Artificial Intelligence Kansas State University Department of Computing and Information Sciences Tasks in KDD Process CIS 830: Advanced Topics in Artificial Intelligence Kansas State University Department of Computing and Information Sciences Primary Tasks of Data Mining finding the description of several predefined classes and classify a data item into one of them. Classification ? maps a data item to a real-valued prediction variable. identifying a finite set of categories or clusters to describe the data. Clustering finding a model which describes significant dependencies between variables. Regression discovering the most significant changes in the data Deviation and change detection CIS 830: Advanced Topics in Artificial Intelligence Dependency Modeling finding a compact description for a subset of data Summarization Kansas State University Department of Computing and Information Sciences Primary Tasks of Data Mining Classification Regression Clustering Dependency modeling Change & Deviation Detection Summarization CIS 830: Advanced Topics in Artificial Intelligence Kansas State University Department of Computing and Information Sciences Primary Tasks of Data Mining CIS 830: Advanced Topics in Artificial Intelligence Kansas State University Department of Computing and Information Sciences Components of Data Mining Algorithms Model Representation - L describing discoverable patterns Model Evaluation - meets criteria of KDD process Search Methods - Parameter Search - Model Search CIS 830: Advanced Topics in Artificial Intelligence Kansas State University Department of Computing and Information Sciences Popular Data Mining Methods Decision Trees & Rules Non-Linear Regression & Classification Methods Example-Based Methods Probabilistic Graphical Dependency Model Relational Learning Model CIS 830: Advanced Topics in Artificial Intelligence Kansas State University Department of Computing and Information Sciences Non-Linear & Nearest Neighbor Classifier CIS 830: Advanced Topics in Artificial Intelligence Kansas State University Department of Computing and Information Sciences SEMMA Process (Simple DM Algo.) CIS 830: Advanced Topics in Artificial Intelligence Kansas State University Department of Computing and Information Sciences Application Issues of KDD Database Marketing Analysis and Selection of Stocks Scientific applications such as -- Astronomy -- Molecular Biology -- Global Climate Change Modeling Fraud Detection & Prevention CIS 830: Advanced Topics in Artificial Intelligence Kansas State University Department of Computing and Information Sciences Guidelines for Selecting a Potential KDD Application Practical Criteria -- Potential for significant impact of an application -- No Good alternatives exists -- Organizational support -- Potential for privacy / legal issues Technical Criteria -- Availability of Significant Data -- Relevance of attributes -- Low noise levels -- Confidence intervals -- Prior Knowledge CIS 830: Advanced Topics in Artificial Intelligence Kansas State University Department of Computing and Information Sciences Challenges for KDD Larger Database High Dimensionality Over-fitting Assessing Statistical Significance Changing Data & Knowledge Missing & Noisy Data Complex Relationships between Fields Understandability of Patterns User Interaction & Prior Knowledge Integration with other Systems CIS 830: Advanced Topics in Artificial Intelligence Kansas State University Department of Computing and Information Sciences Conclusions KDD – a desirable end product Many approaches exists - have advantages & problems Barrier in obtaining Quality Data CIS 830: Advanced Topics in Artificial Intelligence Kansas State University Department of Computing and Information Sciences