Download Overview of Data Mining Methods (MS PPT)

Data Mining Tools Overview Business Intelligence for Managers Data Mining Definition Revisited Analysis of large quantities of data  Knowledge discovery in databases  Extracting implicit, previously unknown information from large volumes of raw data  Instances and Features Typically, the database will be a collection of instances  Each instance will have values for a given set of features  From database theory: instances are rows, features are columns  Classification Supervised learning  Suppose instances have been categorized into classes and the database includes this categorization  Goal: using the “knowledge” in the database, classify a given instance  Classifiers feature values X1 X2 X3 … Y Classifier category Xn DB collection of instances with known categories Classifier intelligence A classifier’s intelligence will be based on a dataset consisting of instances with known categories  Typical goal of a classifier: predict the category of a new instance that is rationally consistent with the dataset  BI Examples   A loans officer in a bank uses a system that automatically approves or disapproves a loan application based on previous loan applications and decisions An admissions officer in a university uses a system that automatically makes an admission decision (accept, reject, wait-list), based on previous applicants’ data and decisions made on them Data mining method example: k - nearest neighbors  For a given instance T, get the top k database instances that are “nearest” to T  Select a reasonable distance measure Inspect the category of these k instances, choose the category C that represent the most instances  Conclude that T belongs to category C  Clustering (Chapter 5 of text) Unsupervised learning  Classes/categories are not known, but unexpected groupings (clusters) are discovered  Clustering provides insight into the population segments  Feature 2 Clustering Feature 1 Goal of Clustering Input: the database of instances, and possibly some predetermined number of clusters  Output: the same database of instances partitioned into clusters  BI Examples  After clustering the current university student population, it was discovered that there is a large group of female marketing majors coming from a particular exclusive school who tend to get high grades   business response: focus recruitment on that school; push the university’s marketing program Customer segment characteristics and spending patterns can direct business strategies Data mining method example: k-means Guess the number of clusters (k)  Guess cluster centers from the samples (these will be called centroids)  Determine cluster membership based on the distance from the centroids  Repeatedly refine the centroids by getting the average (mean) of the members of each cluster  Summary Two sub-areas of data mining: supervised (classification) and unsupervised (clustering) learning methods  For both types of methods, intelligent systems can be created to support business decision making 

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Overview of Data Mining Methods (MS PPT)