* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Document
Survey
Document related concepts
Transcript
Data Mining Overview Business Intelligence Data Mining Defined Knowledge discovery in databases Extracting implicit, previously unknown information from large volumes of raw data Instances and Features Typically, the database will be a collection of instances Each instance will have values for a given set of features From database theory: instances are rows, features are columns Classification Supervised learning Suppose instances have been categorized into classes and the database includes this categorization Goal: using the “knowledge” in the database, classify a given instance Classifiers feature values X1 X2 X3 … Y Classifier category Xn DB collection of instances with known categories Classifier intelligence A classifier’s intelligence will be based on a dataset consisting of instances with known categories Typical goal of a classifier: predict the category of a new instance that is rationally consistent with the dataset BI Examples A loans officer in a bank uses a system that automatically approves or disapproves a loan application based on previous loan applications and decisions An admissions officer in a university uses a system that automatically makes an admission decision (accept, reject, wait-list), based on previous applicants’ data and decisions made on them Data mining method example: k - nearest neighbors For a given instance T, get the top k database instances that are “nearest” to T Select a reasonable distance measure Inspect the category of these k instances, choose the category C that represent the most instances Conclude that T belongs to category C Clustering Unsupervised learning Classes/categories are not known, but unexpected groupings (clusters) are discovered Clustering provides insight into the population segments Feature 2 Clustering Feature 1 Goal of Clustering Input: the database of instances, and possibly some predetermined number of clusters Output: the same database of instances partitioned into clusters BI Examples After clustering the current university student population, it was discovered that there is a large group of female marketing majors coming from a particular exclusive school who tend to get high grades business response: focus recruitment on that school; push the university’s marketing program Customer segment characteristics and spending patterns can direct business strategies Data mining method example: k-means Guess the number of clusters (k) Guess cluster centers from the samples (these will be called centroids) Determine cluster membership based on the distance from the centroids Repeatedly refine the centroids by getting the average (mean) of the members of each cluster Summary Two sub-areas of data mining have been discussed: supervised (classification) and unsupervised (clustering) learning methods For both types of methods, intelligent systems can be created to support business decision making