Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
3/24/2011 Data Mining By Susan Miertschin 1 Data Mining – Task Types Data mining is useful for certain types of tasks As new algorithms are developed and evolve, new task types or extensions of existing task types may evolve 2 Data Mining - Task Types Classification Clustering Discovering Association Rules Discovering Sequential Patterns – Sequence Analysis Regression Detecting Deviations from Normal 3 1 3/24/2011 Prediction versus Description The purpose of some tasks is to describe the status quo The purpose of some tasks is be able to predict something based on something else 4 Supervised versus Unsupervised The purpose of some tasks is to describe the status quo Techniques in this category are referred to as unsupervised The purpose of some tasks is be able to predict something based on something else Techniques in this category are referred to as supervised 5 Description and Examples Data Mining - Task Types 6 2 3/24/2011 Data Mining - Task Types Classification Fit items into slots – Larson Assign items in a collection to target categories or classes – Oracle book (http://download.oracle.com/docs/cd/B28359_01/datamine.111/b28129/classify.htm) Clustering Discovering Association Rules Discovering Sequential Patterns – Sequence Analysis Regression Detective Deviations from Normal 7 Classification Here is data about loan applicants. Which are good risks and which are poor risks? Which of our current customers are likely to increase their current rate of spending if given an affinity card? Which consumers are likely to buy a new cell-phone cell phone product if I send them a direct mailing? 8 Classification Predictive or Descriptive? Supervised or Unsupervised? 9 3 3/24/2011 Classification Predictive Supervised 10 Classification – Predictive - Supervised Predict which loan applicants are good risks and which are poor risks Predict which of our current customers are likely to increase their current rate of spending if given an affinity card Predict which consumers are likely to buy a new cell-phone cell phone product if I send them a direct mailing Predict which patients will respond better to treatment A or treatment B 11 Data Mining - Task Types Classification Clustering Divide data into groups with similar characteristics - Larson Find clusters of data objects similar in some way to one another – Oracle book (http://download oracle com/docs/cd/B28359 01/datamine 111/b28129/clustering htm) (http://download.oracle.com/docs/cd/B28359_01/datamine.111/b28129/clustering.htm) Discovering Association Rules Discovering Sequential Patterns – Sequence Analysis Regression Detective Deviations from Normal 12 4 3/24/2011 Clustering Find customers similar to each other based on geographical distance to nearest store-front location, number of small dogs owned, number of cats owned, and number of children in household Purpose? p Target g niche markets,, pplan new stores Find cardiologists who are similar with respect to likelihood of prescribing a certain class of medication for treatment of congestive heart failure (based on hospital patient records) and patient mix demographics Purpose? Target these cardiologists for a particular marketing effort related to a new pharmaceutical product 13 Clustering Predictive or Descriptive? Supervised or Unsupervised? 14 Clustering Descriptive Unsupervised 15 5 3/24/2011 Data Mining - Task Types Classification Clustering Discovering Association Rules Find patterns in group membership - Larson Findingg p probabilityy of the co-occurrence of items in a collection – Oracle book (http://download.oracle.com/docs/cd/B28359_01/datamine.111/b28129/market_basket.htm) Produce dependency rules which will predict occurrence of an item or event based on occurrences of other items Discovering Sequential Patterns – Sequence Analysis Regression Detective Deviations from Normal 16 Association Also called market basket analysis Customers who bought this book also bought this other book in the same purchase at what rate? Patients who were treated with drug X developed side effect B at a particular rate rate, what else did the side effect B people have in common? 17 Association Predictive or Descriptive? Supervised or Unsupervised? 18 6 3/24/2011 Association Predictive Supervised 19 Association – Predictive - Supervised Predict which second book a customer choosing a first book might like Predict which patients undergoing treatment with drug X will develop side effect B 20 Data Mining - Task Types Classification Clustering Discovering Association Rules Discovering Sequential Patterns – Sequence Analysis Predict future routes based on past routes – Larson Given is a set of objects, with each object associated with its own timeline of events, find rules that predict sequential dependencies among different events – Swaroop and Golden (http://www.rhsmith.umd.edu/faculty/bgolden/classes_links/2009_jan_data%20mining_BUDT%20758.pdf) Regression Detective Deviations from Normal 21 7 3/24/2011 Sequence Analysis A user is on web page A, what page is the user most likely to navigate to next? A customer buys a Kindle. What is the customer most likely to do next? Does the sequential pattern of events in an event log tell us anything about server outages? Network outages? 22 Sequence Analysis Predictive or Descriptive? Supervised or Unsupervised? 23 Sequence Analysis Predictive and Descriptive methods Supervised and Unsupervised methods 24 8 3/24/2011 Sequence Analysis – Predictive and/or Descriptive 25 Data Mining - Task Types Classification Clustering Discovering Association Rules Discovering Sequential Patterns – Sequence Analysis Regression Predict the value of a continuous variable (as opposed to a categorical variable) – Larson Regression predicts a number – Oracle book (http://download.oracle.com/docs/cd/B28359_01/datamine.111/b28129/regress.htm) Detective Deviations from Normal 26 Regression Here is data about loan applicants. Which are good risks and which are poor risks? Which of our current customers are likely to increase their current rate of spending if given an affinity card? Which consumers are likely to buy a new cell-phone cell phone product if I send them a direct mailing? 27 9 3/24/2011 Regression Predictive or Descriptive? Supervised or Unsupervised? 28 Regression Predictive Supervised 29 Regression – Predictive - Supervised Predict which loan applicants are good risks and which are poor risks Predict which of our current customers are likely to increase their current rate of spending if given an affinity card Predict which consumers are likely to buy a new cell-phone cell phone product if I send them a direct mailing Predict which patients will respond better to treatment A or treatment B 30 10 3/24/2011 Data Mining - Task Types Classification Clustering Discovering Association Rules Discovering Sequential Patterns – Sequence Analysis Regression Detecting Deviations from Normal – Anomaly Detection Identify cases that are unusual within homogeneous data – Oracle book (http://download.oracle.com/docs/cd/B28359_01/datamine.111/b28129/anomalies.htm) 31 Detecting Deviations from Normal – Anomaly Detection Is there anything unusual about this pattern of credit card charges? Is there anything unusual about this event log that would indicate an unauthorized intrusion? Is there any pattern here that indicates an unusual pattern of accidents and treatments? 32 Detecting Deviations from Normal – Anomaly Detection Predictive or Descriptive? Supervised or Unsupervised? 33 11 3/24/2011 Detecting Deviations from Normal – Anomaly Detection Predictive Supervised 34 Anomaly Detection – Predictive - Supervised This pattern of credit card charges implies a stolen credit card This pattern of events in the event log indicates an unauthorized intrusion This pattern of accidents and treatments indicates the likelihood of insurance fraud 35 Data Mining – Algorithms Different algorithms are available for different data mining tasks Different tools exist that implement different algorithms and different versions of algorithms 36 12 3/24/2011 Algorithms Available in Analysis Services Decision Trees Linear Regression Naïve Bayes Clustering Algorithms Association Rules Sequence Clustering Time Series Analysis Neural Networks Logistic Regression 37 Data Mining By Susan Miertschin 38 13