Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Final Exam Review • The following is a list of items that you should review in preparation for the exam. Note that not every item in the following slides may be on the exam, and there may be items on the exam not on this slide. Overview of three techniques • Decision Tree • Clustering • Association Rule What is classification? • Determining to what group a data element belongs – Or “attributes” of that “entity” • Examples – Determining whether a customer should be given a loan – Flagging a credit card transaction as a fraudulent charge – Categorizing a news story as finance, entertainment, or sports What is Cluster Analysis? Grouping data so that elements in a group will be • Similar (or related) to one another • Different (or unrelated) from elements in other groups Distance within clusters is minimized Distance between clusters is maximized http://www.baseball.bornbybits.com/blog/uploaded_images/ Takashi_Saito-703616.gif Association Mining Find out which items predict the occurrence of other items Also known as “affinity analysis” or “market basket” analysis Uses • What products are bought together? • Amazon’s recommendation engine • Telephone calling patterns Match Scenario with Data Mining Technique • Which data mining technique (Decision Trees, Clustering, or Association Rules) would be most appropriate to answer each question below? – What products are bought at the same time as coke? – What is the probability that a 57-year-old female in a low income family will die because of cancer? – How many types of customers visit fresh grocery? Interpret your model • You should be able to interpret your model from two aspects: – First, whether it is a good model – Second, how you can use your model to help you answer question/make decision. Basic Statistic Information • Be able to understand the basic about your data by looking at explore window with descriptive statistics – Distribution, Average, Range and etc. – And what those numbers can tell you. What can you tell from this histogram? Do most people spend a lot or not? Decision Tree • Whether it is a good model – Use Subtree Assessment Plot to find out Average Square Error and/or Misclassification Rate. Lower average square error and misclassification rate suggest better model. – Think why these numbers can provide you the optimal number of leaf. • How to use your model – Follow the tree path that matches the descriptions in your question. Why the optimal number of leaves is 13? What is the likelihood of 52 years old man with affluence of 5 buying an organic product? Cluster and Segment • Whether it is a good model – You want to have higher cohesion within your cluster and higher separation between your cluster. – Higher Root Mean Square Standard Deviation suggests lower cohesion. Higher distance to nearest cluster suggests higher separation • How to use your model – Be able to tell the difference each cluster has against your overall result. Which model is better in terms of cluster cohesion? For each model, which cluster has the highest cohesion? How will the maximum number of clusters in you model may affect the cohesion and separation? Is the sale of stretch jeans of cluster 2 better than the average sales of stretch jeans of entire population? Association Rule • Whether it is a good model – Confidence: the chance of Y is bought when X has been bought – Support: the chance of X and Y bought together – Lift: the ration of confidence to the chance of X and Y are bought together coincidentally. • How to use your model – Able to give suggestions based on your analysis Does coke often be bought with Beer or Pepsi? Why? Can you give one suggestion that two products should been put close to each other? Can you give one suggestion that two products should not been put together? Why?