Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
SEG 4630 E-Commerce Data Mining — Final Review — Hong Cheng SEEM Chinese University of Hong Kong www.se.cuhk.edu.hk/~hcheng May 22, 2017 E-Commerce Data Mining 1 Final Time/Location Time: 9:30-11:30 am Dec. 15, Tuesday Location: 103 John Fulton Center Coverage: Chaps 2, 4-8 You can bring two A4 size, double-sided cheat sheet Calculator IS needed. May 22, 2017 E-Commerce Data Mining 2 Chapter 2 (1) Calculate data distribution Mean, median, variance and standard deviation Calculate distance between data objects May 22, 2017 Minkowski distance Distance between binary variables: symmetric and asymmetric Cosine similarity E-Commerce Data Mining 3 Chapter 2 (2) Data normalization Min-max normalization Z-score normalization Decimal scaling Data reduction May 22, 2017 Dimensionality reduction methods Sampling E-Commerce Data Mining 4 Chapters 4-5 (1) Decision tree Bayes theorem and Naïve Bayesian Calculate probabilities from training datasets Lazy classifier and k-nearest neighbor Calculate information gain, gini index, gain ratio Calculate based on different k values and different distance measures Differences between eager and lazy classifiers May 22, 2017 E-Commerce Data Mining 5 Chapters 4-5 (2) Accuracy and error measures ROC curve True positive rate (TPR) and false positive rate (FPR) Area under curve (AUC) Evaluation methods Training error vs. validation error Confusion matrix Hold out Cross validation Ensemble, bagging: know the principle May 22, 2017 E-Commerce Data Mining 6 Chapters 6-7 (1) Frequent patterns and association rules Apriori algorithm Support, confidence Generate association rules from frequent itemsets Candidate generation and test Self joining Pruning Database scan FPgrowth algorithm May 22, 2017 Build FP-tree Extract conditional DB E-Commerce Data Mining 7 Chapter 6-7 (2) Closed itemsets and maximal itemsets Lift/Interest measure Constraints Monotonic Antimonotonic Convertible constraints Sequence pattern mining: know the principle May 22, 2017 Max-gap min-gap Max-span E-Commerce Data Mining 8 Chapter 8 K-means clustering Hierarchical clustering: MIN, MAX, Group average Step-wise calculation Update distance matrix Advantages and disadvantages Density-based clustering Algorithm and calculation Advantages and disadvantages Know the principle Evaluating clustering quality May 22, 2017 SSE, silhouette, entropy, purity E-Commerce Data Mining 9 Questions? May 22, 2017 E-Commerce Data Mining 10