Download cs412slides - Chinese University of Hong Kong

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
SEG 4630
E-Commerce Data Mining
— Final Review —
Hong Cheng
SEEM
Chinese University of Hong Kong
www.se.cuhk.edu.hk/~hcheng
May 22, 2017
E-Commerce Data Mining
1
Final Time/Location

Time: 9:30-11:30 am Dec. 15, Tuesday

Location: 103 John Fulton Center

Coverage: Chaps 2, 4-8

You can bring two A4 size, double-sided cheat
sheet

Calculator IS needed.
May 22, 2017
E-Commerce Data Mining
2
Chapter 2 (1)

Calculate data distribution


Mean, median, variance and standard deviation
Calculate distance between data objects



May 22, 2017
Minkowski distance
Distance between binary variables: symmetric and
asymmetric
Cosine similarity
E-Commerce Data Mining
3
Chapter 2 (2)

Data normalization




Min-max normalization
Z-score normalization
Decimal scaling
Data reduction


May 22, 2017
Dimensionality reduction methods
Sampling
E-Commerce Data Mining
4
Chapters 4-5 (1)

Decision tree


Bayes theorem and Naïve Bayesian


Calculate probabilities from training datasets
Lazy classifier and k-nearest neighbor


Calculate information gain, gini index, gain ratio
Calculate based on different k values and different
distance measures
Differences between eager and lazy classifiers
May 22, 2017
E-Commerce Data Mining
5
Chapters 4-5 (2)

Accuracy and error measures



ROC curve



True positive rate (TPR) and false positive rate (FPR)
Area under curve (AUC)
Evaluation methods



Training error vs. validation error
Confusion matrix
Hold out
Cross validation
Ensemble, bagging: know the principle
May 22, 2017
E-Commerce Data Mining
6
Chapters 6-7 (1)

Frequent patterns and association rules



Apriori algorithm





Support, confidence
Generate association rules from frequent itemsets
Candidate generation and test
Self joining
Pruning
Database scan
FPgrowth algorithm


May 22, 2017
Build FP-tree
Extract conditional DB
E-Commerce Data Mining
7
Chapter 6-7 (2)

Closed itemsets and maximal itemsets

Lift/Interest measure

Constraints




Monotonic
Antimonotonic
Convertible constraints
Sequence pattern mining: know the principle



May 22, 2017
Max-gap
min-gap
Max-span
E-Commerce Data Mining
8
Chapter 8

K-means clustering



Hierarchical clustering: MIN, MAX, Group average




Step-wise calculation
Update distance matrix
Advantages and disadvantages
Density-based clustering


Algorithm and calculation
Advantages and disadvantages
Know the principle
Evaluating clustering quality

May 22, 2017
SSE, silhouette, entropy, purity
E-Commerce Data Mining
9
Questions?
May 22, 2017
E-Commerce Data Mining
10
Related documents