Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
CSCI6405 Fall 2003 Dta Mining and Data Warehousing Instructor: Qigang Gao, Office: CS219, Tel:494-3356, Email: [email protected] Teaching Assistant: Christopher Jordan, Email: [email protected] Office Hours: TR, 1:30 - 3:00 PM 30 October 2003 1 Lectures Outline … Part III: Data Mining Methods/Algorithms 4. Data mining primitives (ch4, optional) 5. Classification data mining (ch7) 6. Association data mining (ch6) 7. Characterization data mining (ch5) 8. Clustering data mining (ch8) Part IV: Mining Complex Types of Data 9. Mining the Web (Ch9) 10. Mining spatial data (Ch9) Project Presentations Ass3: Oct (14) 16 – Oct 30 Ass4: Oct 30 – Nov 13 Project Due: Dec 8 ~prof6405/Doc/proj.guide * Final Exam: Monday, Dec 08, 19:00 30 October 2003 2 1 Classification by Neural Networks Machine learning by simulating the human brain architecture A human brain can have about 10^11 neurons connected to each other via huge number of so called synapses The first learning algorithm came in 1959 (Rosenblatt) who suggested that if a target output value is provided for a single neuron with fixed inputs, one can incrementally change weights to learn to produce these outputs using the perceptron learning rule Neural networks are modeled on the human brain: nodes (neurons) and connections (synapses) Input nodes: receiving the input signals (object features). Output nodes: give the output signals (classification). Intermediate nodes: contained in hidden layers for carrying weight pattern. Back propagation networks It is the most often used form of neural networks for updating distributed weights. 30 October 2003 3 A Neuron x0 w0 x1 w1 xn wn Input weight vector x vector w ∑ - µk f output y weighted sum Activation function The n-dimensional input vector x is mapped into variable y by means of the scalar product and a nonlinear function mapping 30 October 2003 4 2 A Neuron x0 w0 x1 w1 xn wn - µk ∑ output y Input weight weighted vector x vector w sum For Example y = sign( 30 October 2003 f n ∑ Activation function wi xi + µ k ) i=0 5 Categorization of customers for buying magazine The input nodes are wholly interconnected to the hidden nodes and the hidden nodes are wholly connected to the output nodes. In a untrained network the branches between the nodes have equal weights. During the training stage the network receives examples of input and output pairs corresponding to records in the DB and adapts the weights of the different branches until the inputs match the appropriate outputs. 30 October 2003 6 3 The network learning to recognize the readers of both the car magazine and the comics. 30 October 2003 7 It shows the internal state of the network after training. The configuration of the internal nodes shows that there is a certain connection between the car magazine and the comic readers. However, the networks do not provide a rule to identify this association. 30 October 2003 8 4 Network Training The ultimate objective of training obtain a set of weights that makes almost all the tuples in the training data classified correctly Steps Initialize weights with random values Feed the input tuples into the network one by one For each unit Compute the net input to the unit as a linear combination of all the inputs to the unit Compute the output value using the activation function Compute the error Update the weights and the bias Neural networks (cont) Analogy to Biological Systems (Indeed a great example of a good learning system) Can handle real number valued attributes, and learn nonlinear functions Massive Parallelism allowing for computational efficiency Knowledge learned is weighted network which is not visible to human 30 October 2003 10 5 Summary Classification is an extensively studied problem (mainly in statistics, machine learning & neural networks) Classification is probably one of the most widely used data mining techniques with a lot of extensions Scalability is still an important issue for database applications: thus combining classification with database techniques should be a promising topic Research directions: classification of non-relational data, e.g., text, spatial, multimedia, etc.. 30 October 2003 11 6. MINING ASSOCIATION RULES (ch6) Association rule mining Apriori algorithm Mining various kinds of association/correlation rules Constraint-based association mining Summary 30 October 2003 12 6 What Is Association Mining? Association rule mining - First proposed by Agrawal, Imielinski and Swami [AIS93] - Finding frequent patterns of associations, correlations, or causal structures among sets of items or objects in transaction databases, relational databases, etc. - Frequent pattern: pattern (set of items, sequence, etc.) that occurs frequently in a database Motivation: finding regularities in data about relationships - What products were often purchased together?— Beer and diapers?! - What are the subsequent purchases after buying a PC? - What kinds of DNA are sensitive to this new drug? - Web log (click stream) analysis 30 October 2003 13 Applications * Each association rule is a piece of knowledge describing some relationship between two particular item sets Association rules can reveal meaningful togetherness of items and combinations of attributes. The generation of association rules produces intuitively obvious results that the business owner can use in order to take advantage of the knowledge within their industry to improve their business. * Market basket analysis To analyze customer buying habits by finding associations among different items that customers place in their "shopping baskets". "Which items are likely to be purchased together for a customer on a given trip to the store?" One basket tells us about one customer, but all purchases made by all customers have much more information. 30 October 2003 14 7 Market basket analysis (cont) 1) Analyze grocery sale transactions: Customer Items ----------------------------------------------1 orange juice, soda 2 orange juice, milk, window cleaner 3 orange juice, detergent 4 orange juice, detergent, soda 5 soda, window cleaner The association rule: When a customer buys orange juice he/she will also buy soda: orange juice => soda (40, 50) Support: %40 transactions have both items. Confidence: %50 transactions containing orange juice also contain soda. 30 October 2003 15 Applications (cont) 2) Video store data analysis: ~prof6405/Doc/thesisPothier.pdf Coupon => New releases (10%, 52%) If a customer uses a coupon in a transaction, then there is 52% confidence that they will rent a new release. This occurs in over 10% of all transactions. 3) CRM application: www.amazon.com "Customers who bought this book (book x) also bought:" book x => book y, book z, ... 4) Add customer data into market basket analysis. {age(30 ...39), income(42K...48K)} => {buys(DVD player)} 30 October 2003 16 8 What is an association rule? An association rule is an expression of the form: X => Y where X and Y are sets of items, called itemsets. The meaning of an association rule for a transaction database is that the transactions which contain the items in X tend to contain the items in Y. E.g., The database: ID Items ----------1 A,B,C,E 2 A,B,D,G 3 A,B,C,D 4 B,C,K Some association rules: {A} => {B} (75, 100) {B} => {A} (75, 75} {A,B} => {C} (50, 66) {C} => {A,B} (50, 66) {B,C} => {K} (25, 33}... -The rule "X => Y" for a transaction database means that each transaction is a set of items which may contains X, or Y or both. - The rule is only meaning full if it passed required measures: support rate and confidence rate. 30 October 2003 17 Basic Concepts: Frequent Patterns and Association Rules Transaction-id Items bought 10 A, B, C 20 A, C 30 A, D 40 B, E, F Customer buys both Customer buys beer 30 October 2003 Customer buys diaper Itemset X={x1, …, xk} Find all the rules XÆY with min confidence and support support, s, probability that a transaction contains X∪Y, i.e. P(X∪Y) confidence, c, conditional probability that a transaction having X also contains Y, i.e. P(Y|X) Let min_support = 50%, min_conf = 50%: A Æ C (50%, 66.7%) C Æ A (50%, 100%) {A,C} Æ B ? 18 9 Mining Association Rules—an Example Transaction-id Items bought 10 A, B, C 20 A, C 30 A, D 40 B, E, F Min. support 50% Min. confidence 50% Frequent pattern Support {A} 75% {B} 50% {C} 50% {A, C} 50% For rule {A} ⇒ {C}: support = support({A} ∪ {C}) = 50% confidence = support({A} ∪ {C})/support({A}) = 66.6% 30 October 2003 19 Strategy of Apriori Algorithm Two major phases of the algorithm 1. Mine all frequent itemsets from the database * Use supp factor as a filter. * Apply the heuristic knowledge for efficient search. 2. Derive all valid rules from the mined large itemsets * Use conf factor as a filter. * Apply the prior knowledge again. 30 October 2003 20 10 What is Apriori property? Apriori property: All non-empty subsets of a frequent itemset must also be frequent. E.g., If supp(I1,I2,I3) >= T_supp is true, then it is true that supp(I1) >= T_supp, supp(I2) >= T_supp, supp(I3) >= T_supp, supp(I1,I2) >= T_supp, supp(I1,I3) >= T_supp, supp(I2,I3) >= T_supp. where T_supp is a minimum support rate. If supp(I1) < T_supp, then supp(I1,I2) < T_supp, supp(I1,I3) < T_supp, supp(I1,I2,I3) < T_supp. 30 October 2003 21 What is Apriori property? (cont) In general, if S and R are two itemsets and supp(S) < T_supp, then supp(S U R) < T_supp. If itemset S is not frequent, then adding an itemset R to S, i.e. the itemset {S,R}, can not occur more frequent than S alone. E.g. TID 1 2 3 4 Items A,B,C A,C A,B B,C,D If supp(A) < 80%, then without any calculation we can be sure that any item sets which contain A will be < 80%, that is supp(A,B) < 80%, supp(A,C) < 80%, supp(A,B,C) < 80%, and supp(A,B,C,D) < 80%. 30 October 2003 22 11