* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Probability and Information Theory
Survey
Document related concepts
Transcript
Probability and Information A brief review Copyright, 1996 © Dale Carnegie & Associates, Inc. Probability Probability provides a way of summarizing uncertainty that comes from our laziness and ignorance - how wonderful it is! Probability, belief of the truth of a sentence 7/03 1 - true, 0 - false, 0<P<1 - intermediate degrees of belief in the truth of the sentence Degree of truth (fuzzy logic) vs. degree of belief Data Mining -- Probability H Liu (ASU) & G Dong (WSU) 2 All probability statements must indicate the evidence wrt which the probability is being assessed. 7/03 Prior or unconditional probability Posterior or conditional probability Data Mining -- Probability H Liu (ASU) & G Dong (WSU) 3 Basic probability notation Prior probability Proposition: P(Sunny) Random variable: P(Weather=Sunny) Each Random Variable has a domain Sunny, Cloudy, Rain, Snow Probability distribution P(Weather) = <.7,.2,.08,.02> A random variable is not a number; a number may be obtained by observing a RV. A random variable can be continuous or discrete 7/03 Data Mining -- Probability H Liu (ASU) & G Dong (WSU) 4 Conditional Probability Definition Product rule P(A|B) = P(A^B)/P(B) P(A^B) = P(A|B)P(B) Probabilistic inference does not work like logical inference. 7/03 Data Mining -- Probability H Liu (ASU) & G Dong (WSU) 5 The axioms of probability All probabilities are between 0 and 1 Necessarily true (valid) propositions have probability 1; false (unsatisfiable) have 0 The probability of a disjunction P(AvB)=P(A)+P(B)-P(A^B) 7/03 Data Mining -- Probability H Liu (ASU) & G Dong (WSU) 6 The joint probability distribution Joint completely specifies probability assignments to all propositions in the domain A probabilistic model consists of a set of random variables (X1, …,Xn). An atomic event is an assignment of particular values to all the variables. Marginalization rule for RV Y and Z: P(Y) = ΣP(Y,z) over z Let’s see an example next. 7/03 Data Mining -- Probability H Liu (ASU) & G Dong (WSU) 7 Joint Probability An example of two Boolean variables Toothache Cavity !Cavity 0.04 0.01 !Toothache 0.06 0.89 Observations: mutually exclusive and collectively exhaustive What are P(Cavity) = P(Cavity V Toothache) = P(Cavity ^ Toothache) = P(Cavity|Toothache) = 7/03 Data Mining -- Probability H Liu (ASU) & G Dong (WSU) 8 Bayes’ rule Deriving the rule via the product rule P(B|A) = P(A|B)P(B)/P(A) P(A) can be viewed as a normalization factor that makes P(B|A) + (!B|A) = 1 P(A) = P(A|B)P(B)+P(A|!B)P(!B) A more general case is P(X|Y) = P(Y|X)P(X)/P(Y) Bayes’ rule conditionalized on evidence E P(X|Y,E) = P(Y|X,E)P(X|E)/P(Y|E) 7/03 Data Mining -- Probability H Liu (ASU) & G Dong (WSU) 9 Independence Independent events A, B P(B|A)=P(B), P(A|B)=P(A), P(A,B)=P(A)P(B) Conditional independence 7/03 P(X|Y,Z)=P(X|Z) – given Z, X and Y are independent Data Mining -- Probability H Liu (ASU) & G Dong (WSU) 10 Entropy Entropy measures homogeneity/purity of sets of examples Or as information content: the less you need to know (to determine class of new case), the more information you have With two classes (P,N) in S, p & n instances; let t=p+n. View [p, n] as class distribution of S. 7/03 Entropy(S) = - (p/t) log2 (p/t) - (n/t) log2 (n/t) E.g., p=9, n=5; Entropy(S) = Entropy([9,5]) = - (9/14) log2 (9/14) - (5/14) log2 (5/14) = 0.940 E.g., Entropy([14,0])=0; Entropy([7,7])=1 Data Mining -- Probability H Liu (ASU) & G Dong (WSU) 11 Entropy curve For p/(p+n) between 0 & 1, the 2-class entropy is 7/03 0 when p/(p+n) is 0 1 when p/(p+n) is 0.5 0 when p/(p+n) is 1 monotonically increasing between 0 and 0.5 monotonically decreasing between 0.5 and 1 When the data is pure, only need to send 1 bit 1 Data Mining -- Probability H Liu (ASU) & G Dong (WSU) 0.5 12