Download Repe$$on

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Statistics wikipedia , lookup

History of statistics wikipedia , lookup

Inductive probability wikipedia , lookup

Birthday problem wikipedia , lookup

Ars Conjectandi wikipedia , lookup

Probability interpretations wikipedia , lookup

Probability wikipedia , lookup

Transcript
Repe$$on
Tue 1 March 2016 Change log: some typos have been fixed on slides 21 and 23 (19 March 2016) Change log: some typos have been fixed on slide 18 (28 April 2016) 1 Literature
•  Reading is required J 2 Defini$on: The probability of an event
•  The probability of an event is the sum of the probabiliHes of the outcomes it contains. Therefore the probability of event A, ie P(A) is the sum of the probability funcHon P(x) over all values x in the sample space: •  the probability funcHon P(x) lies between zero and one for every value of x in the sample space Ω, and the sum of P(x) over all values x in the sample space Ω is equal to 1. •  An event is defined as any subset A of the sample space Ω. 3 h9ps://www.mathsisfun.com/numbers/sigma-­‐calculator.html What is the probability of a string of three vowels?
Calculations: 6x6x6=216;
26x26x26=17576;
216 * 1/17576 (multiplication can be seen as
repeated addition)
à
216/17576=0.01228949
4
Defini$on: Random variable
•  A random variable is a variable whose value is subject to variaHons due to chance. •  A random variable can take on a set of possible different values (similarly to other mathemaHcal variables), each with an associated probability, in contrast to other mathemaHcal variables. •  A random variable's possible values represent the possible outcomes of a staHsHcal experiment. Therefore: P(x) is the random variable in: 5 Mass and Density
•  a random variable is a funcHon defined on a sample space whose outputs are numerical values •  Random variables can be discrete, that is, taking any of a specified finite or countable list of values, or conHnuous, taking any numerical value in an interval. •  The funcHon f(x), mapping a point -­‐-­‐ ie a descrete value -­‐-­‐ in the sample space to the "probability" value is called a probability mass func3on abbreviated as pmf. •  The funcHon f(x), mapping a interval -­‐-­‐ ie a conHnuous value -­‐-­‐ in the sample space to the "probability" value is called a probability density func3on abbreviated as pdf (very roughly said…) 6 Probability Distribu$ons
•  Certain random variables occur very oWen in probability theory because they well describe many natural or physical processes. Their distribuHons therefore have gained special importance in probability theory. •  Some fundamental discrete distribuHons are the discrete Bernoulli, binomial, Poisson and geometric distribuHons, etc. •  Important conHnuous distribuHons include the conHnuous uniform, normal, etc. 7 Entropy and Probability
•  Entropy is the expected amount of informaHon of a random variable, ie: •  How surprise we get when we learn the value of a variable or •  How difficult it is to predict a variable •  Entropy can be seen as complementary to Probability: •  Probability helps us make educated guesses on events, it tells us how predictable an event is •  Entropy tells us how ”surprising” something can be. •  The amount of surprise is interpreted in terms of ”informaHve-­‐ness”. •  If something is very unpredictable, it is very surprising, so the amount of new informaHon that it carries is very high. Conversely… •  For this reason when probabiliy is 0.5 (so it is very uncertain), entropy is at its peak. 8 In LT, entropy-­‐based measures are used to assess… •  how informaHve a feature is (in classificaHon): is this feature useful for classificaHon or not? Can we get rid of it or not? •  How informaHve a word is: does this word carry lots of informaHon or not? •  … the amount of informaHon that the observaHon of one word gives about the occurrence of another word. For example, do 2 words tend to occur togehter or not ( see mutual informaHon) •  etc. 9 Entropy and Surprisal (=self-­‐informa$on) •  surprisal is a measure of the informaHon content (in bits if the log is base 2) associated with an event in a probability space : •  It measure the amount of surprise generated by the event x. The smaller the probability of x, the bigger the surprisal is. •  the entropy of a random event is the expected value of its self-­‐informaHon (surprisal). Remember: NegaHve logarithm means the number of Hmes we divide 1 by the base to achieve the log value. The negaHve of logy (x) is ylog (1 / x). 10 Nega$ve log calculator
•  hdps://www.easycalculaHon.com/negaHve-­‐log.php 11 Review with addi$onal reading
•  Axioms : hdps://www.probabilitycourse.com/chapter1/1_3_2_probability.php 12 Ques$on 1: Axiom 2
•  Yes or no? 13 Lab1: Task 6
we cannot say that there is a union between the 2 events because the summaHon of their individual probabiliHes is an impossible fact, according to what stated in Axiom 2. 14 Ques$on 2: Axiom 3 – Disjoint events (mutually exclusive=they cannot happen together)
•  In a presidenHal elecHon, there are four candidates. Call them A, B, C, and D. Based on our polling analysis, we esHmate that A has a 20 percent chance of winning the elecHon, while B has a 40 percent chance of winning. What is the probability that A or B win the elecHon? 15 Theorems/Rules (à derived from the 3 axioms mathema$cally)
hdps://www.probabilitycourse.com/chapter1/1_3_3_finding_probabiliHes.php Using the 3 axioms of probability, we can derive (mathemaHcally) the following (and many other) theorems (aka rules): 16 Disjoint events = Mutually exclusive events
•  When 2 events are mutaully exclusive, it is impossible for the them to happen together. Therefore, P(A ∩ B)=0. 17 Independent events
MathemaHcally, 2 events are independent if their intersecHon is equal to the product of their individual probability: P(A ∩ B) = P(A) * P(B) This formula is the joint probability of independent events Watch out! This mulHplicaHon of events has nothing to do with the mulHplicaHon of outcomes!!!! Warning! One common mistake is to confuse independence and being disjoint. These are completely different concepts. When two events A and B are disjoint it means that if one of them occurs, the other one cannot occur, i.e., A∩B=0. Thus, event A usually gives a lot of informaHon about event B which means that they cannot be independent. AddiHonal reading: hdps://www.probabilitycourse.com/chapter1/1_4_1_independence.php 18 Joint probability of independent events: P(A ∩ B) = P(A) * P(B) RepeHHon: •  Joint probability is the likelihood of more than one event occurring at the same Hme. Joint probability is calculated by mulHplying the probability of event A, expressed as P(A), by the probability of event B, expressed as P(B). •  Ex: suppose a staHsHcian wishes to know the probability that the number 5 will occur twice when two dice are rolled at the same Hme. Since each die has six possible outcomes, the probability of a five occurring on each die is 1/6 or 0.1666. •  P(A)=0.1666 •  P(B)=0.1666 •  P(A,B)=0.1666 x 0.1666)=0.02777 •  This means the joint probability that a 5 will be rolled on both dice at the same Hme is 0.02777. 19 Disjoint ≠ Independent : Lab1 – Task 6
… 20 Say that we have the following event probabili$es (computed on a corpus): •  Event 1: the probability that an email contains the word ”spam” is 0.03 •  Event 2: the probability that an email contains the word ”viagra” is 0.04 •  Event 3: The probability that an email contains the word ”spam” OR the word ”viagra” is 0.06 •  Event 4: The probability that an email contains both the word ”spam” AND the word ”viagra” is 0.01 •  QuesHons: •  Are events 1 and 2 disjoint (mutually exclusive)? •  Are events 1 and 2 independent? 21 Joint Probability of dependent events
Two events are dependent if the outcome or occurrence of the first affects the outcome or occurrence of the second so that the probability is changed. This means: The condiHonal probability of an event B in relaHonship to an event A is the probability that event B occurs given that event A has already occurred. The notaHon for condiHonal probability is P(B|A). When two events, A and B, are dependent, the probability of both occurring is: P(A and B) = P(A) * P(B|A) or P(A and B) = P(B) * P(A|B) See mul3plica3on rule 22 Don’t use intui$on, use calcula$ons
•  According to the probability counts computed in our corpus, we can say that: •  The events 1 and 2 are NOT disjoint because the intersecHon of event 1 and event 2 is a non-­‐empty set with probability 0.01 •  CondiHon for disjoint event: P(A ∩ B)=0. •  The events 1 and 2 are not independent because the joint probability is not equal to the product of the individual probabiliHes of the 2 events: •  Joint probability = 0.01 •  Individual probabiliHes: 0.03*0.04=0.0012 •  0.01 ≠ 0.0012 23 Addi$onal reading
•  Rules of probability: hdps://people.richland.edu/james/lecture/m170/ch05-­‐rul.html •  CondiHonal probability: hdps://people.richland.edu/james/lecture/m170/ch05-­‐cnd.html 24 Manipula$ons !!!
•  We manipulate data in order to derive informaHon we do not have… 25 Marginal Probability=Law of Total Probability
•  Joint probability is the probability that two events will occur simultaneously. •  Marginal probability is the probability of the occurrence of a single event. •  Marginal probabiliHes are computed by adding across rows and down columns 26 P(A ∩ B) and Marginals
•  Joint probabiliHes of two events: event (rain) and event(wind) •  Sample space is 1; Joint probability of the two events is 1. 27 Benefits
•  Once we have the marginal probability of single events, we can use them for other calcualaHons! For ex, if we need to compute condiHonal probability…. 28 Joint, Marginal & Condi$onal Probabili$es
What is important is to understand the relaHon between the joint, the marginal and the condiHonal probabiliHes, and the way we can derive them from each other. In parHcular, given that we know the joint probabiliHes of the events we are interested in, we can always derive the marginal and condiHonal probability from them, whereas the opposite does not hold (except in some special condiHons). sum up to 1 What if we want the simple probabiliHes? Once we have the joint probabiliHes and the simple probabiliHes, we can combine these to get condiHonal probabiliHes. 29 TheoreHcally, we should imagine the sample space (rectangle) divided into mutually exclusive parHHons. An event in a parHnioned sample space relates to the individual parHHons and not to the sample space as a whole Marginaliza$on=Law of Total Probability
•  The marginalizaHon formula is applied via tabulated data, exactly in the way we did before cf. MulHplicaHon rule P(A,B)=P(A|B) P(B) Read more: hdps://www.probabilitycourse.com/
chapter1/1_4_2_total_probability.php 30 Remember: from the defini$on of condi$onal probability we can derive the Mul$plica$on Rule
2) Another way to compute the joint probability of A and B is to start with the simple probability of A and mulHply that by the probability of B given A 1) One way to compute the probability of A and B (ie the joint probability) is to take the probability of B by itself and mulHply it with the probability of A given B. 31 Bayes Law is a way of deriving condi2onal probabili2es
•  The Bayes law states that given events A and B in the sample space Ω, the condi;onal probability of A given B is equal to the simple probability of A mul;plied by the inverse condi;onal probability (ie the probability of B given A) divided by the simple probability of B. It follows directly from the defini;on of condi;onal probabili;es simply using the mul;plica;on/chain rule. It allows us to invert condi;onal probability when we are in a situa;on where we need to know the probability of A given B, but our data gives us only the probability of B given A. Provided that we can derive the simple probability of A and B -­‐-­‐ maybe through marginaliza;on -­‐-­‐ we can get the probabili;es that we need. 32 Quiz 2 (only one answer is correct)
We know P(Sym|Dis) = 0.9 Given the disease, the paHent has certain symptoms. We would like to know if givent certain symptoms we can conclude that the pa3ent has the disease P(Dis|Sym) 33 Quiz 2: Solu$ons
1.  The probability is 0.1 — incorrect [We cannot compute P(Dis|Sym) from P(Sym|Dis) without addiHonal informaHon.] 2.  The probability is 0.9 — incorrect [We cannot compute P(Dis|Sym) from P(Sym|Dis) without addiHonal informaHon.] 3.  Nothing — correct [We cannot compute P(Dis|Sym) from P(Sym|
Dis) without addiHonal informaHon.] 34 Break down
•  P(Sym|Dis) = 0.9 à P(B|A)=0.9 •  P(Dis|Sym) = ? à P(A|B)=? •  Bayes: •  P(A|B)= P(A) P(B|A) / P(B) •  P(A)=? •  P(B)=? We could invert the formula via the Bayes Law, but… We need addiHonal info, ie P(A) and P(B) We use marginaliza;on/Law of Total Probability to derive (A) and P(B), if we know the joint probabili;es of the 2 events 35 The end
36