Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Basic Probability - Part 1 Kwang In Kim A. I. Lab Terms The set of all possible subsets of set S is called power set S and denoted as 2S A subset of S containing single element AS is called singleton (singleton {A} and element A are distinct!) When S consists of infinitely many element, size of S can be classified into Countable, if correspondence can be made for each element of S with distinct natural number Uncountable, if not countable Contents Acting under uncertainty Basic Idea of probability Tools for Probability Probability axioms σ-algebra Basic notation probability distribution, random variables Acting under Uncertainty Causes Consequences Logic does not fit everywhere Laziness: There may be too many cases and their causal relations Ignorance: It may be practically impossible to obtain all information necessary for conclusion Acting under Uncertainty (Cont’d) Examples Dental Diagnosis If symptom(toothache) then disease(cavity) If symptom(toothache) then disease(cavity) or disease(abscess) or … No wrong; But it will be difficult to consider all Coin tossing experiment: If I toss a coin, will head appear? Wrong; Many other reasons exist Too many factors to consider: coin weight, temperature, tossing direction, speed, etc. It is sometimes required to do reasoning even under uncertainty (with incomplete knowledge) Probability Probability provides a way of summarizing the uncertainty (or randomness) represents degree of our belief / knowledge provides tool for manipulating this Example If symptom(toothache) then disease(cavity) with 80% chance (or 0.8 probability) “Head appeared with 60/100 relative frequency for this coin” Axioms of Probability As did many great scientific discoveries, probability theory did not first started in axiomatic forms It has discovered however, that some desirable properties cannot be satisfied simultaneously Probability axioms are results of moderating between desiderata and provide background on which facilities for reasoning and analysis developed Actual reasoning with probability (assigning, estimating, manipulating probabilities) will be treated in detail in last of this course Coin Tossing Example Suppose you toss a coin and are interested in whether it shows head (H) or tail (T) Because of randomness, you do not know which outcome will come in advance However, you know the set of all possible outcomes is {H,T} Suppose your belief indicates H and T are equally probable. Then, in probability language we say the probability that neither H nor T appears is 0 H appears is ½ T appears is ½ either H or T appears is 1 Note: the above are all possible subsets of {H,T}: {H} {T} {H,T} Coin Tossing Example (Cont’d) Suppose you toss two coins and observe their faces The set of all possible outcomes is {HH, HT, TH, TT} Assuming H/T are equally probable for both coins, the probability of No H/T is 0 each of HH, HT, TH, TT is ¼ only one H is ½ at least one H is ¾ anyone of above is 1 Again, the above are possible subsets of {HH, HT, TH, TT} {HH},{HT},{TH},{TT} {TH, HT} {TH, HT, HH} {HH, HT, TH, TT} each of which is subset of power set 2{HH, HT, TH, TT}: {, {HH},{HT},{TH},{TT}, {HH, HT}, {HH, TH}, {HH, TT}, {HT, TH}, {HT,TT},{TH,TT}, {HH, HT, TH}, {HH, HT, TT}, {HH, TH, TT}, {HT, TH, TT}, {HH, HT, TH, TT}} Basic Ingredients Suppose you are performing an experiment Sample space S The set of all possible outcomes of an experiment E.g., {HH, HT, TH, TT} in two-coin tossing experiments Event Specification of state of world or question that can be answered “yes” or “no” E.g., Event (A) is subset of sample space (S): AS or element of power set A2S Probability (measure) is a function defined on events “TH observed?” “At least one H observed?” Real Interval Example Suppose you have real interval [0,1] and choose at random (uniformly) a number within this interval Then Intuitively, the probability that the chosen number is length of [0,1] is 1 lengths of intervals [0,0],[0.7,0.7], and [0.95,0.95] are all 0 lengths of intervals [0.2,0.3] and [0.4,0.6] are 0.1 and 0.2, respectively lengths of intervals [0.2,0.5] and [0.4,0.6] are 0.3 and 0.2, respectively within [0,1] is 1 outside [0,1] is 0 within [0.7,0.7] is 0 within [0.5, 0.7] is 0.2 outside [0.5, 0.7] is 0.8 in the union of [0.2,0.3] and [0.4,0.6] is 0.3 in the intersection of [0.2,0.5] and [0.4,0.6] is 0.1 Length of subinterval used as probability measure Tentative Axioms of Probability Given sample space S it might be desirable if we can define function P (called probability) such that P()=0 0<P(A)<1 for all A2S P(S)=1 P(i(Ai))=i(P(Ai)) for all disjoint events (AiAj= if i≠j and Ai 2S for all different i): additivity P(AC)=1-P(A) P(AB)=P(A)+P(B)-P(AB) Anomaly in Uncountable Set Previous plan is OK when S is finite or countably infinite (discrete) For uncountably infinite S additivity makes nonsense Union of infinite events with probability 0 can make event of probability one; This makes analysis tough E.g., for [0,1] as S Each singletons is of probability 0 Uncountable union of singletons constitute S which is of probability 1 Tentative Axioms of Probability, Modified Given a sample space S we define a function P such that 0<P(A)<1 for all A2S P(S)=1 P(i(Ai))=i(P(Ai)) for all countable disjoint collection {Ai}: countable additivity P(AC)=1-P(A) P(AB)=P(A)+P(B)-P(AB) Would this then, be enough? Anomaly in Uncountable Set (Cont’d) Unfortunately, it has shown that no useful probability measure P can be defined for 2[0,1] (and other many interesting spaces) This is intrinsic property of sample space [0,1] (and other many interesting spaces!) We restrict our attention on an interesting collection subsets of S should be rich enough to include interesting events should not be too rich so that probability can be defined It might be desirable if satisfies if A then AC if countable collection {Ai} then i(Ai) S if countable collection {Ai} then i(Ai) Such a collection is called a σ-algebra Examples of σ-Algebra For countably large sample space S 2S is default σ-algebra For R (uncountable set) {, R}: trivial to include any interesting events 2R : too rich to define any proper P Borel σ-algebra σ-algebra generated* by all open intervals ((a,b)R) includes virtually all interesting events in R (can be extended to RN ) default σ-algebra in R, RN, etc. * Smallest σ-algebra including all open intervals Axioms of Probability Probability is function P defined on σalgebra on sample space S such that 0<P(A)<1 for all A P(S)=1 P(i(Ai))=i(P(Ai)) for all countable events (AiAj= if i≠j and Ai for all i) Following properties follow immediately P()=0 P(Ac)=1-P(A) P(AB)=P(A)+P(B)-P(AB) How do we describe probability?: Probability Distribution Function For finite or countable S, prescribe P({x}) for all xS (table) p(x)=P({x}) is called the probability mass function For S=R? For any AR, define P(A)=length(A)=xAdx. Is this a probability? Let f(x) be any nonnegative function defined on R such that xRf(x)dx=1. Define P(A)= xAf(x)dx. Is this a probability? Such f is called the probability density function (pdf) Example: f(x)=exp(x2)/(2)½ the Gaussian or normal density Similar results hold for RN Probability mass functions and pdfs are called probability distribution functions Random Variables Suppose a probability P is defined on S Sometimes we are interested in defining a function X from S to another space N and doing probabilistic reasoning in N In this case, probability on N can be induced from that of S E.g., when you toss three coins you might be interested in only the number of heads (rather than exact outcome) S: {HHH, HHT, HTH, HTT, THH, THT, TTH, TTT} N: {0,1,2,3} X:S N summarizes events in S Probability of X=2 can be induced from P: P(X-1(2))=P ({HHT,HTH,THH})=3/8 Such function X is called a random variable Random Variables (Cont’d) Suppose a probability P is defined on S For a function X:SN to be a random variable it should induce a probability G on N based on P: For each event BN, X-1(B)S and X-1(BC)S For events B,C N, X-1(BC)S, etc. Inverse image of σ-algebra on N should be included in the σ-algebra on S Probability distribution functions can be induced in N In practice, virtually any function you can think of satisfies this condition Often, output space N is R: X is called a random variable RN: X is called a random vector R: X is called a random process References Part of this slide comes from Dr. Jahwan Kim’s Lecture slide “Probability” Introductory textbook Textbooks for measure theoretic probability S. Ross, A first course in probability, 6th ed. P. Billingsley, Probability and Measure, 3rd ed. D. Pollard, User’s Guide to Measure Theoretic Probability Reference book E. Jaynes, Probability Theory: The Logic of Science