Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Introduction to Probability Theory The materials from “Artificial Intelligence A Modern Approach” by Stuart Russell and Peter Norvig, and “Mathematical Statistics with Applications” by Dennis D. Wackerly et. al, were used to prepare these notes. The probability could be thought as a measure of one’s belief in the occurrence of a future event. Some events cannot be predicted with certainty, but the relative frequency with which they occur in a long series of trials is quite stable. This relative frequency is commonly used as a measure of belief in the outcome of a single event. Example. Estimate the probability of a head if in 1000 tosses of a coin, there were 7500 heads and 2500 tails. We assign the probability of 0.75 = 7500/1000 for this coin. An experiment is the process by which an observation is made. For example, a single toss of a coin is an experiment. Each experiment may result in one or more outcomes. A simple event cannot be decomposed and results in one outcome. It corresponds to one sample point. A compound event may result in more than one outcome. Example. A single toss of a balanced die can result in 1, 2, 3, 4, 5, 6. An event “observe 1” is a simple event, but an event “observe odd number” is compound consisting of three different outcomes: 1, 3, 5. The sample space associated with an experiment is the set consisting of all possible sample points. For example, in a toss of a die, sample space consists of simple events with outcomes 1, 2, 3, 4, 5, 6. Example. What is the sample space associated with the experiment of tossing two dice? A possible outcome of such experiment is “observe 1 on the top of the first die and observe 3 on the top of the second die”. What is the size of the sample space for this experiment? Each outcome is a tuple (x, y), and there are 6x6=36 possible tuples in this experiment. The probability is a measure, a number P(X) called probability of an event X, satisfying three axioms: 1) 0 P(X = xi) 1 2) 3) If x1 and x2 are mutually exclusive (or disjoint) events, then A probability model consists of a sample space of mutually exclusive possible events together with a probability measure for each outcome. Example. Consider an experiment of tossing a die. Outcomes 1, 2, 3, 4, 5, 6 are mutually exclusive. Assuming a die is balanced, we assign probability of 1/6 to each event. The sum of probabilities adds up to 1 = 1/6+1/6+1/6+1/6+1/6+1/6. To calculate a compound event “observe an odd number”, we need to add probabilities of simple events that are included in the compound event: P(X=odd) = P(X=1) + P(X=3) + P(X=5) = 1/6 + 1/6 + 1/6 = 3/6 = ½ We denote an event by a variable using uppercase letters. Variables in probability theory are called random variables. The set of all values a random variable can take on is called domain; we denote the values of a random variable by lowercase letters. If a random variable can take on only a finite or countably infinite number of distinct values, then it is called discrete, otherwise, continuous. The probability distribution for a discrete variable X can be represented by a formula, table or a graph: P(X=x). Example. Let Weather be a random variable with possible values sunny, rain, cloudy, snow. We use abbreviation P(Weather) = <0.6, 0.1, 0.29, 0.01> assuming a predefined ordering <sunny, rain, cloudy, snow>. In other words, P is a vector of numbers that defines a probability distribution. Conditional Probability and Independence of Events. The probability of an event sometimes depends on whether we know that other events occurred. For example, probability of “1” in a toss of a die is 1/6, but if we know that an odd number has fallen, then P(1)=1/3 (there are three simple events that are odd). P(1)=1/6 is called unconditional or prior probability and P(1 | odd) is called conditional or posterior probability. The conditional probability of an event A, given an event B has occurred, is equal to Example. In a toss of a balanced die: (Intersection of “1” and “odd” is “1”) If the occurrence of an event A does not depend on the occurrence of an event B, then we say that A and B are independent, and P(A | B) = P(A) P(B | A) = P(B) P(A B) = P(A)P(B) The multiplicative law of probability (the product rule) P(A B) = P(A)P(B|A) In general, P(A1 A2 … Ak) = P(A1) P(A2 | A1) P(A3 | A1 A2) …. P(Ak | A1 A2 … Ak-1) The additive law of probability P(A B) = P(A) + P(B) – P(A B) The intersection of two or more events is frequently of interest to an experimenter. Let Y1 and Y2 be discrete random variables. The joint probability distribution for Y1 and Y2 is given by P(Y1=y1, Y2=y2) = p(y1, y2) p(y1, y2) 0 for all y1, y2 P(Y1=y1, Y2=y2, … , Yk=yk) = P(Y1=y1) P(Y2=y2 | Y1=y1) P(Y3=y3 | Y1=y1, Y2=y2)…P(Yk=yk | Y1=y1, Y2=y2, … , Yk1=yk-1) Next example demonstrates how joint probability distribution for two variables, Province and Sector, is estimated from raw data. Source: http://www.jmc2007compendium.com/V2-ATAPE-P-7.php A. Data Matrix. Consider rows containing the geographical areas while the columns contain economic sectors. B. Joint Probability Table. Each cell in M2 is computed by dividing the corresponding cell in M1 by the grand total. The marginal probability is defined as (marginalization) = (conditioning) Example. Use the joint probability table for Province and Sector, to answer the following questions. 1. Find the marginal probability P(Province = isabela) P(Province = isabela) = P(Province=isabela, Sector=agri) + P(Province=isabela, Sector=industry) + P(Province=isabela, Sector=service) + P(Province=isabela, Sector=undefined) = 0.187 + 0.049 + 0.188 + 0.059 = 0.482 2. Find the marginal probability for each value of Sector P(Sector=agri) = P(Sector=agri, Province=batanes) + P(Sector=agri, Province=cagayan) + P(Sector=agri, Province=isabela) + P(Sector=agri, Province=vizcaya) + P(Sector=agri, Province=quirino) = 0.003 + 0.156 + 0.187 + 0.064 + 0.022 = 0.433 Similarly, P(Sector=industry) = 0.001 + 0.023 + 0.049 + 0.022 + 0.005 = 0.10 P(Sector=service) = 0.354 P(Sector=undefined) = 0.113 3. Find the conditional probability P(Province=isabela | Sector=x) for all values x of Sector. 4. Calculate the marginal probability P(Province=isabela) using conditioning P(Province=isabela) = 0.4320.433 + 0.490.1 + 0.5310.354 + 0.5220.113 = 0.422