Download Introduction to Probability Theory The materials from “Artificial

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Statistics wikipedia , lookup

History of statistics wikipedia , lookup

Inductive probability wikipedia , lookup

Birthday problem wikipedia , lookup

Ars Conjectandi wikipedia , lookup

Probability interpretations wikipedia , lookup

Probability wikipedia , lookup

Transcript
Introduction to Probability Theory
The materials from “Artificial Intelligence A Modern Approach” by Stuart Russell and Peter Norvig, and
“Mathematical Statistics with Applications” by Dennis D. Wackerly et. al, were used to prepare these
notes.
The probability could be thought as a measure of one’s belief in the occurrence of a future event. Some
events cannot be predicted with certainty, but the relative frequency with which they occur in a long
series of trials is quite stable. This relative frequency is commonly used as a measure of belief in the
outcome of a single event.
Example. Estimate the probability of a head if in 1000 tosses of a coin, there were 7500 heads and 2500
tails. We assign the probability of 0.75 = 7500/1000 for this coin.
An experiment is the process by which an observation is made. For example, a single toss of a coin is an
experiment. Each experiment may result in one or more outcomes. A simple event cannot be
decomposed and results in one outcome. It corresponds to one sample point. A compound event may
result in more than one outcome.
Example. A single toss of a balanced die can result in 1, 2, 3, 4, 5, 6. An event “observe 1” is a simple
event, but an event “observe odd number” is compound consisting of three different outcomes: 1, 3, 5.
The sample space associated with an experiment is the set consisting of all possible sample points. For
example, in a toss of a die, sample space consists of simple events with outcomes 1, 2, 3, 4, 5, 6.
Example. What is the sample space associated with the experiment of tossing two dice? A possible
outcome of such experiment is “observe 1 on the top of the first die and observe 3 on the top of the
second die”. What is the size of the sample space for this experiment? Each outcome is a tuple (x, y),
and there are 6x6=36 possible tuples in this experiment.
The probability is a measure, a number P(X) called probability of an event X, satisfying three axioms:
1) 0  P(X = xi)  1
2)
3) If x1 and x2 are mutually exclusive (or disjoint) events, then
A probability model consists of a sample space of mutually exclusive possible events together with a
probability measure for each outcome.
Example. Consider an experiment of tossing a die. Outcomes 1, 2, 3, 4, 5, 6 are mutually exclusive.
Assuming a die is balanced, we assign probability of 1/6 to each event. The sum of probabilities adds up
to 1 = 1/6+1/6+1/6+1/6+1/6+1/6. To calculate a compound event “observe an odd number”, we need to
add probabilities of simple events that are included in the compound event:
P(X=odd) = P(X=1) + P(X=3) + P(X=5) = 1/6 + 1/6 + 1/6 = 3/6 = ½
We denote an event by a variable using uppercase letters. Variables in probability theory are called
random variables. The set of all values a random variable can take on is called domain; we denote the
values of a random variable by lowercase letters. If a random variable can take on only a finite or
countably infinite number of distinct values, then it is called discrete, otherwise, continuous. The
probability distribution for a discrete variable X can be represented by a formula, table or a graph:
P(X=x).
Example. Let Weather be a random variable with possible values sunny, rain, cloudy, snow. We use
abbreviation
P(Weather) = <0.6, 0.1, 0.29, 0.01> assuming a predefined ordering <sunny, rain, cloudy, snow>. In other
words, P is a vector of numbers that defines a probability distribution.
Conditional Probability and Independence of Events.
The probability of an event sometimes depends on whether we know that other events occurred. For
example, probability of “1” in a toss of a die is 1/6, but if we know that an odd number has fallen, then
P(1)=1/3 (there are three simple events that are odd). P(1)=1/6 is called unconditional or prior
probability and P(1 | odd) is called conditional or posterior probability.
The conditional probability of an event A, given an event B has occurred, is equal to
Example. In a toss of a balanced die:
(Intersection of “1” and “odd” is “1”)
If the occurrence of an event A does not depend on the occurrence of an event B, then we say that A
and B are independent, and
P(A | B) = P(A)
P(B | A) = P(B)
P(A  B) = P(A)P(B)
The multiplicative law of probability (the product rule)
P(A  B) = P(A)P(B|A)
In general,
P(A1  A2  …  Ak) = P(A1) P(A2 | A1) P(A3 | A1  A2) …. P(Ak | A1  A2  …  Ak-1)
The additive law of probability
P(A  B) = P(A) + P(B) – P(A  B)
The intersection of two or more events is frequently of interest to an experimenter. Let Y1 and Y2 be
discrete random variables. The joint probability distribution for Y1 and Y2 is given by
P(Y1=y1, Y2=y2) = p(y1, y2)
p(y1, y2)  0 for all y1, y2
P(Y1=y1, Y2=y2, … , Yk=yk) = P(Y1=y1) P(Y2=y2 | Y1=y1) P(Y3=y3 | Y1=y1, Y2=y2)…P(Yk=yk | Y1=y1, Y2=y2, … , Yk1=yk-1)
Next example demonstrates how joint probability distribution for two variables, Province and Sector, is
estimated from raw data. Source: http://www.jmc2007compendium.com/V2-ATAPE-P-7.php
A. Data Matrix. Consider rows containing the geographical areas while the columns contain economic sectors.
B. Joint Probability Table. Each cell in M2 is computed by dividing the corresponding cell in M1 by the grand
total.
The marginal probability is defined as
(marginalization)
=
(conditioning)
Example. Use the joint probability table for Province and Sector, to answer the following
questions.
1. Find the marginal probability P(Province = isabela)
P(Province = isabela) = P(Province=isabela, Sector=agri) + P(Province=isabela,
Sector=industry) + P(Province=isabela, Sector=service) + P(Province=isabela,
Sector=undefined)
= 0.187 + 0.049 + 0.188 + 0.059 = 0.482
2. Find the marginal probability for each value of Sector
P(Sector=agri) = P(Sector=agri, Province=batanes) + P(Sector=agri,
Province=cagayan) +
P(Sector=agri, Province=isabela) + P(Sector=agri, Province=vizcaya) +
P(Sector=agri, Province=quirino) = 0.003 + 0.156 + 0.187 + 0.064 + 0.022 = 0.433
Similarly,
P(Sector=industry) = 0.001 + 0.023 + 0.049 + 0.022 + 0.005 = 0.10
P(Sector=service) = 0.354
P(Sector=undefined) = 0.113
3. Find the conditional probability P(Province=isabela | Sector=x) for all values x of
Sector.
4. Calculate the marginal probability P(Province=isabela) using conditioning
P(Province=isabela) = 0.4320.433 + 0.490.1 + 0.5310.354 + 0.5220.113 = 0.422