Download 01-Introduction to Probability

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Foundations of statistics wikipedia , lookup

Probability interpretations wikipedia , lookup

Probability wikipedia , lookup

Transcript
BIOINF 2118
01-Introduction to Probability
2013-01-08, p.1 of 4
Probability and Statistics
Probability:
Process
Data
“Given a process or mechanism,
after many repetitions what kinds of outcomes (data) can we expect?”
Statistics:
Data
Process
“Given some data,
what can we say about the process or mechanism that gave rise to the data?”
Example: Diagnostic testing
A patient arrives in the clinic. The doctor suspects that the patient suffers a particular illness.
The true state, “healthy” or “sick”, is unknown; therefore the doctor orders a diagnostic test.
True state of patient =  = “the process”. The result = X = “the data”.
Top arrow: probability: If we knew the process, then the probabilities of the result would be “known”
(at least roughly, from previous patients’ data).
Bottom arrow: statistics: After the test, we still do not know the true state of the patient,
but from the data X we now have better knowledge.
DATA
PROCESS
X=negative X=positive X=indeterminate TOTAL
0.03
0.02
1.00
 = “healthy” 0.95
0.03
0.95
0.02
1.00
 = “sick”
TABLE 1: each row is a model; the collection of rows is a model family.
Here  is the unknown “true state of nature”, and X, the test result, is an observation.
Generating a test result X is the result of a process under a particular state of nature .
The sample space is the set of possible observations,
X = {negative, positive, indeterminate}.
The parameter space is the set of possible “states of nature,
= {healthy, sick}.
Each table entry is a conditional probability
.
The “healthy” row is a probability distribution,
Tthe “sick” row is another probability distribution,.
For example, if  = “healthy”, the probability distribution is:
Pr(X=negative) = 0.95, Pr(X=positive)=0.03, Pr(X=indeterminate)=0.02.
The pair of rows is a model family (or a model).
+
Each column is a likelihood function, L. (We write L : Q ®
.)
For example if X=negative is observed, then
L( = “healthy”) = 0.95, L( = “sick”)=0.03.
In the context of a likelihood, these numbers are NOT probabilities. (Columns don’t add to one.)
BIOINF 2118
01-Introduction to Probability
2013-01-08, p.2 of 4
Now suppose that the prevalence of the disease is 10%. Prevalence = Pr( = “sick”).
The following table is the joint distribution of  and X.
 = “healthy”
 = “sick”
TOTAL
X=negative
0.855
0.003
0.858
X=positive
0.027
0.095
0.122
X=indeterminate
0.018
0.002
0.020
TOTAL
0.90
0.10
1.00
TABLE 2: joint probabilities for each combination
Interpretations of Probability
•
Frequency interpretation:
“
” means:
“If I do the test repeatedly on a large number of sick patients,
then in the long run roughly 95% of the test results will equal 2.
•
Subjective (Bayesian) interpretation- before data is observed:
, which means:
“Given what I know now,
my current belief is that there’s a 10% chance that this patient is sick.”
Connection with decision-making:
This belief sometimes represents a willingness to gamble that the patient is sick,
if the payoff is above the ratio 9-to-1 (0.9/0.1), but not if it’s below 9-to-1.
•
Subjective (Bayesian) interpretation- after data is observed:
X=negative (1) X=positive (2) X=indeterminate (3)
0.9965
0.221
0.9
 = “healthy”
0.0035
0.779
0.1
 = “sick”
TOTAL
1.0000
1.000
1.0
TABLE 3: conditional probabilities, given X
,which means:
“Given what I knew before, plus what I know now (the data),
my current belief is that there’s a 77.9% chance that this patient is sick.”
Table 3 combines the two types of probability: belief and frequency.
Now the gambling odds are 0.779/(1-0.779) = 3.52.
“Statistics” is assessing whether the patient is healthy or sick, after observing X.
We saw this above, in the form of the posterior probability, 0.779.
When the prevalence is not known, we have to use the frequency interpretation of probability,
using the models in TABLE 1. A great tool is the likelihood ratio, LR:
BIOINF 2118
01-Introduction to Probability
2013-01-08, p.3 of 4
.
LR(X=1) = 0.03/0.95 ~ 1/32, LR(X=2) =0.95/0.03 ~ 32, LR(X=3) = 1.
Here we see that the observation X=negative lowers the probability of “sick”, and X=positive
raises. Observing X=indeterminate does not provide any information, as reflected in LR=1.
For each value of X, we can see in what way the value of the probability changes, but we cannot
say what the final probability is because we do not know the initial probability.
Experiments
•
An experiment is any process in which the outcome is uncertain.
•
Examples: Rolling a die, conducting a clinical trial, conducting a survey, getting married,….
•
The sample space X is the set of possible outcomes.
•
Example: For our diagnostic test, X = {1, 2, 3}. For rolling a die, X = {1, 2, 3, 4, 5, 6}.
Sets and Subsets
•
A sample space X is a set.
•
An outcome is an element of the sample space,
.
•
An event is a subset of the sample space. For example,
is the event of rolling
an even number with a die.
•
An event A implies another event B if every outcome in A also belongs to B. This relation
is denoted
, “A is a subset of B”.
•
A parameter space
is a set.
•
A hypothesis is a subset
.
Empty, Finite and Infinite Sets
•
The empty set contains no outcomes. It is denoted by . For all events A,
•
Sets may be finite or infinite.
is finite.
•
Infinite sets may be countably infinite or uncountably infinite.
X =[0,1] is uncountable.
.
is countable (but infinite).
Union, Intersection, Complement
Concept
union (either/or)
intersection (both)
complement (not)
empty set, or null set
set product
Subset
element of
symbol
Ac
or { }
X
R function or value
union( )
intersect( )
setdiff( )
NULL, character(0)
expand.grid( )
all(is.element( ))
is.element( )
Disjoint Events
•
A and B are disjoint if and only if
.
.
•
Events
are disjoint or mutually exclusive if, for every
.
BIOINF 2118
01-Introduction to Probability
2013-01-08, p.4 of 4
Semi-formal definition of probability
A probability space is a sample space X, together with a mapping Pr from events in a sample
space to [0,1] (in mathematical notation, Pr: 2X  [0,1]) that satisfy three axioms:
Axiom 1: For every event A
,
.
(To be technically correct: there may be very esoteric sets which cannot be assigned a probability.)
Axiom 2: Pr(X) = 1.
Axiom 3: For every “countable” sequence of disjoint events
,
Some probability theorems
.
.
.
.
Some formal definitions:
Given a parameter space
and a sample space X,
a model family indexed by
is a set of probability distributions
.
When X is observed, the likelihood function is the function
defined by
.
(Later, we’ll modify this slightly for “continuous distributions”.)
 STUDY CAREFULLY ALL NOTATION AND DEFINITIONS. 
The likelihood function:
the parameter space
The probability model:
X
, the sample space