Download Lecture8_SP16_statistical_decisions

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Randomness wikipedia , lookup

Dempster–Shafer theory wikipedia , lookup

Infinite monkey theorem wikipedia , lookup

Birthday problem wikipedia , lookup

Probability box wikipedia , lookup

Ars Conjectandi wikipedia , lookup

Inductive probability wikipedia , lookup

Probability interpretations wikipedia , lookup

Transcript
Statistical decision
making
Statistical analysis of data can inform us on making
important decisions
Statistical data always has an element of uncertainty
The fundamental question is what is our best guess of
what the samples we have tell us about the complete set
of data.
Frequentist statistics
frequency interpretation of probability: any given
experiment can be considered as one of an infinite
sequence of possible repetitions of the same
experiment, each capable of producing statistically
independent results.
the frequentist inference approach to drawing
conclusions from data is effectively to require that the
correct conclusion should be drawn with a given (high)
probability, among this notional set of repetitions.
Example:
If a fair coin is flipped sufficiently many times, then
the inference that it is a fair coin is likely to be drawn if
it is flipped enough times. How many is enough?
If an unfair coin is flipped sufficiently many times,
then the inference that it is a fair coin is likely to be
drawn if it is flipped enough times. How many is
enough?
Sample mean and population
mean
X1, X2 ,…, Xn random numerical measurements
E.g., Xn results from nth coin flip, value 0 or 1.
M= (X1+X2 +…+Xn )/n sample mean
Μ= true expected value of X.
The central limit theorem implies that the sample mean
should converge to the true mean.
If n is large then with high probability, the sample
mean is close to the true mean.
How large is large? How close is close?
Central limit theorem
Coin flips: binomial distribution
Large N: distribution is approximately normal
Normal distribution:
Some normal distributions
Probability that variable takes
value between a and b is the
area under the graph
Confidence interval
The more samples one takes, the more likely it is that
the sample mean is close to the true population mean.
One would like a relationship between N and the
probability that m- μ is smaller than a given fixed value.
Cost benefit: there is often a cost to gathering data but
often a bigger cost associated with an incorrect
conclusion derived from insufficient data.
Error: how precise do you need to be versus
Probability of error: what risk are you willing to take
that you are correct?
Confidence interval example
You want to know whether a coin is (likely to be) fair.
You flip it 100 times. You observe that it comes up
heads 60 times.
Your question: what is the probability that it would
come up heads 60 times (or more) if the coin is a fair
coin?
Put otherwise: how likely is is that a fair coin will come
up heads 60 or more times out of 100?
Plot of probabilities of a given number
of heads out of 100 flips of a fair coin:
100th row of Pascal’s triangle
The probability of 60 or more
heads from 100 coin flips is
about 3 percent.
Confidence intervals
Hypothesis: the expected value of h, the proportion of
trials on which the coin should land on heads in the
long run, will be within a certain error of the sample
average, with high probability.
E: experiment of repeating the coin flip N times
H: the observed number of heads.
Desired: if E is repeated infinitely often then the sample
mean m will be within Err of the true mean h a high
proportion P of the time.
We are 100P percent confident that the true mean lies
in the interval (H/N-err, H/N+err)
Central tendency cont.
Coin flips: can compute the binomial
distribution explicitly and the probabilities
associated with various outcomes.
The confidence interval derives from
adding the probabilities of the various
outcomes corresponding to that interval
and excluding the remaining probabilities.
The precise statement is a subtle reflection
of the approximability of the Gaussian
curve by a binomial curve.
Fair coin example
Example: Suppose that a coin has an unknown
probability r of landing on heads.
Bayesian approach:
Posterior probability: the conditional probability of the
causes, given the observed effects.
Example: probability that a coin is fair, given that it has
landed on heads on some observed proportion of
tosses.
Prior: distribution of an uncertain quantity, before any
measurements are made.
Second example: signal
embedded in noise
In a lot of situations, there is a pattern in data but it is
invisible because of noise in the data.
Consider an example of a sound clip, which can be
represented by a graph as follows
Noisy audio, 5 sec
Average of 100 samples of
noisy audio
500 sample average
5000 sample average
25000 sample average
Reconstructions, 20 ms
Prior:
Example:
Denotes the prior probability that a coin would land on
heads H times out of N tosses if it were already known
that the coin has probability r of landing on heads.
Posterior probability :
Example: what is the probability that a coin is fair if it
landed on heads H times out of N tosses?
Bayes theorem
Bayes theorem is a way to compute a posterior
probability if a prior conditional probability is known
and a likelihood is known. In our example,
Problem: we do not know
But our best guess is
P(r): side information is needed.
Current age
10 years
20 years
30 years
30
0.43
1.86
4.13
40
1.45
3.75
6.87
50
2.38
5.60
8.66
60
3.45
6.71
8.65
†Source: Altekruse SF, Kosary CL, Krapcho M,
Neyman N, Aminou R, Waldron W, Ruhl J, Howlader
N, Tatalovich Z, Cho H, Mariotto A, Eisner MP, Lewis
DR, Cronin K, Chen HS, Feuer EJ, Stinchcomb DG,
Edwards BK (eds). SEER Cancer Statistics Review, 1975–
2007, National Cancer Institute. Bethesda, MD, based
on November 2009 SEER data submission, posted to
the SEER Web site, 2010.
The mammogram question
In 2009, the U.S. Preventive Services Task Force (USPSTF) — a
group of health experts that reviews published research and
makes recommendations about preventive health care — issued
revised mammogram guidelines. Those guidelines included the
following:
Screening mammograms should be done every two years
beginning at age 50 for women at average risk of breast cancer.
Screening mammograms before age 50 should not be done
routinely and should be based on a woman's values regarding the
risks and benefits of mammography.
Doctors should not teach women to do breast self-exams.
The mammogram question (cont)
These guidelines differed from those of the American Cancer
Society (ACS). ACS mammogram guidelines established in 2003
called for yearly mammogram screening beginning at age 40
for women at average risk of breast cancer. The ACS said the
breast self-exam is optional in breast cancer screening.
USPSTF acknowledges that women who have screening
mammograms die of breast cancer less frequently than do
women who don't get mammograms. Recent randomized trials
put figures at 15 to 29 percent lower. These figures have to be
taken in context. The USPSTF says the benefits of screening
mammograms don't outweigh the harms for women ages 40 to
49. Potential harms may include false-positive results that lead
to unneeded breast biopsies, anxiety and distress.
Update: ACS now recommending age 45 for annual screening
New ACS recommendations
As of Oct 20 2015: ACS suggests women ages 40 to 44
should have the choice to start annual breast cancer
screening with mammograms (x-rays of the breast) if
they wish to do so.
Women age 45 to 54 should get mammograms every
year. Women 55 and older may switch to
mammograms every 2 years.
Some women – because of their family history, a
genetic tendency, or certain other factors – should be
screened with MRIs along with mammograms.
What does it have to do with US vehicle
fatalities?
Bayesian analysis of USPSTF
recommendation
The rate of incidence of new cancer in women aged 40
is about 1 percent
Of existing tumors, about 80 percent show up in
mammograms.
9.6% of women who do not have breast cancer will
have a false positive mammogram
Suppose a woman aged 40 has a positive mammogram.
What is the probability that the woman actually has
breast cancer?
According to See Casscells, Schoenberger, and
Grayboys 1978; Eddy 1982; Gigerenzer and Hoffrage
1995; and many other studies, only about 15% of
doctors can compute this probability correctly.
False positives in a medical test
False positives: a medical test for a disease may return a positive
result indicating that patient displays a marker that correlates with
presence of the disease.
Bayes' formula: probability that a positive result is a false positive.
The majority of positive results for a rare disease may be false
positives, even if the test is accurate.
Example
A test correctly identifies a patient who has a particular disease 99% of
the time, or with probability 0.99
The same test incorrectly identifies a patient who does not have the
disease 5% of the time, or with probability 0.05.
Is it true that only 5% of positive test results are false?
Suppose that only 0.1% of the population has that disease: a randomly
selected patient has a 0.001 prior probability of having the disease.
A: the condition in which the patient has the disease
B: evidence of a positive test result.
The probability that a positive result is a false positive is about
1 − 0.002 = 0.998, or 99.8%.
The vast majority of patients who test positive do not have the disease:
The fraction of patients who test positive who do have the disease
(0.019) is 19 times the fraction of people who have not yet taken the
test who have the disease (0.001). Retesting may help.
To reduce false positives, a test should be very accurate in reporting a
negative result when the patient does not have the disease. If the test
reported a negative result in patients without the disease with
probability 0.999, then
 False negatives: a medical test for a disease may return a negative
result indicating that patient does not have a disease even though
the patient actually has the disease.
 Bayes formula for negations:
 In our example = 0.01 x .001/(.01x.001 + .05x .999)=0.0000105 or
about 0.001 percent. When a disease is rare, false negatives will not
be a major problem with the test.
 If 60% of the population had the disease, false negatives would be
more prevalent, happening about 1.55 percent of the time
Clicker question
On a certain island, 1 pct of the population has a
certain disease. A certain test for the disease is
successful in detecting the disease, if it is present, 80%
of the time. The rate of positive test results in the
population is 4%.
What is the probability that someone who tests positive
actually has the disease?
A) 1%
B) 2%
C 4%
D) 8%
Prosecutors fallacy
the context in which the accused has been brought to court is
falsely assumed to be irrelevant to judging how
confident a jury can be in evidence against them with a
statistical measure of doubt.
This fallacy usually results in assuming that the prior
probability that a piece of evidence would implicate a
randomly chosen member of the population is equal to the
probability that it would implicate the defendant.
Defendant’s fallacy
Comes from not grouping the evidence together.
In a city of ten million, a one in a million DNA
characteristic gives any one person that has it a 1 in 10
chance of being guilty, or a 90% chance of being
innocent.
Factoring in another piece of incriminating would give
much smaller probability of innocence.
OJ Simpson
In the courtroom
Bayesian inference can be used by an individual juror to see
whether the evidence meets his or herpersonal threshold for
'beyond a reasonable doubt.
G: the event that the defendant is guilty.
E: the event that the defendant's DNA is a match crime scene.
P(E | G): probability of observing E if the defendant is guilty.
P(G | E): probability of guilt assuming the DNA match (event E).
P(G): juror's “personal estimate” of the probability that the
defendant is guilty, based on the evidence other than the DNA
match.
P(E |G)P(G)
P(E)
On the basis of other evidence, a juror decides that there is a 30% chance that the
defendant is guilty. Forensic testimony suggests that a person chosen at random
would have DNA 1 in a million, or 10−6 chance of having a DNA match to the crime
scene.
Bayesian inference:
P(G | E) =
E can occur in two ways: the defendant is guilty (with prior probability 0.3) so his
DNA is present with probability 1, or he is innocent (with prior probability 0.7) and
he is unlucky enough to be one of the 1 in a million matching people.
P(G|E)= (0.3x1.0)/(0.3x1.0 + 0.7/1 million) =0.99999766667
The approach can be applied successively to all the pieces of evidence presented in
P(E | G)for
= \frac{P(G|E)P(E)}{P(G)}
court, with the posterior from one stage becoming the prior
the next.
P(G)? for a crime known to have been committed by an adult male living in a town
containing 50,000 adult males, the appropriate initial prior probability might be
1/50,000.
O.J.
Nicole Brown was murdered at her home in Los Angeles on the
night of June 12,1994. The Prime suspect was her husband
0.J.Simpson, at the time a well-known celebrity famous both as a
TV actor and as a retired professional football star. This murder
led to one of the most heavily publicized murder trial in U.S.
during the last century. The fact that the murder suspect had
previously physically abused his wife played an important role in
the trial. The famous defense lawyer Alan Dershowitz, a member
of the team of lawyers defending the accused, tried to belittle the
relevence of the fact by stating that only 0.1% of the men who
physically abuse their wives actually end up murdering them.
Question: Was the fact that O.J.Simpson had previously physically
abused his wife irrelevant to the case?
E = all the evidence, that Nicole Brown was murdered
and was previously physically abused by her husband.
G = O.J. Simpson is guilty
What about
?
Posterior odds = prior odds x Bayes factor In the example above, the
juror who has a prior probability of 0.3 for the defendant being
guilty would now express that in the form of odds of 3:7 in favour
of the defendant being guilty, the Bayes factor is one million, and
the resulting posterior odds are 3 million to 7 or about 429,000 to
one in favour of guilt.
In the UK, Bayes' theorem was explained to the jury in the odds
form by a statistician expert witness in the rape case of Regina
versus Denis John Adams.
The Court of Appeal upheld the conviction, but it also gave their
opinion that "To introduce Bayes' Theorem, or any similar method,
into a criminal trial plunges the jury into inappropriate and
unnecessary realms of theory and complexity, deflecting them from
their proper task.”
Bayesian assessment of forensic DNA data remains controversial.
Gardner-Medwin : criterion is not the probability of guilt, but
rather the probability of the evidence, given that the defendant is
innocent (akin to a frequentist p-value).
If the posterior probability of guilt is to be computed by Bayes'
theorem, the prior probability of guilt must be known.
A: The known facts and testimony could have arisen if the
defendant is guilty, B: The known facts and testimony could have
arisen if the defendant is innocent, C: The defendant is guilty.
Gardner-Medwin : the jury should believe both A and not-B in
order to convict. A and not-B implies the truth of C, but B and C
could both be true. Lindley's paradox.
Other court cases in which probabilistic arguments played some
role: the Howland will forgery trial, the Sally Clark case, and the
Lucia de Berk case.