Download Review

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Central limit theorem wikipedia , lookup

Transcript
Review
●
In most card games cards are dealt without replacement.
What is the probability of being dealt an ace and then
a 3? Choose the closest answer.
a) 0.0045
b) 0.0059
c) 0.0060
d) 0.1553
Review
●
What is the probability of throwing two 6s in a row
with a fair die?
a) 0.0278
b) 0.0333
c) 0.1389
d) 0.333
Tree Diagrams
●
●
Tree diagrams help us think through conditional
probabilities by showing sequences of events as paths
that look like branches of a tree
We often make tree diagrams when reversing the
conditioning
●
●
●
●
Suppose we want to know Prob(A | B), but we know only
Prob(A), Prob(B) and Prob(B | A)
We also know Prob(A and B), since P(A and B) = Prob(A) x
Prob(B | A)
From this information, we can find Prob(A | B)
When we reverse the probability from the conditional
probability that we are originally give, we use Bayes
Theorem
Example – false positive rates
●
Assume there is a screening test for a certain cancer that
is 95 percent accurate if someone has the cancer. Also
assume that if someone doesn't have the cancer, the test
is positive just 1 percent of the time. Assume further that
0.5 percent actually have this type of cancer. What is
the probability that someone who tested positive for this
cancer does not actually have the cancer, i.e. what is the
false positive rate?
Example – false positive rates
Example – false positive rates
Example – false positive rates
Using Bayes Rule:
●
About 68% of people who test positive for cancer do not
actually have cancer!
Example – false positive rates
●
What percent of the people who test positive for this
cancer actually have cancer?
Example – HIV test
●
HIV prevalence is .006 in the US population, so .994 do
not have HIV. There is a HIV test that if you have the
disease 99% of the time the test says positive (1% false
negative). If you don't have the disease 98% of the time
the test says negative (2% false positive). What is the
probability that someone actually has HIV if the test says
positive?
Chapter 6: Modeling Random
Events: The Normal and Binomial
Models
Probability Model and Distributions
●
●
A probability model is a description of how a statistician
thinks data are produced
● Uniform
● Linear
● Normal
● Other
A probability distribution or probability distribution
function (pdf) is a table or graph that gives all the
outcomes of a random experiment and their probabilities
Discrete vs. Continuous
●
●
A random variable is called discrete if the outcomes are
values that can be listed or counted
● Number of classes taken
● The roll of a die
A random variable is called continuous if the outcomes
cannot be listed because they occur over a range
● Time to finish the exam
● Exact weight
Discrete or Continuous
Classify the following as discrete or continuous
● Length of your left thumb
● Number of children in a family
● Number of devices in the house that connect to the
Internet
● Sodium concentration in the bloodstream
Discrete Probability Distributions
●
●
The most common way to display a pdf for discrete data
is with a table
The probability distribution table always has two
columns (or rows)
● The first, x, displays all the possible outcomes
● The second, P(x), displays the probabilities for these
outcomes
Examples of Probability Distribution
tables
●
Important: The sum of all the probabilities must equal 1
Die Roll
x
P(x)
1
1/6
2
1/6
3
1/6
4
1/6
5
1/6
6
1/6
Raffle Prize
x
P(x)
95
0.01
995
0.005
-5
0.985
Example – Playing Dice
●
Roll a fair six-sided die. You will win $4 if you roll a 5
or a 6. You will lose $5 if you roll a 1. You will lose $1
if you roll a 2. Any other outcome, you will win or lose
$0. What is the probability distribution table for the
amount you will win?
Continuous Probability Distribution
Functions
●
Often represented a curve.
● The area under the curve between two values of x
represents the probability of x being between the two
values
● The total area under the curve must equal 1
● The curve cannot lie below the x-axis
The Normal Model
●
●
●
The Normal Model is a good fit if:
● The distribution is unimodal
● The distribution is approximately symmetric
● The distribution is approximately bell shaped
A Normal distribution is defined by the mean
and
standard deviation . Shorthand for a normal
distribution is
N( ,
)
The Normal distribution is also called the Gaussian
distribution or the Bell Curve
Standardizing with z-scores
●
●
●
Reminder: z-scores are standardized scores
Z-scores are used to compare individual data values to
their mean relative to their standard deviation
The formula for calculating the z-score of a data value
is:
z-scores
●
●
●
Standardizing data into z-scores shifts the data by
subtracting the mean and rescales the values by dividing
by their standard deviation
Standardizing into z-scores does not change the shape
of the distribution
Standardizing into z-scores changes the center by
making the mean 0
Standardizing into z-scores changes the spread by
making the standard deviation 1
Shape, center, and spread of z-scores
●
●
Z-scores for normally distributed variables are also
normally distributed, but with mean 0 and standard
deviation 1
z ~ N(0, 1)
Z-scores for a variable with some other distribution (right
skewed, uniform, etc.) will follow the same shape as the
original distribution, but with mean 0 and standard
deviation 1
When is a z-score big?
●
●
●
A z-score gives us an indication of how unusual a value
is because it tells us how far it is from the mean
Remember that a negative z-score tells us that the data
value is below the mean, while a positive z-score tells us
that the data value is above the mean
The larger a z-score is (negative or positive), the more
unusual it is
Calculating percentiles and probabilities
with normal models
●
●
●
Since z-scores tell us whether or not an observation is
unusual, they can also tell us how unusual the
observation is (i.e. how likely it is to observe such a
value)
So far we have only be able to tell how unusual an
observation is if it was exactly 1, 2, or 3 standard
deviations from the mean (using the Empirical Rule)
What happens if we have a z-score of 2.5 or -1.3?
Calculating percentiles using the z-table
●
ACT scores are distributed normally with mean 21 and
standard deviation 5. If Adam got a 27 on his ACT,
what is his percentile score?
Note: percentile score means what percent is below the
observed value
First we compute our z-score:
●
Now we go to the z-table.
●
Using the z-table
●
●
We have z = 1.20
z-values occur on
the outer edges of
the z-table,
probabilities are in
the middle
Note: It's best to round z-scores to 2 decimal places since the ztable displays z-scores up to two decimal places
Calculating percentiles using the z-table
●
●
ACT scores are distributed normally with mean 21 and
standard deviation 5. If Adam got a 27 on his ACT,
what is his percentile score?
With a z-score of 1.20 we found the value 0.8849
Adam's score is the 88.49th percentile, i.e. he scored
higher than 88.49% of the test takers.
Percentiles to Probabilities
If a score of 27 is higher than about 88.49% of all
scores on this test, this means that the probability of
scoring lower than 27 is 0.8849.
P(ACT score < 27) = 0.8849
Similarly, the probability of scoring higher than 27 is the
complement of this probability:
P(ACT score > 27) = 1 – 0.8849 = 0.1151
Note: Complement probabilities complete each other to 1; the area under
the normal curve is equal to 1, so when we know the probability of one
side, to get the other side we just subtract it from 1.
Example - z-scores
What percent of standard normal is found where
z < -1.1? Draw a picture first.
Example - z-scores
What percent of standard normal is found where
z > -2.09? Drawing a picture first may help.
a) 2.09%
b) 98.17%
c) 1.83%
d) 0.0183%
Example - z-scores
What percent of standard normal is found where
-1< z < 2.5?
Example - z-scores
What percent of standard normal is found where
z > 13?
a) approximately 100%
b) approximately 0%
c) 1%
d) Cannot calculate with the z-table given, the table does
not go up to z = 13
Example – z-scores
ACT scores are distributed normally with mean 21 and
standard deviation 5. What percent of scores fall
between 28 and 19 on the ACT?
Example – finding observed value from
percentile
Let's assume SAT scores are ~ N(1500, 300). If Sophie
scored at the 76th percentile, what was her actual score?
●
We are given
percentile, so now
start in the middle
of the z-table and
work out to find
the z-score
Example – finding observed value from
percentile
●
Let's assume SAT scores are ~ N(1500, 300). If Sophie
scored at the 76th percentile, what was her actual score?
From the table we found the corresponding z-score of 0.71
th
for the 76 percentile, so:
Example – finding observed value from
percentile
Let's assume SAT scores are ~ N(1500, 300). If Snookie
scored at the 3rd percentile, what was her actual score?
Example – finding observed value from
percentile
Let's assume SAT scores are ~ N(1500, 300). Between what
two scores do the middle 50% of SAT test takers score?