Download Probability Distributions

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Statistics wikipedia , lookup

History of statistics wikipedia , lookup

Probability wikipedia , lookup

Probability interpretations wikipedia , lookup

Transcript
1/16/15
Learning Objectives
1.  Random variable
Probability Distributions
2.  Probability distributions for discrete random
variables
3.  Mean of a probability distribution
4.  Summarizing the spread of a probability
distribution
Section 1: How Can We Summarize
Possible Outcomes and Their
Probabilities?
5.  Probability distribution for continuous
random variables
1
2
Learning Objective 1:
Random Variable
Learning Objective 1:
Randomness
!  Suppose that the numerical values that
!  A random variable is a numerical
a variable assumes are the result of
some random phenomenon, e.g.,
!  Selecting a random sample for a
population
or
!  Performing a randomized experiment
measurement of the outcome of a random
phenomenon.
3
4
1
1/16/15
Learning Objective 1:
Random Variable
Learning Objective 2:
Probability Distribution
!  Use letters near the end of the alphabet, such as x, to
symbolize a particular value of the random variable
!  The probability distribution of a random
variable specifies its possible values and
their probabilities.
!  Use a capital letter, such as X, to refer to the random
variable itself.
Note: In spite of the randomness of the
variable, there is a pattern of randomness
that allows us to specify probabilities for
the outcomes
Example: Flip a coin three times
!  X=number of heads in the 3 flips; defines the
random variable
!  x=2; represents a possible value of the random
variable
5
Learning Objective 2:
Probability Distribution of a Discrete Random
Variable
6
Learning Objective 2:
Example
!  What is the estimated probability of at least three
!  A discrete random variable X has separate values
home runs?
(such as 0,1,2,…) as its possible outcomes
P(3)+P(4)+P(5)=0.13+0.03+0.01=0.17
!  Its probability distribution assigns a probability P(x) to
each possible value x:
!  For each x, the probability P(x) falls between 0
and 1
!  The sum of the probabilities for all the possible x
values equals 1
7
8
2
1/16/15
Learning Objective 3:
The Mean of a Discrete Probability Distribution
Learning Objective 3:
Expected Value of X
!  The mean of a probability distribution for a
!  The mean of a probability distribution of a
discrete random variable is
µ = ∑ x ⋅ p (x)
where the sum is taken over all possible values
of x.
!  The mean of a probability distribution is denoted
by the parameter, µ.
!  The mean is a weighted average; values of x that
random variable X is also called the expected
value of X.
!  The expected value reflects not what we’ll
observe in a single observation, but rather that
we expect for the average in a long run of
observations.
!  It is not unusual for the expected value of a
random variable to equal a number that is NOT a
possible outcome.
are more likely receive greater weight P(x)
9
10
Learning Objective 4:
The Standard Deviation of a Probability
Distribution
Learning Objective 3:
Example
!  Find the mean of this probability distribution.
The standard deviation of a probability
distribution, denoted by the parameter, σ,
measures its spread.
!  Larger
values of σ correspond to greater
spread.
!  Roughly,
The mean:
σ describes how far the random
variable falls, on the average, from the mean
of its distribution
µ = ∑ x ⋅ p (x )
= 0(0.23) + 1(0.38) + 2(0.22) + 3(0.13) +
4(0.03) + 5(0.01) = 1.38
11
12
3
1/16/15
Learning Objective 5:
Continuous Random Variable
Learning Objective 5:
Probability Distribution of a Continuous Random
Variable
!  A continuous random variable has an
!  A random variable is continuous if its set of
possible values forms an interval.
infinite continuum of possible values in an
interval.
!  Examples are: time, age and size
measures such as height and weight.
!  Continuous variables are measured in a
discrete manner because of rounding.
!  Its probability distribution is specified by a
density curve: the probability of any interval is the
area under the curve and above that interval.
!  Each interval has probability between 0 and 1.
!  The interval containing all possible values has
probability equal to 1.
13
14
Learning Objectives
1.  Normal Distribution
2.  68-95-99.7 Rule for normal distributions
3.  Z-Scores and the Standard Normal
Distribution
4.  The Standard Normal Table: Finding
Probabilities
Section 2: How Can We Find Probabilities
for Bell-Shaped Distributions?
5.  Using the TI-calculator: find probabilities
15
16
4
1/16/15
Learning Objectives
Learning Objective 1:
Normal Distribution
6.  Using the Standard Normal Table in
The normal distribution is symmetric, bellshaped and characterized by its mean µ and
standard deviation σ.
Reverse
7.  Using the TI-calculator: find z-scores
8.  Probabilities for Normally Distributed
Random Variables
9.  Percentiles for Normally Distributed Random
Variables
10.  Using Z-scores to Compare Distributions
!  The
normal distribution is the most important
distribution in statistics
Many distributions have an approximate normal
distribution
!  Approximates many discrete distributions well
when there are a large number of possible
outcomes
!  Many statistical methods use it even when the
data are not bell shaped
! 
17
18
Learning Objective 1:
Normal Distribution
Learning Objective 1:
Normal Distribution
!  Normal distributions are
!  Bell shaped
!  Within what interval do almost all of the
!  Symmetric
men’s heights fall? Women’s height?
around the mean
!  The mean (µ) and the standard deviation (σ)
completely describe the density curve
!  Increasing/decreasing
µ moves the curve
along the horizontal axis
!  Increasing/decreasing σ controls the spread of
the curve
19
20
5
1/16/15
Learning Objective 2:
68-95-99.7 Rule for Any Normal Curve
Learning Objective 2:
Example : 68-95-99.7% Rule
!  68% of the observations fall within one standard deviation of the
mean
!  95% of the observations fall within two standard deviations of
the mean
!  99.7% of the observations fall within three standard deviations
of the mean
!  Heights of adult women
!  can
!  µ
be approximated by a normal distribution
= 65 inches; σ = 3.5 inches
!  68-95-99.7 Rule for women’s heights
" 
68% are between 61.5 and 68.5 inches
" 
95% are between 58 and 72 inches
" 
99.7% are between 54.5 and 75.5 inches
[ µ - σ = 65 - 3.5 ]
[ µ + 2σ = 65 + 2(3.5) = 65 + 7 ]
[ µ - 3σ = 65 - 3(3.5) = 65 - 10.5 ]
21
Learning Objective 3:
Z-Scores and the Standard Normal Distribution
Learning Objective 2:
Example : 68-95-99.7% Rule
!  The z-score for a value x of a random variable is
!  What proportion of women are less than 69
the number of standard deviations that x falls from
the mean
inches tall?
z=
? = 84%
16%
22
68%
(by 68-95-99.7 Rule)
!  A negative (positive) z-score indicates that the
value is below (above) the mean
!  z-scores can be used to calculate the probabilities
?
-1
x −µ
σ
of a normal random variable using the normal
tables in the back of the book
+1
65
68.5
23
(height values)
24
6
1/16/15
Learning Objective 3:
Z-Scores and the Standard Normal Distribution
Learning Objective 4:
Table A: Standard Normal Probabilities
!  A standard normal distribution has mean
Table A enables us to find normal probabilities
!  It
tabulates the normal cumulative probabilities
falling below the point µ+zσ
µ=0 and standard deviation σ=1
!  When a random variable X has a normal
distribution and its values are converted to zscores (by subtracting the mean and dividing by the standard deviation),
the new random variable Z whose values are
these z-scores has the standard normal
distribution.
To use the table:
!  Find
the corresponding z-score
up the closest standardized score (z) in
the table.
!  Look
! 
! 
First column gives z to the first decimal place
First row gives the second decimal place of z
!  The
corresponding probability found in the
body of the table gives the probability of falling
below the z-score
25
26
Learning Objective 4:
Example: Using Table A
Learning Objective 4:
Example: Using Table A
!  Find the probability that a normal random variable
!  Find the probability that a normal random variable takes a
value greater than 1.43 standard deviations above µ:
P(z>1.43)=1-.9236=.0764
takes a value less than 1.43 standard deviations
above µ; P(z<1.43)=.9236
TI Calculator = Normcdf(-1e99,1.43,0,1)= .9236
27
TI Calculator = Normcdf(1.43,1e99,0,1)= 0.0764 28
7
1/16/15
Learning Objective 4:
Example:
Learning Objective 5:
Using the TI Calculator
!  Find the probability that a normal random variable
To calculate the cumulative probability
assumes a value within 1.43 standard deviations of µ
!  Probability below 1.43σ = .9236
!  Probability below -1.43σ = .0764 (1-.9236)
!  P(-1.43<z<1.43) =.9236-.0764=.8472
!  2nd
DISTR; 2:normalcdf(lower bound, upper
bound,mean,sd)
!  Use –1E99 for negative infinity and 1E99 for
positive infinity
29
30
TI Calculator = Normcdf(-1.43,1.43,0,1)= .8472
Learning Objective 6:
How Can We Find the Value of z for a Certain
Cumulative Probability?
Learning Objective 5:
Find Probabilities Using TI Calculator
!  Find probability to the left of -1.64
!  P(z<-1.64)=normcdf(-1e99,-1.64,0,1)=.0505
!  To solve some of our problems, we will need
to find the value of z that corresponds to a
certain normal cumulative probability
!  To do so, we use Table A in reverse
!  Find probability to the right of 1.56
!  P(z>1.56)=normcdf(1.56,1e99,0,1)=.0594
!  Rather
than finding z using the first column
(value of z up to one decimal) and the first row
(second decimal of z)
!  Find probability between -.50 and 2.25
!  P(-.5<z<2.25)=normcdf(-.5,2.25,0,1)=.6793
! 
! 
31
Find the probability in the body of the table
The z-score is given by the corresponding values
in the first column and row
32
8
1/16/15
Learning Objective 6:
How Can We Find the Value of z for a Certain
Cumulative Probability?
Learning Objective 6:
How Can We Find the Value of z for a Certain
Cumulative Probability?
!  Example: Find the value of z for a cumulative
!  Example: Find the value of z for a cumulative
probability of 0.975.
probability of 0.025.
!  Look up the cumulative probability of 0.975 in the
!  Look up the cumulative probability of 0.025 in the
body of Table A.
body of Table A.
!  A cumulative probability of 0.025 corresponds to z
= -1.96.
!  Thus, the probability that a normal
!  A cumulative probability of 0.975 corresponds to z
= 1.96.
!  Thus, the probability that a normal
random variable takes a value no more
than 1.96 standard deviations above
the mean is 0.975.
random variable falls at least 1.96
standard deviations below the
mean is 0.025.
33
34
Learning Objective 7:
Using the TI Calculator to Find Z-Scores for a
Given Probability
Learning Objective 7:
Examples
!  2nd DISTR 3:invNorm; Enter
!  The probability that a standard normal random
variable assumes a value that is ≤ z is 0.975. What is
z? Invnorm(.975,0,1)=1.96
!  The probability that a standard normal random
variable assumes a value that is > z is 0.0275.
!  invNorm(percentile,mean,sd)
!  Percentile is the probability under the curve
from negative infinity to the z-score
What is z? Invnorm(.975,0,1)=1.96
!  The probability that a standard normal random
!  Enter
variable assumes a value that is ≥ z is 0.881.
What is z? Invnorm(1-.881,0,1)=-1.18
!  The probability that a standard normal random
variable assumes a value that is < z is 0.119.
What is z? Invnorm(.119,0,1)= -1.18
35
36
9
1/16/15
Learning Objective 7:
Example
Learning Objective 8:
Finding Probabilities for Normally Distributed
Random Variables
!  Find the z-score z such that the probability
1.  State the problem in terms of the observed
random variable X, i.e., P(X<x)
within z standard deviations of the mean is
0.50.
2.  Standardize X to restate the problem in
terms of a standard normal variable Z
⎛
x − µ⎞
P(X < x) = P⎜ Z < z =
⎟
⎝
σ ⎠
3.  Draw a picture to show the desired
probability under the standard normal curve
4.  Find the area under the standard normal
curve using Table A
!  Invnorm(.75,0,1)=
.67
!  Invnorm(.25,0,1)= -.67
!  Probability = P(-.67<Z<.67)=.5
37
38
Learning Objective 8:
P(X<x)
Learning Objective 8:
P(X>x)
!  Adult systolic blood pressure is normally
!  Adult systolic blood pressure is normally distributed
distributed with µ = 120 and σ = 20. What
percentage of adults have systolic blood pressure
less than 100? ⎛
(100 −120) ⎞ = P(z < −1.00) = .1587
!  P(X<100) = P⎜ Z <
⎟
with µ = 120 and σ = 20. What percentage of adults have
systolic blood pressure greater than 100?
!  P(X>100) = 1 – P(X<100)
!  Normcdf(-1E99,100,120,20)=.1587
!  P(X>100)= 1-.1587=.8413
⎝
20
⎛
(100 −120) ⎞ = P(Z < −1.00) = .1587
P⎜ Z <
⎟
20
⎝
⎠
⎠
!  Normcdf(100,1e99,120,20)=.8413
!  84.1% of adults have systolic blood pressure greater than
!  15.9% of adults have systolic blood pressure less
100
than 100
39
40
10
1/16/15
Learning Objective 8:
P(X>x)
Learning Objective 8:
P(a<X<b)
!  Adult systolic blood pressure is normally distributed
!  Adult systolic blood pressure is normally distributed
with µ = 120 and σ = 20. What percentage of adults have
systolic blood pressure greater than 133?
!  P(X>133) = 1 – P(X<133)
!  P(100<X<133) = P(X<133)-P(X<100)
with µ = 120 and σ = 20. What percentage of adults have
systolic blood pressure between 100 and 133?
⎛
(133 −120) ⎞ = P(Z < .65) = .7422
P⎜ Z <
⎟
20
⎝
⎠
⎛
(133 −120) ⎞ − P⎛ Z < (100 −120) ⎞ =
P⎜ Z <
⎟ ⎜
⎟
20
20
⎝
⎠ ⎝
⎠
P(Z < .65) − P(Z < −1.00) = .7422 − .1587 = .5835
!  P(X>133)= 1-.7422=.2578
!  Normcdf(133,1E99,120,20)=.2578
!  25.8% of adults have systolic blood pressure greater than
!  Normcdf(100,133,120,20)=.5835
!  58% of adults have systolic blood pressure between 100
133
and 133
41
42
Learning Objective 9:
Find X Value Given Area to Left
Learning Objective 9:
Find X Value Given Area to Right
!  Adult systolic blood pressure is normally distributed
!  Adult systolic blood pressure is normally distributed
with µ = 120 and σ = 20. What is the
!  P(X<x)=.25, find x:
! 
! 
1st
with µ = 120 and σ = 20. 10% of adults have systolic blood
pressure above what level?
!  P(X>x)=.10, find x.
quartile?
Look up .25 in the body of Table A to find z= -0.67
Solve equation to find x:
! 
x = µ + zσ =120 + (−0.67) * 20 =106.6
!  Check:
!  P(X<106.6) P(Z<-0.67)=0.25
!  TI Calculator = Invnorm(.25,120,20)=106.6
! 
! 
P(X>x)=1-P(X<x)
Look up 1-0.1=0.9 in the body of Table A to find z=1.28
Solve equation to find x:
x = µ + zσ =120 + (1.28) * 20 =145.6
!  Check:
!  P(X>145.6) =P(Z>1.28)=0.10
!  TI Calculator = Invnorm(.9,120,20)=145.6
43
44
11
1/16/15
Learning Objective 10:
Using Z-scores to Compare Distributions
Z-scores can be used to compare observations from
different normal distributions
!  Example:
!  You score 650 on the SAT which has µ=500 and
σ=100 and 30 on the ACT which has µ=21.0 and
σ=4.7. On which test did you perform better?
!  Compare z-scores
SAT:
ACT:
30 − 21
650 − 500
z=
= 1.91
z=
= 1.5
4.7
100
! 
Section 3: How Can We Find Probabilities
When Each Observation Has Two
Possible Outcomes?
Since your z-score is greater for the ACT, you
performed better on this exam
45
46
Learning Objective 1:
The Binomial Distribution
Learning Objectives
The Binomial Distribution
Conditions for a Binomial Distribution
Probabilities for a Binomial Distribution
Factorials
Examples using Binomial Distribution
Do the Binomial Conditions Apply?
Mean and Standard Deviation of the Binomial
Distribution
8.  Normal Approximation to the Binomial
!  Each observation is binary: it has one of two
1. 
2. 
3. 
4. 
5. 
6. 
7. 
possible outcomes.
!  Examples:
! 
! 
! 
47
Accept, or decline an offer from a bank for a credit
card.
Have, or do not have, health insurance.
Vote yes or no on a referendum.
48
12
1/16/15
Learning Objective 2:
Conditions for the Binomial Distribution
Learning Objective 3:
Probabilities for a Binomial Distribution
!  Each of n trials has two possible outcomes:
!  Denote the probability of success on a trial by
“success” or “failure”.
p.
!  Each trial has the same probability of success,
!  For n independent trials, the probability of x
denoted by p.
successes equals:
!  The n trials are independent.
!  The binomial random variable X is the number of
successes in the n trials.
P(x) =
n!
p x (1− p) n−x , x = 0,1,2,...,n
x!(n - x)!
49
50
Learning Objective 4:
Factorials
Learning Objective 5:
Example: Finding Binomial Probabilities
Rules for factorials:
!  John Doe claims to possess ESP.
!  An experiment is conducted:
!  n!=n*(n-1)*(n-2)…2*1
! 
!  1!=1
!  0!=1
! 
For example,
! 
!  4!=4*3*2*1=24
! 
51
A person in one room picks one of the integers 1,
2, 3, 4, 5 at random.
In another room, John Doe identifies the number
he believes was picked.
Three trials are performed for the experiment.
Doe got the correct answer twice.
52
13
1/16/15
Learning Objective 5:
Example 1
Learning Objective 5:
Example 1
If John Doe does not actually have ESP and is
actually guessing the number, what is the
probability that he’d make a correct guess on two
of the three trials?
# The three ways John Doe could make two correct
guesses in three trials are: SSF, SFS, and FSS.
# Each of these has probability: (0.2)2(0.8)=0.032.
The probability of exactly 2 correct guesses is the binomial
probability with n = 3 trials, x = 2 correct guesses and p = 0.2
probability of a correct guess.
P(2) =
# The total probability of two correct guesses is
3(0.032)=0.096.
3!
(0.2) 2 (0.8)1 = 3(0.04)(0.8) = 0.096
2!1!
2nd Vars
0:binampdf(n,p,x)
Binampdf(3,.2,2)=0.096
53
54
Learning Objective 5:
Binomial Example 2
Learning Objective 6:
Do the Binomial Conditions Apply?
!  1000 employees, 50% Female
!  Before using the binomial distribution,
!  None of the 10 employees chosen for management
training were female.
!  Binary
data (success or failure).
same probability of success for each
trial (denoted by p).
!  Independent trials.
!  The
!  The probability that no females are chosen is:
P(0) =
check that its three conditions apply:
10!
(0.50) 0 (0.50)10 = 0.001
0!10!
!  Binompdf(10,.5,0)=9.765625E-4
!  It is very unlikely (one chance in a thousand) that none of the
10 selected for management training would be female if the
employees were chosen randomly
55
56
14
1/16/15
Learning Objective 7:
Binomial Mean and Standard Deviation
Learning Objective 6:
Do the Binomial Conditions Apply to Example 2?
!  The data are binary (male, female).
!  The binomial probability distribution for n trials
!  If employees are selected randomly, the
probability of selecting a female on a given trial
is 0.50.
!  With random sampling of 10 employees from a
large population, outcomes for one trial does
not depend on the outcome of another trial
with probability p of success on each trial has
mean µ and standard deviation σ given by:
µ = np, σ = np(1 - p)
57
58
Learning Objective 7:
Example: Racial Profiling?
Learning Objective 7:
Example: Racial Profiling?
!  Assume:
!  262 car stops represent n = 262 trials.
!  Data:
!  262 police car stops in Philadelphia in 1997.
!  207 of the drivers stopped were African-American.
!  In 1997, Philadelphia’s population was 42.2%
African-American.
!  Successive
police car stops are
independent.
!  P(driver is African-American) is p = 0.422.
!  Calculate the mean and standard deviation
!  Does
the number of African-Americans
stopped suggest possible bias, being
higher than we would expect (other things
being equal, such as the rate of violating
traffic laws)?
of this binomial distribution:
µ = 262(0.422) = 111
59
σ = 262(0.422)(0.578) = 8
60
15
1/16/15
Learning Objective 7:
Example: Racial Profiling?
Learning Objective 7:
Example: Racial Profiling?
!  Recall: Empirical Rule
!  When a distribution is bell-shaped, close to
100% of the observations fall within 3
standard deviations of the mean.
u - 3σ = 111 - 3(8) = 87
µ + 3σ = 111 + 3(8) = 135
!  If there is no racial profiling, we would not be
surprised if between about 87 and 135 of the 262
drivers stopped were African-American.
!  The actual number stopped (207) is well above these
values.
!  The number of African-Americans stopped is too high,
even taking into account random variation.
Limitation of the analysis:
Different people do different amounts of
driving, so we don’t really know that
42.2% of the potential stops were AfricanAmerican.
61
62
Learning Objective 8:
Approximating the Binomial Distribution with
the Normal Distribution
!  The binomial distribution can be well
approximated by the normal distribution
when the expected number of successes,
np, and the expected number of failures,
n(1-p) are both at least 15.
63
16