Download Note

Document related concepts

Central limit theorem wikipedia , lookup

Transcript
Statistics for Science
(ENV)
Chapter 6
Continuous Random Variables
1
Continuous Random Variables
6.1
6.2
6.3
6.4
Continuous Probability Distributions
The Uniform Distribution
The Normal Probability Distribution
Approximating the Binomial Distribution by
Using the Normal Distribution
Continuous Probability Distributions
• A continuous random variable may assume
any numerical value in one or more intervals
• Use a continuous probability distribution to
assign probabilities to intervals of values
Probability density function (PDF)
• PDF P(x) has no trivial meaning before integration
• But after integration, it become probability
P(a<X<b) = ab p(x) dx
(Area under the graph from a to b)
• As a result, P( X=a ) = aa p(x) dx = 0
•  = E(X) = -+ x p(x) dx.
• 2 = Var(X) = E((X- )2)
= -+ (x- )2 p(x) dx.
Properties of Probability density function (PDF)
•
Properties of P(x): P(x) is a continuous function
such that
1. P(x)  0 for all x
2. The total area under the curve of P(x) is equal to 1
•
Essential point: An area under a PDF is a
probability
Area and Probability
• The blue area under the curve f(x) from
x = a to x = b is the probability that x
could take any value in the range a to b
– Symbolized as P(a  x  b)
– Or as P(a < x < b), because each of the
interval endpoints has a probability of 0
Distribution Shapes
• Symmetrical and rectangular
– The uniform distribution (Section 6.2)
• Symmetrical and bell-shaped
– The normal distribution (Section 6.3)
• Skewed
– Skewed either left or right (Section 6.5)
The Uniform Distribution
• If c and d are numbers on the real line (c < d), the
probability curve describing the uniform distribution is
 1

f x =  d  c
0

for c  x  d
otherwise
• The probability that x is any value between the values a
and b (a < b) is
ba
P a  x  b  
d c
• Note: The number ordering is c < a < b < d
The Uniform Distribution
Continued
• The mean X and standard deviation X of a
uniform random variable x are
X
X
cd

2
d c

12
• These are the parameters of the uniform
distribution with endpoints c and d (c < d)
The Uniform Probability Curve
Notes on the Uniform Distribution
• The uniform distribution is symmetrical
– Symmetrical about its center X
– X is the median
• The uniform distribution is rectangular
– For endpoints c and d (c < d) the width of the distribution
is d – c and the height is
1/(d – c)
– The area under the entire uniform distribution is 1
• Because width  height = (d – c)  [1/(d – c)] = 1
• So P(c  x  d) = 1
Example 6.1: Uniform Waiting Time #1
• Suppose you appear at an elevator entry at a random
time.
• The amount of time, X, that you need to wait for the
elevator is uniformly distributed between zero and
four minutes
1
 1

for 0  x  4

4
f x =  4  0
0
otherwise

– So c = 0 and d = 4, and
– and
Pa  x  b  
ba ba

40
4
Example 6.1: Uniform Waiting Time #2
• What is the probability that you have to wait
at least 2.5 minutes for an elevator?
– The probability is the area under the uniform
distribution in the interval [2.5, 4] minutes
• The probability is the area of a rectangle with height ¼
and base 4 – 2.5 = 1.5
– P(x ≥ 2.5) = P(2.5 ≤ x ≤ 4) = ¼  1.5 = 0.375
Example 6.1: Uniform Waiting Time #3
• What is the probability that you have to wait
less than one minutes for an elevator?
– P(x ≤ 1) = P(0 ≤ x ≤ 1) = ¼  1 = 0.25
Example 6.1: Uniform Waiting Time #4
• Expect to wait the mean time µX=(0 + 4)/2 = 2
• With a standard deviation σX of
X 
40
 1.1547 minutes
12
– The mean plus/minus one standard deviation is
µX ± σX = 2 ± 1.1547 = [0.8453, 1.1547] minutes
• The probability that the waiting time is within one
standard deviation of the mean time is
P(0.8453 ≤ x ≤ 3.1547)=¼  (3.1547–0.8453)
= 0.57735
Assignment: waiting for the green light
• Cars arrive a traffic light randomly in time. The
traffic light timing is set so that the green light
lasts for 80 seconds, and follows by 40 seconds
of red light.
• What is the probability that a car does not have
to wait in front of this traffic light?
• What is the probability that a car has to wait
less than 20 seconds?
• What is the average waiting time?
16
The Normal Probability
Distribution
• The normal probability distribution is defined by the
2
equation
1  x  
 

1
f( x) =
e 2  
σ 2π
for all values x on the real number line
–  is the mean and  is the standard deviation
–  = 3.14159… and e = 2.71828 is the base of natural
logarithms
The Normal Probability
Distribution
Continued
• The normal curve is symmetrical about its
mean 
– The mean is in the middle under the curve
– So  is also the median
• It is tallest over its mean 
• The area under the entire normal curve is 1
– The area under either half of the curve is 0.5
The Normal Probability Distribution
Continued
r
a
l
i
t r
b
u
i o
n
:

=
0
,
2
=
1
Characteristics of a Normal Distribution
0
. 4
0
. 3
Normal
curve is
symmetrical
. 2
0
. 1
Theoretically,
curve extends to
infinity
f ( x
0
. 0
- 5
a
Mean, median, and mode
are equal
x
Properties of the Normal Distribution
• There are an infinite number of normal curves
– The shape of any individual normal curve depends on
its specific mean  and standard deviation 
The standard normal N(0,1) distribution is a
normal distribution with a mean of 0 and a
standard deviation of 1.
• P(z) ~ exp(-0.5*z2) also called the z distribution.
x
z

Properties of the Normal Distribution
Continued
• The tails of the normal extend to infinity in
both directions
– The tails get closer to the horizontal axis but never
touch it
• The area under the normal curve to the right
of the mean equals the area under the normal
to the left of the mean
– The area under each half is 0.5
The Position and Shape of the Normal Curve
(a) The mean  positions the peak of the normal curve over the real axis
(b) The variance 2 measures the width or spread of the normal curve
Relationship with other distributions
• The binomial distribution B(n, p) is
approximately normal N(np, np(1 − p)) for
large n and for p not too close to zero or one.
• The Poisson(λ) distribution is approximately
normal N(λ, λ) for large values of λ.
Normal Probabilities
• Suppose X is a normally distributed random
variable with mean  and standard deviation

• The probability that X could take any value in
the range between two given values a and b (a
< b) is P(a ≤ X ≤ b)
Normal Probabilities
Continued
P(a ≤ X ≤ b) is the area colored in blue under
the normal curve and between the values x =
a and x = b
Three Important Areas under the Normal Curve
1. P(µ – σ ≤ X ≤ µ + σ) = 0.6826
– So 68.26% of all possible observed values of x are within
(plus or minus) one standard deviation of µ
2. P(µ – 2σ ≤ X ≤ µ + 2σ) = 0.9544
– So 95.44% of all possible observed values of x are within
(plus or minus) two standard deviations of µ
3. P(µ – 3σ ≤ X ≤ µ + 3σ) = 0.9973
– So 99.73% of all possible observed values of x are within
(plus or minus) three standard deviations of µ
Three Important Percentages
The Standard Normal Distribution
• If x is normally distributed with mean  and
standard deviation , then the random
variable z
x
z

is normally distributed with mean 0 and
standard deviation 1
• This normal is called the standard normal
distribution
The Standard Normal Distribution
Continued
•z measures the number of standard deviations that x is from the mean m
– The algebraic sign on
z indicates on which
side of  is x
– z is positive if x >  (x
is to the right of  on
the number line)
– z is negative if x < 
(x is to the left of 
on the number line)
The Standard Normal Table
• The standard normal table is a table that lists
the area under the standard normal curve to
the right of the mean (z=0) up to the z value of
interest
– See Table 6.1, Table A.3 in Appendix A, and the
table on the back of the front cover
• This table is so important that it is repeated 3 times in
the textbook!
• Always look at the accompanying figure for guidance on
how to use the table
The Standard Normal Table
Continued
• The values of z (accurate to the nearest tenth)
in the table range from 0.00 to 3.09 in
increments of 0.01
– z accurate to tenths are listed in the far left
column
– The hundredths digit of z is listed across the top of
the table
• The areas under the normal curve to the right
of the mean up to any value of z are given in
the body of the table
A Standard Normal Table
The Standard Normal Table Example
• Find P(0 ≤ z ≤ 1)
– Find the area listed in the table corresponding to a z value
of 1.00
– Starting from the top of the far left column, go down to
“1.0”
– Read across the row z = 1.0 until under the column headed
by “.00”
– The area is in the cell that is the intersection of this row
with this column
– As listed in the table, the area is 0.3413, so
• P(0 ≤ z ≤ 1) = 0.3413
Areas under the Standard Normal Curve
Calculating P(-2.53 ≤ z ≤ 2.53)
• First, find P(0 ≤ z ≤ 2.53)
–
–
–
–
Go to the table of areas under the standard normal curve
Go down left-most column for z = 2.5
Go across the row 2.5 to the column headed by .03
The area to the right of the mean up to a value of z = 2.53
is the value contained in the cell that is the intersection of
the 2.5 row and the .03 column
– The table value for the area is 0.4943
Continued
Calculating P(-2.53 ≤ z ≤ 2.53)
Continued
• From last slide, P(0 ≤ z ≤ 2.53)=0.4943
• By symmetry of the normal curve, this is also
the area to the left of the mean down to a
value of z = –2.53
• Then P(-2.53 ≤ z ≤ 2.53) = 0.4943 + 0.4943 =
0.9886
Calculating P(z  -1)
• An example of finding the area under the
standard normal curve to the right of a
negative z value
• Shown is finding the under the standard
normal for z ≥ -1
Calculating P(z  1)
Finding Normal Probabilities
1. Formulate the problem in terms of x values
2. Calculate the corresponding z values, and restate
the problem in terms of these z values
3. Find the required areas under the standard normal
curve by using the table
Note: It is always useful to draw a picture showing the
required areas before using the normal table
Example: scores of IQ test
• Most of the IQ tests are standardized with a mean
score of 100 and a standard deviation of 15. If
scores on an IQ test are normally distributed, what
is the probability that a person who takes the test
will score between 90 and 110?
• Solution: We want to know the probability that the test
score falls between 90 and 110. Since the mean is 100 and
the sd is 15, we need to find the probability between z>0.67 and z<0.67 from table. The answer is 2X0.2486 =
0.4972.
Example: scores of IQ test 2
• With the same assumption as last example,
what is the probability that a person with IQ
score above 130?
• Solution:We have z= (130-100)/15 = 2. From
the table, P(z>2) = 0.5-0.4772 = 2.28%
The Cumulative Normal Curve
• The cumulative normal curve is shown below:
The cumulative
normal table gives
the shaded area
Cumulative area under the standard
normal curve
The Cumulative Normal Table
• Useful for finding the probabilities of
threshold values like P(z ≤ a) or P(z ≥ b)
– Find P(z ≤ 1)
• Find directly from cumulative normal table that P(z ≤ 1)
= 0.8413
– Find P(z ≥ 1)
• Find directly from cumulative normal table that P(z ≤ 1)
= 0.8413
• Because areas under the normal sum to 1
P(z ≥ 1) = 1 – P(z ≤ 1) = 1 – 0.8413 = 0.1587
Cumulative area under the standard normal curve
Cumulative area under the standard
normal curve: example
• Albert Einstein, the famous physicist of last
century, was considered to "only" have an IQ
of about 160. What is the probability that a
person with IQ score above 160?
• Solution:We have z= (160-100)/15 = 4. From
table 6.1, P(z<4) = 0.99997
P(z>4) = 1-0.99997 = 0.00003
or lest than 3 in 100,000 persons
Example: probability of getting an “A”
• Only the top 10% students in this class can
get the “A” grade. If the final scores of the
class are normally distributed, what is the
minimum z-value one should have in order
to get an “A”?
• Solution: We need to find z such that the
cumulative area under the standard normal
curve is 0.9. From table 6.1, we get z>1.28.
Example 6.2: The Car Mileage Case
• Want the probability that the mileage of a randomly selected
midsize car will be between 32 and 35 mpg
– Let x be the random variable of mileage of midsize cars
– Want P(32 ≤ x ≤ 35 mpg)
• Given:
– x is normally distributed
– µ = 33 mpg
– σ = 0.7 mpg
• Procedure (from previous slide):
1. Formulate the problem in terms of x (as above)
2. Restate in terms of corresponding z values (see next slide)
3. Find the indicated area under the standard normal curve using the z
table (see the slide after that)
Example 6.2: The Car Mileage Case2
• For x = 32 mpg, the corresponding z value is
x   32  33
1
z


 1.43

0.7
.07
• So the mileage of 32 mpg is 1.43 standard deviations
below (to the left of) the mean
• For x = 35 mpg, the corresponding z value is
x   35  33
2
z


 2.86

0.7
.07
• Then P(32 ≤ x ≤ 35 mpg) = P(-1.43 ≤ z ≤ 2.86)
Example 6.2: The Car Mileage Case3
• Want: the area under the normal curve
between 32 and 35 mpg
• Will find: the area under the standard normal
curve between -1.43 and 2.86
Continued
Example 6.2: The Car Mileage Case4
• Break this into two pieces, around the mean
– The area to the left of µ between -1.43 and 0
• By symmetry, this is the same as the area between 0 and 1.43
• From the standard normal table, this area is 0.4236
– The area to the right of µ between 0 and 2.86
• From the standard normal table, this area is 0.4979
– The total area of both pieces is 0.4236 + 0.4979 = 0.9215
– Then, P(-1.43 ≤ z ≤ 2.86) = 0.9215
• Returning to x, P(32 ≤ x ≤ 35 mpg) = 0.9215
Illustrating the Results of Example 6.2
Finding Z Points on a Standard Normal Curve
Example 6.5: Stocking to prevent out of inventory
• A discount store sells packs of 50 DVD’s
• Want to know how many packs to stock so that there
is only a 5 percent chance of stocking out during the
week
– Let x be the random variable of weekly demand
– Let st be the number of packs so there is only a 5%
probability that weekly demand will exceed st
– Want the value of st so that P(x ≥ st) = 0.05
• Given x is normally distributed, µ = 100 packs, and σ
= 10 packs
Example 6.5 #2
Example 6.5 #3
• In panel (a), st is located on the horizontal axis under
the right-tail of the normal curve having mean µ =
100 and standard deviation σ = 10
• The z value corresponding to st is
st   st  100
z


10
• In panel (b), the right-tail area is 0.05, so the area
under the standard normal curve to the right of the
mean z = 0 is 0.5 – 0.05 = 0.45
Example 6.5 #4
• Use the standard normal table to find the z value
associated with a table entry of 0.45
–
–
–
–
–
But do not find 0.45; instead find values that bracket 0.45
For a table entry of 0.4495, z = 1.64
For a table entry of 0.4505, z = 1.65
For an area of 0.45, use the z value midway between these
So z = 1.645
st  100
 1.645
10
Example 6.5 #5
• Combining this result with the result from a
prior slide:
st  100
 1.645
10
• Solving for st gives st = 100 + (1.645  10) =
116.45
– Rounding up, 117 packs should be stocked so that
the probability of running out will not be more
than 5 percent
Finding a Tolerance Interval
Finding a tolerance interval [  k] that
contains 99% of the measurements in a
normal population
Example:
• Given a population with mean height of 170
cm and sd=15 cm
• What is the interval in height which contains
99% of the population?
61
Approximating the Binomial Distribution by Using the
Normal Distribution
• The figure below shows several binomial
distributions
• Can see that as n gets larger and as p gets closer to
0.5, the graph of the binomial distribution tends to
have the symmetrical, bell-shaped, form of the
normal curve
63
Normal Approximation to the Binomial
Continued
• Generalize observation from last slide for large
p
• Suppose x is a binomial random variable,
where n is the number of trials, each having a
probability of success p
– Then the probability of failure is 1 – p
• If n and p are such that np  5 and
n(1–p)  5, then x is approximately normal
with
  np and  
np1  p 
Example 6.8: Normal Approximation to a
Binomial #1
• Suppose there is a binomial variable X with n =
50 and p = 0.5
• Want the probability of X = 23 successes in the
n = 50 trials
– Want P(x = 23)
– Approximating using the normal curve with
  np  50  0.50   25

np1  p  50  0.50  0.50  3.5355
Example 6.8: Normal Approximation to a
Binomial #2
• With continuity correction, find the
normal probability P(22.5 ≤ x ≤ 23.5)
Example 6.8: Normal Approximation to a
Binomial #3
• For x = 22.5, the corresponding z value is
x   22 .5  25
z

 0.71

3.5355
• For x = 23.5, the corresponding z value is
x   23 .5  25
z

 0.42

3.5355
• Then P(22.5 ≤ x ≤ 23.5) = P(-0.71 ≤ z ≤ -0.42) = 0.2611
– 0.1628 = 0.0983
• Therefore the estimate of the binomial probability
P(x = 23) is 0.0983
Example:
Suppose that you are taking a multiple-choice test where each
question has 4 possible answers. If there are 48 questions on the
test, what is the probability that you would get exactly 14
questions correct by simply guessing at the answers?
If you are just guessing, then the probability of getting a question correct is
p = 1/4, and the probability of guessing wrong is q = 3/4. With a sample of n = 48
questions, this example meets the criteria for using the normal approximation to
the binomial:
pn = 0.25*(48) = 12
qn = 0.75*(48) = 36 (both greater than 10)
Therefore, the distribution showing the number correct out of 48 questions will be
normal with parameters:
mean = pn = 12; sd = sqrt(npq) = sqrt(9) = 3
We want the proportion of this distribution that corresponds to X = 14, that is, the
portion bounded by X = 13.5 and X = 14.5.
68
The binomial distribution (normal approximation) for the number of answers
guessed correctly (X) on a 48-question multiple-choice test. The shaded portion
corresponds to the probability of guessing exactly 14 questions.
Notice that the score X = 14 corresponds to an interval bounded by X = 13.5
and X = 14.5.
69
Summary
6.1
6.2
6.3
6.4
Continuous Probability Distributions
The Uniform Distribution
The Normal Probability Distribution
Approximating the Binomial Distribution
by using the Normal Distribution