Download Chap 06

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Central limit theorem wikipedia , lookup

Transcript
Chapter 6 ~ Normal Probability Distributions
常態分佈
P ( a  x  b)
a

b
x
Normal Probability Distributions
常態分佈(normal distribution)又稱
高斯分佈(Gaussian distribution)。
德國的10馬克紙幣, 以高斯(Gauss,
1777-1855)為人像, 人像左側有一
常態分佈之P.D.F.及其圖形。
Chapter Goals
• Learn about the normal, bell-shaped, or
Gaussian distribution
• How probabilities are found
• How probabilities are represented
• How normal distributions are used in the real
world
6.1 ~ Normal Probability
Distributions
 The normal probability distribution is the most
important distribution in all of statistics
• Many continuous random variables have normal
or approximately normal distributions
• Need to learn how to describe a normal
probability distribution
6.1 ~ Normal Probability
Distributions
 Exampel: 如果調查台灣1000位成年男性的身高,
將會發現身高特別高和特別低的人佔極少數,絕
大多數的人身高都在中間(如170公分)附近。
 如果這1000人的平均身高是170公分,那麼身高
離170公分越遠的人,所佔的比例就越少。簡單
的說,大略呈現以170公分為中心,並往兩旁遞
減的分佈圖。(如下頁)
 如果調查的是體重、收入、智力等變項,也很可
能出現類似的分佈。
6.1 ~ Normal Probability
Distributions
90
80
70
人數
60
50
40
30
20
10
0
150
155
160
165
170
身高
175
180
185
190
6.1 ~ Normal Probability
Distributions
 上圖的分佈是間斷的,可是理論上身高是連續
的。(因為任何兩個人之間,存在另一人,其身
高介在他們之間)
 如果調查更多的人(如10萬人),那麼上圖的長
條圖中間斷現象逐漸會消除。一旦調查人數非
常之大,那麼上圖的長條圖會變成平滑的曲線
圖,如下圖中的平滑曲線所示。
6.1 ~ Normal Probability
Distributions
90
80
70
人數
60
50
40
30
20
10
0
150
155
160
165
170
175
180
185
190
身高
Demo: http://163.20.15.7/math/LSC/Normal/Normal.htm
Empirical Rule: If a variable is normally distributed, then:
1. Approximately 68% of the observations lie within 1 standard
deviation of the mean
2. Approximately 95% of the observations lie within 2 standard
deviations of the mean
3. Approximately 99.7% of the observations lie within 3 standard
deviations of the mean

  3   2
 

 
  2
  3
6.1 Normal Probability Distribution
1. A continuous random variable
2. Normal probability distribution function:
1 ( x)

e 2 
2
1
 2p
This is the function for the normal (bell-shaped) curve
f ( x) =
π = 3.1416,e :自然對數之底 2.7183,x介在
正負無限大,μ:平均數,σ:標準差。一旦確
定μ和σ,帶入公式算得 f(x)
3. The probability that x lies in some interval is the area under the
curve
Probabilities for a Normal
Distribution
• Illustration
b
P(a  x  b) =  f ( x )dx
a
• The definite integral
is a calculus topic
a
b
x
問題:要決定常態分佈的形狀,就必須知道平均數μ
和變異數σ2(或者標準差σ)。常態分佈取決於兩個
參數(parameter)μ和σ2。只要μ或σ2不同,曲線就
不同。
0.1
0.09
0.08
0.07
A:  = 170,  = 5
f (X )
0.06
B:  = 175,  = 5
0.05
0.04
0.03
C:  = 170,  = 10
0.02
0.01
0
140
145
150
155
160
165
170
175
180
185
190
195
200
Notes

We will learn how to compute probabilities for one special
normal distribution: the standard normal distribution (標準常
態分佈) (μ= 0和σ2 = 1)

Transform all other normal probability questions to this
special distribution

Recall the empirical rule: the percentages that lie within
certain intervals about the mean come from the normal
probability distribution

We need to refine the empirical rule to be able to find the
percentage that lies between any two numbers
Percentage, Proportion &
Probability
 Basically the same concepts
• Percentage (30%) is usually used when talking
about a proportion (3/10) of a population
• Probability is usually used when talking about
the chance that the next individual item will
possess a certain property
6.2 ~ The Standard Normal
Distribution
 There are infinitely many
normal probability
distributions
• They are all related to the standard normal
distribution
• The standard normal distribution is the
normal distribution of the standard variable z
(the z-score)
Standard Normal Distribution
f (Z )
Properties:
 The total area under the normal curve is equal to 1
 The distribution is mounded and symmetric; it extends
indefinitely in both directions, approaching but never touching
the horizontal axis
 The distribution has a mean of 0 and a standard deviation of 1
 The mean divides the area in half, 0.50 on each side
0.5
 Nearly all the area is
0.4
between z = -3.00 and
0.3
z = 3.00
0.2
0.1
0
-3
-2
-1
0
Z
1
2
3

Table 3, Appendix B lists the probabilities
associated with the intervals from the mean (0) to
a specific value of z

Probabilities of other intervals are found using
the table entries, addition, subtraction, and the
properties above
Table 3, Appendix B Entries
0
z
 The table contains the area under the standard normal
curve between 0 and a specific value of z
Example
Find the area under the
standard normal curve
between z = 0 and z =
1.45
0
145
.
• A portion of Table 3:
z
0.00
0.01
0.02
0.03
0.04
0.05
..
.
1.4
..
.
P(0  z  145
. ) = 0.4265
0.4265
0.06
z
Example
Find the area under the
normal curve to the right
of z = 1.45; P(z > 1.45)
Area asked for
0.4265
0
P( z  145
. ) = 0.5000  0.4265 = 0.0735
145
.
z
Example
Find the area to the left
of z = 1.45; P(z < 1.45)
0.5000
0.4265
0
P( z  145
. ) = 0.5000  0.4265 = 0.9265
145
.
z
Notes
• The addition and subtraction used in the previous
examples are correct because the “areas” represent
mutually exclusive events
•
The symmetry of the normal distribution is a key factor
in determining probabilities associated with values
below (to the left of) the mean. For example: the area
between the mean and z = -1.37 is exactly the same as
the area between the mean and z = +1.37.
•
When finding normal distribution probabilities, a
sketch(草圖) is always helpful
Example
Area from table
0.3962
Area asked for
 Find the area between
the mean (z = 0) and
z = -1.26
126
.
P( 126
.  z  0) = 0.3962
0
126
.
z
Example
 Find the area to the left
of -0.98; P(z < -0.98)
Area asked for
0.98 0
P ( z   0.98) = 0.5000  0.3365 = 0.1635
Area from table
0.3365
0.98
Example
 Find the area between
z = -2.30 and z = 1.80
0.4893
 2.30
0.4641
0
P ( 2.30  z  1.80) = P ( 2.30  z  0)  P ( 0  z  1.80)
= 0.4893  0.4641 = 0.9534
1.80
Example
Find the area between
Area asked
z = -1.40 and z = -0.50 for
-1.40
- 0.50 0 0.50
1.40
P( 1.40  z   0.50) = P (0  z  1.40)  P (0  z  0.50)
= 0.4192  0.1915 = 0.2277
Normal Distribution Note
The normal distribution table may also be used to
determine a z-score if we are given the area (working
backwards)
 Example: What is the z-score associated with the 85th
percentile?

0.3500
15%
implies
P
0
z
Solution
 In Table 3 Appendix B, find the “area” entry that is
closest to 0.3500:
z
0.00
0.01
0.02
0.03
0.04
0.05
..
.
1.0
0.3485 0.3500 0.3508
..
.
• The area entry closest to 0.3500 is 0.3508
• The z-score that corresponds to this area is 1.04
• The 85th percentile in a standard normal distribution is 1.04
Example
 What z-scores bound the middle 90% of a standard normal
distribution?
0.4500
90%
implies
0
z
0
z
Solution
 The 90% is split into two equal parts by the mean. Find
the area in Table 3 closest to 0.4500:
z
0.00
0.01
0.02
0.03
0.04
0.05
..
.
1.6
0.4495 0.4500 0.4505
..
.
• 0.4500 is exactly half way between 0.4495 and 0.4505
• Therefore, z = 1.645
• z = -1.645 and z = 1.645 bound the middle 90% of a
normal distribution
6.3 ~ Applications of Normal
Distributions
 Apply the techniques learned for the z distribution
to all normal distributions
• Start with a probability question in terms of
x-values
• Convert, or transform, the question into an
equivalent probability statement involving
z-values
Standardization
 Suppose x is a normal random variable with mean m and
standard deviation s
x
• The random variable z =

has a standard
normal distribution

0
c
c

x
z
Example
 A bottling(裝瓶) machine is adjusted to fill bottles with a
mean of 32.0 oz of soda and standard deviation of 0.02.
Assume the amount of fill is normally distributed and a
bottle is selected at random:
1) Find the probability the bottle contains between 32.00 oz and
32.025 oz
2) Find the probability the bottle contains more than 31.97 oz
Solutions:
32.00   32.00  32.0
=
= 0.00
1) When x = 32.00 ; z =

0.02
32.025   32.025  32.0
=
=
=
= 1.25
When x 32.025; z

0.02
Solution Continued
Area asked for
32.0
0
32.025
125
.
x
z
32.0  32.0 x  32.0 32.025  32.0 



÷
P ( 32.0  x  32.025) = P 


0.02
0.02
0.02
= P ( 0  z  1.25) = 0. 3944
2)
Example, Part 2
3197
.
 1.50
32.0
0
x
z
x  32.0
31.97  32.0 


÷ = P( z  1.50)
P( x  31.97) = P
 0.02

0.02
= 0.5000  0.4332 = 0.9332
Notes
 The normal table may be used to answer many kinds of
questions involving a normal distribution
• Often we need to find a cutoff point: a value of x such
that there is a certain probability in a specified interval
defined by x.
 Example: The waiting time x at a certain bank is
approximately normally distributed with a mean of 3.7
minutes and a standard deviation of 1.4 minutes. The
bank would like to claim that 95% of all customers are
waited on by a teller within c minutes. Find the value of
c that makes this statement true.
Solution
0.0500
0.5000 0.4500
3.7
0
P ( x  c) = 0.95
 x  3.7  c  3.7  =
÷ 0.95
P
 1.4
1.4 
  c  3.7  =
÷ 0.95
P z

1.4 
c
1645
.
x
z
c  3.7
= 1645
.
14
.
c = (1645
. )(14
. )  3.7 = 6.003
c  6 minutes
Review: P.30
Example
 A radar unit is used to measure
the speed of automobiles on an
expressway(高速公路) during
rush-hour traffic. The speeds
of individual automobiles are
normally distributed with a
mean of 62 mph. Find the
standard deviation of all speeds
if 3% of the automobiles travel
faster than 72 mph.
0.0300
0.4700
62
72
x
0
188
.
z
Solution
P( x  72) = 0.03
x
;
z=

P( z  1.88) = 0.03
72  62
1.88 =

1.88  = 10
 = 10 / 1.88 = 5.32
Notation
 If x is a normal random variable with mean m and
standard deviation s, this is often denoted:
x ~ N(m, s)

Example: Suppose x is a normal random variable
with  = 35 and  = 6. A convenient notation to
identify this random variable is: x ~ N(35, 6).
6.4 ~ Notation
 z-score used throughout statistics in a variety of
ways
• Need convenient notation to indicate the area
under the standard normal distribution
• z(a) is the algebraic name, for the z-score (point on
the z axis) such that there is a of the area
(probability) to the right of z(a)
Illustrations
z(0.10) represents the
value of z such that the
area to the right under
the standard normal
curve is 0.10
010
.
0
z(0.10)
z
z(0.80) represents the
value of z such that the
area to the right under
the standard normal
curve is 0.80
0.80
z(0.80) 0
z
Example
• Find the numerical value of z(0.10):
Table shows this area (0.4000)
0.10 (area information
from notation)
0
z(0.10)
z
• Use Table 3: look for an area as close as possible to 0.4000
• z(0.10) = 1.28
Example
• Find the numerical value of z(0.80):
Look for 0.3000; remember
that z must be negative
z(0.80) 0
z
• Use Table 3: look for an area as close as possible to 0.3000
• z(0.80) = -0.84
Example
• Find the numerical value of z(0.99):
0.01
z
z(0.99)
0
• Because of the symmetrical nature of the normal distribution,
z(0.99) = -z(0.01)
• Using Table 3: z(0.99) = -2.33
Example
• Find the z-scores that bound the middle 0.99 of
the normal distribution:
0.005
0.005
0.495
z(0.995)
or
-z(0.005)
0.495
0
• Use Table 3:
z(0.005) = 2.575 and z(0.995) = -z(0.005) = -2.575
z(0.005)
6.5 ~ Normal Approximation of the
Binomial
 Recall: the binomial distribution is a probability
distribution of the discrete random variable x, the
number of successes observed in n repeated
independent trials
• Binomial probabilities can be reasonably estimated
by using the normal probability distribution
Background & Histogram
 Background: Consider the distribution of the
binomial variable x when n = 20 and p = 0.5
• Histogram: P( x)
0.18
0.16
0.14
0.12
0.10
0.08
0.06
0.04
0.02
0.00
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
x
The histogram may be approximated by a normal curve
Notes

The normal curve has mean and standard deviation
from the binomial distribution:
 = np = (20)(0.5) = 10
 = npq = (20)(0.5)(0.5) = 5  2.236

Can approximate the area of the rectangles with the
area under the normal curve

The approximation becomes more accurate as n
becomes larger
Two Problems
1. As p moves away from 0.5, the binomial distribution is
less symmetric, less normal-looking
Solution: The normal distribution provides a reasonable
approximation to a binomial probability distribution
whenever the values of np and n(1 - p) both equal or
exceed 5
2. The binomial distribution is discrete, and the normal
distribution is continuous
Solution: Use the continuity correction factor. Add or
subtract 0.5 to account for the width of each rectangle.
Example
• Research indicates 40% of all students entering a certain
university withdraw(撤回) from a course during their first
year. What is the probability that fewer than 650 of this
year’s entering class of 1800 will withdraw from a class?
ANS: Let x be the number of students that withdraw from a
course during their first year
x has a binomial distribution: n = 1800, p = 0.4
The probability function is given by:
 1800
x
1800 x
P( x ) = 
(
0
.
4
)
(
0
.
6
)
for x = 0, 1, 2, ... ,1800

 x 
Solution
 Use the normal approximation method:
 = np = (1800)(0.4) = 720
 = npq = (1800)(0.4)(0.6) = 432  20.78
P( x is fewer than 650) = P( x  650)
= P( x  649.5)
(for discrete variable x )
(for a continuous variable x )
x  720 649.5  720

= P


 20.78
20.78 
= P( z  3.39)
= 0.5000  0.4997 = 0.0003