Download Chapter 1: Statistics

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Statistics wikipedia , lookup

Probability wikipedia , lookup

History of statistics wikipedia , lookup

Transcript
Chapter 6 ~ Normal Probability Distributions
P(a  x  b)
a

b
x
1
Chapter Goals
• Learn about the normal, bell-shaped, or
Gaussian distribution
• How probabilities are found
• How probabilities are represented
• How normal distributions are used in the real
world
2
6.1 ~ Normal Probability Distributions
• The normal probability distribution is the most
important distribution in all of statistics
• Many continuous random variables have normal
or approximately normal distributions
• Need to learn how to describe a normal
probability distribution
3
Normal Probability Distribution
1. A continuous random variable
2. Description involves two functions:
a. A function to determine the ordinates of the graph
picturing the distribution
b. A function to determine probabilities
3. Normal probability distribution function:
1 ( x-)
e 2 s
2
1
s 2p
This is the function for the normal (bell-shaped) curve
f ( x) =
4. The probability that x lies in some interval is the area
under the curve
4
The Normal Probability Distribution
s
 - 3s  - 2s  - s

 s
  2s   3s
5
Probabilities for a Normal Distribution
• Illustration
b
P(a  x  b) =  f ( x )dx
a
a
b
x
6
Notes

The definite integral is a calculus topic

We will use the TI83/84 to find probabilities for normal
distributions

We will learn how to compute probabilities for one special
normal distribution: the standard normal distribution

We will learn to transform all other normal probability
questions to this special distribution

Recall the empirical rule: the percentages that lie within
certain intervals about the mean come from the normal
probability distribution

We need to refine the empirical rule to be able to find the
percentage that lies between any two numbers
7
Percentage, Proportion & Probability
• Basically the same concepts
• Percentage (30%) is usually used when talking
about a proportion (3/10) of a population
• Probability is usually used when talking about
the chance that the next individual item will
possess a certain property
• Area is the graphic representation of all three
when we draw a picture to illustrate the situation
8
6.2 ~ The Standard Normal Distribution
• There are infinitely many normal probability
distributions
• They are all related to the standard normal
distribution
• The standard normal distribution is the
normal distribution of the standard variable z
(the z-score)
9
Standard Normal Distribution
Properties:
• The total area under the normal curve is equal to 1
• The distribution is mounded and symmetric; it extends indefinitely in
both directions, approaching but never touching the horizontal axis
• The distribution has a mean of 0 and a standard deviation of 1
• The mean divides the area in half, 0.50 on each side
• Nearly all the area is between z = -3.00 and z = 3.00
Notes:

Table 3, Appendix B lists the probabilities associated with the intervals
from the mean (0) to a specific value of z

Probabilities of other intervals are found using the table
entries, addition, subtraction, and the properties above
10
Table 3, Appendix B Entries
0
z
• The table contains the area under the standard normal curve
between 0 and a specific value of z
11
Example
 Example: Find the area under the standard normal curve between
z = 0 and z = 1.45
0
• A portion of Table 3:
z
0.00
0.01
0.02
0.03
145
.
0.04
z
0.05
0.06
..
.
1.4
0.4265
..
.
P(0  z  145
. ) = 0.4265
12
Using the TI 83/84
• To find the area between 0 and 1.45, do the
following:
•
•
•
•
•
2nd DISTR 2 which is normalcdf(
Enter the lower bound of 0
Enter a comma
Then enter 1.45
Close the parentheses if you like or hit “Enter”
• The value of .426 is shown as the answer!
• Interpretation of the result: The probability that Z
lies between 0 and 1.45 is 0.426
13
Example
 Example: Find the area under the normal curve to the right
of z = 1.45; P(z > 1.45)
Area asked for
0.4265
0
145
.
z
P( z  145
. ) = 0.5000 - 0.4265 = 0.0735
14
Using the TI 83/84
• To find the area between 1.45 and ∞, do the
following:
•
•
•
•
•
2nd DISTR 2 which is normalcdf(
Enter the lower bound of 1.45
Enter a comma
Then enter 1 2nd EE 99
Close the parentheses if you like or hit “Enter”
• The value of .074 is shown as the answer!
• Interpretation of result: The probability that Z is
greater than 1.45 is 0.074
15
Example
 Example: Find the area to the left of z = 1.45; P(z < 1.45)
0.5000
0.4265
0
145
.
z
P( z  145
. ) = 0.5000  0.4265 = 0.9265
16
Using The TI 83/84
• To find the area between - ∞ and 1.45, do the
following:
•
•
•
•
•
2nd DISTR 2 which is normalcdf(
Enter the lower bound of -1 2nd EE 99
Enter a comma
Then enter 1.45
Close the parentheses if you like or hit “Enter”
• The value of 0.926 is shown as the answer!
• Interpretation of result: The probability that Z is
less than 1.45 is 0.926
17
Notes

The addition and subtraction used in the previous
examples are correct because the “areas” represent
mutually exclusive events

The symmetry of the normal distribution is a key factor
in determining probabilities associated with values
below (to the left of) the mean. For example: the area
between the mean and z = -1.37 is exactly the same as
the area between the mean and z = +1.37.

When finding normal distribution probabilities, a sketch
is always helpful
18
Example
 Example: Find the area between the mean (z = 0) and
z = -1.26
Area asked for
-126
.
0
126
.
z
P( -126
.  z  0) = 0.3962
19
Using the TI 83/84
• Find the area to the left of z = -0.98
• Use -1E99 for - ∞ and enter 2nd DISTR
• Normalcdf (-1e99, -0.98) which gives .164
Area asked for
-0.98
0
20
Example
 Example: Find the area between z = -2.30 and z = 1.80
0.4893
- 2.30
0.4641
0
180
.
P ( -2.30  z  180
. ) = P ( -2.30  z  0)  P ( 0  z  180
. )
= 0.4893  0.4641 = 0.9534
21
Using the TI 83/84
Find the area between z = -2.30 and z = 1.80
• Enter 2nd DISTR, normalcdf (-2.3, 1.80) and press enter
• .953 is given as the answer.
• Remember, the function normalcdf is of the form:
• Normalcdf(lower limit, upper limit, mean, standard deviation) and if
you’re working with distributions other than the standard normal
(recall mean = 0, stddev = 1), you must enter the values for mean
and standard deviation
22
Normal Distribution Note

The normal distribution table may also be used to determine
a z-score if we are given the area (working backwards)

Example: What is the z-score associated with the 85th
percentile?
23
Using the TI 83/84
• There is another function in the DISTR list that is
used to find the value of z (or x) when the
probability is given. For the previous problem,
we are actually asking what is the value of z such
that 85% of the distribution lies below it.
24
Using the TI 83/84
• Use 2nd DISTR invNorm( to calculate this value
• 2nd DISTR invNorm(.85) “ENTER” gives us a
value of 1.036 which is shown
25
Example
 Example: What z-scores bound the middle 90% of a
standard normal distribution?
26
Using the TI 83/84
• The TI 83/84 calculates areas from -∞ to the
value of z we are interested in. Therefore, we
must get a little creative to solve some problems.
• Using the idea that the total area equals one
comes in very handy here!
• For the example given, where we are interested in
the value of z that bounds the middle 90%, the
tails therefore represent a total of 10%. Divide
this in two since it is symmetric and this gives
5% in each tail.
27
Using the TI 83/84
• Now use the 2nd DISTR invNorm with .05 in the
argument like this:
• Which gives an answer of -1.645
– Since the distribution is symmetric, the upper limit is
1.645, so 90% of the distribution lies between
(-1.645, 1.645)
28
Using the TI 83/84
Now let’s work the problems on page 279
29
6.3 ~ Applications of Normal Distributions
• Apply the techniques learned for the z distribution
to all normal distributions
• Start with a probability question in terms of
x-values
• Convert, or transform, the question into an
equivalent probability statement involving
z-values
30
Standardization
• Suppose x is a normal random variable with mean  and
standard deviation s
x-
• The random variable z =
s
distribution

0
has a standard normal
c
c-
s
x
z
31
Example
 Example: A bottling machine is adjusted to fill bottles with a
mean of 32.0 oz of soda and standard deviation of
0.02. Assume the amount of fill is normally distributed
and a bottle is selected at random:
1) Find the probability the bottle contains between 32.00 oz and
32.025 oz
2) Find the probability the bottle contains more than 31.97 oz
Solutions:
1) When x = 32.00 ;
When x = 32.025;
32.00 -  32.00 - 32.0
=
= 0.00
z=
s
0.02
z=
32.025 -  32.025 - 32.0
=
= 1.25
s
0.02
32
Solution Continued
Area asked for
32.0
0
32.025
125
.
x
z
32.0 - 32.0 x - 32.0 32.025 - 32.0 




P ( 32.0  x  32.025) = P 


0.02
0.02
0.02
= P ( 0  z  1.25) = 0. 3944
33
Example, Part 2
2)
3197
.
- 150
.
32.0
0
x
z
x - 32.0
3197
. - 32.0 


 = P( z  -150)
P( x  3197
. ) = P
.
 0.02

0.02
= 0.5000  0.4332 = 0.9332
34
Notes
• The normal table may be used to answer many kinds of questions
involving a normal distribution
• Often we need to find a cutoff point: a value of x such that there is
a certain probability in a specified interval defined by x
 Example: The waiting time x at a certain bank is approximately
normally distributed with a mean of 3.7 minutes and a
standard deviation of 1.4 minutes. The bank would
like to claim that 95% of all customers are waited on
by a teller within c minutes. Find the value of c that
makes this statement true.
35
Solution
0.0500
0.5000 0.4500
3.7
0
P ( x  c) = 0.95
 x - 3.7  c - 3.7  =
 0.95
P
 14
.
1.4 
  c - 3.7  =
 0.95
P z

14
. 
c
1645
.
x
z
c - 3.7
= 1645
.
14
.
c = (1645
. )(14
. )  3.7 = 6.003
c  6 minutes
36
Example
 Example: A radar unit is used to measure the speed of
automobiles on an expressway during rush-hour traffic. The
speeds of individual automobiles are normally distributed with a
mean of 62 mph. Find the standard deviation of all speeds if 3% of
the automobiles travel faster than 72 mph.
0.0300
0.4700
62
72
x
0
188
.
z
37
Solution
P( x  72) = 0.03
x-
=
;
z
s
P ( z  188
. ) = 0.03
72 - 62
1.88 =
s
188
. s = 10
s = 10 / 188
. = 5.32
38
Notation
• If x is a normal random variable with mean  and
standard deviation s, this is often denoted:
x ~ N(, s)

Example: Suppose x is a normal random variable
with  = 35 and s = 6. A convenient notation to
identify this random variable is: x ~ N(35, 6).
39
6.4 ~ Notation
• z-score used throughout statistics in a variety of
ways
• Need convenient notation to indicate the area
under the standard normal distribution
• z(a) is the algebraic name, for the z-score (point on
the z axis) such that there is a of the area
(probability) to the right of z(a)
40
Illustrations
z(0.10) represents the
value of z such that the
area to the right under
the standard normal
curve is 0.10
010
.
0
z(0.10)
z
z(0.80) represents the
value of z such that the
area to the right under
the standard normal
curve is 0.80
0.80
z(0.80) 0
z
41
Example
 Example: Find the numerical value of z(0.10):
Table shows this area (0.4000)
0.10 (area information
from notation)
0
z(0.10)
z
z(0.10) = 1.28
42
Example
 Example: Find the numerical value of z(0.80):
Look for 0.3000; remember
that z must be negative
z(0.80) 0
z
• Use Table 3: look for an area as close as possible to 0.3000
• z(0.80) = -0.84
43
Notes
• The values of z that will be used regularly come from
one of the following situations:
1. The z-score such that there is a specified area in one
tail of the normal distribution
2. The z-scores that bound a specified middle
proportion of the normal distribution
44
Example
 Example: Find the numerical value of z(0.99):
0.01
z(0.99)
0
z
• Because of the symmetrical nature of the normal distribution,
z(0.99) = -z(0.01)
45
Example
 Example: Find the z-scores that bound the middle 0.99 of the
normal distribution:
0.005
0.005
0.495
z(0.995)
or
-z(0.005)
0.495
0
z(0.005)
z(0.005) = 2.575 and z(0.995) = -z(0.005) = -2.575
46
6.5 ~ Normal Approximation of the Binomial
• Recall: the binomial distribution is a probability
distribution of the discrete random variable x, the
number of successes observed in n repeated
independent trials
• Binomial probabilities can be reasonably
estimated by using the normal probability
distribution
47
Background & Histogram
• Background: Consider the distribution of the binomial
variable x when n = 20 and p = 0.5
• Histogram: P( x )
0.18
0.16
0.14
0.12
0.10
0.08
0.06
0.04
0.02
0.00
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
x
The histogram may be approximated by a normal curve
48
Notes

The normal curve has mean and standard deviation
from the binomial distribution:
 = np = (20)(0.5) = 10
s = npq = (20)(0.5)(0.5) = 5  2.236

Can approximate the area of the rectangles with the
area under the normal curve

The approximation becomes more accurate as n
becomes larger
49
Two Problems
1. As p moves away from 0.5, the binomial distribution is less
symmetric, less normal-looking
Solution: The normal distribution provides a reasonable
approximation to a binomial probability distribution whenever the
values of np and n(1 - p) both equal or exceed 5
2. The binomial distribution is discrete, and the normal distribution
is continuous
Solution: Use the continuity correction factor. Add or subtract
0.5 to account for the width of each rectangle.
50
Example
Example: Research indicates 40% of all students entering a
certain university withdraw from a course during
their first year. What is the probability that fewer
than 650 of this year’s entering class of 1800 will
withdraw from a class?
• Let x be the number of students that withdraw from a course
during their first year
• x has a binomial distribution: n = 1800, p = 0.4
• The probability function is given by:
 1800
x
1800- x
P( x ) = 
(
0
.
4
)
(
0
.
6
)
for x = 0, 1, 2, ... ,1800

 x 
51
Solution
• Use the normal approximation method:
 = np = (1800)(0.4) = 720
s = npq = (1800)(0.4)(0.6) = 432  20.78
P( x is fewer than 650) = P( x  650)
= P( x  649.5)
(for discrete variable x )
(for a continuous variable x )
x - 720 649.5 - 720

= P


 20.78
20.78 
= P( z  -3.39)
= 0.5000 - 0.4997 = 0.0003
52
Random Number Generation
• With each rand execution, the TI-84 Plus
generates the same random-number sequence
• for a given seed value. The TI-84 Plus factory-set
seed value for rand is 0. To generate a
• different random-number sequence, store any
nonzero seed value to rand. To restore
• the factory-set seed value, store 0 to rand or reset
the defaults (Chapter 18).
• Note: The seed value also affects randInt(,
randNorm(, and randBin( instructions.
53