Download Continuous (Normal) Distributions

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Central limit theorem wikipedia , lookup

Transcript
Topics Covered
• Discrete probability distributions
– The Uniform Distribution
– The Binomial Distribution
– The Poisson Distribution
• Each is appropriately applied in certain
situations and to particular phenomena
Continuous Random Variables
• Continuous random variable can assume all real
number values within an interval (e.g., rainfall, pH)
• Some random variables that are technically discrete
exhibit such a tremendous range of values, that is
it desirable to treat them as if they were continuous
variables, e.g. population
• Continuous random variables are described by
probability density functions
Probability Density Functions
• Probability density functions are defined using
the same rules required of probability mass
functions, with some additional requirements:
• The function must have a non-negative value
throughout the interval a to b, i.e.
f(x) >= 0 for a <= x <= b
• The area under the curve defined by f(x), within
the interval a to b, must equal 1
f(x)
a
area=1
x
b
The Normal Distribution
• The most common probability distribution is the
normal distribution
• The normal distribution is a continuous distribution
that is symmetric and bell-shaped
Source: http://mathworld.wolfram.com/NormalDistribution.html
The Normal Distribution
• Most naturally occurring variables are distributed
normally (e.g. heights, weights, annual temperature
variations, test scores, IQ scores, etc.)
• A normal distribution can also be produced by
tracking the errors made in repeated
measurements of the same thing; Karl Friedrich
Gauss was a 19th century astronomer who found
that the distribution of the repeated errors of
determining the position of the same star formed a
normal (or Gaussian) distribution
The Normal Distribution
• The probability density function of the normal
distribution:
1
f ( x) 
e
 2
x 2 

  0.5 (  ) 


• You can see how the value of the distribution at x
is a f(x) of the mean and standard deviation
Source: http://en.wikipedia.org/wiki/Normal_distribution
The Poisson Distribution & The Normal Distribution
λ
The probability density function of a normal distribution approximating
the probability mass function of a binomial distribution
Source: http://en.wikipedia.org/wiki/Normal_Distribution
The Normal Distribution
• As with all frequency distributions, the area
under the curve between any two x values
corresponds to the probability of obtaining an
x value in that range
• The total area under the curve is equal to one
Source: http://en.wikipedia.org/wiki/Normal_Distribution
The Normal Distribution
• The way we use the normal distribution is to
find the probability of a continuous random
variable falling within a range of values ([c, d])
The probability of the
variable between c and d
is the area under the curve
c
d
f(x)
x
• The areas under normal curves are given in
tables such as that found in Table A.2 in Appendix
A.  Textbook (Rogerson)
Normal Tables
• Variables with normal distributions may have an
infinite number of possible means and standard
deviations
Source: http://en.wikipedia.org/wiki/Normal_distribution
Normal Tables
• Variables with normal distributions may have an
infinite number of possible means and standard
deviations
• Normal tables are standardized (a standard
normal distribution with a mean of zero and a
standard deviation of one)
• Before using a normal table, we must transform
our data to a standardized normal distribution
Standardization of Normal Distributions
• The standardization is achieved by converting
the data into z-scores
z-score
xi  x

s
• The z-score is the means that is used to transform our
normal distribution into a standard normal distribution
(  = 0 &  = 1)
Standardization
Month
T (°F)
Z-score
J
39.53
-1.56
F
46.36
-1.03
M
46.42
-1.02
A
60.32
0.05
M
66.34
0.51
J
75.49
1.22
J
75.39
1.21
A
77.29
1.36
S
68.64
0.69
O
57.57
-0.16
N
54.88
-0.37
D
48.2
-0.89
Original data:
Mean = 59.70
Standard deviation = 12.97
Z-score:
Mean = 0
Standard deviation = 1
Standardization of Normal Distributions
• Example I – A data set with  = 55 and  = 20:
Z-score = x -  = x - 55

20
• If one of our data values x = 90 then:
90
55
35
x

Z-score =
=
=
= 1.75

20
20
• Using z-scores in conjunction with standard normal
tables (like Table A.2 on page 214 of Rogerson) we can
look up areas under the curve associated with intervals,
and thus probabilities (P(X>1.75) or P(X<1.75))
Look Up Standard Normal Tables
• Using our
example z-score
of 1.75, we find
the position of
1.75 in the table
and use the value
found there
Look Up Standard Normal Tables
f(x)
.0401
P(Z > 1.75) = 0.0401
P(Z <= 1.75) = 0.9599
μ=0
+1.75
f(x)
.0401
P(X > 90) = 0.0401
P(X <= 90) = 0.9599
μ = 55
+90
Standardization of Normal Distributions
• Example II – If we have a data set with  = 55 and  =
20, we calculate z-scores using:
x
55
x

Z-score =
=

20
• If one of our data values x = 20 then:
20
55
-35
x

Z-score =
=
=
= -1.75

20
20
• Using z-scores in conjunction with standard normal
tables (like Table A.2 on page 214 of Rogerson) we can
look up areas under the curve associated with intervals,
and thus probabilities (P(X>20) or P(X<=20))
f(x)
-1.75
μ=0
+1.75
Look Up Standard Normal Tables
• Using our
example z-score
of -1.75, we find
the position of
1.75 in the table
and use the
value found
there; because
the normal
distribution is
symmetric the
table does not
need to repeat
positive and
negative values
Look Up Standard Normal Tables
f(x)
4.01% (.0401)
-1.75
4.01% (.0401)
μ=0
+1.75
Look Up Standard Normal Tables
f(x)
P(Z <= -1.75) = 0.0401
4.01% (.0401)
P(Z > -1.75) = 0.9599
-1.75
μ=0
c
P(X <= 20) = 0.0401
f(x)
d
4.01% (.0401)
P(X > 20) = 0.9599
μ = 55
Finding the P(x) for Various Intervals
1.
a
P(Z  a) = (table value)
• Table gives the value of P(x) in the
tail above a
a
P(Z  a) = [1 – (table value)]
•Total Area under the curve = 1, and
we subtract the area of the tail
2.
3.
a
P(0  Z  a) = [0.5 – (table value)]
•Total Area under the curve = 1, thus
the area above x is equal to 0.5, and
we subtract the area of the tail
Finding the P(x) for Various Intervals
4.
a
5.
P(Z  a) = (table value)
• Table gives the value of P(x) in the
tail below a, equivalent to P(Z  a)
when a is positive
a
P(Z  a) = [1 – (table value)]
• This is equivalent to P(Z  a) when
a is positive
a
P(a  Z  0) = [0.5 – (table value)]
• This is equivalent to P(0  Z  a)
when a is positive
6.
Finding the P(x) for Various Intervals
P(a  Z  b) if a < 0 and b > 0
7.
b
a
= (0.5 – P(Z<a)) + (0.5 – P(Z>b))
= 1 – P(Z<a) – P(Z>b)
or
= [0.5 – (table value for a)] +
[0.5 – (table value for b)]
= [1 – {(table value for a) +
(table value for b)}]
• With this set of building blocks, you should be able to
calculate the probability for any interval using a standard
normal table
Finding the P(x) – Example
• Suppose we are in charge of buying stock for a
shoe store. We will assume that the distribution of
shoe sizes is normally distributed for a gender.
Let’s say the mean of women’s shoe sizes is 20
cm and the standard deviation is 5 cm
• If our store sells 300 pairs of a popular style each
week, we can make some projections about how
many pairs we need to stock of a given size,
assuming shoes fit feet that are +/- 0.5 cm of the
length of the shoe (because of course we must
use intervals here!)
Finding the P(x) – Example
• 1. How many pairs should we stock of the 25
cm size?
• We first need to convert this to a range
according to the assumption specified above,
since we can only evaluate P(x) over an
interval  P(24.5 cm ≤ x ≤ 25.5 cm) is what we
need to find
• Now we need to convert the bounds of our
interval into z-scores:
Zlower
=
Zupper
=
x -  = 24.5 - 20 =

5
x -  = 25.5 - 20 =

5
4.5 = 0.9
5
5.5 = 1.1
5
Finding the P(x) – Example
• 1. How many pairs should we stock of the 25
cm size?
• We now have our interval expressed in terms of
z-scores as P(0.9 ≤ Z ≤ 1.1) for use in a standard
normal distribution, which we can evaluate using
our standard normal tables
0.9
1.1
• We can calculate this area by finding
P(0.9Z 1.1) = P(Z0.9) - P(Z1.1)
Finding the P(x) – Example
• 1. How many pairs should we stock of the 25
cm size?
• We now have our interval expressed in terms of
z-scores as P(0.9 ≤ Z ≤ 1.1) for use in a standard
normal distribution, which we can evaluate using
our standard normal tables
0.9
1.1
• We can calculate this area by finding
P(0.9Z 1.1) = P(Z0.9) - P(Z1.1)
= 0.1841 – 0.1357
= 0.0484
• Now multiply our P(0.9 ≤ Z ≤ 1.1) by total sales per
week to get the number of shoes we should stock in
that size: 300 х 0.0484 = 14.52 ≈ 15
Finding the P(x) – Example
• 2. How many pairs should we stock of the 18
cm size? (assuming shoes fit feet that are +/- 0.5 cm of the length of
the shoe)
• Again, we convert this to a range according to the
assumption specified above, since we can only
evaluate P(x) over an interval  P(17.5 cm ≤ x ≤
18.5 cm) is what we need to find
• Now we need to convert the bounds of our
interval into z-scores:
Zlower
=
Zupper
=
x -  = 17.5 - 20 =

5
x -  = 18.5 - 20 =

5
-2.5 = -0.5
5
-1.5 = -0.3
5
Finding the P(x) – Example
• 2. How many pairs should we stock of the 18 cm
size?
• We now have our interval expressed in terms of zscores as P(-0.5 ≤ Z ≤ -0.3) for use in a standard
normal distribution, which we can evaluate using our
standard normal tables
-0.3
-0.5
• We can calculate this area by finding
P(-0.5Z-0.3) = P(Z-0.3) - P(Z-0.5)
Finding the P(x) – Example
• 2. How many pairs should we stock of the 18 cm
size?
• We now have our interval expressed in terms of zscores as P(-0.5 ≤ Z ≤ -0.3) for use in a standard
normal distribution, which we can evaluate using our
standard normal tables
-0.3
-0.5
• We can calculate this area by finding
P(-0.5Z-0.3) = P(Z-0.3) - P(Z-0.5)
= 0.3821 – 0.3085
= 0.0736
• Now multiply our P(-0.5 ≤ Z ≤ -0.3) by total sales
per week to get the number of shoes we should stock
in that size: 300 х 0.0736 = 22.08 ≈ 22
Finding the P(x) – Example
• 3. How many pairs should we stock in the 18 to
25 cm size range?
• As always, we convert this to a range according
to the assumption specified above, since we can
only evaluate P(x) over an interval  P(17.5 cm
≤ x ≤ 25.5 cm) is what we need to find
• We have already found the appropriate Z-scores
Zlower
=
Zupper
=
x -  = 17.5 - 20 = -2.5 = -0.5

5
5
x -  = 25.5 - 20 = 5.5 = 1.1

5
5
Finding the P(x) – Example
• 3. How many pairs should we stock in the 18 to
25 cm size range?
• We now have our interval expressed in terms of zscores as P(-0.5 ≤ Z ≤ 1.1) for use in a standard
normal distribution, which we can evaluate using
our standard normal tables
-0.5
1.1
• We can calculate this area by finding
P(-0.5Z1.1) = 1 – [P(Z-0.5) + P(Z1.1)]
= 1 – [0.3085 + 0.1357]
= 1 – 0.4442
= 0.5558
• Now multiply our P(-0.5 ≤ Z ≤ 1.1) by total sales
per week to get the # of shoes we should stock in
that range: 300 х 0.5558 = 166.74 ≈ 167
Commonly Used Probabilities
99.7%
95%
68%
f(x)
-3σ
-2σ
-1σ
μ
+1σ
+2σ
+3σ
Commonly Used Probabilities
• For a normal distribution, 68% of the observations
lie within about one standard deviations of the mean
f(x)
z
x


(   )  

1
-σ μ +σ
• Standard normal distribution:
P(z > 1) = 0.1587 p(-1 ≤ z ≤ 1) = 1 – 2*0.1587 = 0.6826
• Normal distribution as illustrated in the Figure:
P(μ-σ ≤ x ≤ μ-σ) = 0.6826 ≈ 0.68
Commonly Used Probabilities
• For a normal distribution, 95% of the observations
lie within about two standard deviations of the mean
f(x)
z
x


(   2 )  

2
-2σ
μ
+2σ
• Standard normal distribution:
P(z > 2) = 0.0228 p(-2 ≤ z ≤ 2) = 1 – 2*0.0228 = 0.9543
• Normal distribution as illustrated in the Figure:
P(μ-2σ ≤ x ≤ μ-2σ) = 0.9543 ≈ 0.95
Commonly Used Probabilities
• For a normal distribution, 99.7% of the observations
lie within about two standard deviations of the mean
f(x)
z
x


(   3 )  

3
-3σ
μ
+2σ
• Standard normal distribution:
P(z > 3) = 0.0013 p(-3 ≤ z ≤ 3) = 1 – 2*0.0013 = 0.9974
• Normal distribution as illustrated in the Figure:
P(μ-3σ ≤ x ≤ μ-3σ) = 0.9974 ≈ 0.997
Commonly Used Probabilities
99.7%
95%
68%
f(x)
-3σ
-2σ
-1σ
μ
+1σ
+2σ
+3σ