Download THE NORMAL DISTRIBUTION Chapter 6 Prob. Model for a

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Inductive probability wikipedia , lookup

Foundations of statistics wikipedia , lookup

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
THE NORMAL DISTRIBUTION
Chapter 6
Prob. Model for a Continuous RV
Continuous RV’s typically involve the
measure of attributes such as length,
weight, time and temperature - intervals.
(Discrete RV - counting).
Graphical form - smooth curve
Ex. Symmetrical distribution - Normal, t-dist.
This curve is denoted by f(x) and is called a
probability density function (pdf).
-108-
Empirical Rule - (pg 117)
A variable having approximately a bell
shaped distribution should have:
1. Approximately 68% of the observations
fall within one SD of the mean.
2. Approximately 95% of the observations
fall within two SD of the mean.
3. Almost all the observations (99.6%) fall
within three SD of the mean.
-109-
The areas under a pdf correspond to
probabilities for x.
The area A is the probability that x assumes
a value between a and b. That is,
.
Note: When X is a continuous RV,
P(X=a)=0. Hence
.
-110-
A pdf for a continuous RV must satisfy:
1. f(x) is non-negative ($0).
2. The total area under the curve
representing f(x) equals 1. That is,
-111-
The Normal Dist’n (pg 115)
-
One of the most useful and frequently
encountered continuous RV's.
-
symmetrical about :
-
Its spread is determined by the value of
its standard deviation.
Note: The normal dist’n with a mean of :
and a SD of F is denoted by N(:, F).
-112-
A RV X with mean : and SD F is normally
dist’d if its pdf is given by
-4 < x < 4 (infinity)
where B = 3.14159...
e = 2.71828...
Notation: X - N(:, F)
Read :
Ex.
" X (a RV) is normally dist’ed with
mean : and SD F".
X - N(100,15)
Mean
= 100
Variance = 152 (or 225)
SD
= 15
-113-
The standard Normal Dist’n
The standard normal dist’n is denoted as
Z - N(0, 1)
_
a
Mean = : = 0
SD = F = 1.0
To calculate probabilities we need
calculus
tables
or
Always use these
Probability calculations with normal
dist’ns.
Use the fact that if
X - N(:, F)
then
-114-
the standardized normal RV. (Read ‘Z is
normal 0,1').
•
Only Z (standard normal dist’n) is
tabulated- Table B2 in your book.
(Will provide a different table - easier to
use!)
Words: Z equals the distance from X to :,
measured in standard deviations.
Note:
The Z transformation
is sometimes referred as the z-score.
-115-
Areas Under the Standard Normal N(0,1)
Curve
Basic Properties:
1. The total area under the N(0,1) is 1.
2. The N(0,1) is symmetric around 0.
3. Most of the area under N(0,1) curve lies
between -3 and 3.
-116-
Using the standard normal table:
Ex:
Determine the area under the N(0,1)
curve that
a. lies to the left of 2.11
P(Z < 2.11) = P ( Z # 2.11 )
= 0.9826
-117-
b. lies to the right of -1.25
P (Z > -1.25)
= 1 - P(Z # -1.25)
= 1 - 0.1056
= 0.8944
-118-
c. lies between -0.5 and 2.47 inclusive.
P(-0.5 # Z # 2.47)
= P (Z # 2.47) - P(Z < -0.5)
= 0.9932 - 0.3085
= 0.6847
-119-
Finding the z-score for a specified area:
Ex.
Determine the z-score having an
area of
a. 0.025 to its right
Same as : Area to its left is 0.975.
From table:
0.975 = P(Z # 1.96)
Hence, z-score is 1.96.
-120-
b. 0.05 to its left
There is no 0.05 in table, but
0.0495 = P ( Z # -1.65 )
0.0505 = P ( Z # -1.64 )
Hence (using interpolation)
The z-score is -1.645.
-121-
Notation: z" denotes the z-score having
area " (alpha) to its right under
N(0,1) curve.
From above : z0.025 = 1.96
What is z0.05?
z0.05 = 1.645
(Because of symmetry
and part (b) above)
-122-
Working with Normally Distributed
Variables
To determine a percentage or probability for
a normally dist'ed variable:
Steps:
1. Sketch the normal curve.
2. Shade the region of interest and mark
delimiting x-values.
3. Compute the z-scores for the delimiting
x-values found in (2).
4. Use table provided to obtain the area
under the N(0,1) curve.
-123-
Ex. Each year, thousands of college
seniors take the Graduate Record
Examination (GRE). The scores are
transformed so that they have a mean
of 500 and a SD of 100. Furthermore,
the scores are known to be normally
dist’ed. Determine the percentage of
students that score:
a. between 350 and 600 inclusive.
: = 500 and F = 100
-124-
P(350 # X # 600) = ? (Shaded area).
To use Normal tables, first transform the
normal RV X into the standard normal RV Z
or
-1.5 # Z # 1
Hence
P(350 # X # 600)
= P(-1.5 # Z # 1)
= P (Z #1) - P(Z # -1.5)
= 0.8413 - 0.0668
= 0.7745
Represents the area under N(0,1) over the
interval from -1.5 to 1.
-125-
b. 375 or grater.
P(X $ 375)
=P
= P (Z $ -1.25)
= 1 - P(Z < -1.25)
= 1 - 0.1056
= 0.8944
-126-
c. below 750.
P(X < 750)
=
= P ( Z < 2.5 )
= 0.9938
-127-
d. between 300 and 450.
P(300 < X < 450)
= P( -2 < Z < -0.5)
= P (Z < -0.5) - P(Z < -2)
= 0.3085 - 0.0228
= 0.2857
-128-
e. between 587 and 650.
P(587 < X < 650)
= P( 0.87 < Z < 1.5)
= P (Z < 1.5) - P(Z < 0.87)
= 0.9332 - 0.8078
= 0.1254
f. exactly equal to 680.
P(X = 680) = P(Z = 1.8) = 0.0
-129-
g. What score is exceeded by exactly 5%?
(95 percentile).
P(Z < a) = 0.95 6
a = 1.645
8
Excel: norminv(0.95, 0, 1)
1.645 =
x
= 1.645 ×100 + 500
= 164.5 + 500
= 664.5
-130-
Determining if the Distribution is Normal
(pg 124)
If variable is normally dist’ed, to assess
normality:
•
Large sample
6 look at histogram.
Bell shaped?
•
Small/large
sample
6 look at normal
probability plots.
Fairly linear?
-131-
Normal probability plot:
Scatter plot of observed values on
horizontal axis and normal scores on
vertical axis.
Normal scores - the observations we
would expect to get for a variable having a
N(0,1) dist’n.
6 If plot is roughly linear, then accept as
reasonable that the variable is approx.
normal.
-132-
Ex. The serum bilirubin level of 9 patients
admitted to a hospital are as follows:
20.5
26.6
14.8
23.4
21.3
22.9
12.7
19.2
15.2
a. Construct a normal probability plot for
these data.
First, arrange data in ascending order
and obtain the normal scores.
Normal Scores?
Sample size = 9
Idealized sample from a N(0,1) of size
9?
-133-
Cumulative
Prob.
Normal
Scores
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
-1.28
-0.84
-0.52
-0.25
0.00
0.25
0.52
0.84
1.28
-134-
Sorted
Obser.
12.7
14.8
15.2
19.2
20.5
21.3
22.9
23.4
26.6
b. Assess the normality of serum bilirubin
level.
The normal probability plot in (a) looks fairly
linear suggesting that serum bilirubin is
approximately normally dist’ed.
c. Outliers in the data?
No, none of the data values seem to fall
outside the overall pattern of the plot.
-135-
Sampling Distribution
Recall:
Descriptive measures of a
population are called parameters.
Example: : , F², p.
Descriptive measures calculated
from a sample are called statistics.
Example:
, S²,
.
The distribution of a statistic is called the
sampling distribution of the statistic.
-136-
The Sampling Distribution of the sample
Mean (pg 125)
Questions: What is the dist'n of
?
Idea: From repeated sampling
Sample 1 of size n
6
Sample 2 of size n
6
!
Sample m of size n
Questions:
!
6
8
What is the dist'n of
?
What is the mean of
?
What is the SD of
?
-137-
Central Limit Theorem (CLT)
If relatively large samples of size n are
drawn from any population, the sampling
dist'n of is approximately normal.
-
If the popul'n dist'n is normal, the
sampling dist'n of will be exactly
normally dist'd.
-
If the population dist'n is non-normal,
the sampling dist'n of will be, for large
samples (n$30), approximately normally
dist'd (by the CLT).
Mean and SD of
-138-
?
The Mean and SD of
The mean of , for samples of size n, is
equal to the mean of the original popul'n.
That is:
The SD for , for samples of size n, equals
the SD of the parent popul’n divided by the
square root of the sample size.
The SD of a statistic is called the standard
error of the statistic.
:
is called the standard error of the mean.
-139-
Example 1
Suppose it is known that the response time
of healthy subjects to a particular stimulus
is normally distributed with a mean of 15
seconds and a variance of 4 second.
a. What is the mean and SD of the parent
popul'n?
X : Response time in seconds of
healthy subjects to the particular
stimulus
: = 15 seconds
F = 4 seconds
6
X - N( 15 , 4)
-140-
b. If 5 healthy subjects are randomly
selected and the average response time
to the stimulus is calculated, what is the
mean and SD of the sample mean?
Mean:
SD:
- N( 15 , 1.79)
-141-
c. Plot the two distributions found in (a)
original population and (b) average of 5
observations.
-142-
d. Find the probability that a randomly
selected subject will have a response
time of 17 seconds or more?
P[X > 17] = ?
P [ X > 17]
=
= P [ Z > 0.50 ]
= 1 - P[Z #0.50]
= 1 - 0.6915
= 0.3085
-143-
e. Find the probability that a random
sample of 5 subjects will have a mean
response time of 17 seconds or more?
P[
> 17] = ?
In this case, the transformation needed
to standardized the normal RV is:
-144-
= P [ Z > 1.12]
= 1.0 - P[Z # 1.12]
= 1.0 - 0.8686
= 0.1314
-145-
Example 2
The mean and SD of the total cholesterol
value for certain population are 200 and 20
mg/100 ml, respectively. If 45 individuals
are selected at random from this population
and their average total cholesterol value is
calculated,
a. Is it reasonable to assume a normal
dist’n for the sample mean ? Why or
why not?
Yes, since sample size n=45. Hence
- N( 200 , 20/
-146-
)
b. Find the probability that the sample
mean of the 45 total cholesterol levels
will be between 190 and 205 mg/100 ml.
=?
=
= P[- 3.35 < Z < 1.68]
= P[ Z < 1.68 ] - P[ Z < -3.35]
= 0.9535 - 0.0004
= 0.9531
-147-