Download a normal distribution

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Misuse of statistics wikipedia , lookup

Taylor's law wikipedia , lookup

Resampling (statistics) wikipedia , lookup

German tank problem wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Sampling: Distribution of the Sample Mean
(Sigma Known)
o If a population follows the normal distribution
o Population is represented by X1,X2,…,XN.
o The distribution of an X is Normal with mean μ and
standard deviation σ.
o X~Norm(μ, σ) => X1~Norm(μ, σ), X2~Norm(μ, σ),…
o The Sample Mean isX  ( X1  X 2  ...  X n ) / n , and n < N.
o The distribution of the sample Mean is:
X ~ Norm(  ,  / n )
Sampling: Distribution of the Sample Mean
(Sigma Known)
o If a population follows the normal distribution, the
sampling distribution of the sample mean will also
follow the normal distribution.
o To determine the probability a sample mean falls
within a particular region, use:
X 
z
 n
Note that:

n
is called the Standard Error
of the Mean.
Sampling: Distribution of the Sample Mean
(Sigma Unknown)
o
If the population does not follow the normal
distribution, but the sample is of at least 30
observations, the sample means will follow the normal
distribution.
o
To determine the probability a sample mean falls
within a particular region, use:
X 
z
s n
Example: Suppose that X has a distribution with µ= 15 and σ= 14.
a)
If a random sample of size n=49 is drawn, find
P (15  X  17)
b)
If a random sample of size n=64 is drawn, find
P (15  X  17)
c)
x
and  x and
x
and  x and
Why should you expect the probability of part b to be higher than part a?
a)
 x    15
14
x 
2
49
When X = 15, z = 0 (since mean=15)
When X = 17, z = (17-15)/2 = 1
P (15  X  17)  P (0  z  1)  0.3413
b)
 x    15
14
x 
 1.75
64
When X = 15, z = 0
When X = 17, z = (17-15)/ 1.75 = 1.14
P (15  X  17)  P (0  z  1.14)  0.3729
c)
In a larger sample there is less variability. Therefore, an increase
in sample size means an increase in probability of the sample
mean being close to the population mean.
Point Estimate
 Definition:
The statistic computed from sample information and
used to estimate the population parameter.
 Examples:



Sample mean, X is a point estimate for the population mean, µ
Sample standard error s is a point estimate of population standard
deviation σ
Sample proportion p is a point estimate of population proportion π
Confidence Interval
 Definition:
A range of values constructed from sample data so that the
population parameter is likely to occur within that range at a
specified probability.
The specified probability is called the level of confidence.
 Ex.
We are 90% sure that the mean yearly income of
construction workers in the New York area is between $61,000
and $69,000.
Confidence Interval
(Sigma Known)
 What
if we know that a population has a normal distribution (or
the sample size is at least 30) with known standard deviation σ,
but the mean µ of the populations is unknown?
If the mean of a sample of size n is X then we can say that we
are certain with K% level of confidence that the mean µ falls
within the interval:
X z
P( X  z

n

n
where
 X z

n
)K
Confidence Interval
(Sigma Known)
The CI is given by:
[X  z

n
,X z

n
]
X , σ and n are known. We can obtain z from Appendix B.1 by
looking for a value of z that satisfies:
(Area from 0 to z) = K/2 = (level of confidence)/2
Confidence Interval
(Sigma Known) Proof,
P( X  z
 P( z

n

n
 X z
 X   z

n

n
)K
)K
X 
X 
 P( z 
 z )  2 P (0 
 z)  K
/ n
/ n
X 
 P(0 
 z)  K / 2
/ n
Confidence Interval
(Sigma Known)
Example(ex2, page 301): A sample of 81 observations is taken
from a normal distribution with a SD of 5 and a sample mean of
40. Determine the 95% CI.
X z

n
X  40 , σ=5 and n=81.
(Area from 0 to z) =K/2=0.95/2=0.475
Using Appendix B.1 (Area from 0 to 1.96)=0.475 => z=1.96.
CI  [ X  z

,X z
n
 [38.911, 41.088]

n
]  [40  1.96
5
5
, 40  1.96
]
81
81
Confidence Interval
 When
looking for a value for z in the expression,
X z

n
 95%
of the sample means selected from a population will
be within 1.96 SD’s of the population mean µ. (The z
value for a confidence level of 95% is 1.96).
 99%
of the sample means will lie within 2.58 SD’s of the
population mean. (The z value for a confidence level of
99% is 2.58).
Confidence Interval

How did we get the 1.96 and the 2.58 for the 95% and 99% confidence
intervals?

For the 95% CI:


Probability area is 0.95/2=0.475.

In Appendix B.1, the z value for .475 is 1.96.
Use same reasoning and calculations for the 99% CI.

Probability area is 0.99/2=0.495.

In Appendix B.1, the z value for .495is 2.58.
Confidence Interval
 Example:
we get a sample (>30) of recent college graduates
and compute the sample mean annual starting salary. The
mean is $39,000. The SD (or the standard error) is $200. The
95% CI lies between what values?
 The
confidence limits are: $39,000 ±1.96($200);
($38,608 and $39,392).
Standard Error Estimation


When population SD is known:
When population SD is unknown:
X 

n
s
sX 
n
Confidence Interval for the Population
Mean if SD σ known or n ≥30
X z

σ known
n
s
X z
n
n ≥30
Unknown Population SD & a Small Sample
 This
situation is not covered by the central limit
theorem. However, we can reason that the population
is normal or reasonably close to a normal distribution.
 Under
these conditions, we replace the standard normal
distribution with the t distribution.
 The
t distribution is a continuous distribution with
many similarities to the standard normal distribution.
The Student’s t Distribution
Developed in the early 1900’s by William Gosset
 Also called “Student’s” distribution.
 Gosset was concerned with the behavior of the term:

X 
t
s n

Gosset was worried about the discrepancy between s and σ when s was
calculated from a very small sample.
Characteristics of the t Distibution

It is, like the z distribution, a continuous distribution.

It is, like the z distribution, bell-shaped and symmetrical.

There is a family of t distributions. They all have a mean of
0. The SD differs according to sample size (SD larger with
smaller n).

It is more spread out and flatter at the center than the
standard normal distribution. As sample size increases, t
distribution becomes closer to the standard normal
distribution, because the errors in using s to estimate σ
decrease with larger samples.
Confidence Interval for the Population
Mean, σ unknown
s
X t
n
When to Use the t Distribution
Is the
population
normal?
No
Yes
Is the
population SD
known?
Is n 30 or
more?
No
Use a
nonparametric
test
Yes
Use the z
distribution
No
Use the t
distribution
Yes
Use the z
distribution
Example: A tire manufacturer would like to investigate
the tread life of its tires. A sample of 10 tires driven
50,000 miles revealed a sample mean of .32 inch of
tread remaining with a SD of .09 inch. Construct a
95% CI for the population mean.
We assume the population distribution is normal.

We assume the population distribution is normal.

Since n=10, we use the formula

To find the value of t, we use Appendix B.2. Locate the 95% CI column. Move
down to df of 9 (10-1). The value in the cell is 2.262.
Substitute the values in the above formula:

s
X t
n
0.09
0.32  2.262
 0.32  0.064  (0.256, 0.384)
10
It is reasonable to conclude that the population mean is in this
interval. The manufacturer can be 95% confident that the mean
remaining tread depth is between 0.256 and 0.384 inches.
Appendix B.2, page 785
Confidence Interval, c
df
90%
95%
98%
6
7
8
9
2.776
1.833
2.262
2.821
10
degrees of freedom = df= n-1 = 10-1=9
99%
Example(ex12, page309): The ASPA want to estimate the mean yearly
sugar consumption. A sample of 16 people reveals the mean yearly
consumption to be 60 pounds with a standard deviation of 20 pounds.
a) What is the value of the population mean? What is the best estimate
for this value?
The population mean is unknown, but the best estimate is 60, the sample mean.
b) Explain why we need to use the t distribution. What assumption do we
need to make?
Use the t distribution as the standard deviation is unknown and the sample size is small.
However, assume the population is normally distributed.
c) For a 90 % confidence interval, what is the value of t ?
1.753, is obtained from Appendix B.2 for a CI of 90% and df=16-1=15.
c) Develop the 90% confidence interval for the population mean.
Between 51.235 and 68.765, found by
 20 
60  1.753 

 16 
d) Would it be reasonable to conclude that the population mean is
63 pounds?
That value is reasonable because it is inside the interval.
Sample Size for Estimating Population Mean
 zs 
n 
E
2
n is the sample size;
z is the standard normal value corresponding to the desired
level of confidence;
s is an estimate of the population SD;
E is the maximum allowable error (1/2 length of the CI).
If the result is not a whole number, round up.
Choosing an Appropriate Sample
Size
2.
The maximum allowable error E

Is the amount added and subtracted to the sample mean
to determine the endpoints for the CI.

It is the amount of error the researchers are willing to
tolerate.

A small allowable error will require a large sample. A
large allowable error will permit a small sample size.
Choosing an Appropriate Sample
Size
3.
The population SD

If the population is widely dispersed, a large sample
is required.

If the population is concentrated (homogeneous),
the required sample size will be smaller.

It may be necessary to estimate the population SD.
Example: A student wants to determine the mean amount of
earnings per month of city council members. The error in
estimating the mean is to be less than $100 with a 95% level of
confidence. The student found a report by the Department of
Labor that estimated the SD to be $1,000. What is the required
sample size?
E=100, z=1.96 and s=1000
Example (cont’d)

The maximum allowable error, E, is $100. The value of z for a 95% level
of confidence is 1.96, and the estimate for the SD is $1,000. substituting
these values in the formula:
2
2
 zs   (1.96)($ 1,000) 
2
n  
  (19.6)  384.16
$100
E 

A sample of 385 is required to meet the specifications.
Example (cont’d)
What if the student wanted to increase the level of confidence to
99%?
The corresponding z value is 2.58.
2
2
 zs   (2.58)($ 1,000) 
2
n  
  (25.8)  665.64
$100
E 

The recommended sample size is now 666. Notice the change in the
required sample size for the different levels of confidence. There is an
increase of 281 observations. This could greatly increase the cost and
the time of the study. Therefore, the level of confidence should be
considered carefully.
Example (cont’d)
What if the student wanted to increase the level of confidence to
99%?
The corresponding z value is 2.58.
2
2
 zs   (2.58)($ 1,000) 
2
n  
  (25.8)  665.64
$100
E 

The recommended sample size is now 666. Notice the change in the
required sample size for the different levels of confidence. There is an
increase of 281 observations. This could greatly increase the cost and
the time of the study. Therefore, the level of confidence should be
considered carefully.
Proportion
 The
fraction, ratio, or percent indicating the part of
the sample or the population having a particular trait
of interest.
Example: A recent survey indicated that 92 out of 100
surveyed favored the continued use of daylight
savings time in the summer. The sample proportion is
92/100, or .92, or 92%.
Assumptions for Proportion CI Construction
1.
2.
The binomial conditions have been met:
a.
Sample data is a result of counts.
b.
There are only 2 possible outcomes (Success and Failure).
c.
The probability of a success remains the same from one trial to
the next.
d.
The trials are independent.
The values nπ and n(1-π) should be both ≥5. (π is the population
proportion) so that we can use the CLT (z-distribution)
Sample Proportion
X
p
n
If π is the population
proportion, then p is a point
estimator for π.
Confidence Interval for a Population
Proportion
p  z p
Standard Error of the Sample
Proportion
p 
p(1  p)
n
Confidence Interval for a Population
Proportion
p(1  p)
pz
n
Example: The union representing ABC company is considering a merger with
Teamsters Union. According to ABC union bylaws, at least three-fourth
of the union membership must approve any merger. A random sample of
2,000 current ABC members reveal 1,600 plan to vote for the merger
proposal. What is the estimate of the population proportion? Develop a
95% confidence interval for the population proportion. Basing your
decision on this sample information, can you conclude that the
necessary proportion of ABC members favor the merger? Why?
Sample size is N=2000,
Number that approve the merger is X=1600.
The sample proportion p=X/N=1600/2000 = 0.8.
We determine the 95% CI. The z value is 1.96.
p(1  p )
.80(1  .80)
pz
 .80  1.96
 .80  .018
n
2,000
Example (cont’d)
The endpoints are .782 and .818. The lower limit is greater than .75. So, we
conclude that the merger proposal will likely pass because the interval
estimate includes values greater than 75% of the union membership.
Sample Size for the
Population Proportion
Three items need to be specified:
1. The desired level of confidence.
2. The margin of error in the population proportion.
3. An estimate of the population proportion.
z
n  p (1  p )  
E
2
If an estimate of π is not available, use p=0.5 to approximately
estimate the sample size.
Example: A group of students want to estimate the proportion of
cities with subsidized transportation systems. They want the
estimate to be within .10 of the population proportion. The
desired level of confidence is 90%. No estimate for the
population proportion is available. What is the required sample
size?



E= .10
The level of confidence is 90%. The corresponding z value is 1.65.
No estimate for p is available, so we use .50.
2
2
z
 1.65 
n   (1   )    (.5)(1  .5) 
  68.0625
E
 .10 
Round up, so a random sample of 69 cities is needed.