Download CHAPTER 7

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Misuse of statistics wikipedia , lookup

German tank problem wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
CHAPTER 7
Homework:5,7,9,11,17,22,23,25,29,33,37,41,45,51,
59,65,77,79
<Example 7.1>:
The U.S. Bureau of Census publishes annual
price figures for new mobile homes in
Construction Reports. The information is obtained
from sampling, not from a census. Suppose a
random sample of 40 new mobile homes yields the
prices shown in Table 7.1. Data are in thousands
of dollars, rounded to the nearest hundred.
1
Table 7.1
8.4 9.9 13.9 13.9 14.1 15.8 16.6 16.8
17.0 17.6 18.8 20.0 21.4 22.1 22.4 22.8
22.8 24.0 24.4 24.7 24.8 26.2 26.4 26.7
26.8 27.0 28.4 29.3 29.3 29.5 30.6 31.1
31.4 31.9 32.2 32.4 33.0 33.5 35.1 37.2
(a) Find the sample mean and sample standard
deviation.
(b) Find a point estimator for the population mean
and a point estimator for the population standard
deviation.
2
Output 7.01: Prices of Mobile Homes
SAS summary statistics
Analysis Variable : PRICE
N
Mean
Std Dev
Minimum
Maximum
----------------------------------------40
24.26
7.20
8.40
37.20
-----------------------------------------
3
10
15
20
25
30
35
Figure 7.1 Prices of Mobil Homes
4
<DISCUSSION>:
(1) Point estimators do not provide any information
of the reliability for itself. An interval estimator
can provide the reliability information. Therefore,
we want to construct an interval estimator of the
population mean.
(2) Usually, we need to have several reasonable
assumptions about the population before we can
actually construct the interval estimator.
5
Interval Estimator An interval estimator is a rule that
tells us how to use sample data to form an interval
estimator of the population parameters.
Confidence Coefficient:
The confidence coefficient of an interval estimator is
the probability that this interval estimator encloses the
population parameter.
Confidence Level:
The confidence level is the confidence coefficient
expressed as a percentage.
6
Sec 7.1:
Large sample interval
estimation of the population mean
(a) Is the sample representative?
We need to make sure that the sample is randomly
selected from the population. This is one of the
most basic principles of statistical inference.
7
(b) Check the sample size
If the sample size is greater than 30, you can
use the large sample procedure because that:
(i) the central limit theorem ensures at least
approximate normality of the distribution of the
sample mean and
(ii) the law of large number ensures that the
sample standard deviation provides a good
estimator of the population standard deviation.
8
(c) Find the sample mean and standard error of
the sample mean.
X
Sx  S
n
9
(d) Find za/2, where a is equal to one minus
confidence coefficient.
The (1-a)100% confidence interval for the
population mean  is then given by
X  Za /2 S X
10
Interpretation of a confidence interval:
For any single given confidence interval of the
population mean, we don't know whether it contains
the population mean or not. But if this confidence
interval (construction) procedure is used on a large
number of random samples, then about (1 - a ) * 100%
of the intervals do contain
the unknown population mean.
11
<Example 7.2> (Basic)
Find a 90% confidence interval for the
population mean, if
(a). n = 125, sample mean = 13.1, and sample
variance = 0.086.
(b). n = 50, sample mean = 21.9, and sample
variance = 3.44.
12
<Example 7.3> (Basic)
Acid rain, caused by the reaction of certain air
pollutants with rain water, appears to be a growing
problem in the north eastern section of the United
States. Pure rainfall through clean air registers a
pH value of 5.7. Suppose that water samples from
40 rainfalls are analyzed for pH and that mean and
standard deviation are equal to 3.7 and 0.5,
respectively. Find a 99% C.I. for the average pH
in rainfall and interpret this interval. What
assumptions must be made for the confidence
interval to be valid?
13
<Example 7.4> (Intermediate)
Suppose that a random sample (with sample
mean x ) of size n >= 30 is to be taken from a
population with mean  and standard deviation s.
(a) Determine the probability that the interval
[ x  2  sx , x  2  sx ]
will contain the population mean.
(b) Interpret your result in part (a) in terms of
percentages.
14
<Example 7.5> (Advance)
(a) Figure 7.3 is the box plot of the density of earth data
set.
Table 1 Measurements of the density of the earth
5.50 5.61 4.88 4.07 5.26 5.55
5.36 5.29 5.58 5.65 5.57 5.53
5.62 5.29 5.44 5.34 5.79 5.10
5.27 5.39 5.42 5.47 5.34 5.46
5.30 5.75 5.86 5.63
(b) Figure 7.5 is the bax plots of the same data wothout
one extreme value. Briefly discuss the main features such
as symmetry and outlier issue of this distribution .
(c) What is your estimate of the density of the earth based
on this density data set?
15
Output 7.02: SAS summary statistics
Analysis Variable: DENSITY (All)
N
Mean
Std Dev
Minimum
Maximum
---------------------------------------28
5.40
0.33
4.07
5.86
---------------------------------------Analysis Variable: DENSITY (without 4.07)
N
Mean
Std Dev
Minimum
Maximum
----------------------------------------27
5.45
0.21
4.88
5.86
----------------------------------------16
4.0
4.5
5.0
5.5
Figure 7.3 Density of the Earth
17
5.0
5.2
5.4
5.6
5.8
Figure 7.5 Density of the Earth
18
Sec 7.2: Small sample estimation
of population mean:
The student t Distribution
(a) The p.d.f. of a t random variable is given by
 n +1


 2 
x 2  ( n 1)/ 2
f ( x) =
(1  )
   x  .
n  (n / 2)
n
(b) The limiting distribution (as n approaches
infinity) of t is normal distribution.
19
(c) Student t distribution is symmetric about its
median or mean.
(d) The variance of a student t random variable is
larger than the variance of a standard normal
random variable.
20
Normal Curve
t curve with 4 d.f.
0.0
0.1
0.2
0.3
0.4
Figure 7.7 Student t Distribution
-4
0
-2
Quantile
aa
21
2
4
(2). Steps to construct the confidence
interval:
(a) Is this sample representative?
We need to make sure that the sample is randomly
selected from the population.
(b) Is the population approximately normal?
Note that this small sample confidence interval is
applicable only if the population has a normal
distribution!!! This can be assessed roughly by
using graphical tools to display the data.
22
(c) Find the sample mean X and standard error
sx  s / n
(d) Find ta/2,n-1 from Table VI, where (1 - a) is
the confidence coefficient and n is the
sample size.
(e) The (1-a)*100% confidence interval is
 x  ta /2 ,n1 * Sx
, x  ta / 2 ,n 1 * Sx
23

<Example 7.6> (Basic)
A very costly experiment has been conducted
to evaluate a new process for producing synthetic
diamonds. Six diamonds have been produced by
the new process with recorded weight 0.46, 0.61,
0.52, 0.48, 0.57, and 0.54 karat.
(a) What assumptions do we need to use the t
confidence interval?
(b) Find a 95% C.I. of the population mean.
24
Output 7.03: SAS summary statistics
Analysis Variable:
DIAMOND
N
Mean
Std Dev
Minimum
Maximum
---------------------------------------6
0.53
0.056
0.46
0.61
----------------------------------------
25
<Example 7.7> (Basic)
A random sample of size n=12 was selected
from a normally distributed population. The
sample mean is 47.1.
(a) Find the 95% C.I. of the population mean if
the sample variance is 4.7.
(b) Find the 95% C.I. of the population mean if
the sample size is 64 and sample variance is 4.7.
(c) Interpret the intervals in (a) and (b).
26
<Example 7.8> (Advance)
It is recognized that the cigarette smoking has
deleterious effect on lung function. A recent study
found that the carbon monoxide diffusing capacity
(DC) of the lung for 20 current smokers are as
follows:
61.675 71.210 73.003 73.154 76.014
82.115 84.023 88.017 88.602 89.222
90.479 90.677 91.052 92.295 100.615
102.754 103.768 106.755 108.579 123.086
27
Output 7.04: SAS summary statistics
Analysis Variable:
CARBON
N
Mean
Std Dev
Minimum
Maximum
----------------------------------------20
89.85
14.90
61.68
123.09
-----------------------------------------
28
(a). Find the sample mean and sample standard
deviation.
(b). What assumptions do you need to construct the t
confidence interval of DC for current smokers.
(c). Use stem-and-leaf display to check these
assumptions.
(d). Find 95% C.I. of DC for current smokers.
29
60
70
80
90
100
110
120
Figure 7.8 Diffusing Capacity
Quantiles
30
Sec 7.3: Large Sample Estimation of the
parameter for Binomial Population:
Properties of Sample Proportion
(1) Sample proportion is p = x / n.
(2) The expectation of sample proportion is
E( p ) = p.
(3). Variance of p is approximately
s p
2
p (1 p )

n
31
(4). For a sufficiently large sample, the sampling
distribution of p is approximately normal.
(5). Usually, the sample size is large enough if the
interval [ p  3s p , p  3s p ] does not include 0
or 1.
If the sample size is large enough, then an
approximate 1-a confidence interval for p is given
by
[ p  za / 2s p , p  za / 2s p ]
32
<Example 7.9> (Basic)
Suppose that 6841 U.S. households are
selected at random in order to estimate the
proportion, p, of all U.S. households that have a
computer. If 2470 out of the 6841 households
chosen have a computer, find the 95% C.I. for p.
33
<Example 7.10> (Intermediate)
A telephone survey conducted by the Florida State
University's Policy Science Program found that 74% of the
983 responding Florida adults were in favor of raising the
drinking age from 19 to 21.
(a). What assumptions is necessary for the 74% to be valid
point estimator of the percentages of all the adult
Floridians who favor raising the drinking age from 19 to
21?
(b). Is it possible that the assumption in part (a) might not
be satisfied in a telephone survey?
(c). Assume that the assumptions in part (a) were satisfied
by the pollsters, find a 95% C.I. for p.
34
Sec 7.4: Sample Size, Width of the
Confidence Interval, and Confidence Level:
(1). Width and Confidence Level:
The width of a confidence interval
increased when the confidence level of
this interval increased.
35
<Example 7.11> (Intermediate)
The U.S. Energy Information Administration
surveys household in order to obtain data on
monthly fuel expenditures for household vehicles.
Suppose 60 monthly fuel expenditures for
household vehicles are randomly selected and
their mean is equal to 58.56 with a 20.65 standard
deviation.
36
(a). Find a 95% confidence interval of household
monthly expenditure on vehicles.
(b). Find a 90% confidence interval of household
monthly expenditure on vehicles.
(c). Find a 99% confidence interval of household
monthly expenditure on vehicles.
(d). Discuss the relationship between confidence
level and width of confidence interval.
37
(2).
Samples Size and Width of
Confidence Interval:
Error Bound of Confidence Interval
The error bound of the estimator of the population
mean at confidence level 1 - a is
s 
B = Za /2 *   .
 n
It is equal to (about) the half width
of the confidence interval
38
Sample Size Requirements for Estimating
the Population Mean
The sample size required for a (1 - a)*100%
confidence interval for the population mean with
given error bound B is given by
n
 Za /2 s  2
B2
39
<Example 7.12> (Intermediate)
Suppose that you wish to estimate the mean pH of
rainfalls in an area that suffers heavy pollution due to the
discharge of smoke from a power plant.
(a). Assume that you know that sample standard deviation
is 0.5 pH and that you wish to estimate within 0.1 of
population mean with probability of 0.95. Approximately
how many rainfalls would have to include in your sample?
(b). Would it be valid to select all your specimens from a
single rainfall?
(c). Suppose that sample s.d. is 0.2. Repeat part (a). Do
you trust your answer? Explain.
40
Sample Size Requirement
Population Proportion
fot
the
The sample size required for a (1-a)*100%
confidence interval for the population proportion p
with a given error bound B is given by
n=
(Za /2 ) 2 P(1  P)
B2
41
<Example 7.13> (Advance)
Find the sp for the following proportion p for
n=100.
(a). P=0.5.
(b). P=0.6.
(c). P=0.7.
(d). P=0.4.
(e). P=0.3.
42
Figure 7.10
•
•
0.045
0.035 Error 0.045
Standard
•
•
•
•
0.035
•
•
0.025
0.025
•
•
•
0.1
0.2
0.3
0.4
0.5
0.6
Probabilities
43
0.7
0.8
0.9
<Example 7.14> (Intermediate)
Suppose that you want to estimate a binomial
parameter p correct to within 0.04 (i.e., B = 0.04)
at 95% confidence level.
(a). How large should the sample size n be?
(b). You suspect that P is equal to some value
between 0.1 and 0.3. How large should the
sample size n be?
(c). You suspect that P is equal to some value
between 0.6 and 0.8. How large should the
sample size n be?
44