Download sampling distribution

Document related concepts

Degrees of freedom (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Confidence interval wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Sampling (statistics) wikipedia , lookup

German tank problem wikipedia , lookup

Student's t-test wikipedia , lookup

Gibbs sampling wikipedia , lookup

Transcript
CHAPTER-6
Sampling error and
confidence intervals
Parameter
population
error
statistic
sample
Section 1 sampling error of mean
Section 2 t distribution
Section 3 confidence intervals for the
population mean
Section 1
sampling error of mean
A simple random sample is a sample of size n
drawn from a population of size N in such a way
that every possible random samples n has the
same probability of being selected. Variability
among the simple random samples drawn from
the same population is called sampling variability,
and the probability distribution that characterizes
some aspect of the sampling variability, usually
the mean but not always, is called a sampling
distribution. These sampling distributions allow us
to make objective statements about population
parameters without measuring every object in the
population.
[Example 1]
The population mean of DBP in the
Chinese adult men is 72mmHg with
standard deviation 5mmHg.
10 adult participants was chosen
randomly from the Chinese adult men,
here we can calculate the sample mean
and sample standard deviation.
Supposing sampling 100 times,
what’s the result?
X 1 , S1
  72,   5
N
X 2 , S2
X 3 , S3
X 100 , S100
linkage
If random samples are repeatedly drawn from a
population with a mean μ and standard deviation
σ , we can find:
1 the sample means are different from the others
2 The sample mean are not necessary equal to
population mean μ
3 The distribution of sample mean is symmetric
about μ
HOW TO EXPLORE THE SAMPLING
DISTRIBUTION FOR THE MEAN?
The difference between sample
statistics and population parameter or
the difference among sample statistics
are called sampling error.

In real life we sample only once, but we
realize that our sample comes from a
theoretical sampling distribution of all
possible samples of a particular size. The
sampling distribution concept provides a link
between sampling variability and probability.
Choosing a random sample is a chance
operation and generating the sampling
distribution consists of many repetitions of
this chance operation.
Central limit Theorem
 When
sampling from a normally
distributed population with mean μ,
the distribution of the sample mean
will be normal with mean μ
 =10
n=4
x 5
 = 50
Population
distribution
X
n =16
 x  2.5
 x  50
Sampling
distribution
X
Central limit Theorem
 When
sampling from a nonnormally
distributed population with mean μ,
the distribution of the sample mean
will be approximately normal with
mean μ as long as n is larger
enough (n>50).
x


n
x  
X
Standard error (SE) can be used
to assess sampling error of mean.
Although sampling error is
inevitable, it can be calculated
accurately.
Calculation of standard error (SE)

theoretical value

estimation of SE
of SE
x 

sx 
s
N
s↑→SE↑
n↑→SE↓
linkage
n

Example 5.2
One analyst chose randomly a sample
(n=100) and measured their weights with a
mean of 72kg and standard deviation of
15kg.
Question: what is the standard error?
Solution:
S X  S / n  15 / 100  1.5
Exercise 5.1
Consider a sample of measurement
100 with mean 121cm and standard
deviation 7cm drawn from a normal
population. Try to compute its standard
error.
Solution:
SX  s / n  7 / 100  0.7
Section 2
t distribution
1. Definition
N(μ, 2)
N(0, 1)
Z  ( X  ) / 
 0
 1
X 1 ( s1 )
X 2 (s2 )

Random sampling
Z  ( X  ) / 
X k (sk )

Z  ( X  ) /  X
Usually standard deviation σ is
unknown, so we can only get s,
then we can calculate S X
obviously ,
X 
X
X 

SX
X 
~ t  distributi on,  n  1
sx
This sampling distribution was developed
by W.S Gossett and published under the
pseudonym “student” in 1908. it is,
therefore, sometimes called the “student’s
t distribution and is really a family of
distributions dependent on the n-1.
Z
X 
Z distribution
X
t distribution
X 
t
sX
=n-1
2. the characteristics of t distribution
graph
FIG 4 the graph of t distribution
with different degrees of freedom
1
symmetric about 0;
2
the shape of t curve is determined
by degree of freedom, df=n-1.
3
t-distribution is approximated to
standard normal distribution when n
is infinite.
t critical value with one-sided probability → t(α,)
t critical value with two-sided probability →
t(α/2,)
Example 5.2
With n=15, find t0 such that
P(-t0≤t≤ t0 )=0.90
solution
From t value table, df=15-1=14, the twotailed shaded area equals 0.10, so
-t0=-1.761 and t0 =1.761
Section 3
confidence intervals for
the population mean
Statistical methods
descriptive statistics
inferential statistics
parameter
estimation
Point
estimation
hypothesis
test
Intervals
estimation
1. Basic concepts
Parameter estimation:
Deduce
the population parameter basing
on the sample statistics
Point Estimate
A single-valued estimate.
X 
s 
p 
A single element chosen from a sampling
distribution.
Conveys little information about the actual
value of the population parameterabout the
accuracy of the estimate.
Confidence Interval or Interval
Estimation
An interval or range of values
believed to include the unknown
population parameter.
Point
estimation
Intervals estimation
Lower limit
Upper limit
a/2
x
1-a
x  
a/2
X
2. Methods
1. σ is known
Z distribution
Z 
2. σ is unknown ,n>50
X 
t distribution
X
CI
(x  Za / 2

n
,
x  Za / 2

)
n
σ is unknown,n≤50
X 
t
sX
CI (x  ta / 2 ,
S
S
,
x  ta / 2,
)
n
n
Example 5.3
A horticultural scientist is developing a new
variety of apple. One of the important traits, in
addition to taste, color, and storability, is the
uniformity of the fruit size. To estimate the
weight she samples 100 mature fruit and
calculates a sample mean of 220g and
standard deviation 5g
Develop 95% confidence intervals for
the population mean μ from her sample
solution
X  Za / 2 s X    X  Za / 2 s X
5
L1  220  1.96
 219.02 g
100
5
L2  220  1.96
 220.98 g
100
95% confidence intervals for the
population mean is between 219.02
and 220.98g
Exercise
A forester is interested in estimating the
average number of ‘count trees’ per acre. A
random sample of n=64 one acre is selected
and examined. The average (mean) number of
count trees per acre is found to be 27.3, with a
standard deviation of 12.1. Use this information
to construct 95% confidence interval for μ.
solution
X  Za / 2 s X    X  Za / 2 s X
12.1
L1  27.3  1.96
 24.36
64
12.1
L2  27.3  1.96
 30.24
64
95% confidence intervals for the
population mean is between 24.36 and
30.24
The forester is 95% confident that the
population mean for “count trees” per
acre is between 24.36 and 30.24
Example 5.4
The ecologist samples 25 plants and
measures their heights. He finds that
the sample has a mean of 15cm and a
sample deviation of 4cm .
what is the 95% confidence interval for
the population mean μ
solution
X  ta / 2, (s / n )    X  ta / 2, (s / n )
df=25-1=24
t0.05, 24  2.064
4
L1  15  2.064
 13.349cm
25
4
L2  15  2.064
 16.651cm
25
The plant ecologist is 95% confident that
the population mean for heights of
these plants is between 13.349 and
16.651cm
Exercise 1
one doctor samples 25 men and
measures their heights. He finds that
the sample has a mean of 172.12cm
and a sample deviation of 4.50cm .
what is the 95% confidence interval for
the population mean μ
solution
4 .5
L1  172.12  2.064
 170.26cm
25
4 .5
L2  172.12  2.064
 173.98cm
25
95% confidence intervals for the population
mean is between 170.26 and 173.98
Exercise 2
Random samples of size 9 are repeatedly
drawn from a normal distribution with a
mean of 65 and a standard deviation of 18.
Describe the sampling distribution of mean.
65
Lower limit
Upper limit
18
L1  65  2.306
 61.4
9
18
L2  65  2.306
 78.6
9
PROBLEM
1.
What are the difference of SD and SE?
2.
What is the medical reference range?
What is the confidence intervals for
population mean?