Survey							
                            
		                
		                * Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
STATISTICS
Sampling and Sampling
Distributions
Professor Ke-Sheng Cheng
Department of Bioenvironmental Systems Engineering
National Taiwan University
Random sample
• Let the random variables X1, X2, …, Xn have a
joint density f X1 , X 2 ,, X n (,, ,) that factors as
follows:
f X1 , X 2 ,, X n ( x1 , x 2 , x n )  f ( x1 ) f ( x 2 ) f ( x n )
where f () is the common density of each Xi .
Then (X1, X2, …, Xn) is defined to be a random
sample of size n from a population with
density f () .
5/6/2017
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
2
• If X1, X2, …, Xn is a random sample of size n
from f () , then X1, X2, …, Xn are stochastically
independent.
• Histogram -- A frequency (or relative frequency)
plot of observed data is called a frequency
histogram (or relative frequency histogram).
5/6/2017
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
3
Frequency Histogram
5/6/2017
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
4
Cumulative frequency
5/6/2017
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
5
Relative cumulative frequency
5/6/2017
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
6
Statistic
• A statistic is a function of observable random
variables, which is itself an observable random
variable and does not contain any unknown
parameters.
• A statistic must be observable because we
intend to use it to make inferences about the
density functions of the random variables.
5/6/2017
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
7
• For example, if a random variable has a
probability density function N (,  2 ) where  and
 are unknown, then  X   is not a statistic.
• If a statistic is not observable, then it can not be
used to inference the parameters of the density
function.
n
i 1
5/6/2017
2
i
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
8
• An observation of random sample of size n can
be regarded as n independent observations of a
random variable.
5/6/2017
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
9
• One of the central problems in statistics is to
find suitable statistics to represent parameters
of the probability distribution function of a
random variable.
Sample {x1 ,, xn }
Population N (  , 2 )
Statistics ( x , s 2 )
2
Parameters (  ,  )
Observable
5/6/2017
Unknown
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
10
Sample moments
• Let X1, X2, …, Xn be a random sample from the
density f () . Then the rth sample moment about
0 is defined as
n
1
r
'
Mr   Xi
n i 1
5/6/2017
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
11
• In particular, if r = 1, we have the sample
mean X n ; that is,
1 n
Xn   Xi
n i 1
• Also, the rth sample moment about the sample
mean is defined as
1 n
r
Mr   (Xi  Xn)
n i 1
5/6/2017
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
12
• Theorem – Let X1, X2, …, Xn be a random
sample from the density f () . The expected
value of the rth sample moment about 0 is equal
'
'
th
to the r population moment; i.e., E[ M r ]   r
Also,
Var[ M r' ]
1
1 '
2r
r 2
 {E[ X ]  ( E[ X ]) }  [  2 r  (  r' ) 2 ]
n
n
5/6/2017
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
13
• Special case: r=1
1
2
2
Var[ X ]  {E[ X ]  ( E[ X ]) }
n
1 '
Var
(
X
)
' 2
2
 [  2  ( 1 ) ] 
X /n
n
n
5/6/2017
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
14
Sample statistics
• Let X1, X2, …, Xn be a random sample from the
distribution of a random variable X. Sample
mean and sample variance of the distribution are
respectively defined to be
n
1
X   Xi
n i 1
5/6/2017
n
1
2
2
S 
(Xi  X )
n  1 i 1
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
15
5/6/2017
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
16
5/6/2017
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
17
Estimating the mean
• Given a random sample x1 , x2 , xn from a probability
density function f(.) with unknown mean μ and finite
variance σ2, we want to estimate the mean using the
random sample.
• Using only a finite number of values of X (a random
sample of size n), can any reliable inferences be made
about E(X), the average of an infinite number of values
of X?
• Will the estimate be more reliable if the size of the
random sample is larger?
5/6/2017
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
18
R-program demonstration
5/6/2017
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
19
5/6/2017
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
20
5/6/2017
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
21
5/6/2017
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
22
Mean of sample means w.r.t. sample
size
60.2
60.15
60.1
60.05
60
59.95
59.9
59.85
59.8
0
5/6/2017
1000
2000
3000
4000
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
5000
23
Mean of sample standard deviations w.r.t.
sample size
20.02
20
19.98
19.96
19.94
19.92
19.9
19.88
19.86
19.84
0
5/6/2017
1000
2000
3000
4000
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
5000
24
Standard deviation of sample means w.r.t. sample size
5
4.5
4
3.5
What is the theoretical basis?
3
y = 19.938x-0.4998
Y=f(x)=?
R = 0.9995
2.5
2
2
1.5
1
0.5
0
0
5/6/2017
1000
2000
3000
4000
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
5000
25
Histograms of sample mean and standard deviation
ns=30
5/6/2017
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
26
Histograms of sample mean and standard deviation
ns=5000
5/6/2017
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
27
Weak Law of Large Numbers
(WLLN)
• Let f(.) be a density with mean μ and variance
σ2, and let X n be the sample mean of a random
sample of size n from f(.). Let ε and δ be any
two specified numbers satisfying ε>0 and 0<δ<1.
2
If n is any integer greater than
, then
2
 
P[  X n     ]  1  
5/6/2017
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
28
Recall the theorem
5/6/2017
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
29
5/6/2017
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
30
• (Example) Suppose that some distribution
with an unknown mean has its variance equal
to 1. How large a random sample must be
taken such that the probability will be at least
0.95 that the sample mean X n will lie within
0.5 of the population mean?
  1   0.5
2
  1  0.95  0.05
1
n
 80
2
(0.05)(0.5)
5/6/2017
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
31
(Example) How large a random sample
must be taken in order that you are 99%
certain that X n is within 0.5σ of μ?
  0.5
  1  0.992  0.01
n
5/6/2017
(0.01)(0.5 )
2
 400
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
32
Raingauge network design
• Assuming there are already some raingauge
stations in a catchment, and we are interested in
determining the optimal number of stations that
should exist to achieve a desired accuracy in the
estimation of mean rainfall.
• Two approaches
– (1) Standard deviation of the sample mean should
not exceed a certain portion of the population mean.
– (2) P[     xn     ]  1  
5/6/2017
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
33
Criterion 1
Standard deviation of the sample mean should not exceed a
certain portion of the population mean.
X n ~ N (  ,  / n) ,
2
X 
n
n
 CV 
n
  
5/6/2017
  ,
( X n   ) ~ N (0,
2
n
)
 CV
 n
 
2
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
34
Criterion 2
P[     xn     ]  1  
• From the weak law of large numbers,
n 2
2
What assumptions have we made for such approaches of
network design ?
Data independence
What are the practical considerations in monitoring network
design?
5/6/2017
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
35
The Central Limit Theorem
• Let f(.) be a density with mean μ and finite
variance σ2. Let X n be the sample mean of a
random sample of size n from f(.). Then
Zn 
Xn  
n
approaches the standard normal distribution as
n approaches infinity.
5/6/2017
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
36
• The importance of the CLT is the fact that the
mean X n of a random sample from any
distribution with finite variance σ2 and mean μ is
approximately distributed as a normal2 random
variable with mean μ and variance  n .
X n    Zn 
 ~ N  ,
n
n
5/6/2017
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
37
R-program demonstration
- Central Limit Theorem
5/6/2017
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
38
5/6/2017
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
39
5/6/2017
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
40
5/6/2017
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
41
n=100
n=50
n=25
n=2
5/6/2017
n=10
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
42
Sampling distributions
• Given random samples of certain probability densities,
we often are interested in knowing the probability
densities of sampling statistics.
–
–
–
–
–
–
Poisson distribution
Exponential distribution
Normal distribution
Chi-square distribution
Standard normal and chi-square distributions
Student’s t-distribution
5/6/2017
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
43
Poisson distribution
5/6/2017
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
44
5/6/2017
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
45
5/6/2017
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
46
5/6/2017
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
47
5/6/2017
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
48
Exponential distribution
5/6/2017
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
49
Normal distribution
5/6/2017
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
50
5/6/2017
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
51
5/6/2017
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
52
5/6/2017
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
53
5/6/2017
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
54
Chi-square distribution
5/6/2017
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
55
5/6/2017
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
56
Standard normal and chi-square
distributions
5/6/2017
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
57
5/6/2017
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
58
Student’s t-distribution
Student’s t distribution with k degrees of freedom
5/6/2017
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
59
• The "student's" distribution was published in
1908 by W. S. Gosset. Gosset, however, was
employed at a brewery that forbade the
publication of research by its staff members. To
circumvent this restriction, Gosset used the
name "Student", and consequently the
distribution was named "Student t-distribution.
5/6/2017
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
60
Order statistics
5/6/2017
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
61
5/6/2017
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
62
5/6/2017
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
63