Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
STATISTICS Sampling and Sampling Distributions Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University Random sample • Let the random variables X1, X2, …, Xn have a joint density f X1 , X 2 ,, X n (,, ,) that factors as follows: f X1 , X 2 ,, X n ( x1 , x 2 , x n ) f ( x1 ) f ( x 2 ) f ( x n ) where f () is the common density of each Xi . Then (X1, X2, …, Xn) is defined to be a random sample of size n from a population with density f () . • If X1, X2, …, Xn is a random sample of size n from f () , then X1, X2, …, Xn are stochastically independent. 5/9/2017 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ. 2 Statistic • A statistic is a function of observable random variables, which is itself an observable random variable and does not contain any unknown parameters. • A statistic must be observable because we intend to use it to make inferences about the density functions of the random variables. 5/9/2017 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ. 3 • For example, if a random variable has a probability density function N (, 2 ) where and are unknown, then X is not a statistic. • If a statistic is not observable, then it can not be used to inference the parameters of the density function. n i 1 5/9/2017 2 i Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ. 4 • An observation of random sample of size n can be regarded as n independent observations of a random variable. 5/9/2017 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ. 5 • One of the central problems in statistics is to find suitable statistics to represent parameters of the probability distribution function of a random variable. Sample {x1 ,, xn } Population N ( , 2 ) Statistics ( x , s 2 ) 2 Parameters ( , ) Observable 5/9/2017 Unknown Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ. 6 Sample moments • Let X1, X2, …, Xn be a random sample from the density f () . Then the rth sample moment about 0 is defined as n 1 r ' Mr Xi n i 1 5/9/2017 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ. 7 • In particular, if r = 1, we have the sample mean X n ; that is, 1 n Xn Xi n i 1 • Also, the rth sample moment about the sample mean is defined as 1 n r Mr (Xi Xn) n i 1 5/9/2017 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ. 8 • Theorem – Let X1, X2, …, Xn be a random sample from the density f () . The expected value of the rth sample moment about 0 is equal ' ' th to the r population moment; i.e., E[ M r ] r Also, Var[ M r' ] 1 1 ' 2r r 2 {E[ X ] ( E[ X ]) } [ 2 r ( r' ) 2 ] n n 5/9/2017 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ. 9 • Special case: r=1 1 2 2 Var[ X ] {E[ X ] ( E[ X ]) } n 1 ' Var ( X ) ' 2 2 [ 2 ( 1 ) ] X /n n n 5/9/2017 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ. 10 Sample statistics • Let X1, X2, …, Xn be a random sample from the distribution of a random variable X. Sample mean and sample variance of the distribution are respectively defined to be n 1 X Xi n i 1 5/9/2017 n 1 2 2 S (Xi X ) n 1 i 1 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ. 11 5/9/2017 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ. 12 5/9/2017 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ. 13 Estimating the mean • Given a random sample x1 , x2 , xn from a probability density function f(.) with unknown mean μ and finite variance σ2, we want to estimate the mean using the random sample. • Using only a finite number of values of X (a random sample of size n), can any reliable inferences be made about E(X), the average of an infinite number of values of X? • Will the estimate be more reliable if the size of the random sample is larger? 5/9/2017 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ. 14 R-program demonstration 5/9/2017 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ. 15 5/9/2017 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ. 16 5/9/2017 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ. 17 Standard deviation of sample means w.r.t. sample size 5 4.5 4 3.5 What is the theoretical basis? 3 y = 19.938x-0.4998 Y=f(x)=? R = 0.9995 2.5 2 2 1.5 1 0.5 0 0 5/9/2017 1000 2000 3000 4000 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ. 5000 18 Histograms of sample mean and sample standard deviation ns=30 5/9/2017 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ. 19 Histograms of sample mean and sample standard deviation ns=5000 5/9/2017 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ. 20 Weak Law of Large Numbers (WLLN) • Let f(.) be a density with mean μ and variance σ2, and let X n be the sample mean of a random sample of size n from f(.). Let ε and δ be any two specified numbers satisfying ε>0 and 0<δ<1. 2 If n is any integer greater than , then 2 P[ X n ] 1 5/9/2017 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ. 21 Recall the theorem 5/9/2017 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ. 22 5/9/2017 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ. 23 • (Example 1) Suppose that some distribution with an unknown mean has its variance equal to 1. How large a random sample must be taken such that the probability will be at least 0.95 that the sample mean X n will lie within 0.5 of the population mean? 1 0.5 2 1 0.95 0.05 1 n 80 2 (0.05)(0.5) 5/9/2017 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ. 24 (Example 2) How large a random sample must be taken in order that you are at least 99% certain that X n is within 0.5σ from μ? 0.5 1 0.992 0.01 n (0.01)(0.5 ) 2 400 What if we know in advance that the random sample is to be drawn from a normal distribution? 5/9/2017 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ. 25 A much smaller sample size is required if the distribution is known in advance. 5/9/2017 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ. 26 The Central Limit Theorem • Let f(.) be a density with mean μ and finite variance σ2. Let X n be the sample mean of a random sample of size n from f(.). Then Zn Xn n approaches the standard normal distribution as n approaches infinity. 5/9/2017 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ. 27 • The importance of the CLT is the fact that the mean X n of a random sample from any distribution with finite variance σ2 and mean μ is approximately distributed as a normal2 random variable with mean μ and variance n . X n Zn ~ N , n n 5/9/2017 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ. 28 R-program demonstration - Central Limit Theorem 5/9/2017 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ. 29 5/9/2017 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ. 30 5/9/2017 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ. 31 5/9/2017 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ. 32 5/9/2017 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ. 33 Sampling distributions • Given random samples of certain probability densities, we often are interested in knowing the probability densities of sampling statistics. – – – – – – Poisson distribution Exponential distribution Normal distribution Chi-square distribution Standard normal and chi-square distributions Student’s t-distribution 5/9/2017 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ. 34 Poisson distribution 5/9/2017 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ. 35 5/9/2017 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ. 36 5/9/2017 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ. 37 5/9/2017 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ. 38 5/9/2017 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ. 39 Exponential distribution 5/9/2017 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ. 40 Chi-squared distribution 1 x f X ( x; k ) 2(k / 2) 2 ( k / 2 ) 1 e x / 2 I[ 0, ) ( x) , k 1,2, . E[ X ] k Var[ X ] 2k m X (t ) (1 2t ) k / 2 for t 1 / 2. • The chi-squared distribution is a special case of the gamma distribution with k / 2 and 2. 5/9/2017 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ. 41 Normal distribution 5/9/2017 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ. 42 5/9/2017 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ. 43 5/9/2017 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ. 44 5/9/2017 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ. 45 5/9/2017 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ. 46 5/9/2017 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ. 47 The sample mean and sample standard deviation are independently distributed. [Only valid for the normal distribution.] 5/9/2017 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ. 48 5/9/2017 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ. 49 Chi-square distribution F distribution with degrees of freedom m and n [( m n) / 2] m f X ( x) (m / 2)(n / 2) n 5/9/2017 m/2 x ( m2) / 2 I ( x) ( m n ) / 2 ( 0 , ) [1 (m / n) x] Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ. 50 5/9/2017 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ. 51 Standard normal and chi-square distributions 5/9/2017 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ. 52 5/9/2017 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ. 53 Student’s t-distribution Student’s t distribution with k degrees of freedom As the number of degrees of freedom increases, the Student’s t distribution approaches the standard normal distribution. 5/9/2017 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ. 54 • The "student's" distribution was published in 1908 by W. S. Gosset. Gosset, however, was employed at a brewery that forbade the publication of research by its staff members. To circumvent this restriction, Gosset used the name "Student", and consequently the distribution was named "Student t-distribution. 5/9/2017 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ. 55 Order statistics 5/9/2017 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ. 56 5/9/2017 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ. 57 5/9/2017 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ. 58 Extremal Types Theorem • Reference source 5/9/2017 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ. 59 5/9/2017 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ. 60 5/9/2017 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ. 61 5/9/2017 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ. 62 5/9/2017 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ. 63 5/9/2017 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ. 64 Gumbel Distribution (Extreme Value Type I) 5/9/2017 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ. 65