Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Lecture 7 Sampling and Sampling Distributions • Population: Collect Data: Sampling In finance, economics or any other area of concern it is usually impossible to access entire population data, mainly because of money and time restrictions. Population = The complete set of all Sample = It is an observed SUBSET of items about which information is POPULATION. desired. n= Sample size such that n< N N= Population size, can be very large or even infinite Random Sampling = The procedure to select Parameter = It is a specific “n” objects from the population “N” with characteristic of population like equal chance (probability) of selection for each member of the population. mean, variance, standard deviation If the data set is entire population, then population mean is: N x1 x2 x N N x i 1 i 2 2 If the data set is from a sample, then the sample mean and variance are followings: n N The variance of the population is N Sample Statistics = It is a specific characteristic of a sample!! x x2 xn x 1 n x i 1 N i N S 2 x i 1 i 2 x n 1 x i 1 n i Sampling Distributions of SAMPLE MEANS Different samples may result different sample means. Example: Lets consider the following population: 1, 2, 3, 4. 2 1.25 N=4 and 2.5 Lets consider all possible sample of size 2: 4C2 = 4!/[2!(4-2)!] = 6 is the total number of possible samples Sample 1: 1, 2 Mean of first sample: Sample 2: 1, 3 Mean of first sample: Sample 3: 1, 4 Mean of first sample: Sample 4: 2, 3 Mean of first sample: Sample 5: 2, 4 Mean of first sample: Sample 6: 3, 4 Mean of first sample: x1 (1 2) / 2 1.5 x 2 (1 3) / 2 2 x 3 (1 4) / 2 2.5 x 4 (2 3) / 2 2.5 x 5 (2 4) / 2 3 x 6 (3 4) / 2 3.5 See Different samples may result different sample means!! Each of the sample has equal chance of occurrence, so the selection probability of each sample is (1/6) Sample Sample mean Probability 1,2 1.5 1/6 1,3 2 1/6 1,4 2.5 1/6 2,3 2.5 1/6 2,4 3 1/6 3,4 3.5 1/6 Different samples may result different sample means!! Lets see what is the average of the sample means i.e. average of E ( x) x P( x) 1.5 * (1 / 6) 2 * (1 / 6) ... 3.5 * (1 / 6) 2.5 x We can generalize this result as: E ( x) xP( x) x E (x ) • What is the variance of the sample means: Var ( x) E ( x μ)2 ( x μ)2 P( x) x (1.5 2.5) 2 * (1 / 6) ... (3.5 2.5) 2 * (1 / 6) 0,42 • What is the relation between Var (x) • If the population size is small than Var ( x) • 2 n ( and 2 : N n ) N 1 Here ( N-n / N-1 ) is the correction factor for finite population. • If the population size is large than Var ( x ) • • In our example Var ( x) 2 n ( 2 n N n 1.25 4 2 1.25 2 ) ( ) ( ) 0.42 N 1 2 4 1 2 3 What is the distribution of Sample Mean • In general when population size is large we learn that Var ( x ) x2 2 Sample mean Probability 1,2 1.5 1/6 1,3 2 1/6 1,4 2.5 1/6 2,3 2.5 1/6 2,4 3 1/6 3,4 3.5 1/6 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 n E ( x) x x P( x) x Sample probability • Consider our example: • Lets look at figure at the right hand side, we see that the distribution of the sample mean is symmetric around the mean, (looks similar to Normal Distribution !!!) 1.5 2 2.5 sample mean 3 3.5 A Bunch of Proves and Central Limit Theorem: • Lets consider a population composed of elements: X1,, X2,, …, XN with 2 mean and variance • • 1) When we pick up a RANDOM sample of ‘n’ which is: X1,, X2,, …, Xn These X random variables are INDEPENDENT of each other!! • Sample mean is actually nothing but a linear combination independent random variables So: 1 n 1 1 1 • E ( x) E ( n X i 1 i ) n E ( X 1 X 2 ... X n ) n [ E ( X 1 ) E ( X 2 ) ... E ( X n )] n n 1 n 1 2 1 2 2 2 Var ( x) Var ( X i ) ( ) Var ( X 1 ... X n ) ( ) (n ) n i 1 n n n • 2) If a population composed of elements: X1,, X2,, …, XN with mean , and variance, 2, and distributed Normally, • Then x ~N(, 2/n) • 3)Central Limit Theorem (CLT) : Generalizes this property. • IF SAMPLE SIZE “n” is LARGE (n>=30), then x ~ N(, 2/n) • See: The real distribution of X does not have to be known neither it does not have to be Normal. • If n is larger then x ~ N(, 2/n) How we benefit from CLT? if n is larger then • • • • x ~ N(, 2/n) Example1: The weights of people traveling by air in some region have the mean of 163 pounds and the standard deviation of 18 pounds. What is the probability that the average weight of 36 person will be greater then 167 pounds? Information about population: = 163 pounds, 2= 18 pounds n=36 > 30 CLT then x ~ N( = 163, 2/n= 182 /36) P( x 167) P( x x x 167 163 2 18 / 36 ) P( Z 1.33) 1 P( Z 1.33) 0.0918 • • • ACCEPTANCE INTERVALS When we observe sample mean: x We know that it comes from Normal distribution when n is large. 2 x ~N(, /n) • So we can use “Empirical Rule” EMPIRICAL RULE: For many LARGE populations empirical rule provides following approximations, (In our case with mean and standard deviation 2 ) x x Approximately 68% of the observations are in the interval: x x Approximately 95% of the observations are in the interval: 2 x x ****Almost all of the observations are in the interval: x 3 x If we consider the third rule it says that: will be in the interval of [ 3 , 3 ] x x x x x with almost 100% probability. For Normal Distribution we can find EXACT boundaries of the confidence intervals !! Confidence Intervals • Example: Lets consider that we are informed that the health insurance claims have historical mean of $4000 and standard deviation $2000. You take a random sample of 100. What are the 95% confidence interval for the sample mean? Interpret the result. • • • = $4000 , =$2000 Here we will find 95 % confidence interval. The (1-)% confidence interval is equal to in general: • Z Here / 2 is the Standard normal table values when the upper tail probability is /2. In our case =1-0.95=0.05/2=0.025 Thus Z 0.025 1.96 • • P(-z ≤ Z ≤ z) = 0.95 here z=1.96 and x x P(1.96 1.96) 0.95 x Z x Z / 2 x x x x P( x 1.96 x x x 1.96 x ) 0.95 • Thus with 95%probability (confidence) we can say that the sample mean lies between x x Z / 2 x [4000 1.96 * 2000,4000 1.96 * 2000] [3608,4392] Sampling Distributions of Sample Variance • The variance of the population is 2 N xi 2 i 1 N • The sample variance is: N s2 x i 1 i 2 x n 1 • If “n” is small proportion of “N” i.e. (n/N) is small i.e N is large 2 2 E ( s ) • Then : CONFIDENCE INTERVALS • • • The (1-)% confidence interval for sample mean: x : x Z / 2 x P( x Z / 2 x X x Z / 2 x ) 0.95 it means X ~ N ( X , X n ) NOTE: we consider that “n” observations are taken from NORMALLY distributed POPULATION P( Z / 2 x X Z / 2 x ) 0.95 • The (1-)% confidence interval for population : x Z / 2 x mean: here X • • if we know n P( x Z / 2 x x Z / 2 x ) 0.95 then P( x Z / 2 n x Z / 2 n ) 0.95 If we know population standard deviation ( i.e. if we know population standard deviation, ), then plug it into CONFIDENCE INTERVAL!!!! And USE standard NORMAL table to find Z /2 If we do NOT know then we can use SAMPLE VARIANCE, s2, as an estimator of population variance. N • it means S 2 x i 1 i 2 x n 1 As we know s2 is an consistent estimator i.e. s2 2 . • If we do NOT know and use SAMPLE VARIANCE, s2, as an estimator, then we do NOT use standard Normal distribution but “student’s t” distribution with (n-1) degree of freedom to find t / 2 P( x t / 2,n1 • • • • s s x t / 2,n1 ) 0.95 n n NOTE: we consider that “n” observations are taken from NORMALLY distributed POPULATION. We cannot use N(0,1) table since population variance is NOT known Some Properties of Student’s t distribution It is symmetric around mean “0” It approximates to Normal distribution as n increases (specifically if n>30) Examples • Example 8.3 from textbook: (if we know population variance) Suppose that shopping times for customers at a local grocery store are normally distributed. A random sample of 16 shoppers in the local grocery store had a mean of 25 minutes. Assume =6 minutes. Find the standard error of the sample mean, margin of error, and width for a 95 % confidence interval for the population mean. • Standard Error = Standard Deviation 6 1.5 16 • Standard Error of sample mean = • Margin of Error = • Width of the 95% confidence interval = 2* Margin of Error = 2*(2.94) =5.88 • 95 % confidence interval is: n Z / 2 x Z 0.05 / 2 (1.5) Z 0.025 (1.5) 1.96(1.5) 2.94 : x Z / 2 x 25 2.94 P(25 2.94 25 2.94) (22.06 27.94) 0.95 • Example 8.5 from textbook: (if we do NOT know population variance) Gasoline prices rose drastically during the early years of this century. Suppose that a recent study was conducted using truck drivers with equivalent years of experience to test run 24 trucks of a particular model over the same high way. Estimate the population mean fuel consumption for this truck model with 90%confidence if the fuel consumption, in miles per gallon, for these 24 trucks was: 15.5, 21, 18.5, 19.3, 19.7, …., 21.8 Here what we know about population? Nothing, we do not know population variance So we n=24, we will use sample variance to estimate population variance. Note: we should assume that population is Normal. How we can test this assumption? N s 2 2 xi x i 1 n 1 s 1.695 N (15.5 18.68) 2 ... (21.8 18.68) 2.873 24 1 x x i i 1 n 15.5 ... 21.8 18.68 24 s s x t / 2,n 1 ) 0.95 n n s 1.695 : x t / 2,n 1 18.68 t0.05, 231 18.68 (1.714)(0.346) n 24 : 18.68 0.5930 [18.09,19.27] P( x t / 2,n 1