Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
IIMC Long Duration Executive Education Executive Programme in Business Management Statistics for Managerial Decisions Sampling Theory Prof. Saibal Chattopadhyay IIM Calcutta A Brief Review • Uncertainty and Randomness: Theory of Probability • Decision Making Under Uncertainty: Utility Theory • Random Variables & Probability Distributions: Binomial, Poisson, Normal, Exponential • Joint Distribution of Two Random VariablesMarginal Distributions, Mean, Variance, Covariance, Correlation Coefficient, Independence of random variables • Regression Approach to the analysis of a bivariate data – Curve fitting and Least Squares Principle Sampling Theory • Census Vs. Sampling • Judgment Sampling Vs. Probability Sampling Different Probability Sampling Procedures • Simple Random Sampling – With Replacement (SRSWR) & Without Replacement (SRSWOR) • Stratified Random Sampling • Systematic Sampling Preliminary Concepts • Finite Population: N units having values Y1, Y2, …, YN • Parameter: A function of the population values Examples: • Population Mean = = Yi /N • Population SD = = (Yi - )2/N • Population Proportion = P Simple Random Sampling With Replacement (SRSWR) • • • • • • • n units to draw from N units Unit drawn is returned before next draw All possible choices are equally likely Nn possible samples of size n each Each sample has probability 1/Nn Same unit may repeat in the same sample Values of sampled units are random variables ! SRSWR Denote the sample values as y1, y2, …, yn. Consider y1 (the first sample value) This could be any one of the N values of the population y1 takes each of the values Y1, Y2, …, YN with probability 1/N. Thus P(y1 = Y1) = P(y1 = Y2 ) = …. = P( y1 = YN ) = 1/N. SRSWR What about y2? Sampling done with replacement; Composition of the population unchanged; Second sample value y2 is identically distributed as y1 True for all subsequent sample values • Sample Values are identically distributed P(yi = Y1) = P(yi = Y2 ) = …. = P( yi= YN ) = 1/N, for all i = 1, 2, …n. SRSWR Are the sample values independent? P( yi = Y1 and yj = Y2) = 1/N2 P( yi = Y1) = 1/N & P( yj = Y2) = 1/N yi and yj are independent True for all pairs of values • Sample Values are identically distributed • Independent and identically distributed (IID) random variables SRS Without Replacement (SRSWOR) • n units to draw at random from N units • Unit once drawn is not returned before drawing the next unit • All possible choices are equally likely • NCn possible samples of size n each • Each sample has probability 1/NCn • Units in a sample are all distinct • Values of sampled units are random variables ! SRSWOR Are the sample units still identically distributed ? For y1 the distribution is same as SRSWR What about y2 ? P(y2= Y1 | y1 = Yi) = 0 if i = 1; = 1/(N-1) otherwise P(y2= Y1) = (1/N).0 + (N-1). (I/N).(I/(N-1)) = 1/N, same as in SRSWR ! • Yes; units are identically distributed SRSWOR • • Are the sample units still independent? P( y1 = Y1, y2 = Y2) = 1/N(N-1), but P(y1 = Y1) = 1/N = P(y2 = Y2) Y1 and y2 are not independent True for all sample values No - Sample units are not independent What about their dependence? SRSWOR Are the sampled units uncorrelated? • No; Covariance between any two of them is - 2/(N-1); What is a Statistic? • A function of the sample values; Examples • Sample Mean • Sample SD • Sample Proportion SRSWOR • • • • • • A Statistic is a Random Variable Probability Distribution of a Statistic – Called Sampling Distribution Mean of a Statistic – Called Expectation SD of a Statistic – Called Standard Error (SE) Role of SE – compares efficacy of different sampling procedures Smaller the SE, better is the sampling Sampling Distribution of Sample Mean in Simple Random Sampling • • • • • • • • Finite Population of size N Population mean = and SD = Random Sample of size n drawn (WR/WOR) Statistic is Sample Mean Expectation = (both SRSWR and SRSWOR) SE = /n for SRSWR SE = (/n).( FPC) for SRSWOR FPC = Finite Population Correction = (N-n)/(N-1) Comparing SRSWR and SRSWOR • For n =1, FPC = 1, so SRSWR and SRSWOR are equivalent • For n > 1, FPC < 1, so SRSWOR is better than SRSWR • Limiting Behaviour: As N becomes large with n fixed, both sampling methods are asymptotically equivalent ---- Intuitively Obvious ! Can we use SRS always? • SRS is too fair ! • Ignores typical composition of a population Example: Suppose the Population is characterized by sex – males and females • N1 males and N2 females in the population • N1 ‘too large’ compared to N2, say at least 80% are males • Will an SRS be representative here? Drawback of SRS • • • • Possible not; most likely sample will have too few females; may be none at all ! Not a representative sample, at least for a social survey Need representations of all sections of the society How can we ensure that? Divide the population into several parts! Stratified Random Sampling Population has N units: N1 of first type (males), N2 = N – N1 of a second type (females) Total Sample Size = n • Divide n into two parts, n1 and n2 • Draw n1 samples from N1 units • Independently draw n2 samples from N2 units • Use SRS for drawing the units from the subpopulations (strata) • Combine the two sub-samples to get a Stratified Random Sample of size n = n1 + n2 How to Choose n1 and n2 ? Proportional Allocation: • Choose n1 and n2 proportional to the subpopulation sizes N1 and N2 • n1 = (n/N).N1 & n2 = (n/N).N2 Optimum Allocation: • Choose n1 and n2 proportional to the subpopulation SD’s 1 and 2 Systematic Sampling • Units are arranged in a sequence • N = n.k; numbered 1 – N; sample size = n • Divide the population into n groups of k consecutive units each • Select one unit at random from the first group with units 1 – k • Select every k-th unit thereafter • k possible samples; probability of each=1/k • Gives a sample uniformly spread over the population Central Limit Theorem • • • • • • Sampling from a normal population Mean = and SD = SRS of size = n (With Replacement) Statistic = Sample Mean Expectation = ; SE = /n Z = (Sample Mean - )/(/n) is N(0, 1) What happens if sampling is done from a nonnormal distribution? Distribution of sample mean is no longer normal though formulae for Expectation & SE are still true Can we say anything more? Yes, provided the sample size n is ‘large’ ! How large is ‘large’ ? n 30 will do !! What happens if n is ‘large’ ? • Distribution of sample mean is still normal, but only approximately • Approximation is better and better as n becomes larger and larger • Always true regardless of the underlying distribution from which sampling is done Central Limit Theorem Multistage Sampling Methodologies • Generally used to counter presence of nuisance parameter (unknown) • Used in situations where the optimal sample size required is not known a-priori • Sampling done in two or more stages • First Stage: Select a ‘small’ sample m • Use this pilot sample to get an estimate E of the unknown sample size • STOP if E is less than m • Second Stage: Select a second sample of size E – m otherwise Some Standard Sampling Distributions 1. Chi-Square Distribution • n IID N(0, 1) variables: Z1, Z2, …, Zn • Y = Sum of Squares of Z1, Z2, …, Zn = Z21+ Z22 + … + Z2n • Y is Chi-Square with n degrees of freedom (d.f) • Mean = n; SD = 2n • (Y – n)/ 2n is Standard Normal for large n • Distribution is positively skewed; probability table available Some Standard Sampling Distributions 2. t – distribution • Z is N(0, 1) • Y is Chi-Square with d.f = n • Z and Y are independently distributed • Sampling Distribution of t = Z/(Y/n) is called the t-distribution with d.f = n • Similar to N(0, 1); Approaches N(0,1) as sample size n is large ( n 30); Probability tables for n < 30 available Some Standard Sampling Distributions 3. F – distribution • Y1 is Chi-Square with d.f = n1 • Y2 is Chi-Square with d.f = n2 • Y1 and Y2 are independently distributed • Sampling distribution of F = (Y1/n1)/(Y2/n2) is F distribution with d.f = (n1, n2) Useful for Hypothesis-Testing problems when we have samples available from a normal population (exact or approximate) References Text Book for the Course • Statistical Methods in Business and Social Sciences: Shenoy, G.V. & Pant, M. (Macmillan India Limited) Suggested Reading • Applications of Sequential Methodologies: Mukhopadhyay, Nitis, Datta, Sujay & Chattopadhyay, Saibal. (Marcel Dekker, New York, 2004).