Download Sampling Theory

IIMC Long Duration Executive Education Executive Programme in Business Management Statistics for Managerial Decisions Sampling Theory Prof. Saibal Chattopadhyay IIM Calcutta A Brief Review • Uncertainty and Randomness: Theory of Probability • Decision Making Under Uncertainty: Utility Theory • Random Variables & Probability Distributions: Binomial, Poisson, Normal, Exponential • Joint Distribution of Two Random VariablesMarginal Distributions, Mean, Variance, Covariance, Correlation Coefficient, Independence of random variables • Regression Approach to the analysis of a bivariate data – Curve fitting and Least Squares Principle Sampling Theory • Census Vs. Sampling • Judgment Sampling Vs. Probability Sampling Different Probability Sampling Procedures • Simple Random Sampling – With Replacement (SRSWR) & Without Replacement (SRSWOR) • Stratified Random Sampling • Systematic Sampling Preliminary Concepts • Finite Population: N units having values Y1, Y2, …, YN • Parameter: A function of the population values Examples: • Population Mean =  =  Yi /N • Population SD =  =  (Yi - )2/N • Population Proportion = P Simple Random Sampling With Replacement (SRSWR) • • • • • • • n units to draw from N units Unit drawn is returned before next draw All possible choices are equally likely Nn possible samples of size n each Each sample has probability 1/Nn Same unit may repeat in the same sample Values of sampled units are random variables ! SRSWR Denote the sample values as y1, y2, …, yn. Consider y1 (the first sample value) This could be any one of the N values of the population y1 takes each of the values Y1, Y2, …, YN with probability 1/N. Thus P(y1 = Y1) = P(y1 = Y2 ) = …. = P( y1 = YN ) = 1/N. SRSWR What about y2? Sampling done with replacement; Composition of the population unchanged; Second sample value y2 is identically distributed as y1 True for all subsequent sample values • Sample Values are identically distributed P(yi = Y1) = P(yi = Y2 ) = …. = P( yi= YN ) = 1/N, for all i = 1, 2, …n. SRSWR Are the sample values independent? P( yi = Y1 and yj = Y2) = 1/N2 P( yi = Y1) = 1/N & P( yj = Y2) = 1/N  yi and yj are independent True for all pairs of values • Sample Values are identically distributed • Independent and identically distributed (IID) random variables SRS Without Replacement (SRSWOR) • n units to draw at random from N units • Unit once drawn is not returned before drawing the next unit • All possible choices are equally likely • NCn possible samples of size n each • Each sample has probability 1/NCn • Units in a sample are all distinct • Values of sampled units are random variables ! SRSWOR Are the sample units still identically distributed ? For y1 the distribution is same as SRSWR What about y2 ? P(y2= Y1 | y1 = Yi) = 0 if i = 1; = 1/(N-1) otherwise P(y2= Y1) = (1/N).0 + (N-1). (I/N).(I/(N-1)) = 1/N, same as in SRSWR ! • Yes; units are identically distributed SRSWOR • • Are the sample units still independent? P( y1 = Y1, y2 = Y2) = 1/N(N-1), but P(y1 = Y1) = 1/N = P(y2 = Y2) Y1 and y2 are not independent True for all sample values No - Sample units are not independent What about their dependence? SRSWOR Are the sampled units uncorrelated? • No; Covariance between any two of them is - 2/(N-1); What is a Statistic? • A function of the sample values; Examples • Sample Mean • Sample SD • Sample Proportion SRSWOR • • • • • • A Statistic is a Random Variable Probability Distribution of a Statistic – Called Sampling Distribution Mean of a Statistic – Called Expectation SD of a Statistic – Called Standard Error (SE) Role of SE – compares efficacy of different sampling procedures Smaller the SE, better is the sampling Sampling Distribution of Sample Mean in Simple Random Sampling • • • • • • • • Finite Population of size N Population mean =  and SD =  Random Sample of size n drawn (WR/WOR) Statistic is Sample Mean Expectation =  (both SRSWR and SRSWOR) SE = /n for SRSWR SE = (/n).( FPC) for SRSWOR FPC = Finite Population Correction = (N-n)/(N-1) Comparing SRSWR and SRSWOR • For n =1, FPC = 1, so SRSWR and SRSWOR are equivalent • For n > 1, FPC < 1, so SRSWOR is better than SRSWR • Limiting Behaviour: As N becomes large with n fixed, both sampling methods are asymptotically equivalent ---- Intuitively Obvious ! Can we use SRS always? • SRS is too fair ! • Ignores typical composition of a population Example: Suppose the Population is characterized by sex – males and females • N1 males and N2 females in the population • N1 ‘too large’ compared to N2, say at least 80% are males • Will an SRS be representative here? Drawback of SRS • • • • Possible not; most likely sample will have too few females; may be none at all ! Not a representative sample, at least for a social survey Need representations of all sections of the society How can we ensure that? Divide the population into several parts! Stratified Random Sampling Population has N units: N1 of first type (males), N2 = N – N1 of a second type (females) Total Sample Size = n • Divide n into two parts, n1 and n2 • Draw n1 samples from N1 units • Independently draw n2 samples from N2 units • Use SRS for drawing the units from the subpopulations (strata) • Combine the two sub-samples to get a Stratified Random Sample of size n = n1 + n2 How to Choose n1 and n2 ? Proportional Allocation: • Choose n1 and n2 proportional to the subpopulation sizes N1 and N2 • n1 = (n/N).N1 & n2 = (n/N).N2 Optimum Allocation: • Choose n1 and n2 proportional to the subpopulation SD’s 1 and 2 Systematic Sampling • Units are arranged in a sequence • N = n.k; numbered 1 – N; sample size = n • Divide the population into n groups of k consecutive units each • Select one unit at random from the first group with units 1 – k • Select every k-th unit thereafter • k possible samples; probability of each=1/k • Gives a sample uniformly spread over the population Central Limit Theorem • • • • • • Sampling from a normal population Mean =  and SD =  SRS of size = n (With Replacement) Statistic = Sample Mean Expectation = ; SE = /n Z = (Sample Mean - )/(/n) is N(0, 1) What happens if sampling is done from a nonnormal distribution? Distribution of sample mean is no longer normal though formulae for Expectation & SE are still true Can we say anything more? Yes, provided the sample size n is ‘large’ ! How large is ‘large’ ? n  30 will do !! What happens if n is ‘large’ ? • Distribution of sample mean is still normal, but only approximately • Approximation is better and better as n becomes larger and larger • Always true regardless of the underlying distribution from which sampling is done  Central Limit Theorem Multistage Sampling Methodologies • Generally used to counter presence of nuisance parameter (unknown) • Used in situations where the optimal sample size required is not known a-priori • Sampling done in two or more stages • First Stage: Select a ‘small’ sample m • Use this pilot sample to get an estimate E of the unknown sample size • STOP if E is less than m • Second Stage: Select a second sample of size E – m otherwise Some Standard Sampling Distributions 1. Chi-Square Distribution • n IID N(0, 1) variables: Z1, Z2, …, Zn • Y = Sum of Squares of Z1, Z2, …, Zn = Z21+ Z22 + … + Z2n • Y is Chi-Square with n degrees of freedom (d.f) • Mean = n; SD = 2n • (Y – n)/ 2n is Standard Normal for large n • Distribution is positively skewed; probability table available Some Standard Sampling Distributions 2. t – distribution • Z is N(0, 1) • Y is Chi-Square with d.f = n • Z and Y are independently distributed • Sampling Distribution of t = Z/(Y/n) is called the t-distribution with d.f = n • Similar to N(0, 1); Approaches N(0,1) as sample size n is large ( n  30); Probability tables for n < 30 available Some Standard Sampling Distributions 3. F – distribution • Y1 is Chi-Square with d.f = n1 • Y2 is Chi-Square with d.f = n2 • Y1 and Y2 are independently distributed • Sampling distribution of F = (Y1/n1)/(Y2/n2) is F distribution with d.f = (n1, n2)  Useful for Hypothesis-Testing problems when we have samples available from a normal population (exact or approximate) References Text Book for the Course • Statistical Methods in Business and Social Sciences: Shenoy, G.V. & Pant, M. (Macmillan India Limited) Suggested Reading • Applications of Sequential Methodologies: Mukhopadhyay, Nitis, Datta, Sujay & Chattopadhyay, Saibal. (Marcel Dekker, New York, 2004).

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Sampling Theory