Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
第八章 确定样本计划 和样本容量 样本和抽样的基本概念 总体(population) 样本(sample) 样本单位(sample unit) 普查(census) 抽样误差(sampling error) Defining the Population The entire group under study as specified by the research objectives Supermarkets that are part of chains (Safeway, Publix, Woodman’s) located in Wisconsin. Persons in charge of household financial planning located in Pima County, Arizona Sample and Sample Unit A sample is a subset of the population that should represent the entire group. A sample is said to be “representative” if, in fact, it represents the entire population. A sample unit is the basic level of investigation…a college student, a housewife, a purchasing agent, a supermarket, a bank, etc. Census A census is defined as an accounting of the complete population. Census is taken every 10 years It is very difficult to take a census when the population is large and/or not easily accessed. Sampling Error Sampling error is any error that occurs because a sample is used. Sampling error is caused by two factors: 1. Method of sample selection (the PLAN) and 2. The size of the sample. If the average height in this class is really 5’9” and a sample of the people on the first row shows their average height to be 5’6”, we have sampling error of 3”. Sample Frame The sample frame is a master list of all the sample units in the population. If the population were defined as all chain supermarkets in Wisconsin, then a list containing all such stores would be the sample frame. Sometimes a physical list is not possible. The sample frame for a mall intercept survey would be all shoppers who were walking through the mall on the days data were collected. 样本和抽样的基本概念 总体(population):是指根据调研计划的目的 所规定的研究整体。 样本(sample):是能代表整体的总体的子集。 样本单位(sample unit):是调研中最基本的被 调查对象。 普查(census):一种完整总体的说明。 抽样误差(sampling error):因调查中使用样本 产生的误差。 12.2 Reasons for Taking a Sample Taking a census can be very costly if the population is large and/or hard to access. Data analysis with hundreds of thousands of observations, even with today’s computers, is cumbersome. We shall learn that samples, some surprisingly small, can produce very precise estimates of the population values. 两种基本的抽样方法 概率抽样(probability samples):即总体 的成员都有一个被选为样本的已知概率。 非概率抽样(nonprobability samples): 即总体中的成员选为样本的概率是未知 的。 Two Basic Sampling Methods Probability Samples…are ones in which members of the population have a known chance (probability) of being selected into the sample. Nonprobability Samples…are those in which the chances of selecting members from the population into the sample are unknown. Examples... If there are 10,000 students in this university and we took a sample of 100 of them drawn from a “frame” provided to us from the Registrar’s office, would we know the probability of drawing a member of the population into our sample? Yes. 100/10000=.01 or 1%. Therefore, this would be a probability sample. Examples, continued What if we took the members of this class as our sample to represent all students in the university? Lets assume there are 30 students in the class. Can you calculate the chance (probability) that ALL students in university had to be a part of our sample? No. The 30 members of this class have a probability of 1.00 but all other students have a probability of 0! Therefore, we have a nonprobability sample. Why is it important to distinguish between probability and nonprobability sampling methods? It is only through probability samples that one may assess the preciseness with which the sample values represent the population values. 概率抽样 简单随机抽样(simple random sampling); 系统抽样(systematic sampling); 整群抽样(cluster sampling); 分层抽样(stratified sampling): 成比例分层抽样(proportionate stratified sampling) 不成比例分层抽样(disproportionate stratified sampling) 非概率抽样 便利抽样(convenience samples); 判断抽样(judgment samples); 推荐抽样(referral samples, snowball samples); 定额抽样(quota samples)。 12.3 Two Basics Sampling Methods Sampling methods Probability samples Systemati c Stratified Cluster Simple random Nonprobabilit y samples Convenienc e Judgment Snowball Quota Simple Random Sampling Define the population Develop a frame that has all of the population members included Use some nonbiased (random) method of drawing from the frame, the sample units To draw may use “blind draw” or table of random numbers Advantages/Disadvantages of SRS +Can be simple (under certain conditions) +Does give you a probability sample - Must predesignate (number,label) every member of the population as listed on the frame -Requires complete listings on the frame and they may not be available in all situations Systematic Sampling Define the population Obtain a list of all population members as the frame Estimate the number of members of the population and divide by the sample size: Skip Interval =Population list size Sample size Take a random start on the frame and draw a sample unit, skip the interval, draw another, continue Systematic Sampling + One of the most prevalent types of sampling used in place of SRS because it has “Economic Efficiency” over SRS because it can be conducted more easily and in a shorter time period than SRS + Provides a probability sample - May not be as precise as SRS; care must be taken that there are no “periodicities” in the frame. PROBABILITY SAMPLING METHODS Cluster Sampling…population is divided into subgroups and a sample of subgroups are drawn Area Sampling: One Step--select one group and take census Two Step Approach--randomly select several groups and sample from each Stratified Sampling…presence of subgroups that may differ Strata Weighted Mean=to get a total mean Proportionate/Disproportionate Sample NON-PROBABILITY SAMPLING METHODS Convenience Sampling Judgment Sampling Referral Sampling Quota Sampling Nonprobability Sampling Methods Convenience Samples Easy to collect Judgement Samples Based on judgmental selection criteria Quota Samples Demographic characteristics in the same proportion as in the population Snowball Samples Additional respondents selected on referral from initial respondents. 设计样本计划(sample plan) 定义有关的总体; 获得总体的名单; 设计样本方案(大小和方法); 接近总体; 抽取样本 替换方案: 顺移替换(drop-down substitution) 超量抽样(oversampling) 重新抽样 (resampling) 样本证实(sample validiation); 若有需要,重新抽样。 与样本容量有关的概念与要点 样本容量(sample size)与样本精确度 (sample accuracy) 样本容量与样本对于总体的代表性 (representativeness)无关 样本容量可以影响结果的精确度 Sample Representativeness v. Sample Accuracy How representative a sample of the population is determined by the sampling method or plan . Sample accuracy refers to how close the statistic, generated from the sample data, is to the true population value the statistic is estimating. Sample accuracy is related to the size of the sample. 确定样本容量的方法 教条式方法(arbitrary approach) (10% of the population) 约定式方法(conventional approach) (What others have done ) 成本基础法(cost basis approach) (How many does budget allow?) 统计分析法(statistical analysis approach) (Adequate n for subgroup analysis) 置信区间法(confidence interval approach) (Allows us to predetermine how precise our estimates are) 置信区间法是运用差异性 置信区间、样本分布以及平均 数标准误差或百分率标准误差 等概念来创建一个有效的样本。 几个相关概念 差异性(variability):是指受访者对某一特 定问题的答案在相异性方面的总结。 置信区间(confidence interval) 抽样分布(sampling distribution) 平均数标准误差(standard error of the mean)或者百分率标准误差(standard error of a percentage) Variability: the amount of dissimilarity (or similarity) in respondents answers to a question Nominal data: Do you prefer product A over product B? Yes or No The greatest variability is 50% yes; 50% no. We can conceptualize variability for nominal or ordinal data by examining bar charts. The more even the bars, the greater the variability. 1st Qtr High Variance 1st Qtr Low Variance Variability: the amount of dissimilarity (or similarity) in respondents answers to a question Interval or ratio data: On this scale from 1 to 5, how would you value a college degree? Or, how many miles do you drive in your personal car during a year? We can conceptualize variability for interval or ratio data by examining the distribution of scores. The flatter the distribution, the greater the variability. The more peaked, the less variability. Standard deviation is a measure of variability when data is either interval or ratio. Confidence Intervals: a range whose endpoints define a certain percentage of responses to a question. We know that +/- 1.96 z scores will contain 95% of all answers to a particular question. Therefore, it allows us to make statements of how confident we are that data will fall within a certain range (95% confidence=+/-1.96z; 99% confidence=+/-2.58z) Why Can We Apply our Knowledge of the Normal Curve? The Central Limits Theorem states that if we take many samples, of at least size 30, then the means from these samples will form a normal distribution. It is this THEORETICAL SAMPLING DISTRIBUTION (a NORMAL curve) that allows us to use statistical inference. When we use z =+/-1.96 we are capturing 95% of the total sample means in a distribution of possible sample means. This allows us to be 95% confident…95 times out of 100, our sample mean is not going to be too far removed from the population mean. Standard Error of a Mean (or Percentage) The standard error is a measure of how far away from the true population value a typical sample result is expected to fall. It is calculated by dividing a measure of variance by the sample size. How to Estimate Variability 1. Estimate based upon a former study of the same population. 2. Estimate based upon a pilot study of the population. 3.a.For a Percentage, set pq at 50 x 50. 3.b. For a mean, estimate the range and divide by 6. How to Set Accuracy (e=how precise we want our estimate to be of the population value) Managers, working with researchers, determine e. How precise do you need to be? +/-5%?? The more precise, the lower e. The less precise, the higher e. When estimating %, e is set as a percentage…e.g., 5. When estimating a mean, e is set in terms of the number of units being estimated, e.g., number of miles driven, number of Big Macs eaten, etc. How Do We Set Level of Confidence (z)? Decision made by management, with advice from researchers. Rules of Thumb: 95% level of confidence=1.96z 99% level of confidence=2.58z SAMPLE SIZE FORMULA FOR A PERCENTAGE 2 z (pq) n 2 e Where… n = the sample size z = standard error associated with the chosen level of confidence p = estimated variability in the population q = (100-p) e = acceptable error SAMPLE SIZE FORMULA FOR A MEAN n = s 2 z2 e2 Where… n = the sample size z = standard error associated with the chosen level of confidence s = variability indicated by an estimated standard deviation e = acceptable error SAMPLING FROM SMALL POPULATIONS If the sample exceeds 5% of the population size, use the Finite Multiplier to recalculate the sample as follows... Sample size = Sample size formula * Nn N1 The Logic of Sample Accuracy Sample Size and Survey Accuracy with p=50% and q=50% Sample Size and Accuracy Accuracy (+,-) 20% 15% 10% 5% 0% 50 200 350 500 650 800 950 1100 1250 1400 95% 13.9% 6.9% 5.2% 4.4% 3.8% 3.5% 3.2% 3.0% 2.8% 2.6% 99% 18.2% 9.1% 6.9% 5.8% 5.1% 4.6% 4.2% 3.9% 3.6% 3.4% Sam ple Size Sample Size Formula for a Percentage At 95% ( z = 1.96) p1200 n 10 1000 800 20 600 30 400 40 200 500 e=3 384 683683 896 384 384 246 216 1,024 138 1,067 20 10 e=4 216 1,024 896 384 576 504 504 396 323 576 60040 30 p e=5 1,067 138 246 600 323 387 396 387 50 e=3 e=4 e=5