* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Determining Sample Size
Survey
Document related concepts
Transcript
Essentials of Marketing Research Chapter 13: Determining Sample Size WHAT DO STATISTICS MEAN? • DESCRIPTIVE STATISTICS – NUMBER OF PEOPLE – TRENDS IN EMPLOYMENT – DATA • INFERENTIAL STATISTICS – MAKE AN INFERENCE ABOUT A POPULATION FROM A SAMPLE POPULATION PARAMETER VERSUS SAMPLE STATISTICS POPULATION PARAMETER • VARIABLES IN A POPULATION • MEASURED CHARACTERISTICS OF A POPULATION • GREEK LOWER-CASE LETTERS AS NOTATION, e.g. m, s, etc. SAMPLE STATISTICS • VARIABLES IN A SAMPLE • MEASURES COMPUTED FROM SAMPLE DATA • ENGLISH LETTERS FOR NOTATION – e.g., X or S MAKING DATA USABLE • Data must be organized into: – FREQUENCY DISTRIBUTIONS – PROPORTIONS – CENTRAL TENDENCY • MEAN, MEDIAN, MODE – MEASURES OF DISPERSION • range, deviation, standard deviation, variance Frequency Distribution of Deposits Amount Frequency Percent Probability Under $3,000 499 16 .16 $3,000-$4,999 530 17 .17 $5,000-$9,999 562 18 .18 $10,000$14,999 718 23 .23 $15,000 or more 811 26 .26 Total 100 1 3,120 MEASURES OF CENTRAL TENDENCY • MEAN - ARITHMETIC AVERAGE • MEDIAN - MIDPOINT OF THE DISTRIBUTION • MODE - THE VALUE THAT OCCURS MOST OFTEN Number of Sales Calls Per Day by Salespersons Salesperson Mike Patty Billie Bob John Frank Chuck Samantha Number of Sales calls 4 3 2 5 3 3 1 5 26 Sales for Products A and B, Both Average 200 Product A 196 198 199 199 200 200 200 201 201 201 202 202 Product B 150 160 176 181 192 200 201 202 213 224 240 261 MEASURES OF DISPERSION • THE RANGE • STANDARD DEVIATION Low Dispersion Versus High Dispersion 5 4 Low Dispersion 3 2 1 150 160 170 180 190 200 Value on Variable 210 5 4 High dispersion 3 2 1 150 160 170 180 190 Value on Variable 200 210 Standard Deviation 2 2 S= S = (X - X) n - 1 THE NORMAL DISTRIBUTION • NORMAL CURVE • BELL-SHAPED • ALMOST ALL OF ITS VALUES ARE WITHIN PLUS OR MINUS 3 STANDARD DEVIATIONS • I.Q. IS AN EXAMPLE NORMAL DISTRIBUTION MEAN Normal Distribution 13.59% 2.14% 34.13% 34.13% 13.59% 2.14% An example of the distribution of Intelligence Quotient (IQ) scores 13.59% 34.13% 13.59% 34.13% 2.14% 2.14% 70 85 100 IQ 115 130 STANDARDIZED NORMAL DISTRIBUTION • SYMMETRICAL ABOUT ITS MEAN • MEAN IDENTIFIES HIGHEST POINT • INFINITE NUMBER OF CASES - A CONTINUOUS DISTRIBUTION • AREA UNDER CURVE HAS A PROBABILITY DENSITY = 1.0 • MEAN OF ZERO, STANDARD DEVIATION OF 1 A STANDARDIZED NORMAL CURVE -2 -1 0 1 2 STANDARDIZED SCORES •POPULATION DISTRIBUTION •SAMPLE DISTRIBUTION •SAMPLING DISTRIBUTION POPULATION DISTRIBUTION -s m s x SAMPLE DISTRIBUTION _ C S X SAMPLING DISTRIBUTION µX SX C STANDARD ERROR OF THE MEAN STANDARD DEVIATION OF THE SAMPLING DISTRIBUTION CENTRAL LIMIT THEOREM PARAMETER ESTIMATES • POINT ESTIMATES • CONFIDENCE INTERVAL ESTIMATES RANDOM SAMPLING ERROR AND SAMPLE SIZE ARE RELATED SAMPLE SIZE • VARIANCE (STANDARD DEVIATION) • MAGNITUDE OF ERROR • CONFIDENCE LEVEL Determining Sample Size Recap Sample Accuracy • How close the sample’s profile is to the true population’s profile • Sample size is not related to representativeness, • Sample size is related to accuracy Methods of Determining Sample Size • Compromise between what is theoretically perfect and what is practically feasible. • Remember, the larger the sample size, the more costly the research. • Why sample one more person than necessary? Methods of Determining Sample Size • Arbitrary – Rule of Thumb (ex. A sample should be at least 5% of the population to be accurate – Not efficient or economical • Conventional – Follows that there is some “convention” or number believed to be the right size – Easy to apply, but can end up with too small or too large of a sample Methods of Determining Sample Size • Cost Basis – based on budgetary constraints • Statistical Analysis – certain statistical techniques require certain number of respondents • Confidence Interval – theoretically the most correct method Notion of Variability Little variability Great variability Mean Notion of Variability • Standard Deviation – approximates the average distance away from the mean for all respondents to a specific question – indicates amount of variability in sample – ex. compare a standard deviation of 500 and 1000, which exhibits more variability? Measures of Variability • Standard Deviation: indicates the degree of variation or diversity in the values in such as way as to be translatable into a normal curve distribution • Variance = (x-x)2/ (n-1) • With a normali curve, the midpoint (apex) of the curve is also the mean and exactly 50% of the distribution lies on either side of the mean. Normal Curve and Standard Deviation Number of standard deviations from the mean +/- 1.00 st dev Percent of area under the curve Percent of area to the right or left 68% 16% +/- 1.64 st dev 90% 5% +/- 1.96 st dev 95% 2.5% +/- 2.58 st dev 99% 0.5% Notion of Sampling Distribution • The sampling distribution refers to what would be found if the researcher could take many, many independent samples • The means for all of the samples should align themselves in a normal bell-shaped curve • Therefore, it is a high probability that any given sample result will be close to but not exactly to the population mean. Normal, bell-shaped curve Midpoint (mean) Notion of Confidence Interval • A confidence interval defines endpoints based on knowledge of the area under a bell-shaped curve. • Normal curve – 1.96 times the standard deviation theoretically defines 95% of the population – 2.58 times the standard deviation theoretically defines 99% of the population Notion of Confidence Interval • Example – Mean = 12,000 miles – Standard Deviation = 3000 miles • We are confident that 95% of the respondents’ answers fall between 6,120 and 17,880 miles 12,000 + (1.96 * 3000) = 17,880 12,000 - (1.96 * 3000) = 6.120 Notion of Standard Error of a Mean • Standard error is an indication of how far away from the true population value a typical sample result is expected to fall. • Formula – S X = s / (square root of n) – S p = Square root of {(p*q)/ n} • • • • • where S p is the standard error of the percentage p = % found in the sample and q = (100-p) S X is the standard error of the mean s = standard deviation of the sample n = sample size Computing Sample Size Using The Confidence Interval Approach • To compute sample size, three factors need to be considered: – amount of variability believed to be in the population – desired accuracy – level of confidence required in your estimates of the population values Determining Sample Size Using a Mean • Formula: n = (pqz2)/e2 • Formula: n = (s2z2)/e2 • Where – n = sample size – z = level of confidence (indicated by the number of standard errors associated with it) – s = variability indicated by an estimated standard deviation – p = estimated variability in the population – q = (100-p) – e = acceptable error in the sample estimate of the population Determining Sample Size Using a Mean: An Example • 95% level of confidence (1.96) • Standard deviation of 100 (from previous studies) • Desired precision is 10 (+ or -) • Therefore n = 384 – (1002 * 1.962) / 102 Practical Considerations in Sample Size Determination • How to estimate variability in the population – prior research – experience – intuition • How to determine amount of precision desired – small samples are less accurate – how much error can you live with? Practical Considerations in Sample Size Determination • How to calculate the level of confidence desired – risk – normally use either 95% or 99% Determining Sample Size • Higher n (sample size) needed when: – the standard error of the estimate is high (population has more variability in the sampling distribution of the test statistic) – higher precision (low degree of error) is needed (i.e., it is important to have a very precise estimate) – higher level of confidence is required • Constraints: cost and access Notes About Sample Size • Population size does not determine sample size. • What most directly affects sample size is the variability of the characteristic in the population. – Example: if all population elements have the same value of a characteristic, then we only need a sample of one!