Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
SAMPLING DESIGN AND PROCEDURES Sampling Terminology Sample A subset, or some part, of a larger population. Population (universe) Any complete group of entities that share some common set of characteristics. Population Element An individual member of a population. Census An investigation of all the individual elements that make up a population. Sample Survey A survey which is carried out using a sampling method, i.e., in which a portion only, and not the whole population, is surveyed. One of the units into which an aggregate is divided for the purposes of sampling, each unit being regarded as individual and indivisible when the selection is made. The definition of unit may be made on some natural basis, for example, households, persons etc. PARAMETER & STATISTIC PARAMETER(S): population A characteristic of a STATISTIC(S):A characteristic of a sample (estimation of a parameter from a statistic is the prime objective of sampling analysis). A list, map or other specification of the units which constitute the available information relating to the population designated for a particular sampling scheme. There is corresponding to each state of sampling in a multi-stage sampling scheme. The frame may or may not contain information about the size or other supplementary information of the units, but it should have enough details so that a unit, if included in the sample, may be located and taken up for inquiry. that part of the difference between a population value and an estimate thereof, derived from a random sample, which is due to the fact that only a sample of values is observed; as distinct from errors due to imperfect selection, bias in response or estimation, errors of observation and recording, etc. the totality of sampling errors in all possible samples of the same size generates the sampling distribution of the statistic which is being used to estimate the parent value. Why Sample? Budget and time constraints. Limited access to total population. Accurate and Reliable Results Destruction of Test Units Sampling reduces the costs of research in finite populations. Sample Vs. Census Type of Study Conditions Favoring the Use of Sample Census 1. Budget Small Large 2. Time available Short Long 3. Population size Large Small 4. Variance in the characteristic Small Large 5. Cost of sampling errors Low High 6. Cost of nonsampling errors High Low 7. Nature of measurement Destructive Nondestructive 8. Attention to individual cases Yes No Sampling Techniques Nonprobability Sampling Techniques Convenience Sampling Simple Random Sampling Judgmental Sampling Systematic Sampling Probability Sampling Techniques Quota Sampling Stratified Sampling Snowball Sampling Cluster Sampling Other Sampling Techniques Convenience sampling attempts to obtain a sample of convenient elements. Often, respondents are selected because they happen to be in the right place at the right time. • use of students, and members of social organizations • mall intercept interviews without qualifying the respondents • department stores using charge account lists • “people on the street” interviews A B C D E 1 6 11 16 21 2 7 12 17 22 3 8 13 18 23 4 9 14 19 24 5 10 15 20 25 Group D happens to assemble at a convenient time and place. So all the elements in this Group are selected. The resulting sample consists of elements 16, 17, 18, 19 and 20. Note, no elements are selected from group A, B, C and E. Judgmental sampling is a form of convenience sampling in which the population elements are selected based on the judgment of the researcher. test markets purchase engineers selected in industrial marketing research bellwether precincts selected in voting behavior research expert witnesses used in court A B C D E 1 6 11 16 21 2 7 12 17 22 3 8 13 18 23 4 9 14 19 24 5 10 15 20 25 The researcher considers groups B, C and E to be typical and convenient. Within each of these groups one or two elements are selected based on typicality and convenience. The resulting sample consists of elements 8, 10, 11, 13, and 24. Note, no elements are selected from groups A and D. Quota sampling may be viewed as two-stage restricted judgmental sampling. The first stage consists of developing control categories, or quotas, of population elements. In the second stage, sample elements are selected based on convenience or judgment. Control Characteristic Gender Male Female Population composition Sample composition Percentage Percentage Number 48 52 ____ 100 48 52 ____ 100 480 520 ____ 1000 A B C D E 1 6 11 16 21 2 7 12 17 22 3 8 13 18 23 4 9 14 19 24 5 10 15 20 25 A quota of one element from each group, A to E, is imposed. Within each group, one element is selected based on judgment or convenience. The resulting sample consists of elements 3, 6, 13, 20 and 22. Note, one element is selected from each column or group. o o o Each element in the population has a known and equal probability of selection. Each possible sample of a given size (n) has a known and equal probability of being the sample actually selected. This implies that every element is selected independently of every other element. A B C D E 1 6 11 16 21 2 7 12 17 22 3 8 13 18 23 4 9 14 19 24 5 10 15 20 25 Select five random numbers from 1 to 25. The resulting sample consists of population elements 3, 7, 9, 16, and 24. Note, there is no element from Group C. The sample is chosen by selecting a random starting point and then picking every ith element in succession from the sampling frame. The sampling interval, i, is determined by dividing the population size N by the sample size n and rounding to the nearest integer. When the ordering of the elements is related to the characteristic of interest, systematic sampling increases the representativeness of the sample. If the ordering of the elements produces a cyclical pattern, systematic sampling may decrease the representativeness of the sample. For example, there are 100,000 elements in the population and a sample of 1,000 is desired. In this case the sampling interval, i, is 100. A random number between 1 and 100 is selected. If, for example, this number is 23, the sample consists of elements 23, 123, 223, 323, 423, 523, and so on. A B C D E 1 6 11 16 21 2 7 12 17 22 3 8 13 18 23 4 9 14 19 24 5 10 15 20 25 Select a random number between 1 to 5, say 2. The resulting sample consists of population 2, (2+5=) 7, (2+5x2=) 12, (2+5x3=)17, and (2+5x4=) 22. Note, all the elements are selected from a single row. A two-step process in which the population is partitioned into subpopulations, or strata. The strata should be mutually exclusive and collectively exhaustive in that every population element should be assigned to one and only one stratum and no population elements should be omitted. Next, elements are selected from each stratum by a random procedure, usually SRS. A major objective of stratified sampling is to increase precision without increasing cost. The elements within a stratum should be as homogeneous as possible, but the elements in different strata should be as heterogeneous as possible. The stratification variables should also be closely related to the characteristic of interest. Finally, the variables should decrease the cost of the stratification process by being easy to measure and apply. A B C D E 1 6 11 16 21 2 7 12 17 22 3 8 13 18 23 4 9 14 19 24 5 10 15 20 25 Randomly select a number from 1 to 5 for each stratum, A to E. The resulting sample consists of population elements 4, 7, 13, 19 and 21. Note, one element is selected from each column. HYPOTHESIS …??? is formally stated expectation about how a behavior operates. … is a proposition that a researcher wants to verify. A hypothesis is an assumption about the population parameter. • Formulate a Null Hypothesis (H0). • Formulate an Alternative Hypothesis (H1) • Select a suitable Test Statistic • Specify a Level of Significance (a) • Define a suitable Decision Criterion based on a and Test Statistic • Make necessary Assumptions if required • Experiment and Calculation of Test Statistic • Conclusion or Decision Central Limit Theorem As the sample size gets large enough…the sampling distribution becomes almost normal regardless of shape of population The Null Hypothesis, H0 • It is a statement about the hypothesized value of population parameter. • States the Assumption (numerical) to be tested for possible rejection under the assumption that the null hypothesis is TRUE. The average sale of showroom is at least 3.0 lakh (H0: μ≥ 3.0) •Always contains the ‘ = ‘ sign The Alternative Hypothesis, H1 e.g. The average sale of a showroom is less than 3.0 (H1: μ < 3.0) Is the opposite of the null hypothesis Never contains the ‘=‘ sign The Alternative Hypothesis may or may not be accepted Is generally the hypothesis that is believed to be true by the researcher Level of Significance, a • Defines Unlikely Values of Sample Statistic if Null Hypothesis Is True. • If we assume that hypothesis is correct , then the significance level will indicate the percentage of sample statistics is outside certain limits. 0 Typical values are 0.01, 0.05, 0.10 Level of Significance, a and the Rejection Region a H0: m 3 H1: m < 3 Rejection Regions Critical Value(s) 0 a H0: m 3 H1: m > 3 0 H0: m = 3 H1: m 3 0 a/2 One-Tailed Hypothesis Test The term one-tailed signifies that all values that would cause to reject H0, are in just one tail of the sampling distribution Two-Tailed Hypothesis Test Two-tailed test is one in which values of the test statistic leading to rejectioin of the null hypothesis fall in both tails of the sampling distribution curve Summary of Errors Involved in Real State Inference Based on Sample Data Testing ofHypothesis Affairs H0 is Accepted Correct decision H0 is True Confidence level = 1- a H0 is False H0 is Rejected Type I error Significance level=a* Correct decision Type II error P (Type II error) = Power = 1- *Term a represents the maximum probability of committing a Type I error a & Have an Inverse Relationship Reduce probability of one error and the other one goes up. a How to choose between Type I and Type II errors Reworking cost is low----Type I error Reworking cost is high---Type II TOSH of means when the population Standard deviation is known H0: m = < > m0 vs. HA: m ≠ > < m0 Zcalc = (X - m0)/(/ n) 0 Example Bajaj Company claims that the length of life of its electric bulb is 1000 hours with standard deviation of 30 hours. A random sample of 25 checked an average life of 960 hours. At 5 % level of significance can we conclude that the sample has come from a population with mean life of 1000 hours? Table value = 1.96 t –test, Standard deviation is unknown and small sample H0: m = < > m0 vs. HA: m > < m0 Testing a Hypothesis About a Mean; We Do Not Know Which Must be Estimated by S.. Calculate tcalc = (X - m0)/(s/ n ) Example The weight of a canned food product is specified as 500 grm. For a sample of 8 cans the weight were observed as 480, 475, 510, 500, 505, 495, 504 and 515 grm. Test at 5% level of significance, whether on an average the weight is as per specification. Table value = 2.365 Two independent samples were collected. For the first sample of 42 items, the mean was 32.3 and the variance 9. The second sample of 57 items had a mean of 34 and a variance of 16. Using 0.05level of significance, test whether there is sufficient evidence to show the second population has a larger mean. H0: m1 = < > m2 vs. HA: m1 ≠ > < m2 n1 = ______, n2=______ a = _______ Testing a Hypothesis About two Mean; Process Performance Measure is Approximately Normally Distributed; We “Know” 1 & 2 Therefore this is a “Z-test” - Use the Normal Distribution. Calculate test statistic Zcalc = (x1 - x 2) - (m1 - m2 ) ------------------------------ 12/n1 + 22/n2 DR: (≠ in HA) Reject H0 in favor of HA if Zcalc < -Za/2 or if Zcalc > +Za/2. Otherwise, FTR H0. DR: (> in HA) Reject H0 in favor of HA iff Zcalc > +Za . Otherwise, FTR H0. DR: (< in HA) Reject H0 in favor of HA iff Zcalc < -Za. Otherwise, FTR H0. Z-test to test two population mean(m1 & m2)When population standard deviation is unknown & n is large H0: m1 = < > m2 vs. HA: m1 ≠ > < m2 n1 = ______, n2=______ a = _______ Testing a Hypothesis About two Mean; Process Performance Measure is Approximately Normally Distributed; We “Know” S1 & S2 Therefore this is a “Z-test” - Use the Normal Distribution. Calculate test statistic Zcalc = (x1 - x 2) - (m1 - m2 ) ------------------------------ S1 2/n1 + S1 2/n2 t-test ,To test two population mean n H0: m1 = < > m2 vs. HA: m1 > < m2 n n = _______ a = _______ • Testing a Hypothesis About a Mean; • Process Performance Measure is Approximately Normally Distributed or We Have a “small” Samples; • We Do Not Know Which Must be Estimated by S. • Therefore this is a “t-test” - Use Student’s T Distribution. (x1 - x2) - (m1 - m2 ) t = ------------------------s* ( 1/n1 + 1/n2 ) Calculate with d.f. = n1 + n2 - 2. In this expression, s* is the pooled standard deviation, given by s2 = [ (n1 – 1)s12 + (n2 – 1)s22 ] / (n1+n2-2) Paired Samples The difference in these cases is examined by a paired samples t test. To compute t for paired samples, the paired difference variable, denoted by D, is formed and its mean and variance calculated. Then the t statistic is computed. The degrees of freedom are n - 1, where n is the number of pairs. The relevant formulas are: H0: m D = 0 H1: m D 0 tn-1 = D - mD sD n The difference in these cases is examined by a paired samples t test. To compute t for paired samples, the paired difference variable, denoted by D, is formed and its mean and variance calculated. Then the t statistic is computed. The degrees of freedom are n - 1, where n is the number of pairs. The relevant formulas are: H0: m D = 0 H1: m D 0 tn-1 = D - mD sD n Cross-Tabulations: Chi-square Test Technique used for determining whether there is a statistically significant relationship between two categorical (nominal or ordinal) variables Telecommunications Company Marketing manager of a telecommunications company is reviewing the results of a study of potential users of a new cell phone Random sample of 200 respondents A cross-tabulation of data on whether target consumers would buy the phone (Yes or No) and whether the cell phone had Bluetooth wireless technology (Yes or No) Question Can the marketing manager infer that an association exists between Bluetooth technology and buying the cell phone? Two-Way Tabulation of Bluetooth Technology and Whether Customers Would Buy Cell Phone Cross Tabulations -Hypotheses H0: There is no association between wireless technology and buying the cell phone (the two variables are independent of each other). Ha: There is some association between the Bluetooth feature and buying the cell phone (the two variables are not independent of each other). Conducting the Test Test involves comparing the actual, or observed, cell frequencies in the crosstabulation with a corresponding set of expected cell frequencies (Eij) Expected Values Eij = ninj ----n Where ni and nj are the marginal frequencies, that is, the total number of sample units in category i of the row variable and category j of the column variable, respectively Computing Expected Values The expected frequency for the first-row, firstcolumn cell is given by 100 100 E11 = ------------ = 50 200 Observed and Expected Cell Frequencies Chi-square Test Statistic c (Oij - Eij)2 ----------------- i=1 j=1 Eij r 2 = = 72.00 Where r and c are the number of rows and columns, respectively, in the contingency table. The number of degrees of freedom associated with this chi-square statistic are given by the product (r - 1)(c - 1). Chi-square Test Statistic in a Contingency Test For d.f. = 1, Assuming a =.05, from Appendix 2, the critical chi-square value (2c) = 3.84. Decision rule is: “Reject H0 if 2 3.84.” Computed 2 = 72.00 Since the computed Chi-square value is greater than the critical value of 3.84, reject H0. The apparent relationship between “Bluetooth technology"and "would buy the cellular phone" revealed by the sample data is unlikely to have occurred because of chance EXAMPLE In a management institute, the A+, A and B grades allocated to students in there final examination, were as follows. Using 5% level of significance, determine whether the grading scale is independent of the specialization. Table value = 9.488 Specialization Grade Finance Marketing Operations A+ A 20 25 10 15 20 08 05 15 B 07 Univariate Hypothesis: Papa John’s restaurants are more likely to be located in a stand-alone location or in a shopping center. Bivariate Hypothesis: Stand-alone locations are more likely to be profitable than are shopping center locations.