Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
I I ! GENERALIZED PROGRAM FOR STRATIFICATION & STRATA DETERIORATION ANALYSIS - GPSSD Bonnie Brown Jacobson, Northeast Utilities Service Company Where: Most utilities use load research data collected from load research studies of various subgroups of their customer population. It is hoped that the results of these studies will yield accurate profiles of the demand patterns for these subgroups for use in ratemaking. forecasting and load management. ,t numb~r of slrata. t is the Student's t-value associated with the desired confidence interval. a~ is the population standard deviation for the ith stratum. This program is a generalized routine for the calculation of the required sample size needed to satisfy the confidence limits of 90% ± 10% and 95% ± 5% [or rive separate sampling designs all of which are currently utilized load research designs. Each design is further explored through the calculation of the approximate deterioration of data for each stratum. Summary tables are also generated for quick reference. e. is the per~entage of the mean for the ifh stratum relating to the fiducial limits. OPTIMUM ALLOCATION: This design requires a simple random sample to be drawn wi thin e"'c.h str",tum utilizing the most cost effective method for acquiring data relating to the population mean. Thus, the design yields statistically reliable data (for the chosen confidence limits) relating to the mean of the population. An estimate of r_eliability is made by the program for the individual stratum, but this reliability is for the most part much more conservative than the overall population mean reliability. The formula for the sample size is: The sample sizes required [ur Lhe appropriate operation of each of the following sample designs are automatically calculated in GPSSD, based on the assumption that the population size is large. OVERALL SRS: This design requires a simple random sample to be drawn from the total population without regard to strata boundaries. This would yield data relating to the means that ure statistically reliable (for the chosen confidence limits) for the total popUlation. No estimate of reliability can be made for any individual strata before the sample is chosen. The formula used for the sample size is: j n E i=1 Where: j is the total number of strata. N. is the population total within the ith shatum. Where: t is the Student'S t-value associated with the desired confidence interval. N is LIte total population. o. is the population standard deviation f6r the itb stratum. a is the standard deviation of the population. t is the Student's t-value associated with the desired confidence interval. e is the percentage of the mean relating to the fiducial limits. e is the percentage of the population mean relating to the fiducial limits. SRS WITHIN STRATA: SRS WITHIN STRATA-STABLE: This design requires a separate simple random sample to be drawn from each of the pre-assigned stratum. The design would yield data relating to the mean that are statistically reliable (for the chosen confidence limits) for the individual strata as well as a more rigorous reliability for the total population. The formula for the sample size is: This design is identical to the aforementioned SRS within strata. N is raised to 30 fur any stratum with an N of less than 30. This has been determined to be the least number of load research meters that allow a stable result. Thus, the sample yields statistically reliable data overall and per strata. j n OPTIMUM ALLOCATION-STABLE: = 1: i=1 Similar to the above design. this design is identical to the aforementioned optimum allocation. Again, N is substituted by 30 if it is calculated to be less than 30. This design ff ,t is the total 592 statement is required. yields statistically reliable data ovt:!rall and within strata (although somewhat more conservative than its SRS counterpart). THE SUBDES PROC SPECIFICATIONS: SUEDES variable; GPSSD is invoked by a PROe statement and controlled by the following other statements; The SUBDES statement identifies the variable name containing the subpopulation description. I f the SUBDESe statement is omit-ted. the subpopulation description is equal to the population description. PRoe GPSSD options; MEAN variable; STD variable; N variable; DESe variable; SUBDESC variable; STRDES variable; STATEME~T: THE STRDES STATEMENT: STRDES variable; THE PROe GPSSD STATEMENT: The STRDES statement identifies the variable name containing the stratification variable description. This statement is required. These options may appear in the PROe GPSSD statement: THE INPUT DATA SET: ThE> input data sP-t must contain hoth the t_otal population data and each of the population stratum data and optionally, the total subpopulation data and each of the subpopulation stratum data. The input data set must take the following general form: STRATA "" n or STRAT = n or S : n: specifies the number of strata desired for all sample designs, between 1 and 5. It the STRATA = option is omitted from the PROC statement, the default strata value is S. If more than 5 strata are required, run the PROe several times. DATA Line 1: (Total population) mean, standard deviation, size, label, stratification variable label, (optionally) Bubpopulation label. data_set: specifies the SAS data set containing the population parameters to be utilized in the sample design. If the DATA = option is omitted from the PROC statement, the most recently created SAS data set is used. See the section concerning the input data set for specific rl~t~ RPt Line 2: (Stratum 1) mean, standard deviation, Size, label, stratification variable label, (optionally) subpopu1ation label. Linl:! 2 is repeated (up Lo 4 times) ont:! line per stratum. This general population information may optionally be followed by any subpopulation information. Up to 4Y subpopulation data groups may follow. The form is the same as the population (i.e., first line is the general subpopulation data followed by lines of stratum data). All results from these groups will be weighted eRtimatp_H whi ch Himlllatp prohHh1 e co11 ec_tion data. The order of the information in each line is arbitrary. Any value found missing is treated as a "zero" and the computation of sample sizes proceeds accordingly. form~t. THE MEAN STATEMENT: MEAN variable; The MEAN statement identifies the variable name containing the population mean numbers. This statement is required. THE STD STATEMENT: srD variable; OUTPUT: The STD statement identifies the variable name containing the population standard deviation numbers. This statement is required. The first section of output indicates the required size of the sample for the overall SRS (90% + 10%, 95% + 5%, the SRS within strata (90% 10%, 95% 5%) and the optimum allocation (90% ± 10%,-95% ± 5%). The SRS within stratum and optimum allocation are further explored. Each stratum sample size is listed with the expected standard error of the mean and the standard error as a percentage of the mean [or l'ac\t stratified sample design. This information is also printed for the total sample for each stratified design. With this information, the analyst can determine the likely accuracy of the data to be collected from each design. + THE N STATEMENT: The N statement identifies the variable name containing the population size numbers. This statement is required. THE DESC STATEMENT: DEse variable; The DEse statement identifies the variable name containing the population description. This 593 + The second section repeats this stratified information (since the weight for each stratum with regard to the original stratified population is equal to 1). If, for any of the above stratified designs, the stratum sample size Is less than 30, it is raised to 30 and re-evaluated. These analyses yield sample sizes for the SKS within strata-stable (90% + 10%, 95% + 5%, and the optimum allocation stable (90% + 10%, 95% ± 5%) designs. - ACKNOWLEDGEMENTS I would like to thank the following people for their help in the development of both lhe PROC and this paper: Ms. Karen E. Gree_Iey. Mr. James D. Oleksiw, Mrs. Jean H. Ehle. REFERENCES Cochran, W. G., Sampling Techniques Third Edition, c. 1977, John Wiley & Sons, Inc., New York. New York. If additional sUbpopulation and/or stratification variable data cards are_ included in the input data, the sample design process is repeated. Since all evaluation is done in relation to the original strata boundaries, these analyses can at best be only estimates of the accuracy of the data to be collected. Kish, L., Survey Sampling, c. 1965, John Wiley & Sons, Inc., New York, New Yurko For more information, please contact Bonnie B. Jacobson, Consumer EconomiCS, Northeast Utilities Service Company, P. O. Box 270, Hartford, CT 06101 or call (203) 666-6911, Ext. 5030. The first He_cti on of the optional data analyses is unweighed. It is only intended to give the analyst an idea of the sample size required if the subpopulation or new variable were the original. The second section of the optional data analyses reflects the potential behavior of the suhpopulation and/or new variable within the original population-stratification variable framework. All sample sizes are weighted to simulate the true sample size applicable to the optional situation. The analyst should keep in mind that this information is an estimate of variable accuracy. Following the optional analyses is a summary table for each variable and population/subpopulation situation. Listed are the required sample sizes for each stratum within each sample design. Also listed is the value of 100 minus t times the standard error (expressed as a percentage of the mean) for the "best" and "worst" strata. This quantity is relative to tht! mt!an in that it is a measure of how close to the true mean the stratum should result. The best possible stratum would result in a value of 100.00. Evaluation should be based on this standard. The final summary table lists the overall slandard error times t expressed as a percentage of the mean for e_ach sample design and for each variable and population/subpopulation situation. 594 INT ._cotI1ERCIAL SUBSTRATIFICATION VARIABLE; HAX._KW SUBPOf'ULATlON: IN! ._COMtlERCIAL SAt1PLE ·SIZE WITH T-VALIJE :;; 1.645 POPULATION ---------- POP MEMI POP STO 7.(90) 7.(95) UNSTRATIFIED 9839 169.000 166.000 .100 • 050 STRATAti STR.l.TA 12 STRATA 13 STRATA #4 STRATA 15 4026 252& 1573 1064 65.000 llJ.OOO 198.000 349.000 671.000 10.000 1&.000 33.000 57.000 152.000 .100 .100 .100 .100 .100 .050 .050 .050 .050 • 050 ... STRATA ACCURACY ---------- OPTIMUM ALLOCATION 14&3 ... •1 1• 7• .17 11 •• 1 51 l4.-943 &.842 3.696 2.187 8.228 4.869 ,, •• 7 •• • .... 2 7.357 4.353 7 37 •• SAMPLE WITH T-VALUE .. 1.645 SAMPLE WITH T-VALUE .. 1.960 ------_.------------------------------------------SAMPLE ACCLRACY SAI1PLE ACCURACY ----------------------------T • T • sm TOTAL SAMPLE STRATA ACCURACY OPTItruM ALLOCATION --------------------------------------------------SAMf'LE ACCURACY SAMPLE ACCURACY ----------------------------T • T' SAMPLE SIZE STRATAtl STRATA 12 STRATAn STRATA 14 STRATA IS WITli .6 • TOTAl. SAMPLE ACCURACY: T • ST AtlJARO ERROR OF MEAN 7. OF TOTAL POPULATION MEAN SAMPLE SIZE T-VALUE:;; 1.960 ERR OF MEAN 7. OF 'OP MEAN 7 6., 14 • 10.5 19.2 33.2 66.6 9.565 9.431 9.693 9.499 9.959 45 7.' 4.353 •• ST1J ERR SAMPLE SIZE OF 7. OF 'OP MEAN MEAN ,• ,• • .... 11.6 20.9 66.3 144.4 17.895 18.263 19.3&7 16.998 21.514 14.9 8.842 11 SAMPLE SIZE .. 37 'T' ERR OF MEAN X OF pop MEAH 79 17.4 33.5 4.957 4.964 4.962 4.999 4.995 241 '.7 2.187 41 41 3.2 ••• ••• SAMPLE SIZE 'TO ERR OF MEAN 7. OF 'OP "EAN 7 7.' 17 " 12.5 21.6 35.3 72.3 ll.397 ll.237 10.689 10.te3 10.768 51 8.' 4.869 •• EVAWATION BASED ON lUMBER OF METERS FROI'1 ORIGINAL STRATIFICATION VARIABLE ANALYSIS (WEIGHTEO) SAMPLE WITH T-VALUE = 1.645 --------------------------------------------------SAMPLE ACCURACY SAMPLE ACCURACY ----------------------------T • T sm SAMPLE SIZE STR.&.TA STRATA STRATA STRATA STRATA II 12 _3 14 15 TOTAL SAMPLE 7 •• • 14 •• ERR OF "EAN pop MEAN 33.2 66.6 9.565 9.431 9.693 9.499 9.959 7.4 4.353 6.' 10.S 19.1 •ERR ST. 7. OF SAMPLE SIZE ,•• ,• 11 OF 7. Of POP MEAN MEAN 11.6 lO.9 38.4 66.3 144.4 17.695 18.863 19.387 18.998 21.514 14.9 8.842 SAMPLE WITH T-VALUE = 1.960 ------------------------------------------------------------------------------T' T' SAMPLE ACCURACY 'TO SAMPLE SIZE 37 ERR X OF OF MEAN ••• ••• POP MEAN 79 17-4 33.5 4.957 4.964 4.982 4.999 ".995 '41 '.7 2.1&7 41 43 41 5.5 SAMPLE ACCURACY STD ERR OF SAMPLE SIZE "EAN 7 8 1. OF POP MEAN 17 35.' 72.3 lL397 1I.l37 10.8M 10.121 10.768 Sl 8.' 4.869 7.' 12.5 21.6 • 10 EVALUATION WITH MINIMUM 30 METERSISTRATA (WEIGHTED) SAHPLE WITH T-VALUE = 1.645 --------------------------------------------------SA.MPLE ACCURACY SAMPLE ----------------------------T • T • ACC~ACY SAMPLE SIZE STRATA STRA.TA STRATA STRATA. STRATA 11 12 13 14 15 TOTAL SAMPLE !l !,. ,. 30 30 3. 3. 3. IS. STO ER" OF NEAtf 3 .• 5.4 9.' 17.1 45.7 ••• X Of" 'OP "EAN SAMPLE SIZE 4.621 4.870 5.006 4.905 6.803 3. 30 3. 3. 3. 2.354 15. 'TO ERR OF "E,," 3 ..0 5.4 r. SAMPLE WITH T-VALUE SAMPLE ACCURACY MEAN SAMf'LE SIZE 17.1 45.7 4.621 4.870 5.006 4.905 6.803 37 41 43 41 79 4 •• 2.354 241 ••• 595 SAMPLE ACCURACY --------------- Of POP = 1.960 --------------------------------------------------T' .TO ERR OF MEA~ ••• r. --------------T' STD ERR Of POP MEA~ SAMf'LE SIZE OF MEM '.6 r. OF POP ""AN 3. 3. 17.4 33.5 4.957 4.964 4.982 4.999 4.995 3D 3D 3D 54.4 5.505 5.803 5.964 5.844 8.106 '.7 2.167 15. '.7 2.604 5.5 '.9 6.4 ll.8 20.4 nrr ._COMMERCIAL SUBSTRATIFICATION VARIABLE: ANN_~WH SUBPOPULATION: INT ._COJ1MERCIAL SAMPLE SIZE SAI1PlE SIZE IoIITH WIlH T-VALUE POP S10 Z("O) :1.(95) 459537.000 681517.000 • 100 .050 4025 140437.000 2526 256701.01)0 1573 552468.000 1064 1027734.01)0 648 2073734.000 82115.000 146655.000 753941.000 520079.000 1073912.000 .100 .100 .100 .100 .100 .050 .050 • 050 .050 .050 POPULATIOt-l ---------tJt.ISTRATIFIED STRATA STRATA STRATA STRATA STRATA #1 12 13 14 15 9836 POP MEAN STRATA ACCURACY . ., ,* STAHDARD ERROR OF MEAN OF TOTAL POPULATION MEAt-I SAMPLE WITH T-VALUE = 1.645 SAMPLE SIZE STRATA STRATA STRATA STRATA STRATA #1 12 13 14 #5 TOTAL SAMPLE '" MEAN sm ,. OF POP ~EAN 504 70 73 14007.1 25572.2 55244.4 102255.4 20b763.2 9.974 '7.962 10.000 9.950 9.971 629 11.601.4 3.613 93 " EVALUATION BASEO ON ERR FOP MEAN MEAN STRATA STRATA STRATA STRATA STRATA #1 12 13 14 #5 TOTAL SAMPLE 7 6 OF '.R " 70 73 MEAN 14 51055.1 85293.9 43e488.6 302475.6 47Z.139.9 36.354 33.227 79.349 29.431 2Z.766 4' 77827.7 16.936 •• ••,. 502 1573 394 413 4• 23 " Z43 114 143 ." 12. 3408 644 16601.449 3.613 45566.820 9.916 5895.426 1.283 22947.754 4.994 = 1. 960 --------------------------------------------------SAMPLE ACCURACY SAMPLE ACCURACY --------------T _ --------------T_ SAMPLE SIZE 52. STO OF ;I, OF '" POP MEAH MEAN S10 ERR OF SAMPLE SIZE ••,. MEAH ;I, OF POP ~EAN 4.994 30101.4 6228'7.9 179012.2 178390.3 32:8046.7 25.706 <:4.<:66 32.402 17.358 15.819 = 1.645 .02 ,,. 1573 413 7017.6 Ha'9.' 0.' 51354.4103573.7 .4.997 4.998 0.0 4.997 4.995 243 114 143 SAMPLE WITH T-VALUE --------------T_ STD ERR i! OF POP ... ,. I' .44 """"" MEAN 3380 22947.8 3408 5895.4 1.283 129 45566.8 9.916 OF t1ETERS FROM ORIGINAL STRATIFICATION VARIABLE ANALYSIS (WEIGHTEOJ 23 29 --------------- SAMPLE SIZE OPTIt'f1Jt1 AlLOCATION 13.898 12.845 17.159 '7.289 8.488 I. --------------------------------------------------SAI1PlE ACCURACY SAI1PlE ACCURACY T - = 1.960 19517.5 32912.1 94796.1 95471.3 176017.8 14 ,. SAMPLE WITH T-VALUE 'TO ---------- .04 ,. OF OF SAMPLE SIZE STRATA ACCURACY OPTII1IJt1 AlLOCATION SAMPLE WITH T-VALUE --------------------------------------------------SAtlPLE ACCURACY SAMPLE ACClmACY --------------T _ T _ --------------STO OF 1.645 93 TOTAL SAMPLE ACCURACY: T = T -VALUE SAMPLE SIZE , ,,, ;I, Of POP OF MEAN MEAH = 1.960 --------------------------------------------------SAI1PLE ACCURACY SAMPLE ACCURACY --------------T _ --------------T_ i! OF STD ERR SAMPLE SIZE 68.011 66.454 158.738 58.863 49.184 "43 3 95515.4 170567.8 876977.6 604951.3 1019938.7 37 11 157965.4 34.375 POP OF MEAN MEAN 79 26459.3 44891.2 225350.8 159196.4 236516.1 18.841 17.488 40.790 15.490 11.420 241 39251.6 8.542 41 STD ERR SAMPLE SIZE OF M'''' i! OF POP MEAN 17 6t1831.6 101626.• 7 49tS74.7 322348.3 510505.3 43.316 39.590 89.159 31.365 2'1.616 51 87051.5 18.9:+3 7 •, 10 EVALUATION WITH MINIMUM 30 METERS/STRATA (WEIGHTED) SAMPLE WITH T-VALUE = 1.645 SAMPLE WITH T-VAlUE ~-------------------------------------------------- SAMPLE ACCURACY SAMPLE SIZE STRATA Sn!ATA STRATA STRATA STRATA .1 #2 #3 #4 tiS TOTAL SAMPLE T 'TO 'RR OF M'AN , I! Of POP MEAN .. SAMPLE ACCURACY --------------T_ --------------SAMPLE SIZE 'TO OF MEAN MEAN STD ERR SAMPLE SIZE 30 30 30 30 30 24662.0 44045.6 226434.6 156197.7 322532.9 17.561 17.158 40.986. 15.1"8 15.553 30 30 30 30 30 24662.0 44045.6 226434.6 156197.7 322532.9 17.561 17.158 40.tlB6. 15.198 15.553 37 41 .3 41 150 42163.7 9.175 150 42163.7 9.175 596 SAMPLE ACCURACY --------------T_ Y. OF POP = 1.960 -------~------~------------------------------------ SAMPLE ACCURACY OF MEAN --------------n Y. OF POP MEAN 79 26459.3 44891.2 225350.8 159196.4 236.816.1 18.841 17.488 40.790 15.4<;0 11.420 241 39251.6 8.542: S10 ERR SAMPLE SIZE 30 30 OF MEAN 7. OF FOP MEAN 48.834 30 30 29384.5 52479.8 269794.3 186107.9 384294.5 150 50237.5 10.932 " 20.924 20.444 18.109 18.532 POPULATlON: INT "_COJ1t1ERCIAL DETERIORATION OF STRATA - 5USSTRATIFICATION VARIABLE; HAX. KII SL'BPOPULATlOJ.!: INT._COMMERCIAL 100;': - 5 • STRATA '" , IT * s.E.1 STRATA mETERS fOR: .-..-..-..-..- ------------ ~Oi:: 95i:: 90;': ~5i:: ~Oi:: ~5% 90? 'lS? ~Oi:: 'lS? IOi:: Si:: 10;': 5i:: 10i:: 5i:: JO? S? 10i:: 5? 7 37 2 7 SRS SRS OPT OPT OVERALL SRS OVERAl.l S~S SFlS-STABlE SRS-STABLE OPT-STABLE OPT-STABLE POPULATION: 3 • • • • 5 41 2 43 2 41 2 10 14 7. 3 17 30 30 43 30 30 30 41 30 3. 30 7. 30 30 • • TOTALS ,., 45 11 51 262 (AS l:: OF MEAN) WORST BEST STRATA SlRATA 90.57 95.04 82.10 89.88 90.04 95.00 78.49 88.60 95.38 95.04 95.38 94.49 93.20 95.00 93.20 91.89 1483 30 37 30 3. 41 30 3. ISO 241 ISO ISO tNT ._COMt1ERCIAl DETERIORATION OF STRATA SUBSTRATIFIcATION VARIABLE: 'NILKWH SUBPOPULATIOJ.l: INT._COMMERCIAL • S-mATA = 5 • #METERS FOR: 90i:: '75;': 90i:: 1;57. 90i:: 95i:: 90i:: 95i:: 90? 95? .-..-...-.-..-.- ------------ - 10i:: S? 10i:: 5i:: 10i:: Si:: 10i:: 5? lOX 5i:: T It A T A • • • • ., • • 2 7 37 2 7 SRS SitS OPT OPT OVERALL SRS OVERALL SRS SRS-STABlE SRS-STABLE OPT-STABLE OPT-STABLE POPULATION: 3 2 43 2 41 2 10 30 41 30 43 30 41 3. 30 5 TOTALS 14 7. 3 17 45 241 11 51 262 1483 3• 37 30 30 "30 3." 30 7. 30 30 IS. 241 IS' 150 100;': IT * 5.E.l (AS ;.: OF MEAN) BEST WORST S-mATA STRATA ..... ..... ..... n.23 SO.82 75.38 20.63 59.21 31. 99 10.84 84.8ll 81.89 !>9.01 59.21 59.01 51.17 INT._COMMERCIAL OVERALL SUMI'IARY TABLE T * S.E. AS X OF MEAN lBASEO ON ORIGINAL METER ALLOCATIONS) DESIGN 90;': 95;': 907. 957. 907. 957. 907. 957. 907. 957. .-...- ......- 10;': SRS 4.353 16.936 '.0 ••• ••• '.0 57. SRS 2.187 8.542 0.0 0.' 0.0 0.0 107. OPT 8.842 34.375 0.0 0.0 0.0 0.0 57. OPT 4.869 18.943 0.0 0.0 0.0 0.0 107. SRS STABLE 2.354 9.175 0.0 0.0 0.0 0.0 57. SRS STABLE 2.187 8.542 0.0 0.0 0.0 0.0 10;': OPT STABLE 2.354 9.175 0.0 0.0 0.0 0.0 57. OPT STABLE 2.804 10.932 0.0 0.0 0.0 0.0 107. OVERALL SRS 9.982 15.072 '.0 0.' 0.0 0.0 57. OVERALL SRS 4.999 7.548 0.0 0.0 0.0 0.0 597