* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Random Sampling, Universe, and Extrapolation Definitions
History of statistics wikipedia , lookup
Bootstrapping (statistics) wikipedia , lookup
Taylor's law wikipedia , lookup
Confidence interval wikipedia , lookup
German tank problem wikipedia , lookup
Resampling (statistics) wikipedia , lookup
Sampling (statistics) wikipedia , lookup
12/4/2014 Random Sampling, Universe, and Extrapolation Presented by: Frank Cohen Director of Analytics and Business Intelligence DoctorsManagement, LLC [email protected] 727.442.9117 CMS Pub.100-08 Chapter 3 Section 10.2 • If a particular probability sample design is properly executed, i.e., defining the universe, the frame, the sampling units, using proper randomization, accurately measuring the variables of interest, and using the correct formulas for estimation, then assertions that the sample and its resulting estimates are “not statistically valid” cannot legitimately be made. In other words, a probability sample and its results are always “valid.” Definitions • Universe or Population – A collection of units being studied. Units can be beneficiaries, claims, claim lines, procedures, drugs, tests, etc. • Sample Frame – A sample frame is a collection of units from which a sample will be drawn. The data are normally homogenous and share similar characteristics • Sample – a finite part of a statistical population whose properties are studied to gain information about the whole 1 12/4/2014 Definitions • Parameter – Considers the characteristics of the data set • Statistic – A numerical value, such as standard deviation or mean, that characterizes the sample or population from which it was derived • EPSEM – Equal Probability of Selection Method Why Sample? • Because it is often impossible to collect 100% of the data on 100% of the population (or universe) – This is called a census • It is an efficient and inexpensive way to infer the statistics of a sample to the universe (or sample frame). What Size Sample? • Large enough to minimize sampling error and not so small that it no longer fairly represents the population in question • Too large a sample cost more money and resources without added benefits • Too small a sample creates too much error and renders the results useless 2 12/4/2014 Ideally . . . • The sample is representative of the qualities of the population – The sample has the same characteristics as the population • It is of sufficient size to satisfy the assumptions of the statistical techniques used in our analysis Sample Size Rules of Thumb • Any probability sample will have some inaccuracy, or sample error • The larger the sample, the smaller the error (not always a good thing) • The more homogenous the variables, the smaller the error (always a good thing) • Sample size determination is a fairly complex undertaking Why Random Sampling? • It eliminates bias in selecting units • It enables us to infer (extrapolate) the results to a larger population based on what is learned from sample results • It allows us to estimate sampling error, which is critical for extrapolation • Randomness does not guarantee representativeness, particularly if the population is biased 3 12/4/2014 Random Sampling Methods 1. Simple Random sample – Every unit has an equal (non-zero) chance of being selected • Selecting claims to study payer behavior 2. Stratified sample – Breaking the universe into homogenous sampling frames from which a homogenous sample is drawn • • • • Procedure type (E&M v. Surgical) Beneficiary type (age, sex, etc.) Diagnosis Paid amounts Stratified Sample Example Random Sampling Methods 3. Cluster sampling – Organizes the units into similar subsets – Two stage • Random sample of beneficiaries and then random sample of claims for each – Multi-stage • Random sample of beneficiaries from which we draw a random sample of claims from which we draw a random sample of claim lines 4 12/4/2014 Cluster Sampling Practice Beneficiary Beneficiary Claim Claim Claim Claim Claim Claim Claim Line Claim Line Claim Line Claim Line Claim Line Claim Line Claim Line Claim Line Claim Line Claim Line Claim Line Claim Line Claim Line Claim Line Claim Line Claim Line Claim Line Claim Line How Do I Randomize? • You can use a software program – RAT-STATS, MiniTab, Excel, SQL, etc. • You can systematize the sample – Every n unit, such as every 10th or 25th or 50th unit • You can sort by some variable (such as claim ID or claim code) How about Statistically Valid? • There is a difference between a sample being random and it being statistically valid • Random just means that every unit had an equal chance of being selected • Statistically valid has to do with the representativeness of the sample 5 12/4/2014 Representativeness Universe Sample Summary for Paid Sample Summary for Paid Universe A nderson-Darling Normality Test A nderson-Darling Normality Test $36.00 $45.00 $54.00 $63.00 $72.00 $81.00 $90.00 $99.00 A -Squared P-V alue < 1206.57 0.005 A -Squared P-V alue < 4.17 0.005 Mean StDev V ariance Skew ness Kurtosis N 83.637 14.409 207.609 -0.884458 -0.691987 15198 Mean StDev V ariance Skew ness Kurtosis N 68.567 10.990 120.791 -1.40512 0.63790 40 Minimum 1st Q uartile Median 3rd Q uartile Maximum 34.410 68.580 91.050 94.110 98.550 $48.00 $56.00 $64.00 $72.00 $80.00 95% C onfidence Interv al for Mean 83.408 65.052 95% C onfidence Interv al for Median 91.050 14.248 68.230 Mean Median Median $86.00 $88.00 $90.00 $92.00 72.082 73.410 95% C onfidence Interv al for StDev 9 5 % Confidence Inter vals 14.572 Mean $84.00 45.180 68.230 72.840 76.940 78.840 95% C onfidence Interv al for Median 91.050 95% C onfidence Interv al for StDev 9 5 % Confidence Inter vals Minimum 1st Q uartile Median 3rd Q uartile Maximum 95% C onfidence Interv al for Mean 83.867 $64.00 $66.00 $68.00 $70.00 9.003 $72.00 14.112 $74.00 Simple Random Sampling • SRS works well when the population is homogeneous & readily available • Each element of the frame thus has an equal probability of selection. • Each unit in the sampling frame is assigned some unique identifier Systematic Sampling • First, arrange the sampling frame (or population) using some ordering technique and select at regular intervals – i.e., every 4th or all odd or even • Start from a random position 6 12/4/2014 Stratified Sampling • When the population can be described by a number of different characteristic groups, the frame can be organized into separate "strata" • Each stratum can then sampled as an independent subpopulation, subject to SRS • Most often, the strata are sampled proportionate to the population • If done properly, it reduces variability and increases precision 19 Types of Stratified Samples • Proportionate – The sample for each stratum has the same distribution proportion as the universe • Disproportionate – The sample has a different percent distribution than the population Example of Multimodal Distribution 7 12/4/2014 Pre-Stratification Summary Report for Paid Anderson-Darling Normality Test A-Squared P-Value Mean StDev Variance Skewness Kurtosis N 11.66 <0.005 11852 11588 134270636 1.09406 0.00206 195 Minimum 1st Quartile Median 3rd Quartile Maximum 227 2403 7604 16509 38706 80% Confidence Interval for Mean 0 6000 12000 18000 24000 30000 10784 36000 12919 80% Confidence Interval for Median 5974 8902 80% Confidence Interval for StDev 10890 12407 80% Confidence Intervals Mean Median 5000 7500 10000 12500 Post-Stratification Summary for Payment Summary for Payment Stratum = 1 Stratum = 2 Anderson-Darling Normality Test A-Squared P-Value < Anderson-Darling Normality Test 1.82 Mean StDev 2598280.2 Skewness 0.577664 Kurtosis -0.366345 N Minimum 30000 60000 90000 120000 P-Value 2851.8 1611.9 Variance 0 A-Squared 0.005 Mean StDev Variance 94 1st Quartile 1502.5 Median 2402.6 3rd Quartile 4135.6 Maximum 6914.3 0 30000 60000 90000 120000 9 5 % Con f id en ce In tervals 2380.1 Mean 1409.8 2750 1882.2 9 5 % Con f id en ce In tervals 972.4 9000 Mean StDev Variance 0.51 A-Squared 0.181 P-Value < 15875 Mean 1817 StDev 3301474 Variance Skewness -0.29747 Kurtosis -1.26105 1st Quartile 30 N 12712 Minimum 14342 Median 16197 3rd Quartile 16554 Maximum 19974 0 30000 60000 90000 120000 95% Confidence I nterval for Mean 15196 14916 1447 15600 16200 41 26524 Median 32418 3rd Quartile 38655 Maximum 38706 29502 33453 95% Confidence I nterval for Median 9 5 % Con f id en ce In tervals 16509 2443 1st Quartile 20428 95% Confidence I nterval for Mean 16553 95% Confidence I nterval for StDev 15000 6257 0.380632 -0.155208 95% Confidence I nterval for Median Mean 1.25 0.005 31478 39154899 Skewness N Median 1641.4 Anderson-Darling Normality Test Kurtosis Minimum 120000 10023.0 9600 Stratum = 4 A-Squared 90000 9661.0 Summary for Payment P-Value 60000 10098.2 11152.2 95% Confidence I nterval for StDev 8400 Anderson-Darling Normality Test 9 5 % Con f id en ce In tervals 8232.9 9079.3 3rd Quartile Maximum 8368.1 Mean Stratum = 3 30000 30 7511.7 95% Confidence I nterval for Median Median 3000 Summary for Payment 0 1st Quartile Median 8749.1 3164.8 95% Confidence I nterval for StDev 2500 -1.25596 95% Confidence I nterval for Mean 3181.9 95% Confidence I nterval for Median Median 0.16700 Kurtosis Minimum 95% Confidence I nterval for Mean 2521.6 0.61 0.101 9205.0 1221.0 1490753.7 Skewness N 226.8 28828 Mean 36023 95% Confidence I nterval for StDev Median 5137 30000 32500 8006 35000 Certainty Stratum • The statistical reason for selecting a certainty stratum is to capture and isolate the largest unit values so that their extremely large values do not influence sampling variability • This is a great way to deal with outliers • Certainty strata are not part of the extrapolation calculation but rather the face value is added on to the total 8 12/4/2014 Sampling Bias • Any event that causes one or more variables within a population to have a different chance of selection • This can lead to over or under representation of a group of variables • Bias isn’t always bad The Sampling Process 1. 2. 3. 4. 5. 6. 7. Define the population of interest Create a sampling frame Determine the sampling method Calculate the sample size Sample the data Analyze the results Infer to the population of interest Types of Appraisal Methods • Variable Appraisal – To measure a quantitative characteristic such as the dollar amount per claim, line or beneficiary – Continuous variable • Attribute Appraisal – to determine the number of items that meet a given set of criteria, such as the proportion of lines with improper modifier usage – Proportion or ratio (like a percentage of error) 9 12/4/2014 Sampling Units • Populations – All patients for a given provider • Beneficiaries – Claims for individual beneficiaries • Claim – Billable unit • Claim Line – Subset of a claim Sample Error • Sample error is an estimate of the potential error (or precision) the results have in relation to the population (or universe) • Most often, sample error is measured by confidence intervals Calculating Sample Error • For a variable appraisal, sample error is calculated as the standard deviation divided by the square root of the sample size • For an attribute appraisal, sample error is calculated as the square root of the proportion times 1- proportion, all divided by the sample size 10 12/4/2014 Example of SE • A sample of average charges for 99213 was taken from 50 practices in a given area • Mean = $82.40 and STDev = $15.55 – Assume normal distribution – 68.26% of values between $66.85 and $97.95 • • • • SE = Stdev/sqrt(N), or 15.55/sqrt(50), or 15.55/7.07 = 2.2 The standard error for our estimate of the mean of $82.40 is $2.20 Our precision is around 2.6% (pretty good!) 31 What is Margin of Error? • The Margin of error is a range of error that is based on the confidence interval we select. • The higher the CI, the larger the scores and the wider the margin of error – 99% = 2.576 – 95% = 1.96 – 90% = 1.645 – 80% = 1.28 Calculating the Margin of Error • The margin of error is a standard score times the standard error – Score values depend on how wide or narrow you want the margin of error to be – The higher the value, the higher the margin of error • Sample of 50, 95% margin of error – Mean = 82.40, stdev = 15.55, SE = 2.20 – Margin of error = score times SE • ½ Interval = 1.96 * 2.20 = 4.31 33 11 12/4/2014 What is a Confidence Interval (CI)? • The purpose of a confidence interval is to validate a point estimate; it tells us how far off our estimate is likely to be • A confidence interval specifies a range of values within which the unknown population parameter may lie – Normal CI values are 90, 95%, 99% and 99.9% • The width of the interval gives us some idea as to how uncertain we are about an estimate – A very wide interval may indicate that more data should be collected before anything very definite can be inferred from the data 34 Confidence Interval Example Note that in only six of the 100, the mean was not within the range of the upper and lower bound. Ideally, this should have been five, but it’s pretty close! Calculating the Confidence Interval • Using our average charge example: – Mean = 82.40, stdev = 15.55, SE = 2.20, ME = 4.42 – CI = 82.40 +/- 4.42, or – 95% CI = $77.98 to $86.82 • If I were to take 100 samples, in 95 of them the actual point estimate would be somewhere between $77.98 and $86.82 36 12 12/4/2014 Why is this Important? • In an audit situation, we want to be able to estimate precision – OIG states that precision should be no worse than 25% using a 80% confidence interval • In an extrapolation, we want to use sample error to our advantage – Most commonly, the extrapolation uses a point estimate minus the ½ interval of a 90% confidence interval Creating a Sample for Review • It is not necessary (and often ill-advised) to create a SVRS for an internal review – Obligates you to extrapolate the overpayments • Use nonprobability sampling – Convenience sampling – Quota sampling – Purposive sampling Nonprobability Sampling • Convenience samples – Selecting units that are easy and accessible – i.e., the last five encounters • Quota sampling – A specific quota is established and you choose any unit you want until the quota is met • Purposive sampling – This involves choosing the units for the sample that you think are most appropriate 13 12/4/2014 For More Information Frank D Cohen www.frankcohengroup.com [email protected] 727.442.9117 The Toolbox www.frankcohengroup.com Library>=Toolboxes Click on the link for the Post-Audit toolbox Password to unzip is 11417028 CEU Index #: 38881BYC 14