Download Statistical Sampling Overview and Principles

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Taylor's law wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Misuse of statistics wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Gibbs sampling wikipedia , lookup

Sampling (statistics) wikipedia , lookup

Transcript
Statistical Sampling
Overview and Principles
Alvin Binns
205-220-4522
[email protected]
Scenario
• Provider X is identified for billing excessive
ambulance services. A decision was made
to pull all his/her ambulance services for a
specified two years period.
Results:
- 3,000 claims, 7,000 lines and $1.8M
in payments.
Reasons for Sampling
•
•
•
•
Time
Cost
Available resources
Available staff
What is Sampling?
• Sampling - is the selection of observations
to acquire some knowledge of a statistical
universe (population).
• From the characteristics of samples, we can
infer the characteristics of universes, if the
sample is representative of the universe.
How Do I Get a Representative Sample?
• In order for statistics to be good estimates
of parameters, they must, on average, return
the value of the universe parameter
• When the expected value of a statistic
equals a universe parameter, we call the
statistic an unbiased estimator of that
universe parameter
How Do I Get a Representative Sample?
• How do you ensure that your statistic is an
unbiased estimator?
RANDOMIZATION!!!
Randomization
• A sample that is randomly selected from a
universe yields sample statistics that are
unbiased estimates of the universe
parameters
• Many software packages, such as SAS and
RAT-STATS have a valid random number
generator
Probability Sampling
• Another idea behind random sampling is
that each sampling unit has a known
probability of being selected
Sampling Terms
• Universe : An event or things of interest
that the researcher wishes to investigate.
Eg. All Medicare beneficiaries that received a left
heart catheterization from Dr. John Doe between
January 1, 2007 and June 30, 2008 paid up to
September 30, 2008.
Sampling Terms
• Samples are usually drawn by taking a
subset of sampling units from the total
universe
• Sampling units are non-overlapping
collection of elements from the universe
that cover the entire universe (eg claims,
beneficiaries)
Estimation
• We can infer the values of the universe
from the sample by the use of estimation
• Ideally, we would like gather information
from the sample and then estimate that
value for the entire universe
• These estimates calculated from the sample
data are called statistics
Statistics
ENTITY
Sample
Estimates
Census
CHARACTERISTIC
Statistic
Estimates
Parameter
Estimation
• In an simple random sample where we had
sampled 100 units out of 1000, suppose we had a
$5,000 total overpayment from the sample
• The Mean Total Overpayment would then be:
 op
 5000 
 Nyt  1000
  $50,000
 100 
Why Should I Care?
• HCFA Ruling 86-1 allows the use of
statistical sampling for the purpose of
estimating a provider’s overpayment to the
Medicare trust fund
• Thus, we can use sampling to estimate
overpayments to providers and avoid having
to review the entire universe!
CMS Sampling Guidelines
• CMS guidelines for Statistical Sampling for
Overpayment Estimation
• Program Integrity Manual Section 3.10
• Some of the issues addressed are:
– Methodologies
– Sample Size
– Estimation techniques
Sampling Guidelines
• This replaces and clarifies (for older cases) the old
HCFA Sampling Guidelines Appendix (CR 1363)
– “This program memorandum (PM) provides clarified
guidance and direction for Medicare carriers to use
when conducting statistical sampling for overpayment
estimation. The attached replaces the prior Sampling
Guidelines Appendix for reviews conducted after
issuance of this PM. For reviews conducted prior to
this issuance, the attached are a clarification to aid
interpretation of the earlier instructions, particularly
where specific numbers are suggested”
Sampling Methodologies
• Simple Random Sampling
• Cluster Sampling
• Stratified Sampling
• Other Methodologies
Simple Random Sampling
• This is the most straightforward method of
sampling
• X number of sampling units are randomly
selected from Y total sampling units in the
Universe
• Each sampling unit has an equal probability
of being selected
Cluster Sampling
• A cluster sample is a probability sample in
which each sampling unit is a collection, or
cluster, of elements
• A good example is the random selection of
beneficiaries, then selecting all relevant
claims from each beneficiary
Stratified Random Sampling
• A stratified random sample is one obtained by
separating the universe elements into nonoverlapping groups, called strata, and then
selecting a simple random sample from each
stratum
• An example of this would be samples involving
multiple procedure codes, selecting simple random
samples from each code
• Stratified random sampling generally has less
sampling variability that other sampling designs
Stratified Random Sampling –
Proportional Allocation Example
Universe = 1000 Units
99211
99212
99213
99214
99215
50
150
300
450
50
99214
45
99215
5
Sample = 100 Units
99211
5
99212
15
99213
30
PIM 3.10 – Sample Sizes
• PIM 3.10 states about sample sizes:
– “It is neither possible nor desirable to specify a minimum sample
size that applies to all situations”
– “…real-world economic constraints must be taken into account.
As stated earlier, sampling is used when it is not administratively
feasible to review every sampling unit in the target universe. In
practice, sample sizes may be determined by available resources.
That does not mean, however, that the resulting estimate of
overpayment is not valid as long as proper procedures for the
execution of probability sampling have been followed. A challenge
to the validity of the sample that is sometimes made is that the
particular sample size is too small to yield meaningful results.
Such a challenge is without merit as it fails to take into account all
of the other factors that are involved in the sample design”
PIM 3.10 – Sample Sizes
• CSA procedure:
– If we can, we like to pull at least 10% of the
universe, however, this is not a rule that is set in
stone
– We must, at a minimum, pull at least 30
sampling units to satisfy distribution
requirements through the central limit theorem
PIM 3.10 – Overpayment
• PIM 3.10 also states:
– “In most situations the lower limit of a onesided 90 percent confidence interval should be
used as the amount of overpayment to be
demanded for recovery from the physician or
supplier. The details of the calculation of this
lower limit involve subtracting some multiple
of the estimated standard error from the point
estimate, thus yielding a lower figure.”
PIM 3.10 – Overpayment
• It further states that:
– “This procedure, which, through confidence interval estimation,
incorporates the uncertainty inherent in the sample design, is a
conservative method that works to the financial advantage of
the physician or supplier. That is, it yields a demand amount for
recovery that is very likely less than the true amount of
overpayment, and it allows a reasonable recovery without
requiring the tight precision that might be needed to support a
demand for the point estimate. However, you are not precluded
from demanding the point estimate where high precision has been
achieved.”
PIM 3.10 – Overpayment
• What we really do then is calculate the
Mean Total Overpayment and subtract a
multiple of the standard error from it to
achieve the lower level of the confidence
interval
PIM 3.10 – Overpayment
• Below is the formula for the total variance for cluster
sampling
n
 N n
V  op   N 2  

 N n 
2


y

y
 i t
i 1
n 1
PIM 3.10 – Overpayment
• Look at how the overpayments work:
OP w/ Small Variance (Large n)
OP w/ Large Variance (Small n)
$$ Overpayment
90% Upper Limit
90% Upper Limit
Mean Total
Overpayment
Mean Total
Overpayment
90% Lower Limit
90% Lower Limit
Sample Size Comparison
Analysis Variable : CLUSTAMT
N
Mean
Std Dev
Sum
Minimum
Maximum
20
17,462.81
30,551.07
34,9256.24
0.00
81,603.20
10
17,462.81
31,388.24
17,4628.12
0.00
81,603.20
5
18,323.58
35,639.19
91,617.92
0.00
81,603.20
Estimation Of Total Amount Of
Refund & It's Lower 1-sided 90% C.I.
Sample
Size
Univ.
Size
Mean Total
Overpayment
Std.
Error
90% 1-sided Lower
Bound
20
44
$768,363.73
221,995.
08
$483,766.04
10
44
$768,363.73
383,912.
92
$276,187.37
5
44
$806,237.70
660,329.
49
$91,617.92
Difference if sample Size 5-10 beneficiary: $184,569.45
Difference in sample Size 10-20 beneficiary: $207,578.67
Bottom Line
• Large Sample Sizes
– Use when the expected
overpayment is large
– Use in high profile
cases
– Resource intensive
– Increase precision even
more using stratified
sampling plans
• Small Sample Sizes
– Use when the expected
overpayment is small
– Use in routine, low $
cases
– Not as resource
intensive
– Does not work as well
for stratified sampling
Sub-samples
• It is often beneficial to evaluate a subsample before moving to a full statistical
sample. (sample size of about 30)
• Get a good idea of the point estimate (Mean
Total Overpayment).
• Sampling for Consent Settlements.
Summary for Sampling
• Define the Universe
• Determine the sampling methodology
• Create the sampling Frame
• Determine sample size
• Create your sample
After Sampling review is completed
• Perform overpayment Projection
Questions?
Thank You!