Download Basic Statistics for the Healthcare Professional

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Taylor's law wikipedia , lookup

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Confidence interval wikipedia , lookup

Student's t-test wikipedia , lookup

Misuse of statistics wikipedia , lookup

Transcript
9/17/2015
Basic Statistics for the Healthcare
Professional
1
FRANK COHEN, MBB, MPA
DIRECTOR OF ANALYTICS
DOCTORS MANAGEMENT, LLC
Purpose of Statistic
2

Provide a numerical summary of the data being analyzed.

Data (n)






Factual information organized for analysis.
Numerical or other information represented in a form suitable for processing by computer
Values from scientific experiments.
Provide the basis for making inferences about the future.
Provide the foundation for assessing process capability.
Provide a common language to be used throughout an organization to describe processes.
Relax….it
won’t be that
bad!
Objectives
3
 Identify the basic tenets of statistics and statistical theory
 Define mean, median and other central measurements
 Define standard deviation, inter-quartile ratios and other
measurements of variability
 Explain the difference between data analysis and statistics
 Describe hypothesis testing and other tests of statistical significance
 Articulate how to build relationships through regression analysis
1
9/17/2015
3 Degrees of Separation
4
 Measures of Location (central tendency)
 Mean, Median and Mode
 1st statistical moment
 Measures of Variation (dispersion)
 Range, Standard deviation, Interquartile Range
 2nd statistical moment
 Measures of Error (estimation)
 Standard error and confidence intervals
 Measures of Relationships
 Covariance, Correlation, Regression
Descriptive Statistics
5
 Descriptive statistics are numbers that are used to summarize and describe
data





Mean conversion factor
Cost per RVU
Average Collection by provider
Work RVUs that define 1 FTE
New office visits to initial consults
 Measures of central tendency include the mean, median, and mode
 Measures of variability include the range, variance, and standard deviation.
 Descriptive statistics are just descriptive and do not involve generalizing
beyond the data at hand
Inferential Statistics
6
 Inferential statistics depends upon a sample of a population to draw (or
infer) conclusions about the population as a whole


Inferences are made based on central tendency or any of a number of other aspects of
a distribution
For example, it is not practical to review every chart for a practice so extrapolation is
used to assess overpayment
 No given sample will represent exactly the population, so distribution
techniques and sample error calculations are very important
2
9/17/2015
Important Definitions
 Universe

The complete set of objects included within the database in question
 Sample Frame
 A homogenous set of objects that the investigator is interested in studying
 Sample
 A subset of the population that is actually being studied
 Variable
 A characteristic of an individual or object that can have different values (as opposed to a
constant)
 Independent variable
 The variable that is systematically manipulated or measured by the investigator to determine
its impact on the outcome.
Important Definitions
 Dependent variable
 The outcome variable of interest
 Data
 The measurements that are collected by the investigator
 Statistic
 Summary measure of a sample
 Parameter
 Summary measure of a population
Types of Data
9
 Attribute Data (Qualitative)

Is always binary, there are only two possible values (0, 1)



Yes, No
Go, No go
Pass/Fail
 Variable Data (Quantitative)



Variables are properties or characteristics of some event, object, or person that can take on
different values or amounts (as opposed to constants such as π that do not vary).
Independent Variables
 Variables that are manipulated by the experimenter
Dependent Variables

A variable that measures the experimental outcome
3
9/17/2015
Discrete Variables
10
 Discrete variables are whole numbers (count numbers) that do not pass
through the space between each number.





The number of patients seen in a single day
The number of surgical procedures reported by a provider
The number of 99213 codes reported for the practice
The number of different specialties
The number of charts with coding errors
Continuous Variables
11
 Continuous variables are real numbers that can occupy and infinite
amount of space between discreet variables






Frequency distribution of E/M codes within a category
Total number of calculated FTEs in a practice
Minutes per work RVU
Ratio of initial consults to new office visits
Cost per RVU
Charge per Hour
Frequency Distribution
12
WELCOME TO THE FAMILY!
4
9/17/2015
Frequency Distributions
13
 A frequency distribution shows the number of observations falling into
each of several ranges of values
 Frequency distributions are portrayed as frequency tables, histograms,
or polygons
 Frequency distributions can show either the actual number of
observations falling in each range or the percentage of observations
 From the frequency distribution table, you can calculate the mean, median,
mode, and range
Normal Distribution
14
 In many natural processes, random variation conforms to the normal
distribution
 Characteristics
 Symmetric, Unimodal, Extends to +/- infinity
 Completely described by two parameters
 Mean and Standard deviation



68.27 % of the data will fall within +/- 1 standard deviation
95.45 % of the data will fall within +/- 2 standard deviations
99.73 % of the data will fall within +/- 3 standard deviations
Normal Distribution - Illustrated
15
-6
-5
-4
-3
-2
-1
+1
+2
+3
+4
+5
+6
5
9/17/2015
Properties of the Normal Distribution
16
1.
2.
3.
4.
5.
6.
7.
8.
It is bell-shaped
The mean, median and mode are equal and located in the center of the
distribution
It is unimodal (has only one mode)
The curve is symmetric about the mean
The curve is continuous (for each value of x there is a corresponding value
of y)
The curve never touches the x-axis (goes to infinity)
The total area under the curve is 1.00
The area under the curve that lies 1, 2 and 3 STD is equivalent to 68%, 95%
and 99.7% respectively
Same Mean, Different Standard Deviations
17
Different Mean, Same Standard Deviation
18
6
9/17/2015
Different Mean, Different Standard Deviation
19
Looking at the Curve
20
 Skew
 Skew measures the degree to which the distribution is biased (or skewed) right or left
of normal expectation
 Kurtosis (it’s not a disease)
 Beyond skewness, kurtosis tells us when our distribution may have high or low
variance, even if normal.
Skewness – the 3rd Moment
21
The third moment, is used to define the skewness of a distribution:
Skewness is a measure of the symmetry of the shape of a distribution. If a distribution
is symmetric, the skewness will be zero. If there is a long tail in the positive direction,
skewness will be positive, while if there is a long tail in the negative direction, skewness
will be negative.
7
9/17/2015
Kurtosis – The 4th Moment
The fourth moment, is used to define the kurtosis of a 22
distribution:
Kurtosis is a measure of the flatness or peakedness of a distribution. Flat-looking distributions are referred to
as “platykurtic,” while peaked distributions are referred to as “leptokurtic.”
A kurtosis value of 3 represents a normally distributed data set. <3 approaches platykurtic while > 3
approaches leptokurtic.
Normal or Not?
23
Normal
Not Normal
Summary Report for Data
Summary Report for Frequency
Anderson-Darling Normality Test
A-Squared
P-Value
0.00949
1.00143
1.00287
-0.0073412
-0.0347934
10000
Minimum
1st Quartile
Median
3rd Quartile
Maximum
-3.3
-2.2
-1.1
0.0
1.1
2.2
3.3
Anderson-Darling Normality Test
0.13
0.986
Mean
StDev
Variance
Skewness
Kurtosis
N
-3.50651
-0.66662
0.01314
0.68064
3.95809
A-Squared
P-Value
61.61
<0.005
Mean
StDev
Variance
Skewness
Kurtosis
N
9.1974
5.1472
26.4937
1.36825
2.62067
2507
Minimum
1st Quartile
Median
3rd Quartile
Maximum
1.0000
5.0000
8.0000
11.0000
41.0000
95% Confidence Interval for Mean
-0.01014
0.02912
95% Confidence Interval for Mean
8.9959
9.3990
95% Confidence Interval for Median
95% Confidence Interval for Median
6
-0.01225
0.04014
95% Confidence Interval for StDev
0.98774
12
18
24
30
36
8.0000
8.0000
95% Confidence Interval for StDev
1.01551
5.0086
95% Confidence Intervals
5.2938
95% Confidence Intervals
Mean
Mean
Median
Median
-0.01
0.00
0.01
0.02
0.03
8.00
0.04
8.25
8.50
8.75
9.00
9.25
9.50
Individual standard deviations are used to calculate the intervals.
Benford’s Distribution
24
 The distribution of first digits in
Benford's Distribution
any series of naturally occurring
numbers, according to Benford's
law.
 Each bar represents a digit, and
the height of the bar is the
percentage of numbers that start
with that digit.
35.00
30.00
30.10
Percent Distribution
25.00
20.00
17.61
15.00
12.49
9.69
10.00
7.92
6.69
5.80
5.12
4.58
8
9
5.00
0.00
1
2
3
4
First 5Digit
6
7
8
9/17/2015
Even We Are Obliged to Follow Benford
25
Benford's Distribution
Benford
Powers of 2
Area
Population
35.00
Percent Distribution
30.00
25.00
20.00
15.00
10.00
5.00
0.00
1
2
3
4
First 5Digit
6
7
8
Weighted average charges for 11,066 individual
procedure codes
Taken from the 100% Medicare claims database
9
Measures of Position and Central
Tendency
26
MEAN, MEDIAN, MODE, PERCENTILES
Measures of Central Tendency
 In the study of statistics there are many measurements of central
tendency.
 The three most common are the mean, median, and mode.
 These metrics are used to identify the approximate location of the
center of the data
9
9/17/2015
Mean
 The mean, or arithmetic average, is found by adding a group of
numbers and dividing the sum by the number of items added.
 The mean is the best known and most used measure of central
tendency.
 The group of numbers is sometimes referred to as the data or data set.
 The mean measures the central location of the values within the
database
 The mean is useful for predicting but only where there are no extreme
values
The Mean (or Average)
29
Arithmetic Mean (average)
30
 Create a metric for each code using the same
method

n
x w
i
xw =
i.e., divide the charge by the RVU
total
 Divide the grand total by the number of
samples
 Pros:


Easy to calculate
Eliminates frequency bias
 Cons:


Does not take into account the frequency of occurrence
Not accurate if data is not normally distributed
w
i 1
 Add each of the results together to get a grand
Code
99201
99202
99203
99204
99205
99211
99212
99213
99214
99215
Total
Count
Average
i
i =1
n
i
CF
80.83
81.61
81.67
106.79
78.81
86.96
87.78
68.43
70.59
75.73
819.20
10.00
81.92
10
9/17/2015
Mean Sensitivity to Outliers
31
CF With Outliers – 89.8
CF Without Outliers – 77.6
Median
 The median is the middle number in a set of data that is arranged in
either ascending or descending order.
 One-half of the numbers will be on either side of the median.
 The median is good for use with non-normally distributed data as it
is far less affected by outliers
 Order the data in ascending order
 Count the number of records and divide by two
 Pick the middle number
 If an even number of records, get the average of the middle two
Example of Median Calculation
33
Even Number of Records
Odd Number of Records
Code
99213
99214
99215
99205
99201
99202
99203
99211
99204
Fee
111
174
254
368
78
138
205
46
400
Frequency
6,219
1,563
75.73
129
25
87
246
206
654
RVU
1.622
2.465
3.354
4.854
0.965
1.691
2.51
0.529
3.854
 The median is 99201

80.83
CF
Code
68.43
70.59
75.73
75.81
80.83
81.61
81.67
86.96
103.79
99213
99214
99215
99205
99201
99202
99203
99212
99211
99204
Fee
111
174
254
368
78
138
205
82
46
400
Frequency
6,219
1,563
75.73
129
25
87
246
2,080
206
654
RVU
1.622
2.465
3.354
4.854
0.965
1.691
2.51
0.993
0.529
3.854
CF
68.43
70.59
75.73
75.81
80.83
81.61
81.67
82.58
86.96
103.79
 The median is halfway between
99201 and 99202

(80.83 + 81.67) / 2 = 81.25
11
9/17/2015
Median is Less Sensitive to Outliers
34
CF With Outliers – 81.0
CF Without Outliers – 80.0
Mean vs. Median
35
 This table shows that the median
wage is substantially less than the
average wage.
 The reason for the difference is
that the distribution of workers by
wage level is highly skewed.




0.01% = $23,846,950
0.10% = $2,802,020
1.00% = 1,019,089
10.0% = 161,139
Mode
36
 The mode is the one value within a data set that is reported the most
 There can be more than one mode
 A single mode is called Unimodal
 Two modes is called Bimodal
 Many modes is called MultiModal
 A multimodal distribution can indicate groups of variables with
difference characteristics within the same data set

Paid amounts for E&M visit vs. Surgical procedures
12
9/17/2015
UniModal
37
Histogram of Procedure Code
35000
30000
Frequency
25000
20000
15000
10000
5000
0
12000
24000
36000
48000
60000
72000
84000
96000
Procedure Code
Bimodal
38
Summary Report for Mean Universe
Anderson-Darling Normality Test
A-Squared
P-Value
Mean
StDev
Variance
Skewness
Kurtosis
N
Minimum
1st Quartile
Median
3rd Quartile
Maximum
23.62
<0.005
44.853
13.976
195.323
-0.236830
-0.773360
1014
0.000
34.619
46.872
56.557
79.756
90% Confidence Interval for Mean
44.131
45.576
0.0
12.5
25.0
37.5
50.0
62.5
90% Confidence Interval for Median
75.0
40.797
48.913
90% Confidence Interval for StDev
13.485
14.507
90% Confidence Intervals
Mean
Median
40
42
44
46
48
50
Multimodal
39
Histogram of Payment
4000
Frequency
3000
2000
1000
0
0
35
70
105
140
175
210
245
Payment
13
9/17/2015
Percentiles
40
th
The p percentile is the
  observation, when the set
n*p
th
100
of observations are arranged in order or magnitude;
where n is the sample size.
 Percentiles report the value for a given variable where a certain percent of
observations fall above and below the value
 For example:

At the 20th percentile, some 20% of values are lower and some 80% of value are higher
 A percentile is 1/100 of the total of an ordered data set

Splits the data into hundredths
 All data are ordered around the median (50 th percentile)
Other Standard Metrics
41
 Quartiles divide the data into four equal parts
 1st quartile = 25th percentile
 2nd quartile = 50th percentile (mean)
 3rd quartile = 75th percentile
 Deciles divide the data into 10 equal parts
Quartiles and Deciles
42
Deciles
Quartiles
Qi  Li 
hi

 n  F , i  1, 2, 3
fi  4

 L0 = Lower limit of the i-th Quartile class
 n = Total number of observations in the
distribution
 h = Class width of the i-th Quartile class
 fi = Frequency of the i-th Quartile class
 F = Cumulative frequency of the class
prior to
the i-th quartile class
Pi  Li 
h
fi
 i

 n  F , i  1, 2, 3
 10

 L0 = Lower limit of the i-th Decile class
 n = Total number of observations in
the
distribution
 h = Class width of the i-th Decile class
 fi = Frequency of the i-th Decile class
 F = Cumulative frequency of the class
prior to
the i-th Decile class
14
9/17/2015
Interquartile Range
43
 The interquartile range (IQR)
is a measure of variability, based
on dividing a data set into
quartiles.
 The IQR identifies the middle 50%
of the data
Example: Age Distribution
10 44
th
The p percentile is the
  observation, when the set
n*p
th
100
of observations are arranged in order or magnitude;
where n is the sample size.
For the age distribution, n = 121;
The 75th percentile for the age distribution is the
(121 *75)/100 = 90.75 ~ 91st observation when the ages are arranged in an increasing order of
magnitude.
The 75th percentile of the ages is therefore 31 years; the 25th percentile, 50th and 80th percentile
are the 31st, 61st, and 97th observations respectively, as shown on the next slide.
25th
percentile
50th percentile
75th percentile
80th
percentile
Age
Frequency
21
6
Cumulative
Frequency
6
22
16
22
23
11
33
24
9
42
25
17
59
26
13
72
27
6
78
28
5
83
29
4
87
30
3
90
31
1
91
32
4
33
3
98
34
2
100
35+
21
121
Total
121
95
The 31st
observation
falls in this
group
The 61st
observation
falls in this
group
The 97th
observation falls
in this group
10 - 45
15
9/17/2015
Box Plot
75th percentile + 1.5 IQR
Quantitative
Scale
Referred to as whisker
75th percentile
Average/mean
50th percentile/median
25th percentile
Referred to as whisker
25th percentile - 1.5 IQR
Individual box symbol
IQR: Interquartile range, which is calculated by substracting the 25 th percentile of
the data from 75th percentile; consequently, it contains the middle 50% of the
observations.
10 - 46
Box Plots
47
Other Ways to Graph Mean, Median and Mode
48
Summary Report for Fee - 27137
Anderson-Darling Normality Test
A-Squared
P-Value
Mean
StDev
Variance
Skewness
Kurtosis
N
Minimum
1st Quartile
Median
3rd Quartile
Maximum
Dotplot of Fee for 27137
72.45
<0.005
5881.1
4069.2
16558200.7
2.5239
10.9663
1632
231.7
3594.8
5008.4
6952.5
39796.0
95% Confidence Interval for Mean
5683.5
6078.7
0
6000
12000
18000
24000
30000
95% Confidence Interval for Median
36000
4933.5
5154.7
95% Confidence Interval for StDev
3934.2
95% Confidence Intervals
4213.8
0
6000
12000
18000
24000
30000
36000
Fee
Mean
Each symbol represents up to 8 observations.
Median
5000
5200
5400
5600
5800
6000
6200
16
9/17/2015
Measures of Variability
49
Variance
50
 A measure of the average squared distance of possible
values from the expected value (arithmetic average)
Code
Fee
99201
99202
99203
99204
99205
99211
99212
99213
99214
99215
78
138
205
400
368
46
82
111
174
254
Frequency
25
87
246
654
129
206
2,080
6,219
1,563
75.73
RVU
CF
0.965 80.83
1.691 81.61
2.51 81.67
3.854 103.79
4.854 75.81
0.529 86.96
0.993 82.58
1.622 68.43
2.465 70.59
3.354 75.73
80.80
Difference
0.029
0.809
0.873
22.988
(4.986)
6.157
1.778
(12.366)
(10.212)
(5.070)
0.000
Variance
0.001
0.654
0.763
528.461
24.863
37.903
3.161
152.917
104.280
25.700
878.702
n
( x - x )
2
i
2
S =
i =1
n-1
 Note that the difference (not squared) always adds up to zero
n
( x -x )
2
Standard Deviation
i
S=
51
i =1
n-1
 The measure of spread of values around the mean
Code
Fee
99201
99202
99203
99204
99205
99211
99212
99213
99214
99215
78
138
205
400
368
46
82
111
174
254
Frequency
25
87
246
654
129
206
2,080
6,219
1,563
75.73
RVU
CF
0.965 80.83
1.691 81.61
2.51 81.67
3.854 103.79
4.854 75.81
0.529 86.96
0.993 82.58
1.622 68.43
2.465 70.59
3.354 75.73
80.80
Difference
0.029
0.809
0.873
22.988
(4.986)
6.157
1.778
(12.366)
(10.212)
(5.070)
0.000
Variance
0.001
0.654
0.763
528.461
24.863
37.903
3.161
152.917
104.280
25.700
878.702
•The square root of the variance (878.702) equals the standard
deviation (29.643)
•For normally distributed data:
•68.2% of the population values will fall between 51.57 and 110.44
•95.4% of population values will fall between 21.54 and 140.05
17
9/17/2015
Coefficient of Variance
52
 The coefficient of variance (CV) is calculated by dividing the standard
deviation by the mean
 The coefficient of variation is a measure of spread that describes the
amount of variability relative to the mean

Because the coefficient of variation is unitless, you can use it instead of the standard
deviation to compare the spread of data sets that have different units or different
means.
 Large CV means that the data are more dispersed while a lower CV
means that the data
CV for Two Different Codes
53
99213
Specialty
GS
CA
FP
GE
IM
OS
PM
99205
Fee
Stdev
CV
125.11
27.81 22.23%
120.86
19.36 16.02%
162.11
25.84 15.94%
118.21
20.56 17.39%
111.67
19.40 17.37%
130.12
21.99 16.90%
113.60
18.84 16.58%
Specialty
GS
CA
FP
GE
IM
OS
PM
Fee
Stdev
CV
383.37
83.45 21.77%
369.92
65.81 17.79%
304.10
45.37 14.92%
349.83
76.19 21.78%
328.28
64.04 19.51%
380.78
74.71 19.62%
339.65
72.98 21.49%
Interquartile Range
54
 The interquartile range (IQR) is the distance between the 75th percentile
and the 25th percentile
 The IQR is essentially the range of the middle 50% of the data
 Because it uses the middle 50%, the IQR is not affected by outliers or
extreme values
 IQR is often represented using Boxplots
18
9/17/2015
Box Plots
55
Error and Confidence Intervals
56
x  t (1 -  )(n -1)df
2
p̂  Z (1 -  ) SE p̂
2
where
S
n
SE =
p̂(1 - p̂)
n
What is Sample Error
57
 In statistics, sampling error is
the error caused by observing a
sample instead of the whole
population. [Bunns & Grove,
2009]

The sample is never identical to the
population
 Basically, all samples have some
error when used to predict a value
within the population
19
9/17/2015
Causes of Sampling Error
58
 Population specific error
 Not understanding who or what to sample
 Sample frame error
 Occurs when the wrong sub-population is identified and/or used
 Selection error
 When data points are not selected correctly
 Non-response error
 Occurs when data are missing, variable fields are zero or other similar issues
 Sampling error
 Can occur when the wrong sample type is selected (e.g. SVRS, Cluster, Convenience)
Calculating the Margin of Error
59
 Margin of error rules go something like this:
 The larger the sample, the smaller the error
 The smaller the variance, the smaller the error
 SE can be calculated using two primary assessment types
 Attribute
 Variable
 Attribute
 Variable
Example of SE for Variable Assessment
60
 A sample of average charges for 99213 was taken from 50 practices in a
given area
 Mean = $82.40 and STDev = $15.55
 Assume normal distribution
 68.26% of values between $66.85 and $97.95
 SE = Stdev/sqrt(N), or 15.55/sqrt(50), or
 15.55/7.07 = 2.2
 The standard error for our estimate of the mean of $82.40 is $2.20
20
9/17/2015
Example for SE Calculation for Attribute Assessment
61
 In a chart review, a practice finds that $1,809 out of $5,742 was paid in
error

This equates to a paid error rate of 31.5%
 To calculate the sample error, we use this formula:
 Where p = .315
 (1-p) = 1 - .315 = .685
 n = 5,742
 (p(1-p)/n = .216 / 5742 = 0.00376%
 The square root of 0.00376% is 0.613%
 Plus and minus p = 30.88% to 32.213%
Example for SE Calculation for Attribute Assessment
62
 In a chart review, a practice finds that 6 out of 30 charts contained a
medical necessity error

This equates to a coding error rate of 20%
 To calculate the sample error, we use this formula:
 Where p = .2
 (1-p) = 1 - .2 = .8
 n = 30
 p(1-p)/n = .16 / 30 = 0.0053
 The square root of 0.0053 is 0.073 (7.3%)
 Plus and minus p = 12.7% to 27.3%
Common “Z” levels of confidence
 Z-score is a measure of standard
distance in a normally distributed
distribution
 Many believe that if the data set is large
(n > 30), you can assume a normal or
near normal distribution (NOT TRUE)
 Commonly used confidence levels
are 90%, 95%, and 99%
21
9/17/2015
Margin of Error (1/2 Interval)
64
 The margin of error is the z or t score times the standard error
 z and t values depend on how wide or narrow you want the margin of error to be
 The higher the value, the higher the margin of error
 Sample of 50, 95% margin of error
 Mean = 82.40, stdev = 15.55, SE = 2.20
 Margin of error = (z or t) times SE


z = 1.96*2.20 = 4.31
t = 2.009 * 2.20 = 4.42
What is a Confidence Interval (CI)?
65
 The purpose of a confidence interval is to validate a point estimate; it tells us
how far off our estimate is likely to be
 A confidence interval specifies a range of values within which the unknown
population parameter may lie

Normal CI values are 90, 95%, 99% and 99.9%
 The width of the interval gives us some idea as to how uncertain we are about
an estimate

A very wide interval may indicate that more data should be collected before anything very
definite can be inferred from the data
 This means when a sample is drawn there are ?? chances in 100 that the
sample will reflect the sampling frame at large within the sampling error
Interpreting the CI
66
 Using our average charge example:
 Mean = 82.40, SD= 15.55, SE = 2.20, ME = 4.42 (t-score)
 CI = 82.40 +/- 4.42, or
 95% CI = $77.98 to $86.82
 False Statement:
 I am 95% confident that the true average charge for this code is somewhere between
$77.98 and $86.82
 True Statement:
 In 95% of samples, the population mean will be somewhere between $77.98 and
$86.82
22
9/17/2015
Example for SE Calculation for Attribute Assessment
67
 In a chart review, a practice finds that $1,809 out of $5,742 was paid in
error

This equates to a paid error rate of 31.5%
 To calculate the sample error, we use this formula:
 Where p = .315
 (1-p) = 1 - .315 = .685
 n = 5,742
 (p(1-p)/n = .216 / 5742 = 0.00376%
 The square root of 0.00376% is 0.613%
 Plus and minus p = 30.88% to 32.213%
Example for SE Calculation for Attribute Assessment
68
 In a chart review, a practice finds that 6 out of 30 charts contained a
medical necessity error

This equates to a coding error rate of 20%
 To calculate the sample error, we use this formula:
 Where p = .2
 (1-p) = 1 - .2 = .8
 n = 30
 p(1-p)/n = .16 / 30 = 0.0053
 The square root of 0.0053 is 0.073 (7.3%)
 Plus and minus p = 12.7% to 27.3%
Attribute 95% Confidence Interval
69
 In attribute example 1, the SE was 0.00613 (.613%)
 The 95% half interval is .00613 * 1.96 = 0.012 (1.2%)
 Plus and minus the p of 31.5% = 30.3% to 32.7%
 In attribute example 2, the SE was 0.073 (7.3%)
 The 95% half interval is .073 * 1.96 = 0.143 (14.3%)
 Plus and minus the p of 20% = 5.7% to 34.3%
 The confidence interval range has a huge impact in inferential statistics
23
9/17/2015
95% Confidence Intervals
To Change the Confidence Interval
71
 To narrow the confidence interval
 Decrease the variability
 Lower your z/t score
 Increase the sample size
 s 
x  Z

 n
 To get a better level of confidence
 Decrease the variability
 Accept a broader CI (i.e., 80%)
 Increase the Sample Size
 s 
x  Z

 n
CI Applications – Physician Productivity
72
 In physician productivity studies, the 95% CI gives us a range of
work RVU values.
 In 95% of the samples we take, the true mean for the population
would fall somewhere between the lower and upper bound
Lower Work
RVUs per
Mean Work
RVUs per
Upper Work
RVUs per
Specialty FTE -National FTE -National FTE -National
GP
EM
PD
CV
PY
GS
FP
OS
4,289.35
6,153.49
4,220.96
4,005.59
3,232.74
4,435.13
4,104.91
4,394.81
4,543.80
6,732.48
4,618.12
4,382.48
3,536.91
4,852.44
4,491.15
4,808.32
4,798.25
7,109.50
4,876.73
4,627.90
3,734.98
5,124.18
4,742.65
5,077.59
24
9/17/2015
CI Applications – Auditing
73
 If a hundred similar audits were
Summary Report for Overpaid
performed, in 95 of them, the actual
mean damage would be somewhere
between $107.02 and $164.44
 Assume the universe is 10,000 claims
 The difference between the mean and
the lower bound of the 95% CI is
$28.71
 This translates to a difference in
damage estimates of 107,020 rather
than 135,730 (28,710)
Anderson-Darling Normality Test
A-Squared
P-Value
Mean
StDev
Variance
Skewness
Kurtosis
N
Minimum
1st Quartile
Median
3rd Quartile
Maximum
0.69
0.066
135.73
89.77
8059.52
0.15625
-1.16048
40
8.27
34.48
131.82
222.77
300.72
95% Confidence Interval for Mean
107.02
164.44
0
80
160
240
95% Confidence Interval for Median
320
96.97
174.64
95% Confidence Interval for StDev
73.54
115.27
95% Confidence Intervals
Mean
Median
100
120
140
160
180
For More Information
74




Frank D. Cohen
www.drsmgmt.com
[email protected]
800.635.4040





www.frankcohengroup.com
Click on the Library tab (at top of page)
Click on Workshop Toolboxes
Select the Statistics toolbox
Password is 68487122
To Get the Toolbox
25