Download Computing the Standard Deviation from a Geometric Mean

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Mean field particle methods wikipedia , lookup

Taylor's law wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Categorical variable wikipedia , lookup

Regression toward the mean wikipedia , lookup

Transcript
COMPUTING THE STANDARD DEVIATION FROM A GEOMETRlC MEAN
Daniel R. Bretlteim, KPMG Peat Marwick
INTRODUCTION
mean. The seven-step process outlined below was used
for tllese purposes:
TIle requirement to compute the standard deviation
from
geometric (rather than arithmetic) mean has
been prompted by the desire to medel " portion of the
Federal government's methodology for reimbursing
hospitals for servK"" provided to Medicare recipients.
The Health Care Financing Administralion (HCF A) h ..
used this approach at; the mecllanism to identify
unusually higil-L"()st cases, known as outliers. The
geomett'ic mean cluu'ge for each diagllosis-l"elated
group (DRG) serves as .. benchmark to which multiples
of the DRG's alandard devialion are added. The
resulting threshold mual be exceeded for an individual
case to achieve llouUieI.'lI status. These obseITations
are deleted from the data base prior to computing the
average charge for each DRG.
1.
Compute the natur91logar<llIID of the charge
variable for each observation.
2.
Compute the mean of the log values for each
diagnostic group.
fI.
3.
Merge each observation with the appropriate
group mean.
4.
Compote the antilog of the group mean.
5.
Compute the squared deviation of each
observation from the antilog of its gt'"oup
meaH.
Procedures available within the SAS* System for
computing averages are based pdmariJs on the
arithmetic mean. The arithmetic mean is distinguished
from other seldom-used statistics sucb as the
geometric mean, which averages the iOg'al'ithms of
numbers and the harmonic mean, wWch averages
reciproe~ls. F orttmately. ti,e SAS* System provides
tile flexibility Bnd data manipolaUon power to satisfy
tltis computing requirement.
6.
Sum the squared deviations for each grQUp.
7.
Divide the sum of squared deviations by the
numher of observations (less one) in each
group. The square root of tltis value yields
the sample standard deviation..
TIle statements used to accomplish these calculations
and (l<int tile results are displayed in Exhibit 2 on the
following page.
APPROACH
RESULTS
The data set displayed in Exhibit 1 (l<ovides a
simplified example of the raw data which were
Exhibit 3 displays several of the statistics associated
WiUl each D RG. The outlier tJU'esholds in Ulis eXRtnple
are computed by adding tltree multiples of the standard
deviatiou to the geomelric mean. Therefore, any case
exceeding its threshold would qualify as 1m outlier and
be deleted from tile data baBe.
eU('''mmtered.
Exhibit 1
Sample Raw Data
Q!lS
!llill
CHARGE
1
2
3
4
105
105
105
105
201
201
201
$18,200.00
16,750.00
17,345.00
22,750.00
5,950.00
5,538.00
7,103.00
1,238.00
1,310.00
1,540.00
"
6
7
8
9
10
379
379
319
*
These daia simulate a sample of ten cases and
as.·"",[ated charges Cor three DRGs. Tbe objective was
to develop the code necessary to compute for each
. DRG: 1) tbe geometric mean charge and 2) the
standard deviation of eh1l1'ges from the geometric
1430
SAS is a registered trademark of SAS luaIitute,
Inc., Cary, North Carolina, U~S.A.
EldUbit 2
SAS* System statements
options nodate:
libname in 'd;\sas':
* compute logarithm for each value *:
data one:
set in. raw:
logchg",logt charge.l ;
proc sort
data~one:
by drg;
'*
compute mean of log values for each group *:
proc means data=one mean noprint:
var logchg~
by drg:
output out~stats m€an~mchg;
* merge each observation with its group mean *:
data twa:
merge one stats:
by drg;
if first.drg then do:
sum_sd""O :
end:
* compute geometric mean *:
geo_mean ~ exp(mchg):
* accumulate squared deviations ~:
squardev ~ (charge - geo_mean)**2:
sum_sd + squardev;
* ,compute the standard deviation and threshold *.
if last.drg an~ _freq_ > 1 then do:
std_dev '" sqrt(sum_sd/Lfreq_ - 1)):
thres = geo_mean
output:
end:
~
!strl_dev
*
3}:
proc print data~two label:
var
drg _freq_ geo_mean std_dev thres:
fQrmat geo_mean std_dev thres dollarl0.2:
label
_freq_
'~'
geo mean ~ 'GEO. MEAN'
std~dev
thres
titlel
title2
'STD. DEV.'
' THRESHOLD' ;
'OUTLIER iHRESHOLDS BASED ON'~
'THREE STANDARD DEVIATIONS FROM THE GEOMETRIC
run~
1431
MEA~;:
Exhibit 3
Sample Outpnt
OUTLIER THRESHOLDS BASED ON
THREE STANDARD DEVIATIONS FROM THE GEOMETRIC MEAN
Q!lS
llil.G.
N
GFA).MEAN
STD.DEV.
1
2
3
105
4
201
379
3
3
$18,623.46
6,162.69
1,356.76
$2,729.59
812.30
157.90
•
1432
THRESHOLD
$26,812.23
8,599.59
1,830.48