Download portable document (.pdf) format

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
On Efficient Confidence Intervals via Meta-Analysis of Geo-Spatial
Data for Global Warming Control in Decision-Makers‟ Setup for
Project-Managers
*Professor Dr. Ashok Sahai
Professor of Statistics; Dept. Mathematics & Computer Science.
The University of the West Indies; St. Augustine Campus @ TRINIDAD.
Email: [email protected]
&
Professor Dr. Neeraj Tiwari
Professor & Head; Dept. of Statistics
Kumaun University, S.S.J.Campus, Almora, UTTARAKHAND, INDIA.
Email: [email protected]
ABSTRACT.
With the advent of Geographical Information System (GIS) & Global Positioning System
(GPS), it is quite common to have access to geospatial temporal/spatial panel data for
analyses in a meta-data setup. In this context, researchers often employ pooling methods
to evaluate the efficacy of meta-data analysis such as used to combine individual study
results is the fixed-effects model, which assumes that a true-effect is equal for all studies.
An alternative, and intuitively-more-appealing method, is the random-effects model. This
paper addresses the efficient confidence-interval (in decision makers‟ set-up for the
„project-managers‟) estimation problem, using this method in the aforesaid meta-data
setup of the „Geospatial Data‟ at hand. As compared to the latest state-of-art of such
estimation-strategies in the literature up-to-date, this paper aims at a much more efficient
estimation strategy furthering the gainful use of the „Geospatial Panel-Data‟ in the
Global/Continental/Regional/National contexts. This „Statistical Theme‟ is, as such,
equally gainfully applicable to any area of application in the present world-order at large
inasmuch as the “Data-Mapping” in any context, e.g. the topically significant one of
“Global Environmental Pollution-Mitigation for Arresting the Critical phenomenon of Global
Warming”, etc; are tackle-able more readily, as the impactful advances in the “GIS &
GPS” technologies have led to the concept of “Managing Global Village” in terms of
„Geospatial Meta-Data‟. This last fact has been seminal to special zeal-n-motivation to the
authors to have worked for this improved paper containing rather a much more efficient
strategy of confidence-interval estimation for decision-making team of managers for any
impugned area of application.
*Contact Author‟s Alternative Email Address: [email protected]
1|Page
1. INTRODUCTION.
Meta-analysis represents the statistical analysis of a collection of analytic results motivated by the gainful
urge to integrate the findings in the context of geo-spatial data that represent environmental and socioeconomic indicators in a given context. This paper is in sequel to Sahai & Tewarie (2007). It is quite
common to have access to panel data generated by a set of similar spatial data for analyses in a meta-data
setup. Within this context, researchers often employ pooling methods to evaluate the efficacy of metadata
analysis. One of the simplest techniques used to combine individual study results is the fixed effects model,
which assumes that a true effect is equal for all studies. An alternative, and intuitively more appealing
method, is the random effects model. The purpose of this paper is to address the estimation problem of the
fixed effects model and to present a simulation study of an efficient estimation of a mean true effect using
panel data and a random effects model in order to establish appropriate „Confidence Interval‟(CI)
estimation for being readily usable in a decision-makers‟ setup.
The paper of Sahai & Tewarie (2007) is attempted to be improved by proposing a method of statistical
estimation aimed at improving the accuracy of the resultant CIs for an efficient estimation of a mean true
effect using panel data and a random effects model. The improvement of the reliability, which has been
demonstrated by the use of a simulation study, is two-fold namely, statistically using the concept of the
„Minimum Mean Square Estimator (MMSE)‟ of the reciprocal of the heterogeneities of the studies, and by
computing the simulational averages and hence the “Relative Efficiency (R.Eff.)‟ of the estimators, using
the strength of the „51,000‟ replications of the simulation.
To illustrate the universal applicability of the impugned technique of „Meta-Analysis of the Panel-Data‟
targeted in this paper in the areas additional to that of the socio-economic studies, as well-illustrated in the
Sahai & Tewarie (2007)‟s paper, we take the example of the currently important „Global Warming Problem
& Environmental Pollution‟. For the „Decision-Making Managers‟ at United Nation we could construct the
handy CIs for their decision-making and the judgment regarding, for example, in the context of the
allocations of the combating-resources to fight-down the awefully significant threat to humanity, including
the moneys to the various countries/ continents on the earth, we would do well to make good use of the
„Geospatial Panal Data‟. The Meta-Analysis would have to be carried up in a hierarchical-setup. The
Overall Global CIs would also be available finally to the decision-manager @ UN consequent upon this
hierarchally carried out meta-analysis. The Final Global Meta-Analysis could have been using the
„Continental Geospatial Data‟; The Continental Meta-Analysis could have, well, been using the data for
various countries in the continent; The Country-wise Meta-Analysis could well be using the Meta-Data of
various States in the country, and finally the State-wise Meta-Analysis would be using the Geospatial
Meta-Data of the various cities and the counties therein! At the grass-root level we might well use the
„Geo-Temporal Data‟, as well. For example, for the study of the „Environmental Pollution‟ in a city by the
harmful emissions from the automobiles commuting therein, the Data would vary temporally at the peak/
moderate/less-busy hours of the day/night, etc., etc.,…! It would be in place to mention that the cases of
missing/poor data in underdeveloped/developing countries in the aforesaid context could well be dealt-with
by generating the relevant synthetic data, to take care of such Geo-Spatial-Temporal Data Gaps, through
the use of well-known powerful statistical techniques.
Incidentally, such meta-analyses of „panel-data‟ are quickly gaining currency in medical researches (e.g.
Villar et. al. (2001)), and are becoming increasingly popular day-by-day in these research investigations,
where information on efficacy of a treatment is available from a number of clinical studies with similar
treatment protocols. Often, when one study is considered separately, the data therein (as generated by the
randomized control clinical trial(s)) would be rather too small or too limited in its scope (Olkin (1994)) to
lead to unequivocal or generalizable conclusions concerning, for example, the effect of the socio-economic
co-variates under investigation, in the target study of the predecessor-paper.
2|Page
A number of methods are available to set up the confidence limits for the overall mean effect(s) for the
meta-analysis of the panel data in the context of a random/fixed effects model generated by these data, say,
on the socio-economic variable(s).
A popular and simple commonly-used method is the one proposed by the DerSimonian and Lard (1986)
approach. It is worth noting, in the context of panel data/meta-analysis, that the simplest statistical
technique for combining the individual study results is based on a fixed effects model. In the fixed effects
model, it is assumed that the true effect is the same for all studies generating the panel data. On the other
hand, a random effects model allows for the variation in the true effect across these studies in various states
(in a federal setup like in U.S.A./India, e.g.)/districts/Counties/municipalities/village panchayat/block
therein, and is, therefore, more realistic a model.
Halvorsen, in a systematic search of the first ten issues published in 1982 of each of the four weekly
journals (NEJM, JAMA, BMJ, and Lancet) found only one article (out of 589) that considered combining
results using formal statistical methods. The basic difficulty one faces in trying to combine/integrate the
results from various studies is generated by the diversity amongst these studies at hand in terms of the
methods employed therein as also the design of these studies Also due to different parent populations and
varying sample sizes, each study has a different level of sampling error, as well. Hence, while integrating
the results from such varied studies, one ought to assign varying weights to the information stemming from
respective studies; these weights reflecting the relative “value” of each of these informations. In this
context, Armitage (1984) highlighted the need for the careful considerations in developing the methods for
drawing inferences from such heterogeneous, though logically related, studies. DerSimonian and Lard
(1986) observed that, in this setting, it would be more appropriate to use a regression analysis to
characterize the differences in study outcomes. In the context of a random effects model for the meta/paneldata analysis, there are a number of methods available to construct the confidence limits for the overall
mean effects. Sidik and Jonkman (2002) proposed a simple confidence interval for meta-analysis, based on
the t-distribution. Their approach, as per the simulation study, is quite likely to improve the coverage
probability of the DerSimonian and Laird (1986)‟s approach. In the present paper we propose a couple of
more efficient constructions of this confidence interval. A simulation study is carried out to demonstrate
that our propositions improve the coverage probability of both of the aforesaid methods.
2. THE PROBLEM FORMULATION.
The statistical inference problem is concerned with using the information from „k‟ independent studies in
the meta-analyses setup. Set the random variable „yi‟ to stand for the effect size estimate from the „ith.‟
study. It would be in order to note here that, some commonly used, measures of effect size are mean
difference, standardized mean difference, risk difference, relative risk, and odds-ratio. As the „Odds-Ratio
(OR)‟, which is of particular use in retrospective or case-control studies, is mostly used, we would confine
to it for the simplicity of illustration in our paper. Nevertheless, it is without any loss of generality
inasmuch as the details of this paper are analogously valid for the other measures of effect size.
Let nti and nci denote the sample sizes and let pti and pci denote the proportions dying (not achieving the
stipulated goal) for each of the treatment (t) and control (c) groups, where „i‟ stands for the designation of
the study number: „I = 1 (1) n‟. Also, let xti and xci denote the observed number of the deaths for the
treatment and the control groups respectively, for the study number „i‟. We note that for the „ith.‟ study the
following gives the observed log-odds ratio (log (ORi)) and the corresponding estimated variance.
(1  pci ). pti
ORi 
,
pci .(1  pti )
3|Page

estimated ( i ) 2  ( i ) 2  = estimated (var(log( ORi ))) 
1
1
1
1



xti (nti  xti ) xci (nci  xci )
The important point to be noted at this stage is that the estimated ( i ) 2 is rather a very close estimate of
the respective population variance ( i ) 2 , and that it is analogously closely available for the population
variances for the cases of other measures of the effect size. For example, if the effect size yi happens to be
the difference in proportions, „pti - pci„, we estimate the population variance ( i ) 2 by:

Estimated( i ) 2  ( i ) 2  p ti (1  p ti ) / p Ci (1  p Ci ).
Now, we might note that the general model is specified as follows.
y i  i   i , i  1,..., k;
Wherein,
2
 i  N(0,  i ) ;
And,
i     i ;
Wherein,
 i  N(0,  2 ) .
Hence, essentially the model comes to be:
y i     i   i ; i  1,..., k .
It is also important to note that whereas  i stands for the random error across the studies,  i represents the
random error within a study, and that  i and  i are assumed to be independent. Also, the parameter  2 is a
measure of the heterogeneity between the „k‟ studies. We will refer to it in our paper as the „heterogeneity
variance‟, which it is often called by.
Perhaps the important and the most crucial element in the panel-data/meta analysis is the challenge of
developing an efficient estimator of this heterogeneity variance‟  2 ‟. DerSimonian and Laird (1986)
proposed and used the following estimate:
Estimate of (  2 ) =


max{ 0,

wi ( yi   ) 2  ( k  1)
i 1
k
wi  i 1 wi2 / i 1 wi
i 1
k
k
k
}..................................( 2.1)
Wherein,
w i  1 /  i2 , and the weighted estimate of the mean effect is given by:

  i 1 wi yi / i 1 wi ................................................................................(2.2)
k
k
Also, herein the weight „wi‟ is assumed to be known. Earlier, we noted that the estimated ( i ) 2 is rather a
very close estimate of the respective population variance ( i ) 2 .



Therefore, most usually the sample estimate ( i ) 2 is used in place of ( i ) 2 , so that w i  1 /(  i ) 2 is used in


(2.2), and estimated wi, namely, w i  1 /(  i ) 2 .
4|Page
Recently, Sidik and Jonkman (2002) proposed a simple confidence interval for the meta-analysis. This
approach, consisting in the construction of the confidence interval based on the„t-distribution‟,
significantly improved the „coverage probability‟ compared to the existing „most popular‟ DerSimonian
and Laird (1986)‟s approach, as outlined above.
It is worth noting, in the above context, that recently Brockwell and Gordon (2001) presented a
comprehensive summary of the existing methods of constructing the confidence interval for meta-analysis
and carried out their comparisons in terms of their „coverage probabilities‟. While, the most-commonlyused/popular method of DerSimonian and Laird (1986) random effects method led to the coverage
probabilities below nominal level, the profile likelihood interval of Hardy and Thompson (1996) led to the
higher coverage probabilities.
However, the profile likelihood approach happens to be quite cumbersome computationally, and involves
an iterative calculation as does the „simple likelihood method‟ presented in Brockwell and Gordon (2001).
On the other hand, Sidik and Jonkman (2002)‟s proposition of a rather simple approach for constructing a
„100(1-α)‟ percent confidence interval for the overall effect in the random effects model, suing the pivotal
inference based on the t-distribution, uses no iterative computation like the popular method of DerSimonian
and Laird (1986).
Moreover, the Sidik and Jonkman (2002)‟s proposition has a better „coverage probability‟ than that of
DerSimonian and Laird (1986). Consequently, while DerSimonian and Laird (1986)‟s confidence interval
for meta-analysis used to be the most popular/commonly-used confidence interval, that of Sidik and
Jonkman (2002)‟s happens to be rather-the-best one in terms of the most important count, namely that of
the „coverage probability‟, on which the „confidence intervals‟ are compared and rated.
In fact, therefore, our motivation is basically to attempt the improvement of these two methods for
constructing the „Confidence Intervals‟ for an interval estimate for the overall mean effect across the „k‟
studies, using the panel/meta-data generated by these studies. The impugned improvement was targeted
mainly at the improved „coverage probabilities‟. It is amply achieved, as is revealed by the comparison
using a ‘Simulation Study’. The modified “Sidik-Jonkman (2002) Estimator (MSJE)” proposed in this
paper turns up to be the best to use.
3. THE PROPOSED CONFIDENCE INTERVAL ESTIMATES.
As noted in the preceding section, the important and the most crucial element in the panel-data/meta
analysis is the challenge of developing an efficient estimator of this heterogeneity variance‟  2 ‟.
DerSimonian and Laird (1986)‟s approximate 100(1-α) per cent confidence interval for the general mean
effect „μ‟, using the random effects model, is given by:

  z / 2 (1 /
5|Page

k
i 1

wi ) ;
………………………………………………………………….. … (3.1).






2
2
Wherein,   i 1 w i y i / i 1 w i . Also, w i  1 /((  i )  ( ) ) is evaluated using (2.1).
k
k
It would be in order here to note that zα/2 in (3.1) above is the α/2 upper quantile of the standard normal
variable.
To construct an alternative simple confidence interval for the general mean effect „μ‟, using the random
effects model, assuming that
y i  N(  ,
(  i ) 2  ( ) 2 ) ,
Recently Sidik and Jonkman (2002) proposed an improvement. They, subject to the assumption



that w i  1 /(  i ) ‟s correct weights ((i.e., essentially that w i  w i i  1,2,...k . ), being close estimates),
2
noted that:


Z w  ( ) (1 / ( i 1 w i ) )  N(0,1) …..………………………………………….(3.1).
k
&
k


Q w   w i ( y i  ) 2   (2k 1) …………………………………………..………..……… (3.2).
i 1
They showed that Z w and Qw are independently distributed. Hence, it follows that:


(  ) ( i 1 w i )
k
k

 t ( k 1) ……… ………………………………….……….… (3.3).

{ w i ( y i  ) 2 /( k  1)}
i 1
This, thence, led to Sidik and Jonkman (2002)‟s proposition of an approximate 100(1-α) per cent
confidence interval for the general mean effect „μ‟, using the random effects model, is given by:

  t k 1, / 2

{


w ( yi   ) 2
i 1 i
k

(k  1)i 1 wi
k
} ………… ……………………………….…………… (3.4).
It would be in order here to note that tk-1, α/2, in (3.4) above, is the α/2 upper quantile of the t-distribution
with k-1 degrees of freedom. Also, under the assumption of known weights,
Qw / [ (k


 1)i 1 w i ] is an unbiased estimator of the variance of  …………….... (3.5).
k
It is very significant fact at this stage to note that both DerSimonian and Laird (1986)‟s, as also Sidik and
Jonkman (2002)‟s 100(1-α) per cent confidence interval for the general mean effect „μ‟, using the random
effects model, are ‘approximate’, inasmuch as their validity is subject to the extent of the truth of the
underlying assumption that the weights:
6|Page


w i  w i i  1,2,...k . , and hence the ( i )  ( i ) , i  1,2,...k .
2
2
Thus, essentially, it boils down to as to how efficient our estimate of the inter-study heterogeneity
variance‟  2 ‟ is. We might as well note here that:

If the estimate of (  2 ), i.e. ( ) 2  0, the random effects model boils down to the fixed-affect model.
Further, we might mention here that the more efficient estimation of the inter-study heterogeneity
variance‟  2 ‟ is the key motivating factor for our propositions to possibly improve the „coverage
probability‟.
In both the papers, namely those of DerSimonian and Laird (1986) and Sidik and Jonkman (2002), the
estimation of this inter-study heterogeneity variance‟  2 ‟, as is nicely described in Brockwell and Gordon
(2001), is as follows:
The impugned two-stage random effects model:
y i   i   i , i  1,..., k , Wherein,  i  N(0,  i )
2
&
 i     i . Wherein  i  N(0,  2 ) .
That could well be re-written equivalently as:
y i     i   i ; i  1,..., k ; Wherein  i  N(0, i 2 ) and  i

 N(0,  2 ) .

As noted earlier, under the assumptions that w i  1 /(  i ) 2 ‟s correct weights ((i.e., essentially

that w i  w i i  1,2,...k . ), being close estimates) and that  i and  i are independent (all assumptions being
well-known to be quite reasonably tenable), we have (to the extent of the approximation due to the extent
of the tenability of the aforesaid assumptions):

k

 
 w i ( )y i
i 1

k
w

i
k
i
( )
i 1


2 1
In the above, w i ( )  [( w i )  ( ) ]

and w i
Now, assuming that 2 is known, we have:
7|Page
…………….……... (3.6).

w
( )
i 1

1
; With Variance: var(   ) 

 1 /( i )2 ; i=1, …, k………… ……………. (3.7).


k
  N (  ,1) /  wi ( )) ............................................................................................... (3.8).
i 1
It is interesting to note that the random effects model confidence intervals for μ are expected to be wider


than those constructed under fixed effects model simply due to the facts that     , and hence


var (  )  var (   ).
As 2 is unknown in practice we ought to estimate it. DerSimonian R, Laird N (1986) derived an estimate
of 2, using the meted of moments, by equating an estimate of the expected value of Qw with its observed
value say, ' q  ' .
w
E(Q w )  k  1   2 ( 11 w i 
k

11 ( w i ) 2
k


k

) q
w
wi
Therefore, we note that if „t‟is the solution of the above equation, we have:
[q   (k  1)]
w
t 
…………………………………………….. (3.9).

k
2

(
w
)
k
11 i ]
[ 11 w i 

k
w
11 i
11
So as to ward off the possibility of a negative value of„t‟ (which will be an unacceptable value of 2, as
any variance could not be negative), we define:
Estimated (2) = t if t  0; and estimated (2) = 0 if t  0 ……………………….…….……. (3.10).





Using (3.9) in (3.7), we get the w i ( )  [(w i )  ( ) 2 ]1 (wherein w i  1 /( i )2 ); i=1, …, k to be used in

(3.6), with the estimated variance of  


1
……………………………………….…………. (3.11).
var (  ) 
k

(  w i ( ))
i 1
Both, DerSimonian and Laird (1986)‟s and Sidik and Jonkman (2002)‟s propositions of an approximate
100(1-α) per cent confidence interval for the general mean effect „μ‟, using the random effects model (as in



(3.1) & (3.4), respectively), use the  generated by the aforesaid and use the value of var (  ) , as in
(3.11).
Essentially our proposition of the improved Confidence Interval (CIs) estimates of the general mean


effect „μ‟ consist solely in a more efficient estimation of ‟ var (  ) ‟ in (3.11). For this purpose the
following results are importantly relevant:
Lemma 3.1: If an estimate, say „s2‟ (usual unbiased sample variance estimator) of the population
variance, say „2‟ is based on a random sample X1, X2, … Xk from a Normal population N (ө, 2), we
have:
8|Page
{(k-1).s2}/2 ≈ χ2 (k-1) (: Chi-Square distribution on „(k – 1)‟ degrees of freedom).
Further, we have:
UMVUE of „1/2„is 1/ (k*.s2).Wherein k* = (k-1)/ (k-3)…………………………….. (3.12a).
Proof: As, in the case of the random sample from a normal distribution, it is rather very well-known
that the „sample variance‟ s2 is a „complete sufficient statistic‟ for the „population variance‟ 2.
Therefore, „Uniformly Minimum Variance Unbiased Estimator (UMVUE)‟ of „1/2„is simply
its unbiased estimator. Now, using the fact that {(k-1).s2}/2 ≈ χ2 (k-1), it could easily be shown
that:
E [1/ (k*.s2)] = (1/2).
That establishes the truth of (3.12a) of the above „Lemma‟.
Q. E. D.
Lemma 3.2: If an estimate, say „s2‟ (usual unbiased sample variance estimator) of the population
variance, say „2‟ is based on a random sample X1, X2, … Xk from a Normal population N (ө, 2), we
have:
{(k-1).s2}/2 ≈ χ2 (k-1) (: Chi-Square distribution on „(k – 1)‟ degrees of freedom).
Further, we have: Minimum Mean Square Error Estimator MMSEE of „1/2„is
1/ ([email protected]).Wherein k@ = (k-1)/ (k-5)………………………………………………………………. (3.12b).
Proof: As, in the case of the random sample from a normal distribution, it is rather very well-known
that the „sample variance‟ s2 is a „complete sufficient statistic‟ for the „population variance‟ 2.
Therefore, Minimum Mean Square Error Estimator (MMSEE) of „1/2‟ is simply its MMSE
estimator of the class “M/2.
Now, we use the following equations, which are easily derived:
E [(1/ s2)] = [(k-3)/ (k-1)]. (1/2)
&
E [(1/ s4)] = [(k-3). (k-5)/ (k-1)2]. (1/4).
Hence, we could easily establish that the Minimum Mean Square Error Estimator (MMSEE) of
„1/2‟ would be as stated in (3.12).
Q. E. D.
As „MMSE‟ estimator has an unfavorable trade-off between variance and bias, inasmuch as the
“Relative Bias‟ in „‟Confidence Interval‟ estimation and is still superior to the UMVUE estimator, we
propose a compromise “UMVE-CUM-MMSE” estimator by proposing the use of k** = (k* + k@)/2
wherein the expressions in (3.12a) & (3.12b) would be used.
Hence, we propose the following modified more efficient Cis, modifying the say, “Original
DerSimonian-Laird (1986) Estimator (ODLE)”, and modifying the say, “Original Sidik-Jonkman
(2002) Estimator (OSJE)” defined, respectively, in (3.1) and (3.4) above.
9|Page
We would call our estimators as the “Modified DerSimonian-Laird (1986) Estimator (MDLE), and as
the “Modified Sidik-Jonkman (2002) Estimator (MSJE)”, respectively.
Essentially, the „sole‟ difference between ODLE & MDLE, as also between OSJE & MSJE consists in
replacing „k‟ in (3.1) and (3.4), respectively by „k**‟for the modifications under the TWO approaches
consisting in “UMVUE-CUM-MMSE” estimation of „„1/2„, whereas „2„ herein stands for the
heterogeneity variance‟  2 ‟, and the parameter  2 is essentially a measure of the heterogeneity
between the „k‟ studies.
4. THE SIMULATION STUDY.
The format of the „Simulation Study‟ in our paper to compare the “Original DerSimonian-Laird
(1986) Estimator (ODLE)” and the Original Sidik-Jonkman (2002) Estimator (OSJE)” with our
estimators “Modified DerSimonian-Laird (1986) Estimator (MDLE), and as the “Modified SidikJonkman (2002) Estimator (MSJE)”, respectively, is the same as that in Sidik-Jonkman (2002).
To compare the simple confidence interval based on the t-distribution with the DerSimonian and
Laird interval in terms of coverage probability, we performed a simulation study of metaanalysis for
the random effects model.
Throughout the study, the overall mean effect μ is fixed at 0.5 and the error probability of the
confidence interval, α, is set at 0.05. We use only one value for μ because the t-distribution interval
based on the pivotal quantity in (3.3) and the DerSimonian and Laird interval are both invariant to a
location shift. Three different values of 2 are used: 0.03; 0.05; 0.08, and 0.10. For each 2, three
different values of k (namely 10, 20, and 60 to keep the comparisons modestly) are considered. The
number of simulation runs for the meta-analysis of k studies is 51 000. The simulation data for each
run are generated in terms of the most popular measure of effect size in meta-analysis, the log of the
odds ratio. That is, the generated effect size yi is interpreted as a log odds ratio (it could alternatively
be the mean effect of the ith. Study, as well).
For given k, the within-study variance i2 is generated using the method of Brockwell and Gordon
[2001]. Specifically, a value is generated from a chi-square distribution with one degree of freedom,
which is then scaled by 1=4 and restricted to an interval between 0.009 and 0.6. This results in a
bimodal distribution of i2, with the modes at each end of the distribution. As noted by Brockwell
and Gordon, values generated in this way are consistent with a typical distribution of i2 for log odds
ratios encountered in practice. For binary outcomes, the within-study variance decreases with
increasing sample size, so large values of i2 (close to 0:6) represent small trials included in the metaanalysis, and small values of i2 represent large trials.
The effect size yi for i=1,…, k is generated from a normal distribution with mean μ and
variance: ( i ) 2  ( ) 2 .
For each simulation of the meta-analysis, the confidence intervals based on the t-distribution and the
DerSimonian and Laird method are calculated, along with those of our proposed estimators
“Modified Sidik-Jonkman (2002) Estimator (MSJE)” are calculated.
The numbers of intervals containing the true μ are recorded for all four methods. The proportion of
intervals containing the true μ (out of the large replication of 51, 000 runs) serves as the simulation
estimate of the true coverage probability.
10 | P a g e
The results of the simulation study are presented in the tables (Nine Tables) in APPENDIX. From the
tables, it can be seen that the coverage probabilities of the interval based on the t-distribution are
larger than the coverage probabilities of the interval using the DerSimonian and Laird method for
each 2 and all values of k. Interestingly, our proposed estimator “Modified DerSimonian-Laird
(1986) Estimator (MDLE) performs even better than that. Although the coverage probabilities of the
confidence interval from the t-distribution, like other methods, are below the nominal level of 95 per
cent, they are higher than the commonly applied interval based on the DerSimonian and Laird
method, particularly when k is small. This suggests that the simple confidence interval based on the
t-distribution is an improvement compared to the existing simple confidence interval based on
DerSimonian and Laird‟s method. Incidentally, „MDLE‟ is the best. The most remarkable fact is that
our proposed estimator “Modified Sidik-Jonkman (2002) Estimator (MSJE)” turns out to be the best
in terms of the “Coverage Probability”, with small „Relative Biases‟!
REFERENCES
Armitage P. Controversies and achievements in clinical trials. Controlled Clinical Trials 1984; 5: 67-72.
DerSimonian R, Laird N. Evaluating the effect of coaching on SAT Scores: a meta-analysis. Havard Ed
Rev 1983; 53: 1-15.
Brockwell, S E, and Gordon, I. R. A comparison of statistical methods for meta-analysis. Statistics in
Medicine 2001; 20: 825-840.
DerSimonian R, Laird N. Meta-analysis in clinical trials. Controlled Clinical Trials 1986; 7: 177-188.
Halvorsen, K: Combining results from independent investigations: meta-analysis in medical research. In:
Medical Uses of Statistics, Bailar, J C, Mosteteller, F; Eds. Boston: New England Journal of Medicine
1986.
Hardy, R J, Thompson, S. G. A likelihood approach to meta-analysis with random effects. Statistics in
Medicine 1996; 15: 619-629.
Olkin, I. Invited Commentary: Re: A critical look at some popular meta-analytic methods. American
Journal of Epidemiology 1994; 140: 297-299.
Sahai, Ashok & Bhoendradatt Tewarie. Meta-Analysis of Spatial Panel Data for Socioeconomic
Development Project(s) in India. Map World Forum (Jan. 2007 @ Hyderabad, ICC; INDIA).
Sidik, K, and Jonkman, J. N. A simple confidence interval for meta-analysis. Statistics in Medicine 2002;
21: 3153-3159.
Villar, J, Maria E. Mackey, Guillermo, C. and Allan, D. Meta-analyses in systematic reviews of
randomized controlled trials in perinatal medicine: comparison of fixed and random effect models.
Statistics in Medicine 2001; 20: 3635-3647.
Randomized Studies Panel Data CIs
APPENDIX-I.
11 | P a g e
Number of Studies Seminal To Panel/Meta-Data Analysis:
K = 10
Table # A.1.
K=10  =0.030000 =5.000000e-001 N=51000 t=0.033313 St=2.262157 1-α = 0.95
Estimators
Cvg. Prob.
Cvg. Err.
Length
Left Bias
Right Bias
ODLE Estimator
0.916588
0.033412
0.419378
4.239216
4.101961
MMDLE Estimator
0.939118
0.010882
0.497299
3.121569
2.966667
OSRE Estimator
0.932961
0.017039
0.474334
3.435294
3.268627
MMSRE Estimator
0.946843
0.003157
0.565561
2.733333
2.582353
2
Rel. Bias
0.016455
0.025443
0.024861
0.028403
Table # A.2.
K=10  =0.050000 =5.000000e-001 N=51000 t=0.152581 St=2.262157 1-α = 0.95
Estimators
Cvg. Prob.
Cvg. Err.
Length
Left Bias
Right Bias
ODLE Estimator
0.904667
0.045333
0.456035
4.703922
4.829412
MMDLE Estimator
0.935176
0.014824
0.549347
3.192157
3.290196
OSRE Estimator
0.926078
0.023922
0.518587
3.654902
3.737255
MMSRE Estimator
0.944804
0.005196
0.627044
2.707843
2.811765
2
Rel. Bias
0.013163
0.015124
0.011141
0.018828
Table # A.3.
K=10  =0.070000 =5.000000e-001 N=51000 t=-0.071278 St=2.262157 1-α = 0.95
Estimators
Cvg. Prob.
Cvg. Err.
Length
Left Bias
Right Bias
ODLE Estimator
0.899451
0.050549
0.542661
5.076471
4.978431
MMDLE Estimator
0.938706
0.011294
0.665222
3.113725
3.015686
OSRE Estimator
0.926529
0.023471
0.621357
3.705882
3.641176
MMSRE Estimator
0.952588
0.002588
0.762973
2.413725
2.327451
2
Rel. Bias
0.009750
0.015995
0.008807
0.018197
Table # A.4.
K=10  =0.100000 =5.000000e-001 N=51000 t=0.111125 St=2.262157 1-α = 0.95
Estimators
Cvg. Prob.
Cvg. Err.
Length
Left Bias
Right Bias
ODLE Estimator
0.899451
0.050549
0.542661
5.076471
4.978431
MMDLE Estimator
0.938706
0.011294
0.665222
3.113725
3.015686
OSRE Estimator
0.926529
0.023471
0.621357
3.705882
3.641176
MMSRE Estimator
0.952588
0.002588
0.762973
2.413725
2.327451
2
Randomized Studies Panel Data CIs
12 | P a g e
Rel. Bias
0.009750
0.015995
0.008807
0.018197
Number of Studies Seminal To Panel/Meta-Data Analysis:
K = 20.
Table # A.5.
K=20  =0.030000 =5.000000e-001 N=51000 t=0.016522 St=2.093024 1-α = 0.95
Estimators
Cvg. Prob.
Cvg. Err.
Length
Left Bias
Right Bias
Rel. Bias
ODLE Estimator
0.923686
0.026314
0.290163
3.847059
3.784314
0.008222
MMDLE Estimator
0.938235
0.011765
0.313227
3.107843
3.068627
0.006349
OSRE Estimator
0.934078
0.015922
0.309332
3.343137
3.249020
0.014277
MMSRE Estimator
0.946569
0.003431
0.334157
2.670588
2.672549
0.000367
2
Table # A.6.
K=20  =0.050000 =5.000000e-001 N=51000 t=0.048108 St=2.093024 1-α = 0.95
Estimators
Cvg. Prob.
Cvg. Err.
Length
Left Bias
Right Bias
Rel. Bias
ODLE Estimator
0.917353
0.032647
0.322684
4.066667
4.198039
0.015896
MMDLE Estimator
0.936569
0.013431
0.350536
3.137255
3.205882
0.010819
OSRE Estimator
0.930922
0.019078
0.344165
3.409804
3.498039
0.012773
MMSRE Estimator
0.946020
0.003980
0.373990
2.641176
2.756863
0.021431
2
Table # A.7.
K=20  =0.070000 =5.000000e-001 N=51000 t=0.024341 St=2.093024 1-α = 0.95
Estimators
Cvg. Prob.
Cvg. Err.
Length
Left Bias
Right Bias
ODLE Estimator
0.922941
0.027059
0.351899
3.800000
3.905882
MMDLE Estimator
0.941412
0.008588
0.383228
2.888235
2.970588
OSRE Estimator
0.936098
0.013902
0.375380
3.172549
3.217647
MMSRE Estimator
0.952294
0.002294
0.408866
2.376471
2.394118
2
Rel. Bias
0.013740
0.014056
0.007057
0.003699
Table # A.8.
K=20  =0.100000 =5.000000e-001 N=51000 t=0.116016 St=2.093024 1-α = 0.95
Estimators
Cvg. Prob.
Cvg. Err.
Length
Left Bias
Right Bias
ODLE Estimator
0.921765
0.028235
0.389156
3.898039
3.925490
MMDLE Estimator
0.941843
0.008157
0.424470
2.839216
2.976471
OSRE Estimator
0.936275
0.013725
0.414368
3.121569
3.250980
MMSRE Estimator
0.953059
0.003059
0.451998
2.282353
2.411765
2
Randomized Studies Panel Data CIs
13 | P a g e
Rel. Bias
0.003509
0.023601
0.020308
0.027569
Number of Studies Seminal To Panel/Meta-Data Analysis:
K = 60.
Table # A.9.
K=60  =0.030000 =5.000000e-001 N=51000 t=0.042380 St=2.000995 1-α = 0.95
Estimators
Cvg. Prob.
Cvg. Err.
Length
Left Bias
Right Bias
ODLE Estimator
0.937451
0.012549
0.167686
3.235294
3.019608
MMDLE Estimator
0.943353
0.006647
0.172048
2.945098
2.719608
OSRE Estimator
0.941627
0.008373
0.171539
3.035294
2.801961
MMSRE Estimator
0.947118
0.002882
0.176004
2.745098
2.543137
2
Rel. Bias
0.034483
0.039806
0.039973
0.038191
Table # A.10.
K=60  =0.050000 =5.000000e-001 N=51000 t=0.042201 St=2.000995 1-α = 0.95
Estimators
Cvg. Prob.
Cvg. Err.
Length
Left Bias
Right Bias
ODLE Estimator
0.939725
0.010275
0.188748
3.021569
3.005882
MMDLE Estimator
0.945490
0.004510
0.193752
2.701961
2.749020
OSRE Estimator
0.943902
0.006098
0.192751
2.780392
2.829412
MMSRE Estimator
0.949098
0.000902
0.197861
2.505882
2.584314
2
Rel. Bias
0.002602
0.008633
0.008738
0.015408
Table # A.11.
K=60  =0.070000 =5.000000e-001 N=51000 t=0.066456 St=2.000995 1-α = 0.95
Estimators
Cvg. Prob.
Cvg. Err.
Length
Left Bias
Right Bias
ODLE Estimator
0.941431
0.008569
0.205990
2.898039
2.958824
MMDLE Estimator
0.947941
0.002059
0.211467
2.572549
2.633333
OSRE Estimator
0.945353
0.004647
0.210215
2.672549
2.792157
MMSRE Estimator
0.951275
0.001275
0.215804
2.400000
2.472549
2
Rel. Bias
0.010378
0.011676
0.021887
0.014889
Table # A.12.
K=60  =0.100000 =5.000000e-001 N=51000 t=0.131044 St=2.000995 1-α = 0.95
Estimators
Cvg. Prob.
Cvg. Err.
Length
Left Bias
Right Bias
ODLE Estimator
0.940902
0.009098
0.228391
2.935294
2.974510
MMDLE Estimator
0.947098
0.002902
0.234466
2.601961
2.688235
OSRE Estimator
0.945314
0.004686
0.232905
2.698039
2.770588
MMSRE Estimator
0.950431
0.000431
0.239100
2.441176
2.515686
2
14 | P a g e
Rel. Bias
0.006636
0.016308
0.013266
0.015032