Download Appropriate use of Hormone Replacement Therapy (HRT): a risk

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Gene expression programming wikipedia , lookup

Pharmacogenomics wikipedia , lookup

Twin study wikipedia , lookup

Behavioural genetics wikipedia , lookup

Epigenetics of neurodegenerative diseases wikipedia , lookup

Microevolution wikipedia , lookup

Genome (book) wikipedia , lookup

Fetal origins hypothesis wikipedia , lookup

Quantitative trait locus wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Heritability of IQ wikipedia , lookup

Neuronal ceroid lipofuscinosis wikipedia , lookup

Public health genomics wikipedia , lookup

Transcript
Technical Report 2003/GE2
TECHNICAL REPORT 2003/GE2
Meta-Analytical methods for the synthesis of Genetic Studies
using Mendelian Randomisation
Cosetta Minelli*, M.D., M.Sc.; John R Thompson*, Ph.D.; Martin D Tobin, M.Sc.,
M.R.C.P.; Keith R Abrams, Ph.D., C.Stat.
* joint first authors
Genetic Epidemiology Unit,
Centre for Biostatistics,
Department of Health Sciences,
University of Leicester, U.K.
Date: 4 Aug 2003
Revision:

11 Aug 2003 – minor corrections to English and addition of a section on Publication and
Reporting Bias.

19 Aug 2003 – re-numbering of tables and addition to references.
1
Technical Report 2003/GE2
1.
Introduction
With the recent growth in knowledge about the human genome there has been a dramatic increase in
the number of genetic epidemiological studies of the association between specific genes and diseases
and between those genes and the risk factors or phenotypes that are thought to be intermediates on
the causal pathway to disease. In many instances these studies have supplemented pre-existing
research on the association between the phenotype and the disease. For instance, many recent studies
have looked at the associations between polymorphisms of the Methylene TetraHydroFolate
Reductase (MTHFR) gene and Coronary Heart Disease (CHD) and between the MTHFR gene and
homocysteine; these have, in part, been motivated by the pre-existing evidence for an association
between homocysteine and CHD. Similarly there have been many studies of APO-E polymorphisms
and CHD or stroke, and many studies of APO-E and lipid levels, stemming from epidemiological
evidence of an association between lipids and CHD or stroke.
As the number of genetic studies has grown, so meta-analyses have been produced to synthesise the
evidence and overcome the limitations of power found in even moderately sized studies1. Two
factors are evident from reviewing these meta-analyses; first, studies of gene and phenotype tend to
be less common that those of gene and disease, and second, the evidence for a genotype-phenotype
association is often obtained as a spin-off from a study primarily aimed at investigating the
genotype-disease relationship and then the information is often only obtained on a subset of the
subjects.
Where there is strong reason to suppose that the phenotype is intermediate on the causal pathway
from gene to disease, it would be sensible to perform meta-analyses that in some way integrate the
evidence for all three relationships; genotype-phenotype, genotype-disease and phenotype-disease.
The logic behind this approach is greatly strengthened by an appeal to Mendelian randomisation, that
is, the fact that one's genes are inherited at birth by a seemingly random process. Accordingly,
epidemiological studies of the genotype-phenotype and genotype-disease associations show strong
parallels with randomised trials and should not be affected by confounding or reverse causation in
the way that makes phenotype-disease studies so difficult to interpret2-4. In theory, by combining the
information from genotype-disease and genotype-phenotype studies, it is possible to derive an
unconfounded estimate of the phenotype-disease association. Integrated meta-analyses may be able
to take advantage of Mendelian randomisation, in order to test whether the phenotype is actually on
the causal pathway and to provide an unconfounded estimate of the effect of phenotype on disease.
2
Technical Report 2003/GE2
2.
2.1
Methods
Mendelian Randomisation
In order to use genetic studies to quantify the relationship between the phenotype and disease, the
estimate of the genotype-disease association has to be combined with the estimate of the genotypephenotype association (Fig. 1). Suppose that a mutant genotype (GG) causes an increased risk of
disease compared to the wildtype (gg) and that this is measured by the Odds Ratio (ORGG vs gg).
Further suppose that GG compared to gg causes a mean difference, ΔP, in the level of the
intermediate phenotype. Then, under the assumptions required for Mendelian randomisation and
assuming linearity of the relationship between phenotype increase/decrease and OR for the disease
on a log scale, ORGG vs gg 1/ ΔP is an unconfounded estimate of the odds ratio of disease resulting from
a unit change in the phenotype.
Figure 1 - Calculation of an unconfounded estimate of the effect of a change in phenotype on a disease based on the
concept of Mendelian randomisation
Genotype G
Genotype - Phenotype
mean (PGG – Pgg ) = P
Phenotype P
Genotype - Disease
Pooled OR GG vs gg = b
?
Disease D
Phenotype - Disease
Odd ratio associated with a k unit change in Phenotype
OR PD = b k/P
2.2
Example: MTHFR, Homocysteine and CHD
A recent non-genetic meta-analysis on individual patient data from epidemiological studies showed a
decrease of 11% in CHD for a 25% decrease of homocysteine levels (OR: 0.89; 95% Confidence
3
Technical Report 2003/GE2
5
Interval [CI]: 0.83 to 0.96) . The meta-analysis showed heterogeneity between studies partly
explained by study design. Retrospective studies yielded higher estimates of risk, perhaps due to
unadjusted confounding. In particular, two major confounding factors were suggested; smoking and
blood pressure. These are both strongly correlated with homocysteine and are known risk factors for
CHD. The strong possibility of unadjusted confounding makes it very difficult to be sure that the
relationship between homocysteine and CHD is causal.
A common polymorphism of the gene for the MTHFR enzyme leads to reduced enzyme activity,
lower folate and consequently higher homocysteine levels6. The polymorphism involves a C-to-T
substitution at base 677, so the wildtype homozygous genotype is referred to as CC and the mutant
homozygous genotype as TT. This polymorphism can be used, together with the idea of Mendelian
randomisation, to indirectly assess the effect of homocysteine on CHD.
A recent genetic meta-analysis of individual patient data has shown an increased risk of CHD of
about 16% associated with genotype TT compared to CC (OR: 1.16; 95%CI: 1.05 to 1.28). This
result was similar to that of another meta-analysis published at the same time but carried out on
aggregated data, which showed a pooled odds ratio of 1.21 for TT genotype (95%CI: 1.06 to 1.39)7.
The later paper also mentioned those studies that evaluated the association between genotype and
phenotype. They found a simple average mean difference of 2.7 μmol/l in homocysteine
concentration (95%CI: 2.1 to 3.4) between TT and CC genotypes.
Using the paper by Wald et al.7 a total of 64 genetic studies were identified (Tab. 1). Classifying the
studies that reported both estimate and precision, 31 evaluated only genotype-disease association, 16
only genotype-phenotype association, and 17 both. The definition of CHD used was myocardial
infarction or angiographically confirmed coronary artery occlusion (>50% of the luminal diameter).
Genotype-disease associations were reported in an additional 11 studies, but this information was not
considered because either a different disease definition (prognosis rather than occurrence of CHD8;9,
atherosclerotic vascular disease rather than CHD10-13, other disease outcomes14-17) or a different
study population (diabetic subjects18) was used.
Among the 17 studies evaluating both associations, 7 measured the mean difference in phenotype
level with genotype in both cases and controls (2 reporting only combined means), but 4 studies
measured homocysteine only in cases and 4 only in controls, while two reports were unclear.
4
Technical Report 2003/GE2
Table 1 – Studies included in the meta-analysis (n=64) and evaluating either Genotype-Disease (n=31), GenotypePhenotype (n=16) or both associations (n=17)
Study
Chambers (Asians)19
Chao20
Tsai21
Schmitz22
Meisel23
Ma24
Schwartz25
Kim26
Kluijtmans (1997)27
Christensen28
Chambers (European) 19
Tokgozoglu29
Nakai30
Morita31
Ou32
Malinow33
Kawashiri34
Zheng35
Fernandez-A. (Females)36
Brulhart37
Girelli38
Brugada39
Adams40
Ardissino41
Dilley42
Verhoef (1998)43
Hsu44
Van Bockxmeer45
Abbate46
Wilcken47
Pinto48
Anderson (1997)49
Fowkes50
Gardemann51
Todesco52
Reinhardt53
Verhoef (1997)54
Kihara55
Araujo56
Malik57
Thogersen58
Izumi59
Fernandez-A. (Males)36
Szczeklik60
Gallagher61
Mager62
Ferrer-Antunes63
Gulec64
Dekou (Females)65
Voutilainen66
Chango (a)67
Mazza14
Chango (b)68
Dekou (Males)65
Kosokabe8;14
Gonzalez Ordonez15
Anderson (2000)9
Kluijtmans (1996)11
Deloughery12
Fujimara16
Arai18
Rassoul10
Yoo13
D'Angelo17
GENOTYPE-DISEASE
GENOTYPE-PHENOTYPE
log OR
Variance logOR
Diff. Hcy
Variance Diff. Hcy
-0.82
-0.48
-0.31
-0.22
-0.2
-0.17
-0.11
0.07
0.19
0.25
0.28
0.41
0.55
0.73
0.84
0.96
1.14
-1.61
-0.63
-0.37
-0.29
-0.26
-0.22
-0.21
-0.17
-0.17
-0.11
-0.01
0.03
0.04
0.06
0.1
0.25
0.26
0.3
0.32
0.35
0.38
0.4
0.42
0.46
0.49
0.5
0.77
1.18
1.26
1.3
1.46
/
/
/
/
/
/
/
/
/
/
/
/
/
/
/
0.42
0.25
0.14
0.10
0.03
0.07
0.21
0.15
0.03
0.16
0.07
0.32
0.10
0.04
0.06
0.23
0.26
0.61
0.28
0.07
0.10
0.21
0.08
0.09
0.77
0.04
0.16
0.10
0.16
0.08
0.23
0.08
0.16
0.08
0.17
0.19
0.26
0.22
0.66
0.12
0.39
0.08
0.09
0.14
0.24
0.15
0.30
0.31
/
/
/
/
/
/
/
/
/
/
/
/
/
/
/
0.4
2.9
1.3
-0.8
0.9
2
2.6
1.3
2.8
2.5
1.4
8.8
3
3.8
4.6
1.2
11
/
/
/
/
/
/
/
/
/
/
/
/
/
/
/
/
/
/
/
/
/
/
/
/
/
/
/
/
/
/
/
0.9
1
1.4
1.4
1.7
2.1
2.1
2.1
3.8
4
4.2
4.3
4.4
7.3
8.1
10.00
2.44
0.46
0.48
0.19
0.36
1.23
2.63
1.47
1.15
0.47
14.29
0.57
1.47
1.61
0.71
11.11
/
/
/
/
/
/
/
/
/
/
/
/
/
/
/
/
/
/
/
/
/
/
/
/
/
/
/
/
/
/
/
0.33
0.86
0.64
0.38
2.56
0.53
2.08
0.84
2.33
4.76
7.69
3.23
4.00
4.17
0.89
/
/
9.9
3.45
5
Technical Report 2003/GE2
2.3
Publication and Reporting Bias
The studies in Table 1 show that some studies reported on the Genotype-Disease (GD) association,
some on the Genotype-Phenotype (GP) association and some on both. Thus as well as the usual
publication bias we need to be concerned about the possibility of reporting bias due to researchers
choosing not to report information that is inconsistent with some accepted theory. The usual method
of investigation of publication bias is via funnel plots.
Figure 2(a) and 2(b) show the funnel plots for the GD estimates for studies that do or do not report
on GP. Figure 3 combines the information in a single plot.
Figure 2 – Funnel plots of GD in studies (a) that report also GP (b) that do not report GP
Studies Only Reporing on GD
1
2
3
4
Inverse Standard Error
4
3
2
0
0
1
Inverse Standard Error
5
5
6
6
Studies Reporting on GD and GP
-2
-1
0
Log OR of GD
1
2
-2
-1
0
Log OR of GD
1
2
Figure 3 – Funnel plots of GD in all studies
5
4
3
2
1
0
Inverse Standard Error
6
All Studies Reporing on GD
-2
-1
0
Log OR of GD
1
2
Information Reported
GD only
GD & GP
6
Technical Report 2003/GE2
It certainly appears from figures 2 and 3 that studies reporting a small GD odds ratio are underrepresented in the literature but it is not clear that the reporting of GP is related to that bias.
Figure 4(a) and 4(b) show the funnel plots for the GP estimates for studies that do or do not report on
GD. Figure 5 combines the information in a single plot.
Figure 4 – Funnel plots of GP in studies (a) that report also GD (b) that do not report GD
Studies Reporting on GP and GD
3
1
2
Inverse Standard Error
2
1
0
0
Inverse Standard Error
3
Studies Only Reporing on GP
0
2
4
6
8
Mean Difference in GP
10
12
0
2
4
6
8
Mean Difference in GP
10
12
Figure 5 – Funnel plots of GP in all studies
2
1
0
Inverse Standard Error
3
All Studies Reporing on GP
0
2
4
6
8
Mean Difference in GP
10
12
Information Reported
GP only
GP & GD
7
Technical Report 2003/GE2
It certainly appears from figures 4 and 5 that studies reporting a negative GP difference are underrepresented in the literature but once again it is not clear that the reporting of GD is related to that
bias.
2.4
Meta-analytical Approaches
In the example of MTHFR gene, homocysteine and coronary heart disease, as it is likely to happen in
most instances, a mixture of studies that measure the genotype-phenotype effect (n=16), those that
measure genotype-disease (n=31) and those that measure both (n=17), is present. If the genotypephenotype and genotype-disease evidence all come from unrelated sources then separate metaanalyses will give estimates of the pooled effects that can, by appealing to Mendelian randomisation,
be combined to estimate the size of the phenotype-disease association. However, when some sources
are shared, it becomes important to consider the correlation in the genotype-phenotype and
genotype-disease evidence arising from studies that measure both associations.
In the following paragraphs, meta-analytical methods that allow for between-study correlation are
described, which enable all available evidence to be combined within a single model. In particular,
two models will be presented (Model A and B), which differ in the way that the heterogeneities of
genotype-disease, genotype-phenotype and phenotype-disease associations are modelled (Fig. 6). In
model A we directly model the heterogeneity of the observed associations with the genotype. Under
the assumption of a causal pathway acting solely through the phenotype, these heterogeneities will
be correlated and so they are modelled by a general bivariate normal distribution. In contrast, in
model B we model the heterogeneity of the genotype-phenotype and phenotype-disease stages.
Together these induce the heterogeneity in the genotype-disease association and therefore the
correlation between the genotype-phenotype and genotype-disease stages. Critically we add the
assumption that the heterogeneities on the genotype-phenotype and phenotype-disease stages are
independent. In many situations this assumption will be reasonable because there will be no
biological reason why a study where the effect of gene on phenotype is large will also find that the
phenotype is more closely linked to disease.
Both models will be fitted following a maximum likelihood approach as well as a Bayesian
approach.
8
Technical Report 2003/GE2
Figure 6 – Two different ways of modelling the heterogeneities of the three associations
Model B
Model A
Genotype
Genotype
y
y
x

Phenotype
Disease
=0
Phenotype
Disease

Heterogeneity in phenotype-disease association can be derived
Heterogeneity in gene-disease association can be derived
2.4.1 Model A
Let the Log Odds Ratio of Disease given genotype be x and the mean difference in phenotype be y.
The ith study produces two potential estimates xi and yi although in practice only one or other may be
obtained or reported.
The study estimates have reported variances vxi and vyi. The variances are assumed known.
It is possible that the within-study estimates of x and y are correlated (within-study correlation),
although there is good reason to suppose that this correlation is negligible because:

y (difference in phenotype across genotypes) is often only measured in a subset of the total
study subjects

x (log OR of disease given genotype) is an aggregate measure obtained from the disease
outcome since we cannot measure the actual increase in risk at an individual level
To study whether this assumption was reasonable we performed the following simulation study.
Each simulated study consisted of n cases and n controls. The probability of disease given the
phenotype, u, was assumed to be controlled by a logistic function with parameters  and . The
distribution of the phenotype over the population was assumed to be normal N(,2) with a fixed
genotype effect on phenotype. From the generated data we estimated the log odds ratio for the
genotype-disease association and the mean difference in phenotype with disease. M repeated studies
were simulated under identical conditions and the correlation between the estimated log odds ratio
9
Technical Report 2003/GE2
for the genotype-disease association and the mean difference in phenotypes was calculated.
Initial values for the parameters were chosen to reflect the values in studies of the MTHFRHomocysteine-CHD pathway. Thus
OR unit change in phenotype on disease = 1.1 [=log(1.1)=0.095]
Difference in phenotype due to genotype = 4 units
[Hence OR genotype on disease = 1.46]
Frequency of dangerous genotype = 12%
Between-subject variation in levels of phenotype, u, N(8,2) or N(12,2) depending on genotype
Baseline risk of disease () mean –4 st dev 0.2 (reflecting unmeasured covariates)
Hence
e  u
P( D  1 | u ) 
1  e  u
About 4% of people with the wild type genotype and about 5.4% of those with the mutant genotype
develop the disease.
Studies had 100 cases and 100 controls and, to avoid bias, the effect of genotype on phenotype was
estimated from the controls only.
From 500 simulations the correlation r between the estimated log OR of genotype on disease and the
mean difference in phenotype was calculated as 0.00.
Repeating the estimation under exactly the same conditions gave –0.07, 0.00, 0.02, -0.08. The
average was therefore –0.03. 500 simulations may not be enough to guarantee an accurate estimate
but it is enough to demonstrate that the correlation is within 0.1.
Using 500 simulations the basic simulation was altered by changing one parameter and keeping the
others fixed at the values given above. The results are shown in Table 2.
The program used for the simulations is given in Appendix 1.
10
Technical Report 2003/GE2
Table 2 – Simulation studies to investigate the within-study correlation of log OR with mean difference in phenotype
Parameter
Estimated Correlation
BASE ASSUMPTIONS
0.00,–0.07, 0.00, 0.02, -0.08 mean=-0.03
OR unit change in phenotype on disease=1.2
0.04
Difference in phenotype due to genotype = 2
-0.10
Difference in phenotype due to genotype = 6
-0.06
Frequency of dangerous genotype = 25%
0.11
Frequency of dangerous genotype = 6%
0.03
Standard deviation in phenotype = 3
0.00
Standard deviation in phenotype = 1
0.02
Although the within-study correlation will be small, between studies there will be considerable
heterogeneity and this heterogeneity will possibly be highly correlated. That is studies that show a
larger than expected difference in phenotype y may well report a larger than expected log odds ratio
of disease, x (between-study correlation).
Suppose that the heterogeneity in x has variance x, the heterogeneity in y has variance y and the
correlation is .
Assuming bivariate normal distributions (MN2) we have a hierarchical model with
 xi 
 y  ~ MN 2
 i
  xi 
mean  
  yi 
 wxi
var iance 
 0
0 
w yi 
and
  xi 
   ~ MN 2
 yi 
x 
mean  
 y 
 x
var iance 
   x y
  x y 
y


Integrating over the study means we have
 xi 
 y  ~ MN 2
 i
x 
mean  
 y 
 wxi   x
var iance 
  x y

  x y 

w yi   y 

The log odds ratio of phenotype on disease is estimated by the ratio  = x /y so we could write the
model as
11
Technical Report 2003/GE2
 xi 
 y  ~ MN 2
 i
 y 
mean 

 y 
 wxi   x
var iance 
   x y
  x y 

w yi   y 

When only one of the pair is observed we treat them as univariate normal.
xi ~ N
mean  y
var iance wxi   x
yi ~ N
mean  y
var iance w yi   y
and
This model with five parameters (, y, x, y, ) can be fitted by either a maximum likelihood
approach or a Bayesian approach.
Maximum Likelihood Approach
Numerical maximisation (ml in Stata) is based on transformed parameter values (, y, log(x),
log(y), log[(1+)/(1-)])
To obtain initial values we take

x
y
 y  y  x  Var ( x)  y  Var ( y )
  Corr ( x, y )
The Stata code used for the maximisation is given in Appendix 2.
To test the algorithm data were generated from n=100 studies with 1/wxi ~ U(3,30) and
1/wyi ~ U(1,5) and parameters (0.05, 3, 0.01, 1, 0.8). Observations were made missing at random to
leave 50 studies in which both were measure, 25 in which only x was measured and 25 in which only
y was measured. This data set was called “test100”.
Using data set "test100" the initial values were (0.04, 3.35, 0.09, 1.42, 0.26). The 50 studies with
both measured are shown in Figure 7. The correlation is attenuated by the uncertainties in the withinstudy estimates and the variance of y is exaggerated.
12
Technical Report 2003/GE2
Figure 7 – Correlation between the log OR of the Genotype-Disease association and the difference in Phenotype
-.5
0
Log Odds Ratio GD
.5
1
associated with the Genotype (Genotype-Phenotype association) for 50 studies measuring both, in the “test100” data set
0
2
4
Dif f erence in Phenoty pe
6
8
Numerical maximisation (ml in Stata) converges to the values reported in Table 3, which in terms of
the original measurements, equates to:  =0.04, y=3.36, x=0.02, y=0.87, =0.51.
Table 3 – Model A - Results of numerical maximisation in the “test100” data set
Parameter

y
log(x)
log(y)
log[(1+)/(1-)]
Estimate
0.043
3.360
-3.943
-0.144
1.116
St. Error
0.010
0.128
0.635
0.242
0.818
A second test data set generated identically but with n=1000, 250 only x and 250 only y. This started
with  =0.05, y=3.00, x=0.10, y=1.44, =0.23 and converged to the values reported in Table 4,
which in terms of the original measurements, equates to:  =0.05, y=3.02, x=0.01, y=1.06, =1.00
Table 4 – Model A - Results of numerical maximisation in the “test1000” data set
Parameter

y
log(x)
log(y)
log[(1+)/(1-)]
Estimate
0.051
3.015
-4.916
0.057
16.164
St. Error
0.003
0.043
0.280
0.069
1418.8
Both analyses partially correct the errors in initial var(y) and the correlation but the likelihood is
clearly flat and poorly defines the correlation. Noticeably the situation is worse with a sample of
1000 studies, which suggests that the problem is not solved by increasing the number of studies in
13
Technical Report 2003/GE2
the meta-analysis.
The MTHFR-Homocysteine-CHD data starts with  =0.06, y=3.29, x=0.34, y=7.92, =0.60 and
converges to the values reported in Table 5, which in terms of the original measurements, equates to:
 =0.09, y=2.64, x=0.11, y=3.14, =1.00
Table 5 – Model A - Results of numerical maximisation in the actual MTHFR-Homocysteine-CHD data set
Parameter
Estimate
0.086
2.644
-2.192
1.144
16.852

y
log(x)
log(y)
log[(1+)/(1-)]
St. Error
0.024
0.356
0.445
0.367
1306.9
Again the correlation is poorly defined.
The profile log likelihood curves reflect the uncertainty in correlation (Fig. 8). The likelihood for the
actual data increases linearly with , with a maximum at =1. The problem is not present when using
the test data set with 100 studies, where the maximum likelihood is reached for =0.5, but this is
likely to be due to chance since the problem is present again when using the test data set with 1000
studies. Thus, although this issue needs to be addressed by a thorough simulation study, it seems that
the problem of unbounded likelihood does not depend only on the number of studies included in the
meta-analysis, even when considering extreme (and unrealistic) numbers.
The program used for the profile calculations with fixed correlations is given in Appendix 3.
Figure 8 – Profile Log Likelihood for the three different scenarios; (a) actual data;
(b) test data with n. of studies=100; (c) test data with n. of studies=1000
(a) Actual data
-36
-37
-38
Log Likelihood
-35
-34
Profile Log Likelihood
0
.2
.4
.6
.8
1
Correlation
14
Technical Report 2003/GE2
(b) Test data with n. of studies=100
6.5
7
Log Likelihood
7.5
8
Profile Log Likelihood
0
.2
.4
.6
.8
1
.8
1
Correlation
(c) Test data with n. of studies=1000
60
65
70
Log Likelihood
75
80
Profile Log Likelihood
0
.2
.4
.6
Correlation
Unfortunately conclusions about the estimate of  (log OR of phenotype on disease) and its standard
error also depend on the size of the between-study correlation (Fig. 9). The significance of  is also
affected (Fig. 10).
Figure 9 – Profile curves for the actual data
Profile St Error Estimates
.07
.0245
.025
.0255
.026
St Error of Ratio
.08
.075
Estimate of Ratio
.0265
.085
.027
Profile Ratio Estimates
0
.2
.4
.6
Correlation
.8
1
0
.2
.4
.6
.8
1
Correlation
15
Technical Report 2003/GE2
Figure 10 – Statistical significance of Profile Ratio to Standard Error for the actual data
3.2
3
2.6
2.8
z=estimate/st error
3.4
3.6
Profile Ratio to St Error
0
.2
.4
.6
.8
1
Correlation
The estimate of  is significantly different from 0 if the correlation is known and over about 0.55.
Bayesian Approach
The same model is implemented following a Bayesian approach, using non-informative prior
distributions for all model parameters, which are estimated using Markov Chain Monte Carlo
(MCMC) methods implemented using the WinBUGS software 1.369.
The estimate for the phenotype-disease association is derived from the random effect meta-analyses
of the genotype-disease and genotype-phenotype associations. The three groups of studies, i.e. those
measuring only genotype-disease, genotype-phenotype or both associations, are modelled with three
separate model specifications, and for those studies evaluating both associations a multivariate
normal distribution is used in order to allow for the between-study correlation.
The code in Winbugs is:
•For studies with both G-P and G-D associations (n=17)
delta[i,1:2]~ dmnorm(m u[],T[, ])
Diff_GP[i]~ dnorm(delta[i,1],W eight_GP[i])
LogOR_GD[i]~ dnorm(delta[i,2],W eight_GD[i])
•For studies with only G-P association (n=16)
Diff_GP[i] ~ dnorm(delta[i,1], W eight_GP[i])
delta[i,1] ~ dnorm(mu[1],tau_GP)
•For studies with only G-D association (n=31)
LogOR_GD[i] ~ dnorm(delta[i,2], Weight_GD[i])
delta[i,2] ~ dnorm(mu[2],tau_GD)
16
Technical Report 2003/GE2
where mu[1] and mu[2] are assigned a normal vague prior distribution (dnorm(0.0,1.0E-6)), and
tau_GP and tau_GD
a gamma vague prior distribution (dgamma(0.001,0.001)). For the inverse
covariance matrix, T[1:2,1:2], a Wishart vague prior distribution is used (dwish(R[ , ],2)).
The three models are linked by the fact that the parameters for the true underlying effects (mu[]) are
the same.
The results of the model are very sensitive to the values specified for the matrix R in the Wishart
prior distribution for the precision matrix T.
The results of Model A obtained using the maximum likelihood and the Bayesian approaches,
expressed as the effect of a 5μmol/l increase in homocysteine on CHD, are compared in Table 9.
2.4.2 Model B
An attractive alternative to Model A is to assume that the heterogeneity is on the genotypephenotype and phenotype-disease stages of Figure 1 rather than on the measured genotypephenotype and genotype-disease. The big advantage of this model comes about if we are willing to
assume that heterogeneities on the genotype-phenotype and phenotype-disease stages are
independent. Even under the independence model correlation will still be induced into the resultant
heterogeneities on genotype-phenotype and genotype-disease. Independence assumes that in studies
that find a large effect of genotype on disease, perhaps because of local conditions, they will not tend
to find relatively larger or smaller effects of a given phenotype on disease. This is probably
reasonable.
The model can be fitted by either a maximum likelihood approach or a Bayesian approach.
Maximum Likelihood Approach
Assuming bivariate normal distributions (MN2) we have a hierarchical model with
 xi 
 y  ~ MN 2
 i
  xi 
mean  
  yi 
 wxi
var iance 
 0
0 
w yi 

var iance 
0
0
 y 
and
 i 
   ~ MN 2
 yi 
 
mean  
 y 
17
Technical Report 2003/GE2
since xi = i yi we can use Taylor series approximations (delta method) to derive that
  xi 
   ~ MN 2
 yi 
 y 
mean 

 y 
  2y   2 y  y 
var iance 



y
y 

so
  xi 
   ~ MN 2
 yi 
 y 
mean 

 y 
 wxi   2y   2 y
var iance 
 y

 y 

w yi y 
When only one of the pair is observed we treat them as univariate normal
xi ~ N
mean  y
var iance wxi   2y   2 y
yi ~ N
mean  y
var iance w yi   y
and
To test the algorithm data were generated from n=100 studies with 1/wxi ~ U(3,30) and
1/wyi ~ U(1,5) and parameters =0.05, y=3, =0.01, y=1. Observations were made missing at
random to leave 50 studies in which both were measure, 25 in which only x was measured and 25 in
which only y was measured. A second data set with the same structure but n=1000 was also
generated.
Numerical maximisation (ml in Stata) is based on transformed parameter values , y, log(),
log(y). The program is given in Appendix 4.
To obtain initial values we take:

x
y
 y  y    Var (
xi
)  y  Var ( y)
yi
For n=100, numerical maximisation starts at =0.038, y=3.23, =0.025, y=1.34 and converges to
the values reported in Table 6, which in terms of the original measurements, equates to =0.032,
y=3.25, =0.011, y=0.85.
18
Technical Report 2003/GE2
Table 6 – Model B - Results of numerical maximisation in the “test100” data set
Parameter

y
log()
log(y)
Estimate
0.032
3.25
-4.47
-0.16
St. Error
0.015
0.13
0.26
0.24
For n=1000, numerical maximisation starts at (0.056, 2.96, 0.041, 1.38) and converges to the values
reported in Table 7, which in terms of the original measurements, equates to =0.055, y=2.95,
=0.012, y=0.97.
Table 7 – Model B - Results of numerical maximisation in the “test1000” data set
Parameter

y
log()
log(y)
Estimate
0.055
2.95
-4.45
-0.03
St. Error
0.005
0.04
0.09
0.07
For the actual MTHFR-Homocysteine-CHD data maximisation starts at =0.06, y=3.29, =0.33,
y=7.92 and converges to the values reported in Table 8, which in terms of the original
measurements, equates to =0.10, y=2.63, =0.01, y=3.55.
Table 8 – Model B - Results of numerical maximisation in the actual MTHFR-Homocysteine-CHD data set
Parameter

y
log()
log(y)
Estimate
0.095
2.627
-5.144
1.267
St Error
0.028
0.408
0.940
0.390
These figures lead to an estimate of the induced correlation between true study-specific means of
genotype-phenotype and genotype-disease of =0.58.
Bayesian Approach
Model B has also been implemented following a Bayesian approach in WinBugs, using a slightly
different method to deal with those studies measuring only one association (either genotype-disease
or genotype-phenotype association). In these studies, the association, which has not been evaluated,
is treated as data missing at random, and missing values are sampled and inputted. Thus, all 64
studies are modelled in a single step for both genotype-disease and genotype-phenotype associations
19
Technical Report 2003/GE2
(NA=Not Available):
•Data for ALL studies (n=64):
Diff_GP
Weight_GD
LogOR_GD
Weight_GP
a1
…
…
a17
w_a1
…
…
w_a17
b1
…
…
b17
w_b1
…
…
w_b17
NA
…
…
NA
NA
…
…
NA
b18
…
…
b48
w_b18
…
…
w_b48
a49
…
…
a64
w_a49
…
…
w_a64
NA
…
…
NA
NA
…
…
NA
In a way similar to that used in the maximum likelihood approach, the estimate for the phenotypedisease association is derived from the random effect meta-analyses of the two associations,
genotype-disease and genotype-phenotype, by modelling the heterogeneity on genotype-phenotype
and phenotype-disease stages of Figure 1 rather than on the measured genotype-phenotype and
genotype-disease.
The code in Winbugs is:
•For all studies (n=64):
Diff_GP[i] ~ dnorm(delta_GP[i], W eight_GP[i])
LogOR_GD[i] ~ dnorm(delta_GD[i], W eight_GD[i])
delta_GD[i] <- delta_GP[i]*delta_PD[i]/5
delta_GP[i] ~ dnorm(d_GP, tau_GP)
delta_PD[i] ~ dnorm(d_PD, tau_PD)
W eight_GP[i] ~ dlnorm(mu_WGP,psi_GP)
W eight_GD[i] ~ dlnorm(mu_WGD,psi_GD)
where d_GP, d_PD, mu_WGP, and mu_WGD were assigned a normal vague prior distribution
(dnorm(0.0,1.0E-6)), and tau_GP, tau_PD, psi_GP and psi_GD a gamma vague prior distribution
(dgamma(0.001,0.001)).
The results of Model B obtained following the maximum likelihood and the Bayesian approaches are
compared in Table 9.
20
Technical Report 2003/GE2
Table 9 – Estimates of the effect of a 5μmol/l increase in homocysteine on CHD obtained from studies evaluating G-D,
P-D or both, using a Bayesian and a Maximum Likelihood approach
METHOD
OR ΔHcy of 5 μmol/l
Mean
95% CrI/CI
Deriving the heterogeneity in phenotype-disease association (Model A)
1.56 (median: 1.54)
1.19 to 2.06
Deriving the heterogeneity in genotype-disease association (Model B)
1.61 (median: 1.59)
1.22 to 2.15
Deriving the heterogeneity in phenotype-disease association (Model A)
1.54
1.21 to 1.95
Deriving the heterogeneity in genotype-disease association (Model B)
1.60
1.22 to 2.12
Bayesian Approach
All studies (evaluating G-D, P-D or both)
Maximum Likelihood approach
All studies (evaluating G-D, P-D or both)
21
Technical Report 2003/GE2
3. Discussion
The need for an integrated meta-analytical approach to genetic studies when using Mendelian
randomisation is particularly important. The uncertainty associated with the derived estimate of the
phenotype-disease association can be large, depending on uncertainty in both estimates of genotypephenotype and genotype-disease associations70. In particular, in a study where the confidence
interval for the difference in phenotype level between genotypes includes the value of 0, the
confidence interval for the derived estimate of phenotype-disease association may tend to plus and
minus infinity70. It is crucial to the use of Mendelian randomisation that both estimates are
sufficiently precise, but this applies particularly to the genotype-phenotype association. Such
precision is only likely to be obtained through a meta-analysis of all evidence available. In fact,
almost all genetic studies are statistically underpowered to detect the relatively small effects of the
frequent gene variants that underlie common, complex diseases71.
Two meta-analytical models are presented in this report to combine all evidence regarding genotypedisease and genotype-phenotype associations in order to derive an estimate of the phenotype-disease
association. The models have been developed with the aim of allowing for the correlation between
the pooled estimates of genotype-disease and genotype-phenotype associations induced by those
studies evaluating both (between-study correlation). In fact, a simple approach of carrying out two
independent meta-analyses on genotype-disease and genotype-phenotype data and then estimating
phenotype-disease association based on the two pooled estimates would be equivalent to assume a
between-study correlation of zero. When adopting such an approach, the result is an OR of CHD for
a 5 μmol/l increase in homocysteine of 1.44, 95%CI from 1.11 to 1.96, where the confidence interval
is calculated based on the uncertainty in the estimates of both genotype-disease and genotypephenotype associations70 (following a Bayesian approach with vague prior distributions the result is a
mean of 1.45 and median of 1.42, 95%CrI from 1.11 to 1.96). In general terms, the importance of a
proper meta-analytical approach which takes between-study correlation into account when part of the
studies contributes to evidence on both sides of genotype-disease and genotype-phenotype
associations will depend on the number and size of studies evaluating both associations relatively to
those measuring only one or the other.
The two models presented differ for the way the heterogeneities of genotype-disease, genotypephenotype and phenotype-disease associations are modelled.
22
Technical Report 2003/GE2
In Model A the heterogeneity in phenotype-disease association is derived from the heterogeneities on
genotype-disease and genotype-phenotype associations, which are estimated by the observed data.
Although this model is the most straightforward given that it follows the idea of deriving phenotypedisease from genotype-disease and genotype-phenotype associations, its robustness is very much
dependent on the amount of information available to estimate the size of between-study correlation.
In the example of MTHFR gene, homocysteine and coronary heart disease the data were not
sufficient to provide a good estimate for the between-study correlation. This explains why the
estimate of the correlation tends to the boundary value of 1 and the fact that this also happened with
a randomly generated data set of 1000 studies suggests that the problem is not simply a lack of
studies in the meta-analysis. When adopting a Bayesian approach the same instability was shown as
extreme sensitivity to the prior distribution assumed for the precision matrix in the bivariate model.
Model B was developed in the attempt of solving the problem of insufficient data to estimate all
parameters of the first model, by introducing the assumption of independence between genotypephenotype and phenotype-disease heterogeneities. Modelling the heterogeneities of the observed
genotype-phenotype data and the unobserved phenotype-disease data, the heterogeneity of genotypedisease association is derived from the two. Whilst the first model is limited by the possibility of
estimating the size of between-study correlation from the data, the validity of the second model
depends on the assumptions we are prepared to make about the interrelationship of the
heterogeneities of the three associations.
When in a study the estimate is not given for one of the two associations (either genotype-disease or
genotype-phenotype), the model can deal with this by either treating it as a missing value, as in
Model B in the Bayesian approach, or considering the marginal distribution for those estimates that
have been observed, as in all the other models. However, these two methods are equivalent and both
rely on a missing at random assumption, which is a strong assumption, since it might well be that
reporting bias is the reason for the absence of the estimate in a published paper. For instance, it
might be that a study reporting only on genotype-disease has also evaluated genotype-phenotype
association but chosen not to report it because the genotype-phenotype data did not fit with the
underlying hypothesis.
Although the choice between the two models might depend on the situation, and in particular on the
amount of data available (number and size of studies) to estimate the between-study correlation, our
preference is for Model B. In fact, the independence between genotype-phenotype and phenotype23
Technical Report 2003/GE2
disease heterogeneities seems a reasonable assumption since it is unlikely that the variability in the
estimate of the difference in phenotype associated with a certain genotype should influence the
variability in the estimate of the increase in the risk of the disease associated with a specific increase
in phenotype level, or vice versa. However, this assumption should be considered for any given
application.
Although in the example of MTHFR gene, homocysteine and CHD, the two models gave similar
results in both the maximum likelihood and Bayesian approach, further work should investigate the
behaviour of the two models under a variety of conditions (e.g. by performing simulation studies
with different database sizes and different values for the three heterogeneities). This would allow a
deeper insight into the most appropriate model for specific situations.
24
Technical Report 2003/GE2
References
1. Attia J, Thakkinstian A, D'Este C. Meta-analyses of molecular association studies: Methodologic lessons for
genetic epidemiology. J Clin Epidemiol 2003;56:297-303.
2. Clayton D,.McKeigue PM. Epidemiological methods for studying genes and environmental factors in complex
diseases. Lancet 2001;358:1356-60.
3. Keavney B. Genetic epidemiological studies of coronary heart disease. Int J Epidemiol 2002;31:730-6.
4. Davey Smith G,.Ebrahim S. 'Mendelian randomization': can genetic epidemiology contribute to understanding
environmental determinants of disease? Int J Epidemiol 2003;32:1-22.
5.
Homocysteine and risk of ischemic heart disease and stroke: a meta- analysis. JAMA 2002;288:2015-22.
6. Bailey LB,.Gregory JF 3rd. Polymorphisms of methylenetetrahydrofolate reductase and other enzymes: metabolic
significance, risks and impact on folate requirement. J Nutr 1999;129:919-22.
7. Wald DS, Law M, Morris JK. Homocysteine and cardiovascular disease: evidence on causality from a metaanalysis. BMJ 2002;325:1202.
8. Kosokabe T, Okumura K, Sone T, Kondo J, Tsuboi H, Mukawa H et al. Relation of a common
methylenetetrahydrofolate reductase mutation and plasma homocysteine with intimal hyperplasia after coronary
stenting. Circulation 2001;103:2048-54.
9. Anderson JL, Muhlestein JB, Horne BD, Carlquist JF, Bair TL, Madsen TE et al. Plasma homocysteine predicts
mortality independently of traditional risk factors and C-reactive protein in patients with angiographically defined
coronary artery disease. Circulation 2000;102:1227-32.
10. Rassoul F, Richter V, Janke C, Purschwitz K, Klotzer B, Geisel J et al. Plasma homocysteine and lipoprotein
profile in patients with peripheral arterial occlusive disease. Angiology 2000;51:189-96.
11. Kluijtmans LA, van den Heuvel LP, Boers GH, Frosst P, Stevens EM, van Oost BA et al. Molecular genetic
analysis in mild hyperhomocysteinemia: a common mutation in the methylenetetrahydrofolate reductase gene is a
genetic risk factor for cardiovascular disease. Am J Hum Genet 1996;58:35-41.
12. Deloughery TG, Evans A, Sadeghi A, McWilliams J, Henner WD, Taylor LM Jr et al. Common mutation in
methylenetetrahydrofolate reductase. Correlation with homocysteine metabolism and late-onset vascular disease.
Circulation 1996;94:3074-8.
13. Yoo J-H. A thermolabile variant of methylenetetrahydrofolate reductase is a determinant of
hyperhomocyst(e)inemia in the elderly. Ann N Y Acad Sci 2001;928:344.
14. Mazza A, Motti C, Nulli A, Marra G, Gnasso A, Pastore A et al. Lack of association between carotid intimamedia thickness and methylenetetrahydrofolate reductase gene polymorphism or serum homocysteine in noninsulin-dependent diabetes mellitus. Metabolism 2000;49:718-23.
15. Gonzalez Ordonez AJ, Fernandez Alvarez CR, Rodriguez JM, Garcia EC, Alvarez MV. Genetic polymorphism of
methylenetetrahydrofolate reductase and venous thromboembolism: a case-control study. Haematologica
1999;84:190-1.
16. Fujimura H, Kawasaki T, Sakata T, Ariyoshi H, Kato H, Monden M et al. Common C677T polymorphism in the
methylenetetrahydrofolate reductase gene increases the risk for deep vein thrombosis in patients with
predisposition of thrombophilia. Thromb Res 2000;98:1-8.
17. D'Angelo A, Coppola A, Madonna P, Fermo I, Pagano A, Mazzola G et al. The role of vitamin B12 in fasting
hyperhomocysteinemia and its interaction with the homozygous C677T mutation of the methylenetetrahydrofolate
reductase (MTHFR) gene. A case-control study of patients with early-onset thrombotic events. Thromb Haemost
25
Technical Report 2003/GE2
2000;83:563-70.
18. Arai K, Yamasaki Y, Kajimoto Y, Watada H, Umayahara Y, Kodama M et al. Association of
methylenetetrahydrofolate reductase gene polymorphism with carotid arterial wall thickening and myocardial
infarction risk in NIDDM. Diabetes 1997;46:2102-4.
19. Chambers JC, Ireland H, Thompson E, Reilly P, Obeid OA, Refsum H et al. Methylenetetrahydrofolate reductase
677 C-->T mutation and coronary heart disease risk in UK Indian Asians. Arterioscler Thromb Vasc Biol
2000;20:2448-52.
20. Chao CL, Tsai HH, Lee CM, Hsu SM, Kao JT, Chien KL et al. The graded effect of hyperhomocysteinemia on the
severity and extent of coronary atherosclerosis. Atherosclerosis 1999;147:379-86.
21. Tsai MY, Welge BG, Hanson NQ, Bignell MK, Vessey J, Schwichtenberg K et al. Genetic causes of mild
hyperhomocysteinemia in patients with premature occlusive coronary artery diseases. Atherosclerosis
1999;143:163-70.
22. Schmitz C, Lindpaintner K, Verhoef P, Gaziano JM, Buring J. Genetic polymorphism of
methylenetetrahydrofolate reductase and myocardial infarction. A case-control study. Circulation 1996;94:1812-4.
23. Meisel C, Cascorbi I, Gerloff T, Stangl V, Laule M, Muller JM et al. Identification of six
methylenetetrahydrofolate reductase (MTHFR) genotypes resulting from common polymorphisms: impact on
plasma homocysteine levels and development of coronary artery disease. Atherosclerosis 2001;154:651-8.
24. Ma J, Stampfer MJ, Hennekens CH, Frosst P, Selhub J, Horsford J et al. Methylenetetrahydrofolate reductase
polymorphism, plasma folate, homocysteine, and risk of myocardial infarction in US physicians. Circulation
1996;94:2410-6.
25. Schwartz SM, Siscovick DS, Malinow MR, Rosendaal FR, Beverly RK, Hess DL et al. Myocardial infarction in
young women in relation to plasma total homocysteine, folate, and a common variant in the
methylenetetrahydrofolate reductase gene. Circulation 1997;96:412-7.
26. Kim CH, Hwang KY, Choi TM, Shin WY, Hong SY. The methylenetetrahydrofolate reductase gene
polymorphism in Koreans with coronary artery disease. Int J Cardiol 2001;78:13-7.
27. Kluijtmans LA, Kastelein JJ, Lindemans J, Boers GH, Heil SG, Bruschke AV et al. Thermolabile
methylenetetrahydrofolate reductase in coronary artery disease. Circulation 1997;96:2573-7.
28. Christensen B, Frosst P, Lussier-Cacan S, Selhub J, Goyette P, Rosenblatt DS et al. Correlation of a common
mutation in the methylenetetrahydrofolate reductase gene with plasma homocysteine in patients with premature
coronary artery disease. Arterioscler Thromb Vasc Biol 1997;17:569-73.
29. Tokgozoglu SL, Alikasifoglu M, Unsal, Atalar E, Aytemir K, Ozer N et al. Methylene tetrahydrofolate reductase
genotype and the risk and extent of coronary artery disease in a population with low plasma folate. Heart
1999;81:518-22.
30. Nakai K, Fusazaki T, Suzuki T, Ohsawa M, Ogiu N, Kamata J et al. Genetic polymorphism of 5,10methylenetetrahydrofolate increases risk of myocardial infarction and is correlated to elevated levels of
homocysteine in the Japanese general population. Coron Artery Dis 2000;11:47-51.
31. Morita H, Taguchi J, Kurihara H, Kitaoka M, Kaneda H, Kurihara Y et al. Genetic polymorphism of 5,10methylenetetrahydrofolate reductase (MTHFR) as a risk factor for coronary artery disease. Circulation
1997;95:2032-6.
32. Ou T, Yamakawa-Kobayashi K, Arinami T, Amemiya H, Fujiwara H, Kawata K et al. Methylenetetrahydrofolate
reductase and apolipoprotein E polymorphisms are independent risk factors for coronary heart disease in Japanese:
a case-control study. Atherosclerosis 1998;137:23-8.
33. Malinow MR, Nieto FJ, Kruger WD, Duell PB, Hess DL, Gluckman RA et al. The effects of folic acid
26
Technical Report 2003/GE2
supplementation on plasma total homocysteine are modulated by multivitamin use and methylenetetrahydrofolate
reductase genotypes. Arterioscler Thromb Vasc Biol 1997;17:1157-62.
34. Kawashiri M, Kajinami K, Nohara A, Yagi K, Inazu A, Koizumi J et al. Effect of common
methylenetetrahydrofolate reductase gene mutation on coronary artery disease in familial hypercholesterolemia.
Am J Cardiol 2000;86:840-5.
35. Zheng YZ, Tong J, Do XP, Pu XQ, Zhou BT. Prevalence of methylenetetrahydrofolate reductase C677T and its
association with arterial and venous thrombosis in the Chinese population. Br J Haematol 2000;109:870-4.
36. Fernandez-Arcas N, Dieguez-Lucena JL, Munoz-Moran E, Ruiz-Galdon M, Espinosa-Caliani S, Aranda-Lara P et
al. The genotype interactions of methylenetetrahydrofolate reductase and renin-angiotensin system genes are
associated with myocardial infarction. Atherosclerosis 1999;145:293-300.
37. Brulhart MC, Dussoix P, Ruiz J, Passa P, Froguel P, James RW. The (Ala-Val) mutation of
methylenetetrahydrofolate reductase as a genetic risk factor for vascular disease in non-insulin-dependent diabetic
patients. Am J Hum Genet 1997;60:228-9.
38. Girelli D, Friso S, Trabetti E, Olivieri O, Russo C, Pessotto R et al. Methylenetetrahydrofolate reductase C677T
mutation, plasma homocysteine, and folate in subjects from northern Italy with or without angiographically
documented severe coronary atherosclerotic disease: evidence for an important genetic-environmental interaction.
Blood 1998;91:4158-63.
39. Brugada R,.Marian AJ. A common mutation in methylenetetrahydrofolate reductase gene is not a major risk of
coronary artery disease or myocardial infarction. Atherosclerosis 1997;128:107-12.
40. Adams M, Smith PD, Martin D, Thompson JR, Lodwick D, Samani NJ. Genetic analysis of thermolabile
methylenetetrahydrofolate reductase as a risk factor for myocardial infarction. QJM 1996;89:437-44.
41. Ardissino D, Mannucci PM, Merlini PA, Duca F, Fetiveau R, Tagliabue L et al. Prothrombotic genetic risk factors
in young survivors of myocardial infarction. Blood 1999;94:46-51.
42. Dilley A, Hooper WC, El-Jamil M, Renshaw M, Wenger NK, Evatt BL. Mutations in the genes regulating
methylene tetrahydrofolate reductase (MTHFR C-->T677) and cystathione beta-synthase (CBS G-->A919, CBS
T-- >c833) are not associated with myocardial infarction in African Americans. Thromb Res 2001;103:109-15.
43. Verhoef P, Rimm EB, Hunter DJ, Chen J, Willett WC, Kelsey K et al. A common mutation in the
methylenetetrahydrofolate reductase gene and risk of coronary heart disease: results among U.S. men. J Am Coll
Cardiol. 1998;32:353-9.
44. Hsu LA, Ko YL, Wang SM, Chang CJ, Hsu TS, Chiang CW et al. The C677T mutation of the
methylenetetrahydrofolate reductase gene is not associated with the risk of coronary artery disease or venous
thrombosis among Chinese in Taiwan. Hum Hered 2001;51:41-5.
45. van Bockxmeer FM, Mamotte CD, Vasikaran SD, Taylor RR. Methylenetetrahydrofolate reductase gene and
coronary artery disease. Circulation 1997;95:21-3.
46. Abbate R, Sardi I, Pepe G, Marcucci R, Brunelli T, Prisco D et al. The high prevalence of thermolabile 5-10
methylenetetrahydrofolate reductase (MTHFR) in Italians is not associated to an increased risk for coronary artery
disease (CAD). Thromb Haemost 1998;79:727-30.
47. Wilcken DE, Wang XL, Sim AS, McCredie RM. Distribution in healthy and coronary populations of the
methylenetetrahydrofolate reductase (MTHFR) C677T mutation. Arterioscler Thromb Vasc Biol 1996;16:878-82.
48. Pinto X, Vilaseca MA, Garcia-Giralt N, Ferrer I, Pala M, Meco JF et al. Homocysteine and the MTHFR 677C-->T
allele in premature coronary artery disease. Case control and family studies. Eur J Clin Invest 2001;31:24-30.
49. Anderson JL, King GJ, Thomson MJ, Todd M, Bair TL, Muhlestein JB et al. A mutation in the
methylenetetrahydrofolate reductase gene is not associated with increased risk for coronary artery disease or
27
Technical Report 2003/GE2
myocardial infarction. J Am Coll Cardiol 1997;30:1206-11.
50. Fowkes FG, Lee AJ, Hau CM, Cooke A, Connor JM, Lowe GD. Methylene tetrahydrofolate reductase (MTHFR)
and nitric oxide synthase (ecNOS) genes and risks of peripheral arterial disease and coronary heart disease:
Edinburgh Artery Study. Atherosclerosis 2000;150:179-85.
51. Gardemann A, Weidemann H, Philipp M, Katz N, Tillmanns H, Hehrlein FW et al. The TT genotype of the
methylenetetrahydrofolate reductase C677T gene polymorphism is associated with the extent of coronary
atherosclerosis in patients at high risk for coronary artery disease. Eur Heart J 1999;20:584-92.
52. Todesco L, Angst C, Litynski P, Loehrer F, Fowler B, Haefeli WE. Methylenetetrahydrofolate reductase
polymorphism, plasma homocysteine and age. Eur J Clin Invest 1999;29:1003-9.
53. Reinhardt D, Sigusch HH, Vogt SF, Farker K, Muller S, Hoffmann A. Absence of association between a common
mutation in the methylenetetrahydrofolate reductase gene and the risk of coronary artery disease. Eur J Clin Invest
1998;28:20-3.
54. Verhoef P, Kok FJ, Kluijtmans LA, Blom HJ, Refsum H, Ueland PM et al. The 677C-->T mutation in the
methylenetetrahydrofolate reductase gene: associations with plasma total homocysteine levels and risk of coronary
atherosclerotic disease. Atherosclerosis 1997;132:105-13.
55. Kihara T, Abe S, Saigo M, Kaieda H, Obata H, Eto H et al. Methylenetetrahydrofolate reductase gene
polymorphism and premature myocardial infarction. Circulation 1997;96:101-I.
56. Araujo F, Lopes M, Goncalves L, Maciel MJ, Cunha-Ribeiro LM. Hyperhomocysteinemia, MTHFR C677T
genotype and low folate levels: a risk combination for acute coronary disease in a Portuguese population. Thromb
Haemost 2000;83:517-8.
57. Malik NM, Syrris P, Schwartzman R, Kaski JC, Crossman DC, Francis SE et al. Methylenetetrahydrofolate
reductase polymorphism (C-677T) and coronary artery disease. Clin Sci (Lond) 1998;95:311-5.
58. Thogersen AM, Nilsson TK, Dahlen G, Jansson JH, Boman K, Huhtasaari F et al. Homozygosity for the C677->T mutation of 5,10- methylenetetrahydrofolate reductase and total plasma homocyst(e) ine are not associated
with greater than normal risk of a first myocardial infarction in northern Sweden. Coron Artery Dis 2001;12:8590.
59. Izumi M, Iwai N, Ohmichi N, Nakamura Y, Shimoike H, Kinoshita M. Molecular variant of 5,10methylenetetrahydrofolate reductase is a risk factor of ischemic heart disease in the Japanese population.
Atherosclerosis 1996;121:293-4.
60. Szczeklik A, Sanak M, Jankowski M, Dropinski J, Czachor R, Musial J et al. Mutation A1298C of
methylenetetrahydrofolate reductase: risk for early coronary disease not associated with hyperhomocysteinemia.
Am J Med Genet 2001;101:36-9.
61. Gallagher PM, Meleady R, Shields DC, Tan KS, McMaster D, Rozen R et al. Homocysteine and risk of premature
coronary heart disease. Evidence for a common gene mutation. Circulation 1996;94:2154-8.
62. Mager A, Lalezari S, Shohat T, Birnbaum Y, Adler Y, Magal N et al. Methylenetetrahydrofolate reductase
genotypes and early-onset coronary artery disease. Circulation 1999;100:2406-10.
63. Ferrer-Antunes C, Palmeiro A, Morais J, Lourenco M, Freitas M, Providencia L. The mutation C677T in the
methylene tetrahydrofolate reductase gene as a risk factor for myocardial infarction in the Portuguese population.
Thromb Haemost 1998;80:521-2.
64. Gulec S, Aras O, Akar E, Tutar E, Omurlu K, Avci F et al. Methylenetetrahydrofolate reductase gene
polymorphism and risk of premature myocardial infarction. Clin Cardiol 2001;24:281-4.
65. Dekou V, Whincup P, Papacosta O, Ebrahim S, Lennon L, Ueland PM et al. The effect of the C677T and A1298C
polymorphisms in the methylenetetrahydrofolate reductase gene on homocysteine levels in elderly men and
28
Technical Report 2003/GE2
women from the British regional heart study. Atherosclerosis 2001;154:659-66.
66. Voutilainen S, Lakka TA, Hamelahti P, Lehtimaki T, Poulsen HE, Salonen JT. Plasma total homocysteine
concentration and the risk of acute coronary events: the Kuopio Ischaemic Heart Disease Risk Factor Study. J
Intern Med 2000;248:217-22.
67. Chango A, Potier De Courcy G, Boisson F, Guilland JC, Barbe F, Perrin MO et al. 5,10methylenetetrahydrofolate reductase common mutations, folate status and plasma homocysteine in healthy French
adults of the Supplementation en Vitamines et Mineraux Antioxydants (SU.VI.MAX) cohort. Br J Nutr
2000;84:891-6.
68. Chango A, Boisson F, Barbe F, Quilliot D, Droesch S, Pfister M et al. The effect of 677C-->T and 1298A-->C
mutations on plasma homocysteine and 5,10-methylenetetrahydrofolate reductase activity in healthy subjects. Br J
Nutr 2000;83:593-6.
69. Spiegelhalter DJ, Thomas A, Best NG. WinBUGS Version 1.3. User Manual. MRC Biostatistics Unit : 1999.
70. Thompson JR, Tobin MD, Minelli C. On the accuracy of estimates of the effect of phenotype on disease derived
from Mendelian randomisation studies. Technical Report 2003_GE1 - available at
http://www.prw.le.ac.uk/research/HCG/getechrep.html.
71. Hirschhorn JN, Lohmueller K, Byrne E, Hirschhorn K. A comprehensive review of genetic association studies.
Genet Med 2002;4:45-61.
29
Technical Report 2003/GE2
APPENDIX 1
Stata program to estimate the within-study correlation between log OR of genotype on disease and
mean effect of genotype on phenotype
*********************************************************
* PROGRAM TO INVESTIGATE WITHIN-STUDY CORRELATION
* BETWEEN GENOTYPE-PHENOTYPE & GENOTYPE-DISEASE
* WORKS BY SIMULATING A LARGE POPULATION AND SUBSAMPLING
* THE CASES AND CONTROLS
*********************************************************
version 8
clear
set more off
cd "d:\research\genetics\mendelian randomisation\meta-analysis\stata"
local NCASE=100
/* number of cases */
local NCONT=100
/* number of controls */
local ORpd=1.1
/* odds ratio for unit phenotype on disease */
local beta=log(`ORpd') /* beta in logistic function */
local mn_alpha=-4
/* average alpha in logitic function */
local sd_alpha=0.2
/* between-subject variation in baseline risk due to
covariates etc */
local delta=4
/* effect of genotype on phenotype */
local mn_pheno=8
/* mean phenotype */
local sd_pheno=2
/* sd in phenotype */
local pg=0.12
/* genotype frequency eg 12% */
local M=500
/* number of simulations */
local N=`NCASE'+`NCONT'
local NSIM=50*`N'
/* number to simulate */
*-------------------------------* Post results to file temp
*
postfile pf a b c d m1 sd1 m2 sd2 using "..\data\temp" , replace
*-------------------------------* Loop throught the simulations
*
forvalues i=1/`M' {
di "." _continue
*-------------------------------* Simulate the population
*
quietly {
set obs `NSIM'
*-------------------------------* generate genotypes
*
gen genotype=uniform() < `pg'
*-------------------------------* generate phenotypes
*
gen phenotype=`mn_pheno'+`delta'*genotype+`sd_pheno'*invnorm(uniform())
*-------------------------------* generate prob disease
*
local alpha=`mn_alpha'+`sd_alpha'*invnorm(uniform())
gen pd=exp(`alpha'+`beta'*phenotype)/(1+exp(`alpha'+`beta'*phenotype))
*--------------------------------
30
Technical Report 2003/GE2
* generate outcome
*
gen d= uniform() < pd
*-------------------------------* pick the cases & controls
*
gen u=d+uniform()
sort u
gen mark=_n<=`NCONT'
replace u=(1-d)+uniform()
sort u
replace mark=1 if _n<=`NCASE'
drop if mark==0
*-------------------------------* data for odds ratio GD
*
count if genotype==0 & d==0
local a=r(N)
count if genotype==1 & d==0
local b=r(N)
count if genotype==0 & d==1
local c=r(N)
count if genotype==1 & d==1
local d=r(N)
*-------------------------------* data for phenotype difference
*
summarize phenotype if genotype==1 & d==0
local m1=r(mean)
local sd1=r(sd)
summarize phenotype if genotype==0 & d==0
local m2=r(mean)
local sd2=r(sd)
post pf (`a') (`b') (`c') (`d') (`m1') (`sd1') (`m2') (`sd2')
drop u d genotype phenotype pd mark
}
}
postclose pf
*-------------------------------* Restore results
*
use "..\data\temp", clear
*-------------------------------* analyse
*
gen n=a+b+c+d
count if n != `N'
gen OR=(a*d)/(b*c)
gen lnOR=log(OR)
gen se=sqrt(1/a+1/b+1/c+1/d)
gen d1=m1-m2
summarize OR d1
corr lnOR d1
31
Technical Report 2003/GE2
APPENDIX 2
Stata program for fitting model A
************************************************
* LIKELIHOOD ANALYSIS FOR MODEL A
************************************************
version 8
cd "D:\Research\Genetics\Mendelian Randomisation\meta-analysis\Stata"
clear
set more off
program drop _all
*-----------------------------* Program to evaluate LogL
*
program LL
args lnl ratio muy tz ty r
quietly {
scalar tauz=exp(`tz')
scalar tauy=exp(`ty')
scalar rho=(exp(`r')-1)/(exp(`r')+1)
scalar muz=`muy'*`ratio'
gen double vz=tauz+1/wz
gen double vy=tauy+1/wy
scalar cov=rho*sqrt(tauz*tauy)
gen double r=cov/sqrt(vz*vy)
gen double L3=log(vz)+(z-muz)^2/vz
gen double L2=log(vy)+(y-`muy')^2/vy
gen double L1=log(vz*vy*(1-r*r))+((z-muz)^2/vz- /*
*/ 2*r*(z-muz)*(y-`muy')/sqrt(vz*vy)+(y-`muy')^2/vy)/(1-r*r)
replace `lnl'=-0.5*L1 if type==1
replace `lnl'=-0.5*L2 if type==2
replace `lnl'=-0.5*L3 if type==3
drop vz vy r L1 L2 L3
}
end
*-----------------------------* Read Data
*
*use "..\data\test100",clear
*use "..\data\test1000",clear
use "..\data\mthfr chd.dta",clear
*-----------------------------* Type denotes the available data
*
gen type=1
replace type=2 if z ==. & y ~=.
replace type=3 if y ==. & z ~=.
*-----------------------------* Initial Guess
*
summarize z
local tauz=r(Var)
local ratio=r(mean)
summarize y
32
Technical Report 2003/GE2
local mu=r(mean)
local tauy=r(Var)
local ratio=`ratio'/`mu'
corr z y
local r=r(rho)
di "Initial Guess "%7.3f `ratio' %7.3f `mu' %7.3f `tauz' %7.3f `tauy' %7.3f `r'
local tauz=log(`tauz')
local tauy=log(`tauy')
local r=log((1+`r')/(1-`r'))
*-----------------------------* Fit Model A
*
ml model lf LL () () () () ()
ml init eq1:_cons=`ratio' eq2:_cons=`mu' eq3:_cons=`tauz' eq4:_cons=`tauy'
eq5:_cons=`r'
ml maximize
33
Technical Report 2003/GE2
APPENDIX 3
Stata program for fitting model A with specified correlations
************************************************
* LIKELIHOOD ANALYSIS
************************************************
version 8
clear
set more off
program drop _all
*-----------------------------* Program to evaluate LogL
*
program LL
args lnl ratio muy tz ty
quietly {
scalar tauz=exp(`tz')
scalar tauy=exp(`ty')
scalar rho=$R
scalar muz=`muy'*`ratio'
gen double vz=tauz+1/wz
gen double vy=tauy+1/wy
scalar cov=rho*sqrt(tauz*tauy)
gen double r=cov/sqrt(vz*vy)
gen double L3=log(vz)+(z-muz)^2/vz
gen double L2=log(vy)+(y-`muy')^2/vy
gen double L1=log(vz*vy*(1-r*r))+((z-muz)^2/vz- /*
*/ 2*r*(z-muz)*(y-`muy')/sqrt(vz*vy)+(y-`muy')^2/vy)/(1-r*r)
replace `lnl'=-0.5*L1 if type==1
replace `lnl'=-0.5*L2 if type==2
replace `lnl'=-0.5*L3 if type==3
drop vz vy r L1 L2 L3
}
end
*-----------------------------* Read Data
*
*use "..\data\test100",clear
*use "..\data\test1000",clear
use "..\data\mthfr chd.dta",clear
*-----------------------------* Type denotes the available data
*
gen type=1
replace type=2 if z ==. & y ~=.
replace type=3 if y ==. & z ~=.
*-----------------------------* Initial Guess
*
summarize z
local tauz=r(Var)
local ratio=r(mean)
summarize y
local mu=r(mean)
local tauy=r(Var)
34
Technical Report 2003/GE2
local ratio=`ratio'/`mu'
corr z y
local r=r(rho)
di "Initial Guess "%7.3f `ratio' %7.3f `mu' %7.3f `tauz' %7.3f `tauy' %7.3f `r'
local tauz=log(`tauz')
local tauy=log(`tauy')
*-----------------------------* File to collect profile LnL
*
postfile pf r LL b se using "..\data\profile", replace
*-----------------------------* Loop over r
*
forvalues i=1/20 {
global R=(`i'-0.5)/20
ml model lf LL () () () ()
ml init eq1:_cons=`ratio' eq2:_cons=`mu' eq3:_cons=`tauz' eq4:_cons=`tauy'
ml max
matrix cf=e(b)
local coef=cf[1,1]
matrix v=e(V)
local se=sqrt(v[1,1])
post pf ($R) (e(ll)) (`coef') (`se')
}
postclose pf
*-----------------------------* Restore data & plot
*
use "..\data\profile" , clear
twoway line LL r, xtitle("Correlation") ytitle("Log Likelihood") /*
*/ title("Profile Log Likelihood") /*
*/ saving("..\plots\profile LnL.gph",replace)
twoway line se r, xtitle("Correlation") ytitle("St Error of Ratio") /*
*/ title("Profile St Error Estimates") /*
*/ saving("..\plots\profile se.gph",replace)
twoway line b r, xtitle("Correlation") ytitle("Estimate of Ratio") /*
*/ title("Profile Ratio Estimates")
/*
*/ saving("..\plots\profile ratio.gph",replace)
gen z=b/se
twoway line z r, xtitle("Correlation") ytitle("z=estimate/st error") /*
*/ title("Profile Ratio to St Error") /*
*/ saving("..\plots\profile z.gph",replace)
35
Technical Report 2003/GE2
APPENDIX 4
Stata program for fitting model B
************************************************
* LIKELIHOOD ANALYSIS FOR MODEL B
************************************************
version 8
cd "D:\Research\Genetics\Mendelian Randomisation\meta-analysis\Stata"
clear
set more off
program drop _all
*-----------------------------* Program to evaluate LogL
*
program LL
args lnl ratio muy tt ty
quietly {
scalar taut=exp(`tt')
scalar tauy=exp(`ty')
scalar muz=`muy'*`ratio'
gen double vz=`ratio'^2*tauy+`muy'^2*taut+1/wz
gen double vy=tauy+1/wy
gen double cov=`ratio'*tauy
gen double r=cov/sqrt(vz*vy)
gen double L3=log(vz)+(z-muz)^2/vz
gen double L2=log(vy)+(y-`muy')^2/vy
gen double L1=log(vz*vy*(1-r*r))+( (z-muz)^2/vz- /*
*/ 2*r*(z-muz)*(y-`muy')/sqrt(vz*vy)+(y-`muy')^2/vy )/(1-r*r)
replace `lnl'=-0.5*L1 if type == 1
replace `lnl'=-0.5*L2 if type == 2
replace `lnl'=-0.5*L3 if type == 3
drop vz vy r cov L1 L2 L3
}
end
*-----------------------------* Read Data
*
*use "..\data\testB100",clear
*use "..\data\testB1000",clear
use "..\data\mthfr chd.dta",clear
*-----------------------------* Type denotes the available data
*
gen type=1
replace type=2 if z ==. & y ~=.
replace type=3 if y ==. & z ~=.
*-----------------------------* Initial Guess
*
summarize z
local tauz=r(Var)
local ratio=r(mean)
summarize y
local mu=r(mean)
local tauy=r(Var)
36
Technical Report 2003/GE2
local ratio=`ratio'/`mu'
gen theta=z/y
summarize theta
local taut=r(Var)
di "Initial Guess "%7.3f `ratio' %7.3f `mu' %7.3f `taut' %7.3f `tauy'
local tauz=log(`tauz')
local tauy=log(`tauy')
*-----------------------------* Fit Model B
*
ml model lf LL () () () ()
ml init eq1:_cons=`ratio' eq2:_cons=`mu' eq3:_cons=`taut' eq4:_cons=`tauy'
ml maximize
37