Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Gene expression programming wikipedia , lookup
Pharmacogenomics wikipedia , lookup
Behavioural genetics wikipedia , lookup
Epigenetics of neurodegenerative diseases wikipedia , lookup
Microevolution wikipedia , lookup
Genome (book) wikipedia , lookup
Fetal origins hypothesis wikipedia , lookup
Quantitative trait locus wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Heritability of IQ wikipedia , lookup
Technical Report 2003/GE2 TECHNICAL REPORT 2003/GE2 Meta-Analytical methods for the synthesis of Genetic Studies using Mendelian Randomisation Cosetta Minelli*, M.D., M.Sc.; John R Thompson*, Ph.D.; Martin D Tobin, M.Sc., M.R.C.P.; Keith R Abrams, Ph.D., C.Stat. * joint first authors Genetic Epidemiology Unit, Centre for Biostatistics, Department of Health Sciences, University of Leicester, U.K. Date: 4 Aug 2003 Revision: 11 Aug 2003 – minor corrections to English and addition of a section on Publication and Reporting Bias. 19 Aug 2003 – re-numbering of tables and addition to references. 1 Technical Report 2003/GE2 1. Introduction With the recent growth in knowledge about the human genome there has been a dramatic increase in the number of genetic epidemiological studies of the association between specific genes and diseases and between those genes and the risk factors or phenotypes that are thought to be intermediates on the causal pathway to disease. In many instances these studies have supplemented pre-existing research on the association between the phenotype and the disease. For instance, many recent studies have looked at the associations between polymorphisms of the Methylene TetraHydroFolate Reductase (MTHFR) gene and Coronary Heart Disease (CHD) and between the MTHFR gene and homocysteine; these have, in part, been motivated by the pre-existing evidence for an association between homocysteine and CHD. Similarly there have been many studies of APO-E polymorphisms and CHD or stroke, and many studies of APO-E and lipid levels, stemming from epidemiological evidence of an association between lipids and CHD or stroke. As the number of genetic studies has grown, so meta-analyses have been produced to synthesise the evidence and overcome the limitations of power found in even moderately sized studies1. Two factors are evident from reviewing these meta-analyses; first, studies of gene and phenotype tend to be less common that those of gene and disease, and second, the evidence for a genotype-phenotype association is often obtained as a spin-off from a study primarily aimed at investigating the genotype-disease relationship and then the information is often only obtained on a subset of the subjects. Where there is strong reason to suppose that the phenotype is intermediate on the causal pathway from gene to disease, it would be sensible to perform meta-analyses that in some way integrate the evidence for all three relationships; genotype-phenotype, genotype-disease and phenotype-disease. The logic behind this approach is greatly strengthened by an appeal to Mendelian randomisation, that is, the fact that one's genes are inherited at birth by a seemingly random process. Accordingly, epidemiological studies of the genotype-phenotype and genotype-disease associations show strong parallels with randomised trials and should not be affected by confounding or reverse causation in the way that makes phenotype-disease studies so difficult to interpret2-4. In theory, by combining the information from genotype-disease and genotype-phenotype studies, it is possible to derive an unconfounded estimate of the phenotype-disease association. Integrated meta-analyses may be able to take advantage of Mendelian randomisation, in order to test whether the phenotype is actually on the causal pathway and to provide an unconfounded estimate of the effect of phenotype on disease. 2 Technical Report 2003/GE2 2. 2.1 Methods Mendelian Randomisation In order to use genetic studies to quantify the relationship between the phenotype and disease, the estimate of the genotype-disease association has to be combined with the estimate of the genotypephenotype association (Fig. 1). Suppose that a mutant genotype (GG) causes an increased risk of disease compared to the wildtype (gg) and that this is measured by the Odds Ratio (ORGG vs gg). Further suppose that GG compared to gg causes a mean difference, ΔP, in the level of the intermediate phenotype. Then, under the assumptions required for Mendelian randomisation and assuming linearity of the relationship between phenotype increase/decrease and OR for the disease on a log scale, ORGG vs gg 1/ ΔP is an unconfounded estimate of the odds ratio of disease resulting from a unit change in the phenotype. Figure 1 - Calculation of an unconfounded estimate of the effect of a change in phenotype on a disease based on the concept of Mendelian randomisation Genotype G Genotype - Phenotype mean (PGG – Pgg ) = P Phenotype P Genotype - Disease Pooled OR GG vs gg = b ? Disease D Phenotype - Disease Odd ratio associated with a k unit change in Phenotype OR PD = b k/P 2.2 Example: MTHFR, Homocysteine and CHD A recent non-genetic meta-analysis on individual patient data from epidemiological studies showed a decrease of 11% in CHD for a 25% decrease of homocysteine levels (OR: 0.89; 95% Confidence 3 Technical Report 2003/GE2 5 Interval [CI]: 0.83 to 0.96) . The meta-analysis showed heterogeneity between studies partly explained by study design. Retrospective studies yielded higher estimates of risk, perhaps due to unadjusted confounding. In particular, two major confounding factors were suggested; smoking and blood pressure. These are both strongly correlated with homocysteine and are known risk factors for CHD. The strong possibility of unadjusted confounding makes it very difficult to be sure that the relationship between homocysteine and CHD is causal. A common polymorphism of the gene for the MTHFR enzyme leads to reduced enzyme activity, lower folate and consequently higher homocysteine levels6. The polymorphism involves a C-to-T substitution at base 677, so the wildtype homozygous genotype is referred to as CC and the mutant homozygous genotype as TT. This polymorphism can be used, together with the idea of Mendelian randomisation, to indirectly assess the effect of homocysteine on CHD. A recent genetic meta-analysis of individual patient data has shown an increased risk of CHD of about 16% associated with genotype TT compared to CC (OR: 1.16; 95%CI: 1.05 to 1.28). This result was similar to that of another meta-analysis published at the same time but carried out on aggregated data, which showed a pooled odds ratio of 1.21 for TT genotype (95%CI: 1.06 to 1.39)7. The later paper also mentioned those studies that evaluated the association between genotype and phenotype. They found a simple average mean difference of 2.7 μmol/l in homocysteine concentration (95%CI: 2.1 to 3.4) between TT and CC genotypes. Using the paper by Wald et al.7 a total of 64 genetic studies were identified (Tab. 1). Classifying the studies that reported both estimate and precision, 31 evaluated only genotype-disease association, 16 only genotype-phenotype association, and 17 both. The definition of CHD used was myocardial infarction or angiographically confirmed coronary artery occlusion (>50% of the luminal diameter). Genotype-disease associations were reported in an additional 11 studies, but this information was not considered because either a different disease definition (prognosis rather than occurrence of CHD8;9, atherosclerotic vascular disease rather than CHD10-13, other disease outcomes14-17) or a different study population (diabetic subjects18) was used. Among the 17 studies evaluating both associations, 7 measured the mean difference in phenotype level with genotype in both cases and controls (2 reporting only combined means), but 4 studies measured homocysteine only in cases and 4 only in controls, while two reports were unclear. 4 Technical Report 2003/GE2 Table 1 – Studies included in the meta-analysis (n=64) and evaluating either Genotype-Disease (n=31), GenotypePhenotype (n=16) or both associations (n=17) Study Chambers (Asians)19 Chao20 Tsai21 Schmitz22 Meisel23 Ma24 Schwartz25 Kim26 Kluijtmans (1997)27 Christensen28 Chambers (European) 19 Tokgozoglu29 Nakai30 Morita31 Ou32 Malinow33 Kawashiri34 Zheng35 Fernandez-A. (Females)36 Brulhart37 Girelli38 Brugada39 Adams40 Ardissino41 Dilley42 Verhoef (1998)43 Hsu44 Van Bockxmeer45 Abbate46 Wilcken47 Pinto48 Anderson (1997)49 Fowkes50 Gardemann51 Todesco52 Reinhardt53 Verhoef (1997)54 Kihara55 Araujo56 Malik57 Thogersen58 Izumi59 Fernandez-A. (Males)36 Szczeklik60 Gallagher61 Mager62 Ferrer-Antunes63 Gulec64 Dekou (Females)65 Voutilainen66 Chango (a)67 Mazza14 Chango (b)68 Dekou (Males)65 Kosokabe8;14 Gonzalez Ordonez15 Anderson (2000)9 Kluijtmans (1996)11 Deloughery12 Fujimara16 Arai18 Rassoul10 Yoo13 D'Angelo17 GENOTYPE-DISEASE GENOTYPE-PHENOTYPE log OR Variance logOR Diff. Hcy Variance Diff. Hcy -0.82 -0.48 -0.31 -0.22 -0.2 -0.17 -0.11 0.07 0.19 0.25 0.28 0.41 0.55 0.73 0.84 0.96 1.14 -1.61 -0.63 -0.37 -0.29 -0.26 -0.22 -0.21 -0.17 -0.17 -0.11 -0.01 0.03 0.04 0.06 0.1 0.25 0.26 0.3 0.32 0.35 0.38 0.4 0.42 0.46 0.49 0.5 0.77 1.18 1.26 1.3 1.46 / / / / / / / / / / / / / / / 0.42 0.25 0.14 0.10 0.03 0.07 0.21 0.15 0.03 0.16 0.07 0.32 0.10 0.04 0.06 0.23 0.26 0.61 0.28 0.07 0.10 0.21 0.08 0.09 0.77 0.04 0.16 0.10 0.16 0.08 0.23 0.08 0.16 0.08 0.17 0.19 0.26 0.22 0.66 0.12 0.39 0.08 0.09 0.14 0.24 0.15 0.30 0.31 / / / / / / / / / / / / / / / 0.4 2.9 1.3 -0.8 0.9 2 2.6 1.3 2.8 2.5 1.4 8.8 3 3.8 4.6 1.2 11 / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / 0.9 1 1.4 1.4 1.7 2.1 2.1 2.1 3.8 4 4.2 4.3 4.4 7.3 8.1 10.00 2.44 0.46 0.48 0.19 0.36 1.23 2.63 1.47 1.15 0.47 14.29 0.57 1.47 1.61 0.71 11.11 / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / 0.33 0.86 0.64 0.38 2.56 0.53 2.08 0.84 2.33 4.76 7.69 3.23 4.00 4.17 0.89 / / 9.9 3.45 5 Technical Report 2003/GE2 2.3 Publication and Reporting Bias The studies in Table 1 show that some studies reported on the Genotype-Disease (GD) association, some on the Genotype-Phenotype (GP) association and some on both. Thus as well as the usual publication bias we need to be concerned about the possibility of reporting bias due to researchers choosing not to report information that is inconsistent with some accepted theory. The usual method of investigation of publication bias is via funnel plots. Figure 2(a) and 2(b) show the funnel plots for the GD estimates for studies that do or do not report on GP. Figure 3 combines the information in a single plot. Figure 2 – Funnel plots of GD in studies (a) that report also GP (b) that do not report GP Studies Only Reporing on GD 1 2 3 4 Inverse Standard Error 4 3 2 0 0 1 Inverse Standard Error 5 5 6 6 Studies Reporting on GD and GP -2 -1 0 Log OR of GD 1 2 -2 -1 0 Log OR of GD 1 2 Figure 3 – Funnel plots of GD in all studies 5 4 3 2 1 0 Inverse Standard Error 6 All Studies Reporing on GD -2 -1 0 Log OR of GD 1 2 Information Reported GD only GD & GP 6 Technical Report 2003/GE2 It certainly appears from figures 2 and 3 that studies reporting a small GD odds ratio are underrepresented in the literature but it is not clear that the reporting of GP is related to that bias. Figure 4(a) and 4(b) show the funnel plots for the GP estimates for studies that do or do not report on GD. Figure 5 combines the information in a single plot. Figure 4 – Funnel plots of GP in studies (a) that report also GD (b) that do not report GD Studies Reporting on GP and GD 3 1 2 Inverse Standard Error 2 1 0 0 Inverse Standard Error 3 Studies Only Reporing on GP 0 2 4 6 8 Mean Difference in GP 10 12 0 2 4 6 8 Mean Difference in GP 10 12 Figure 5 – Funnel plots of GP in all studies 2 1 0 Inverse Standard Error 3 All Studies Reporing on GP 0 2 4 6 8 Mean Difference in GP 10 12 Information Reported GP only GP & GD 7 Technical Report 2003/GE2 It certainly appears from figures 4 and 5 that studies reporting a negative GP difference are underrepresented in the literature but once again it is not clear that the reporting of GD is related to that bias. 2.4 Meta-analytical Approaches In the example of MTHFR gene, homocysteine and coronary heart disease, as it is likely to happen in most instances, a mixture of studies that measure the genotype-phenotype effect (n=16), those that measure genotype-disease (n=31) and those that measure both (n=17), is present. If the genotypephenotype and genotype-disease evidence all come from unrelated sources then separate metaanalyses will give estimates of the pooled effects that can, by appealing to Mendelian randomisation, be combined to estimate the size of the phenotype-disease association. However, when some sources are shared, it becomes important to consider the correlation in the genotype-phenotype and genotype-disease evidence arising from studies that measure both associations. In the following paragraphs, meta-analytical methods that allow for between-study correlation are described, which enable all available evidence to be combined within a single model. In particular, two models will be presented (Model A and B), which differ in the way that the heterogeneities of genotype-disease, genotype-phenotype and phenotype-disease associations are modelled (Fig. 6). In model A we directly model the heterogeneity of the observed associations with the genotype. Under the assumption of a causal pathway acting solely through the phenotype, these heterogeneities will be correlated and so they are modelled by a general bivariate normal distribution. In contrast, in model B we model the heterogeneity of the genotype-phenotype and phenotype-disease stages. Together these induce the heterogeneity in the genotype-disease association and therefore the correlation between the genotype-phenotype and genotype-disease stages. Critically we add the assumption that the heterogeneities on the genotype-phenotype and phenotype-disease stages are independent. In many situations this assumption will be reasonable because there will be no biological reason why a study where the effect of gene on phenotype is large will also find that the phenotype is more closely linked to disease. Both models will be fitted following a maximum likelihood approach as well as a Bayesian approach. 8 Technical Report 2003/GE2 Figure 6 – Two different ways of modelling the heterogeneities of the three associations Model B Model A Genotype Genotype y y x Phenotype Disease =0 Phenotype Disease Heterogeneity in phenotype-disease association can be derived Heterogeneity in gene-disease association can be derived 2.4.1 Model A Let the Log Odds Ratio of Disease given genotype be x and the mean difference in phenotype be y. The ith study produces two potential estimates xi and yi although in practice only one or other may be obtained or reported. The study estimates have reported variances vxi and vyi. The variances are assumed known. It is possible that the within-study estimates of x and y are correlated (within-study correlation), although there is good reason to suppose that this correlation is negligible because: y (difference in phenotype across genotypes) is often only measured in a subset of the total study subjects x (log OR of disease given genotype) is an aggregate measure obtained from the disease outcome since we cannot measure the actual increase in risk at an individual level To study whether this assumption was reasonable we performed the following simulation study. Each simulated study consisted of n cases and n controls. The probability of disease given the phenotype, u, was assumed to be controlled by a logistic function with parameters and . The distribution of the phenotype over the population was assumed to be normal N(,2) with a fixed genotype effect on phenotype. From the generated data we estimated the log odds ratio for the genotype-disease association and the mean difference in phenotype with disease. M repeated studies were simulated under identical conditions and the correlation between the estimated log odds ratio 9 Technical Report 2003/GE2 for the genotype-disease association and the mean difference in phenotypes was calculated. Initial values for the parameters were chosen to reflect the values in studies of the MTHFRHomocysteine-CHD pathway. Thus OR unit change in phenotype on disease = 1.1 [=log(1.1)=0.095] Difference in phenotype due to genotype = 4 units [Hence OR genotype on disease = 1.46] Frequency of dangerous genotype = 12% Between-subject variation in levels of phenotype, u, N(8,2) or N(12,2) depending on genotype Baseline risk of disease () mean –4 st dev 0.2 (reflecting unmeasured covariates) Hence e u P( D 1 | u ) 1 e u About 4% of people with the wild type genotype and about 5.4% of those with the mutant genotype develop the disease. Studies had 100 cases and 100 controls and, to avoid bias, the effect of genotype on phenotype was estimated from the controls only. From 500 simulations the correlation r between the estimated log OR of genotype on disease and the mean difference in phenotype was calculated as 0.00. Repeating the estimation under exactly the same conditions gave –0.07, 0.00, 0.02, -0.08. The average was therefore –0.03. 500 simulations may not be enough to guarantee an accurate estimate but it is enough to demonstrate that the correlation is within 0.1. Using 500 simulations the basic simulation was altered by changing one parameter and keeping the others fixed at the values given above. The results are shown in Table 2. The program used for the simulations is given in Appendix 1. 10 Technical Report 2003/GE2 Table 2 – Simulation studies to investigate the within-study correlation of log OR with mean difference in phenotype Parameter Estimated Correlation BASE ASSUMPTIONS 0.00,–0.07, 0.00, 0.02, -0.08 mean=-0.03 OR unit change in phenotype on disease=1.2 0.04 Difference in phenotype due to genotype = 2 -0.10 Difference in phenotype due to genotype = 6 -0.06 Frequency of dangerous genotype = 25% 0.11 Frequency of dangerous genotype = 6% 0.03 Standard deviation in phenotype = 3 0.00 Standard deviation in phenotype = 1 0.02 Although the within-study correlation will be small, between studies there will be considerable heterogeneity and this heterogeneity will possibly be highly correlated. That is studies that show a larger than expected difference in phenotype y may well report a larger than expected log odds ratio of disease, x (between-study correlation). Suppose that the heterogeneity in x has variance x, the heterogeneity in y has variance y and the correlation is . Assuming bivariate normal distributions (MN2) we have a hierarchical model with xi y ~ MN 2 i xi mean yi wxi var iance 0 0 w yi and xi ~ MN 2 yi x mean y x var iance x y x y y Integrating over the study means we have xi y ~ MN 2 i x mean y wxi x var iance x y x y w yi y The log odds ratio of phenotype on disease is estimated by the ratio = x /y so we could write the model as 11 Technical Report 2003/GE2 xi y ~ MN 2 i y mean y wxi x var iance x y x y w yi y When only one of the pair is observed we treat them as univariate normal. xi ~ N mean y var iance wxi x yi ~ N mean y var iance w yi y and This model with five parameters (, y, x, y, ) can be fitted by either a maximum likelihood approach or a Bayesian approach. Maximum Likelihood Approach Numerical maximisation (ml in Stata) is based on transformed parameter values (, y, log(x), log(y), log[(1+)/(1-)]) To obtain initial values we take x y y y x Var ( x) y Var ( y ) Corr ( x, y ) The Stata code used for the maximisation is given in Appendix 2. To test the algorithm data were generated from n=100 studies with 1/wxi ~ U(3,30) and 1/wyi ~ U(1,5) and parameters (0.05, 3, 0.01, 1, 0.8). Observations were made missing at random to leave 50 studies in which both were measure, 25 in which only x was measured and 25 in which only y was measured. This data set was called “test100”. Using data set "test100" the initial values were (0.04, 3.35, 0.09, 1.42, 0.26). The 50 studies with both measured are shown in Figure 7. The correlation is attenuated by the uncertainties in the withinstudy estimates and the variance of y is exaggerated. 12 Technical Report 2003/GE2 Figure 7 – Correlation between the log OR of the Genotype-Disease association and the difference in Phenotype -.5 0 Log Odds Ratio GD .5 1 associated with the Genotype (Genotype-Phenotype association) for 50 studies measuring both, in the “test100” data set 0 2 4 Dif f erence in Phenoty pe 6 8 Numerical maximisation (ml in Stata) converges to the values reported in Table 3, which in terms of the original measurements, equates to: =0.04, y=3.36, x=0.02, y=0.87, =0.51. Table 3 – Model A - Results of numerical maximisation in the “test100” data set Parameter y log(x) log(y) log[(1+)/(1-)] Estimate 0.043 3.360 -3.943 -0.144 1.116 St. Error 0.010 0.128 0.635 0.242 0.818 A second test data set generated identically but with n=1000, 250 only x and 250 only y. This started with =0.05, y=3.00, x=0.10, y=1.44, =0.23 and converged to the values reported in Table 4, which in terms of the original measurements, equates to: =0.05, y=3.02, x=0.01, y=1.06, =1.00 Table 4 – Model A - Results of numerical maximisation in the “test1000” data set Parameter y log(x) log(y) log[(1+)/(1-)] Estimate 0.051 3.015 -4.916 0.057 16.164 St. Error 0.003 0.043 0.280 0.069 1418.8 Both analyses partially correct the errors in initial var(y) and the correlation but the likelihood is clearly flat and poorly defines the correlation. Noticeably the situation is worse with a sample of 1000 studies, which suggests that the problem is not solved by increasing the number of studies in 13 Technical Report 2003/GE2 the meta-analysis. The MTHFR-Homocysteine-CHD data starts with =0.06, y=3.29, x=0.34, y=7.92, =0.60 and converges to the values reported in Table 5, which in terms of the original measurements, equates to: =0.09, y=2.64, x=0.11, y=3.14, =1.00 Table 5 – Model A - Results of numerical maximisation in the actual MTHFR-Homocysteine-CHD data set Parameter Estimate 0.086 2.644 -2.192 1.144 16.852 y log(x) log(y) log[(1+)/(1-)] St. Error 0.024 0.356 0.445 0.367 1306.9 Again the correlation is poorly defined. The profile log likelihood curves reflect the uncertainty in correlation (Fig. 8). The likelihood for the actual data increases linearly with , with a maximum at =1. The problem is not present when using the test data set with 100 studies, where the maximum likelihood is reached for =0.5, but this is likely to be due to chance since the problem is present again when using the test data set with 1000 studies. Thus, although this issue needs to be addressed by a thorough simulation study, it seems that the problem of unbounded likelihood does not depend only on the number of studies included in the meta-analysis, even when considering extreme (and unrealistic) numbers. The program used for the profile calculations with fixed correlations is given in Appendix 3. Figure 8 – Profile Log Likelihood for the three different scenarios; (a) actual data; (b) test data with n. of studies=100; (c) test data with n. of studies=1000 (a) Actual data -36 -37 -38 Log Likelihood -35 -34 Profile Log Likelihood 0 .2 .4 .6 .8 1 Correlation 14 Technical Report 2003/GE2 (b) Test data with n. of studies=100 6.5 7 Log Likelihood 7.5 8 Profile Log Likelihood 0 .2 .4 .6 .8 1 .8 1 Correlation (c) Test data with n. of studies=1000 60 65 70 Log Likelihood 75 80 Profile Log Likelihood 0 .2 .4 .6 Correlation Unfortunately conclusions about the estimate of (log OR of phenotype on disease) and its standard error also depend on the size of the between-study correlation (Fig. 9). The significance of is also affected (Fig. 10). Figure 9 – Profile curves for the actual data Profile St Error Estimates .07 .0245 .025 .0255 .026 St Error of Ratio .08 .075 Estimate of Ratio .0265 .085 .027 Profile Ratio Estimates 0 .2 .4 .6 Correlation .8 1 0 .2 .4 .6 .8 1 Correlation 15 Technical Report 2003/GE2 Figure 10 – Statistical significance of Profile Ratio to Standard Error for the actual data 3.2 3 2.6 2.8 z=estimate/st error 3.4 3.6 Profile Ratio to St Error 0 .2 .4 .6 .8 1 Correlation The estimate of is significantly different from 0 if the correlation is known and over about 0.55. Bayesian Approach The same model is implemented following a Bayesian approach, using non-informative prior distributions for all model parameters, which are estimated using Markov Chain Monte Carlo (MCMC) methods implemented using the WinBUGS software 1.369. The estimate for the phenotype-disease association is derived from the random effect meta-analyses of the genotype-disease and genotype-phenotype associations. The three groups of studies, i.e. those measuring only genotype-disease, genotype-phenotype or both associations, are modelled with three separate model specifications, and for those studies evaluating both associations a multivariate normal distribution is used in order to allow for the between-study correlation. The code in Winbugs is: •For studies with both G-P and G-D associations (n=17) delta[i,1:2]~ dmnorm(m u[],T[, ]) Diff_GP[i]~ dnorm(delta[i,1],W eight_GP[i]) LogOR_GD[i]~ dnorm(delta[i,2],W eight_GD[i]) •For studies with only G-P association (n=16) Diff_GP[i] ~ dnorm(delta[i,1], W eight_GP[i]) delta[i,1] ~ dnorm(mu[1],tau_GP) •For studies with only G-D association (n=31) LogOR_GD[i] ~ dnorm(delta[i,2], Weight_GD[i]) delta[i,2] ~ dnorm(mu[2],tau_GD) 16 Technical Report 2003/GE2 where mu[1] and mu[2] are assigned a normal vague prior distribution (dnorm(0.0,1.0E-6)), and tau_GP and tau_GD a gamma vague prior distribution (dgamma(0.001,0.001)). For the inverse covariance matrix, T[1:2,1:2], a Wishart vague prior distribution is used (dwish(R[ , ],2)). The three models are linked by the fact that the parameters for the true underlying effects (mu[]) are the same. The results of the model are very sensitive to the values specified for the matrix R in the Wishart prior distribution for the precision matrix T. The results of Model A obtained using the maximum likelihood and the Bayesian approaches, expressed as the effect of a 5μmol/l increase in homocysteine on CHD, are compared in Table 9. 2.4.2 Model B An attractive alternative to Model A is to assume that the heterogeneity is on the genotypephenotype and phenotype-disease stages of Figure 1 rather than on the measured genotypephenotype and genotype-disease. The big advantage of this model comes about if we are willing to assume that heterogeneities on the genotype-phenotype and phenotype-disease stages are independent. Even under the independence model correlation will still be induced into the resultant heterogeneities on genotype-phenotype and genotype-disease. Independence assumes that in studies that find a large effect of genotype on disease, perhaps because of local conditions, they will not tend to find relatively larger or smaller effects of a given phenotype on disease. This is probably reasonable. The model can be fitted by either a maximum likelihood approach or a Bayesian approach. Maximum Likelihood Approach Assuming bivariate normal distributions (MN2) we have a hierarchical model with xi y ~ MN 2 i xi mean yi wxi var iance 0 0 w yi var iance 0 0 y and i ~ MN 2 yi mean y 17 Technical Report 2003/GE2 since xi = i yi we can use Taylor series approximations (delta method) to derive that xi ~ MN 2 yi y mean y 2y 2 y y var iance y y so xi ~ MN 2 yi y mean y wxi 2y 2 y var iance y y w yi y When only one of the pair is observed we treat them as univariate normal xi ~ N mean y var iance wxi 2y 2 y yi ~ N mean y var iance w yi y and To test the algorithm data were generated from n=100 studies with 1/wxi ~ U(3,30) and 1/wyi ~ U(1,5) and parameters =0.05, y=3, =0.01, y=1. Observations were made missing at random to leave 50 studies in which both were measure, 25 in which only x was measured and 25 in which only y was measured. A second data set with the same structure but n=1000 was also generated. Numerical maximisation (ml in Stata) is based on transformed parameter values , y, log(), log(y). The program is given in Appendix 4. To obtain initial values we take: x y y y Var ( xi ) y Var ( y) yi For n=100, numerical maximisation starts at =0.038, y=3.23, =0.025, y=1.34 and converges to the values reported in Table 6, which in terms of the original measurements, equates to =0.032, y=3.25, =0.011, y=0.85. 18 Technical Report 2003/GE2 Table 6 – Model B - Results of numerical maximisation in the “test100” data set Parameter y log() log(y) Estimate 0.032 3.25 -4.47 -0.16 St. Error 0.015 0.13 0.26 0.24 For n=1000, numerical maximisation starts at (0.056, 2.96, 0.041, 1.38) and converges to the values reported in Table 7, which in terms of the original measurements, equates to =0.055, y=2.95, =0.012, y=0.97. Table 7 – Model B - Results of numerical maximisation in the “test1000” data set Parameter y log() log(y) Estimate 0.055 2.95 -4.45 -0.03 St. Error 0.005 0.04 0.09 0.07 For the actual MTHFR-Homocysteine-CHD data maximisation starts at =0.06, y=3.29, =0.33, y=7.92 and converges to the values reported in Table 8, which in terms of the original measurements, equates to =0.10, y=2.63, =0.01, y=3.55. Table 8 – Model B - Results of numerical maximisation in the actual MTHFR-Homocysteine-CHD data set Parameter y log() log(y) Estimate 0.095 2.627 -5.144 1.267 St Error 0.028 0.408 0.940 0.390 These figures lead to an estimate of the induced correlation between true study-specific means of genotype-phenotype and genotype-disease of =0.58. Bayesian Approach Model B has also been implemented following a Bayesian approach in WinBugs, using a slightly different method to deal with those studies measuring only one association (either genotype-disease or genotype-phenotype association). In these studies, the association, which has not been evaluated, is treated as data missing at random, and missing values are sampled and inputted. Thus, all 64 studies are modelled in a single step for both genotype-disease and genotype-phenotype associations 19 Technical Report 2003/GE2 (NA=Not Available): •Data for ALL studies (n=64): Diff_GP Weight_GD LogOR_GD Weight_GP a1 … … a17 w_a1 … … w_a17 b1 … … b17 w_b1 … … w_b17 NA … … NA NA … … NA b18 … … b48 w_b18 … … w_b48 a49 … … a64 w_a49 … … w_a64 NA … … NA NA … … NA In a way similar to that used in the maximum likelihood approach, the estimate for the phenotypedisease association is derived from the random effect meta-analyses of the two associations, genotype-disease and genotype-phenotype, by modelling the heterogeneity on genotype-phenotype and phenotype-disease stages of Figure 1 rather than on the measured genotype-phenotype and genotype-disease. The code in Winbugs is: •For all studies (n=64): Diff_GP[i] ~ dnorm(delta_GP[i], W eight_GP[i]) LogOR_GD[i] ~ dnorm(delta_GD[i], W eight_GD[i]) delta_GD[i] <- delta_GP[i]*delta_PD[i]/5 delta_GP[i] ~ dnorm(d_GP, tau_GP) delta_PD[i] ~ dnorm(d_PD, tau_PD) W eight_GP[i] ~ dlnorm(mu_WGP,psi_GP) W eight_GD[i] ~ dlnorm(mu_WGD,psi_GD) where d_GP, d_PD, mu_WGP, and mu_WGD were assigned a normal vague prior distribution (dnorm(0.0,1.0E-6)), and tau_GP, tau_PD, psi_GP and psi_GD a gamma vague prior distribution (dgamma(0.001,0.001)). The results of Model B obtained following the maximum likelihood and the Bayesian approaches are compared in Table 9. 20 Technical Report 2003/GE2 Table 9 – Estimates of the effect of a 5μmol/l increase in homocysteine on CHD obtained from studies evaluating G-D, P-D or both, using a Bayesian and a Maximum Likelihood approach METHOD OR ΔHcy of 5 μmol/l Mean 95% CrI/CI Deriving the heterogeneity in phenotype-disease association (Model A) 1.56 (median: 1.54) 1.19 to 2.06 Deriving the heterogeneity in genotype-disease association (Model B) 1.61 (median: 1.59) 1.22 to 2.15 Deriving the heterogeneity in phenotype-disease association (Model A) 1.54 1.21 to 1.95 Deriving the heterogeneity in genotype-disease association (Model B) 1.60 1.22 to 2.12 Bayesian Approach All studies (evaluating G-D, P-D or both) Maximum Likelihood approach All studies (evaluating G-D, P-D or both) 21 Technical Report 2003/GE2 3. Discussion The need for an integrated meta-analytical approach to genetic studies when using Mendelian randomisation is particularly important. The uncertainty associated with the derived estimate of the phenotype-disease association can be large, depending on uncertainty in both estimates of genotypephenotype and genotype-disease associations70. In particular, in a study where the confidence interval for the difference in phenotype level between genotypes includes the value of 0, the confidence interval for the derived estimate of phenotype-disease association may tend to plus and minus infinity70. It is crucial to the use of Mendelian randomisation that both estimates are sufficiently precise, but this applies particularly to the genotype-phenotype association. Such precision is only likely to be obtained through a meta-analysis of all evidence available. In fact, almost all genetic studies are statistically underpowered to detect the relatively small effects of the frequent gene variants that underlie common, complex diseases71. Two meta-analytical models are presented in this report to combine all evidence regarding genotypedisease and genotype-phenotype associations in order to derive an estimate of the phenotype-disease association. The models have been developed with the aim of allowing for the correlation between the pooled estimates of genotype-disease and genotype-phenotype associations induced by those studies evaluating both (between-study correlation). In fact, a simple approach of carrying out two independent meta-analyses on genotype-disease and genotype-phenotype data and then estimating phenotype-disease association based on the two pooled estimates would be equivalent to assume a between-study correlation of zero. When adopting such an approach, the result is an OR of CHD for a 5 μmol/l increase in homocysteine of 1.44, 95%CI from 1.11 to 1.96, where the confidence interval is calculated based on the uncertainty in the estimates of both genotype-disease and genotypephenotype associations70 (following a Bayesian approach with vague prior distributions the result is a mean of 1.45 and median of 1.42, 95%CrI from 1.11 to 1.96). In general terms, the importance of a proper meta-analytical approach which takes between-study correlation into account when part of the studies contributes to evidence on both sides of genotype-disease and genotype-phenotype associations will depend on the number and size of studies evaluating both associations relatively to those measuring only one or the other. The two models presented differ for the way the heterogeneities of genotype-disease, genotypephenotype and phenotype-disease associations are modelled. 22 Technical Report 2003/GE2 In Model A the heterogeneity in phenotype-disease association is derived from the heterogeneities on genotype-disease and genotype-phenotype associations, which are estimated by the observed data. Although this model is the most straightforward given that it follows the idea of deriving phenotypedisease from genotype-disease and genotype-phenotype associations, its robustness is very much dependent on the amount of information available to estimate the size of between-study correlation. In the example of MTHFR gene, homocysteine and coronary heart disease the data were not sufficient to provide a good estimate for the between-study correlation. This explains why the estimate of the correlation tends to the boundary value of 1 and the fact that this also happened with a randomly generated data set of 1000 studies suggests that the problem is not simply a lack of studies in the meta-analysis. When adopting a Bayesian approach the same instability was shown as extreme sensitivity to the prior distribution assumed for the precision matrix in the bivariate model. Model B was developed in the attempt of solving the problem of insufficient data to estimate all parameters of the first model, by introducing the assumption of independence between genotypephenotype and phenotype-disease heterogeneities. Modelling the heterogeneities of the observed genotype-phenotype data and the unobserved phenotype-disease data, the heterogeneity of genotypedisease association is derived from the two. Whilst the first model is limited by the possibility of estimating the size of between-study correlation from the data, the validity of the second model depends on the assumptions we are prepared to make about the interrelationship of the heterogeneities of the three associations. When in a study the estimate is not given for one of the two associations (either genotype-disease or genotype-phenotype), the model can deal with this by either treating it as a missing value, as in Model B in the Bayesian approach, or considering the marginal distribution for those estimates that have been observed, as in all the other models. However, these two methods are equivalent and both rely on a missing at random assumption, which is a strong assumption, since it might well be that reporting bias is the reason for the absence of the estimate in a published paper. For instance, it might be that a study reporting only on genotype-disease has also evaluated genotype-phenotype association but chosen not to report it because the genotype-phenotype data did not fit with the underlying hypothesis. Although the choice between the two models might depend on the situation, and in particular on the amount of data available (number and size of studies) to estimate the between-study correlation, our preference is for Model B. In fact, the independence between genotype-phenotype and phenotype23 Technical Report 2003/GE2 disease heterogeneities seems a reasonable assumption since it is unlikely that the variability in the estimate of the difference in phenotype associated with a certain genotype should influence the variability in the estimate of the increase in the risk of the disease associated with a specific increase in phenotype level, or vice versa. However, this assumption should be considered for any given application. Although in the example of MTHFR gene, homocysteine and CHD, the two models gave similar results in both the maximum likelihood and Bayesian approach, further work should investigate the behaviour of the two models under a variety of conditions (e.g. by performing simulation studies with different database sizes and different values for the three heterogeneities). This would allow a deeper insight into the most appropriate model for specific situations. 24 Technical Report 2003/GE2 References 1. Attia J, Thakkinstian A, D'Este C. Meta-analyses of molecular association studies: Methodologic lessons for genetic epidemiology. J Clin Epidemiol 2003;56:297-303. 2. Clayton D,.McKeigue PM. Epidemiological methods for studying genes and environmental factors in complex diseases. Lancet 2001;358:1356-60. 3. Keavney B. Genetic epidemiological studies of coronary heart disease. Int J Epidemiol 2002;31:730-6. 4. Davey Smith G,.Ebrahim S. 'Mendelian randomization': can genetic epidemiology contribute to understanding environmental determinants of disease? Int J Epidemiol 2003;32:1-22. 5. Homocysteine and risk of ischemic heart disease and stroke: a meta- analysis. JAMA 2002;288:2015-22. 6. Bailey LB,.Gregory JF 3rd. Polymorphisms of methylenetetrahydrofolate reductase and other enzymes: metabolic significance, risks and impact on folate requirement. J Nutr 1999;129:919-22. 7. Wald DS, Law M, Morris JK. Homocysteine and cardiovascular disease: evidence on causality from a metaanalysis. BMJ 2002;325:1202. 8. Kosokabe T, Okumura K, Sone T, Kondo J, Tsuboi H, Mukawa H et al. Relation of a common methylenetetrahydrofolate reductase mutation and plasma homocysteine with intimal hyperplasia after coronary stenting. Circulation 2001;103:2048-54. 9. Anderson JL, Muhlestein JB, Horne BD, Carlquist JF, Bair TL, Madsen TE et al. Plasma homocysteine predicts mortality independently of traditional risk factors and C-reactive protein in patients with angiographically defined coronary artery disease. Circulation 2000;102:1227-32. 10. Rassoul F, Richter V, Janke C, Purschwitz K, Klotzer B, Geisel J et al. Plasma homocysteine and lipoprotein profile in patients with peripheral arterial occlusive disease. Angiology 2000;51:189-96. 11. Kluijtmans LA, van den Heuvel LP, Boers GH, Frosst P, Stevens EM, van Oost BA et al. Molecular genetic analysis in mild hyperhomocysteinemia: a common mutation in the methylenetetrahydrofolate reductase gene is a genetic risk factor for cardiovascular disease. Am J Hum Genet 1996;58:35-41. 12. Deloughery TG, Evans A, Sadeghi A, McWilliams J, Henner WD, Taylor LM Jr et al. Common mutation in methylenetetrahydrofolate reductase. Correlation with homocysteine metabolism and late-onset vascular disease. Circulation 1996;94:3074-8. 13. Yoo J-H. A thermolabile variant of methylenetetrahydrofolate reductase is a determinant of hyperhomocyst(e)inemia in the elderly. Ann N Y Acad Sci 2001;928:344. 14. Mazza A, Motti C, Nulli A, Marra G, Gnasso A, Pastore A et al. Lack of association between carotid intimamedia thickness and methylenetetrahydrofolate reductase gene polymorphism or serum homocysteine in noninsulin-dependent diabetes mellitus. Metabolism 2000;49:718-23. 15. Gonzalez Ordonez AJ, Fernandez Alvarez CR, Rodriguez JM, Garcia EC, Alvarez MV. Genetic polymorphism of methylenetetrahydrofolate reductase and venous thromboembolism: a case-control study. Haematologica 1999;84:190-1. 16. Fujimura H, Kawasaki T, Sakata T, Ariyoshi H, Kato H, Monden M et al. Common C677T polymorphism in the methylenetetrahydrofolate reductase gene increases the risk for deep vein thrombosis in patients with predisposition of thrombophilia. Thromb Res 2000;98:1-8. 17. D'Angelo A, Coppola A, Madonna P, Fermo I, Pagano A, Mazzola G et al. The role of vitamin B12 in fasting hyperhomocysteinemia and its interaction with the homozygous C677T mutation of the methylenetetrahydrofolate reductase (MTHFR) gene. A case-control study of patients with early-onset thrombotic events. Thromb Haemost 25 Technical Report 2003/GE2 2000;83:563-70. 18. Arai K, Yamasaki Y, Kajimoto Y, Watada H, Umayahara Y, Kodama M et al. Association of methylenetetrahydrofolate reductase gene polymorphism with carotid arterial wall thickening and myocardial infarction risk in NIDDM. Diabetes 1997;46:2102-4. 19. Chambers JC, Ireland H, Thompson E, Reilly P, Obeid OA, Refsum H et al. Methylenetetrahydrofolate reductase 677 C-->T mutation and coronary heart disease risk in UK Indian Asians. Arterioscler Thromb Vasc Biol 2000;20:2448-52. 20. Chao CL, Tsai HH, Lee CM, Hsu SM, Kao JT, Chien KL et al. The graded effect of hyperhomocysteinemia on the severity and extent of coronary atherosclerosis. Atherosclerosis 1999;147:379-86. 21. Tsai MY, Welge BG, Hanson NQ, Bignell MK, Vessey J, Schwichtenberg K et al. Genetic causes of mild hyperhomocysteinemia in patients with premature occlusive coronary artery diseases. Atherosclerosis 1999;143:163-70. 22. Schmitz C, Lindpaintner K, Verhoef P, Gaziano JM, Buring J. Genetic polymorphism of methylenetetrahydrofolate reductase and myocardial infarction. A case-control study. Circulation 1996;94:1812-4. 23. Meisel C, Cascorbi I, Gerloff T, Stangl V, Laule M, Muller JM et al. Identification of six methylenetetrahydrofolate reductase (MTHFR) genotypes resulting from common polymorphisms: impact on plasma homocysteine levels and development of coronary artery disease. Atherosclerosis 2001;154:651-8. 24. Ma J, Stampfer MJ, Hennekens CH, Frosst P, Selhub J, Horsford J et al. Methylenetetrahydrofolate reductase polymorphism, plasma folate, homocysteine, and risk of myocardial infarction in US physicians. Circulation 1996;94:2410-6. 25. Schwartz SM, Siscovick DS, Malinow MR, Rosendaal FR, Beverly RK, Hess DL et al. Myocardial infarction in young women in relation to plasma total homocysteine, folate, and a common variant in the methylenetetrahydrofolate reductase gene. Circulation 1997;96:412-7. 26. Kim CH, Hwang KY, Choi TM, Shin WY, Hong SY. The methylenetetrahydrofolate reductase gene polymorphism in Koreans with coronary artery disease. Int J Cardiol 2001;78:13-7. 27. Kluijtmans LA, Kastelein JJ, Lindemans J, Boers GH, Heil SG, Bruschke AV et al. Thermolabile methylenetetrahydrofolate reductase in coronary artery disease. Circulation 1997;96:2573-7. 28. Christensen B, Frosst P, Lussier-Cacan S, Selhub J, Goyette P, Rosenblatt DS et al. Correlation of a common mutation in the methylenetetrahydrofolate reductase gene with plasma homocysteine in patients with premature coronary artery disease. Arterioscler Thromb Vasc Biol 1997;17:569-73. 29. Tokgozoglu SL, Alikasifoglu M, Unsal, Atalar E, Aytemir K, Ozer N et al. Methylene tetrahydrofolate reductase genotype and the risk and extent of coronary artery disease in a population with low plasma folate. Heart 1999;81:518-22. 30. Nakai K, Fusazaki T, Suzuki T, Ohsawa M, Ogiu N, Kamata J et al. Genetic polymorphism of 5,10methylenetetrahydrofolate increases risk of myocardial infarction and is correlated to elevated levels of homocysteine in the Japanese general population. Coron Artery Dis 2000;11:47-51. 31. Morita H, Taguchi J, Kurihara H, Kitaoka M, Kaneda H, Kurihara Y et al. Genetic polymorphism of 5,10methylenetetrahydrofolate reductase (MTHFR) as a risk factor for coronary artery disease. Circulation 1997;95:2032-6. 32. Ou T, Yamakawa-Kobayashi K, Arinami T, Amemiya H, Fujiwara H, Kawata K et al. Methylenetetrahydrofolate reductase and apolipoprotein E polymorphisms are independent risk factors for coronary heart disease in Japanese: a case-control study. Atherosclerosis 1998;137:23-8. 33. Malinow MR, Nieto FJ, Kruger WD, Duell PB, Hess DL, Gluckman RA et al. The effects of folic acid 26 Technical Report 2003/GE2 supplementation on plasma total homocysteine are modulated by multivitamin use and methylenetetrahydrofolate reductase genotypes. Arterioscler Thromb Vasc Biol 1997;17:1157-62. 34. Kawashiri M, Kajinami K, Nohara A, Yagi K, Inazu A, Koizumi J et al. Effect of common methylenetetrahydrofolate reductase gene mutation on coronary artery disease in familial hypercholesterolemia. Am J Cardiol 2000;86:840-5. 35. Zheng YZ, Tong J, Do XP, Pu XQ, Zhou BT. Prevalence of methylenetetrahydrofolate reductase C677T and its association with arterial and venous thrombosis in the Chinese population. Br J Haematol 2000;109:870-4. 36. Fernandez-Arcas N, Dieguez-Lucena JL, Munoz-Moran E, Ruiz-Galdon M, Espinosa-Caliani S, Aranda-Lara P et al. The genotype interactions of methylenetetrahydrofolate reductase and renin-angiotensin system genes are associated with myocardial infarction. Atherosclerosis 1999;145:293-300. 37. Brulhart MC, Dussoix P, Ruiz J, Passa P, Froguel P, James RW. The (Ala-Val) mutation of methylenetetrahydrofolate reductase as a genetic risk factor for vascular disease in non-insulin-dependent diabetic patients. Am J Hum Genet 1997;60:228-9. 38. Girelli D, Friso S, Trabetti E, Olivieri O, Russo C, Pessotto R et al. Methylenetetrahydrofolate reductase C677T mutation, plasma homocysteine, and folate in subjects from northern Italy with or without angiographically documented severe coronary atherosclerotic disease: evidence for an important genetic-environmental interaction. Blood 1998;91:4158-63. 39. Brugada R,.Marian AJ. A common mutation in methylenetetrahydrofolate reductase gene is not a major risk of coronary artery disease or myocardial infarction. Atherosclerosis 1997;128:107-12. 40. Adams M, Smith PD, Martin D, Thompson JR, Lodwick D, Samani NJ. Genetic analysis of thermolabile methylenetetrahydrofolate reductase as a risk factor for myocardial infarction. QJM 1996;89:437-44. 41. Ardissino D, Mannucci PM, Merlini PA, Duca F, Fetiveau R, Tagliabue L et al. Prothrombotic genetic risk factors in young survivors of myocardial infarction. Blood 1999;94:46-51. 42. Dilley A, Hooper WC, El-Jamil M, Renshaw M, Wenger NK, Evatt BL. Mutations in the genes regulating methylene tetrahydrofolate reductase (MTHFR C-->T677) and cystathione beta-synthase (CBS G-->A919, CBS T-- >c833) are not associated with myocardial infarction in African Americans. Thromb Res 2001;103:109-15. 43. Verhoef P, Rimm EB, Hunter DJ, Chen J, Willett WC, Kelsey K et al. A common mutation in the methylenetetrahydrofolate reductase gene and risk of coronary heart disease: results among U.S. men. J Am Coll Cardiol. 1998;32:353-9. 44. Hsu LA, Ko YL, Wang SM, Chang CJ, Hsu TS, Chiang CW et al. The C677T mutation of the methylenetetrahydrofolate reductase gene is not associated with the risk of coronary artery disease or venous thrombosis among Chinese in Taiwan. Hum Hered 2001;51:41-5. 45. van Bockxmeer FM, Mamotte CD, Vasikaran SD, Taylor RR. Methylenetetrahydrofolate reductase gene and coronary artery disease. Circulation 1997;95:21-3. 46. Abbate R, Sardi I, Pepe G, Marcucci R, Brunelli T, Prisco D et al. The high prevalence of thermolabile 5-10 methylenetetrahydrofolate reductase (MTHFR) in Italians is not associated to an increased risk for coronary artery disease (CAD). Thromb Haemost 1998;79:727-30. 47. Wilcken DE, Wang XL, Sim AS, McCredie RM. Distribution in healthy and coronary populations of the methylenetetrahydrofolate reductase (MTHFR) C677T mutation. Arterioscler Thromb Vasc Biol 1996;16:878-82. 48. Pinto X, Vilaseca MA, Garcia-Giralt N, Ferrer I, Pala M, Meco JF et al. Homocysteine and the MTHFR 677C-->T allele in premature coronary artery disease. Case control and family studies. Eur J Clin Invest 2001;31:24-30. 49. Anderson JL, King GJ, Thomson MJ, Todd M, Bair TL, Muhlestein JB et al. A mutation in the methylenetetrahydrofolate reductase gene is not associated with increased risk for coronary artery disease or 27 Technical Report 2003/GE2 myocardial infarction. J Am Coll Cardiol 1997;30:1206-11. 50. Fowkes FG, Lee AJ, Hau CM, Cooke A, Connor JM, Lowe GD. Methylene tetrahydrofolate reductase (MTHFR) and nitric oxide synthase (ecNOS) genes and risks of peripheral arterial disease and coronary heart disease: Edinburgh Artery Study. Atherosclerosis 2000;150:179-85. 51. Gardemann A, Weidemann H, Philipp M, Katz N, Tillmanns H, Hehrlein FW et al. The TT genotype of the methylenetetrahydrofolate reductase C677T gene polymorphism is associated with the extent of coronary atherosclerosis in patients at high risk for coronary artery disease. Eur Heart J 1999;20:584-92. 52. Todesco L, Angst C, Litynski P, Loehrer F, Fowler B, Haefeli WE. Methylenetetrahydrofolate reductase polymorphism, plasma homocysteine and age. Eur J Clin Invest 1999;29:1003-9. 53. Reinhardt D, Sigusch HH, Vogt SF, Farker K, Muller S, Hoffmann A. Absence of association between a common mutation in the methylenetetrahydrofolate reductase gene and the risk of coronary artery disease. Eur J Clin Invest 1998;28:20-3. 54. Verhoef P, Kok FJ, Kluijtmans LA, Blom HJ, Refsum H, Ueland PM et al. The 677C-->T mutation in the methylenetetrahydrofolate reductase gene: associations with plasma total homocysteine levels and risk of coronary atherosclerotic disease. Atherosclerosis 1997;132:105-13. 55. Kihara T, Abe S, Saigo M, Kaieda H, Obata H, Eto H et al. Methylenetetrahydrofolate reductase gene polymorphism and premature myocardial infarction. Circulation 1997;96:101-I. 56. Araujo F, Lopes M, Goncalves L, Maciel MJ, Cunha-Ribeiro LM. Hyperhomocysteinemia, MTHFR C677T genotype and low folate levels: a risk combination for acute coronary disease in a Portuguese population. Thromb Haemost 2000;83:517-8. 57. Malik NM, Syrris P, Schwartzman R, Kaski JC, Crossman DC, Francis SE et al. Methylenetetrahydrofolate reductase polymorphism (C-677T) and coronary artery disease. Clin Sci (Lond) 1998;95:311-5. 58. Thogersen AM, Nilsson TK, Dahlen G, Jansson JH, Boman K, Huhtasaari F et al. Homozygosity for the C677->T mutation of 5,10- methylenetetrahydrofolate reductase and total plasma homocyst(e) ine are not associated with greater than normal risk of a first myocardial infarction in northern Sweden. Coron Artery Dis 2001;12:8590. 59. Izumi M, Iwai N, Ohmichi N, Nakamura Y, Shimoike H, Kinoshita M. Molecular variant of 5,10methylenetetrahydrofolate reductase is a risk factor of ischemic heart disease in the Japanese population. Atherosclerosis 1996;121:293-4. 60. Szczeklik A, Sanak M, Jankowski M, Dropinski J, Czachor R, Musial J et al. Mutation A1298C of methylenetetrahydrofolate reductase: risk for early coronary disease not associated with hyperhomocysteinemia. Am J Med Genet 2001;101:36-9. 61. Gallagher PM, Meleady R, Shields DC, Tan KS, McMaster D, Rozen R et al. Homocysteine and risk of premature coronary heart disease. Evidence for a common gene mutation. Circulation 1996;94:2154-8. 62. Mager A, Lalezari S, Shohat T, Birnbaum Y, Adler Y, Magal N et al. Methylenetetrahydrofolate reductase genotypes and early-onset coronary artery disease. Circulation 1999;100:2406-10. 63. Ferrer-Antunes C, Palmeiro A, Morais J, Lourenco M, Freitas M, Providencia L. The mutation C677T in the methylene tetrahydrofolate reductase gene as a risk factor for myocardial infarction in the Portuguese population. Thromb Haemost 1998;80:521-2. 64. Gulec S, Aras O, Akar E, Tutar E, Omurlu K, Avci F et al. Methylenetetrahydrofolate reductase gene polymorphism and risk of premature myocardial infarction. Clin Cardiol 2001;24:281-4. 65. Dekou V, Whincup P, Papacosta O, Ebrahim S, Lennon L, Ueland PM et al. The effect of the C677T and A1298C polymorphisms in the methylenetetrahydrofolate reductase gene on homocysteine levels in elderly men and 28 Technical Report 2003/GE2 women from the British regional heart study. Atherosclerosis 2001;154:659-66. 66. Voutilainen S, Lakka TA, Hamelahti P, Lehtimaki T, Poulsen HE, Salonen JT. Plasma total homocysteine concentration and the risk of acute coronary events: the Kuopio Ischaemic Heart Disease Risk Factor Study. J Intern Med 2000;248:217-22. 67. Chango A, Potier De Courcy G, Boisson F, Guilland JC, Barbe F, Perrin MO et al. 5,10methylenetetrahydrofolate reductase common mutations, folate status and plasma homocysteine in healthy French adults of the Supplementation en Vitamines et Mineraux Antioxydants (SU.VI.MAX) cohort. Br J Nutr 2000;84:891-6. 68. Chango A, Boisson F, Barbe F, Quilliot D, Droesch S, Pfister M et al. The effect of 677C-->T and 1298A-->C mutations on plasma homocysteine and 5,10-methylenetetrahydrofolate reductase activity in healthy subjects. Br J Nutr 2000;83:593-6. 69. Spiegelhalter DJ, Thomas A, Best NG. WinBUGS Version 1.3. User Manual. MRC Biostatistics Unit : 1999. 70. Thompson JR, Tobin MD, Minelli C. On the accuracy of estimates of the effect of phenotype on disease derived from Mendelian randomisation studies. Technical Report 2003_GE1 - available at http://www.prw.le.ac.uk/research/HCG/getechrep.html. 71. Hirschhorn JN, Lohmueller K, Byrne E, Hirschhorn K. A comprehensive review of genetic association studies. Genet Med 2002;4:45-61. 29 Technical Report 2003/GE2 APPENDIX 1 Stata program to estimate the within-study correlation between log OR of genotype on disease and mean effect of genotype on phenotype ********************************************************* * PROGRAM TO INVESTIGATE WITHIN-STUDY CORRELATION * BETWEEN GENOTYPE-PHENOTYPE & GENOTYPE-DISEASE * WORKS BY SIMULATING A LARGE POPULATION AND SUBSAMPLING * THE CASES AND CONTROLS ********************************************************* version 8 clear set more off cd "d:\research\genetics\mendelian randomisation\meta-analysis\stata" local NCASE=100 /* number of cases */ local NCONT=100 /* number of controls */ local ORpd=1.1 /* odds ratio for unit phenotype on disease */ local beta=log(`ORpd') /* beta in logistic function */ local mn_alpha=-4 /* average alpha in logitic function */ local sd_alpha=0.2 /* between-subject variation in baseline risk due to covariates etc */ local delta=4 /* effect of genotype on phenotype */ local mn_pheno=8 /* mean phenotype */ local sd_pheno=2 /* sd in phenotype */ local pg=0.12 /* genotype frequency eg 12% */ local M=500 /* number of simulations */ local N=`NCASE'+`NCONT' local NSIM=50*`N' /* number to simulate */ *-------------------------------* Post results to file temp * postfile pf a b c d m1 sd1 m2 sd2 using "..\data\temp" , replace *-------------------------------* Loop throught the simulations * forvalues i=1/`M' { di "." _continue *-------------------------------* Simulate the population * quietly { set obs `NSIM' *-------------------------------* generate genotypes * gen genotype=uniform() < `pg' *-------------------------------* generate phenotypes * gen phenotype=`mn_pheno'+`delta'*genotype+`sd_pheno'*invnorm(uniform()) *-------------------------------* generate prob disease * local alpha=`mn_alpha'+`sd_alpha'*invnorm(uniform()) gen pd=exp(`alpha'+`beta'*phenotype)/(1+exp(`alpha'+`beta'*phenotype)) *-------------------------------- 30 Technical Report 2003/GE2 * generate outcome * gen d= uniform() < pd *-------------------------------* pick the cases & controls * gen u=d+uniform() sort u gen mark=_n<=`NCONT' replace u=(1-d)+uniform() sort u replace mark=1 if _n<=`NCASE' drop if mark==0 *-------------------------------* data for odds ratio GD * count if genotype==0 & d==0 local a=r(N) count if genotype==1 & d==0 local b=r(N) count if genotype==0 & d==1 local c=r(N) count if genotype==1 & d==1 local d=r(N) *-------------------------------* data for phenotype difference * summarize phenotype if genotype==1 & d==0 local m1=r(mean) local sd1=r(sd) summarize phenotype if genotype==0 & d==0 local m2=r(mean) local sd2=r(sd) post pf (`a') (`b') (`c') (`d') (`m1') (`sd1') (`m2') (`sd2') drop u d genotype phenotype pd mark } } postclose pf *-------------------------------* Restore results * use "..\data\temp", clear *-------------------------------* analyse * gen n=a+b+c+d count if n != `N' gen OR=(a*d)/(b*c) gen lnOR=log(OR) gen se=sqrt(1/a+1/b+1/c+1/d) gen d1=m1-m2 summarize OR d1 corr lnOR d1 31 Technical Report 2003/GE2 APPENDIX 2 Stata program for fitting model A ************************************************ * LIKELIHOOD ANALYSIS FOR MODEL A ************************************************ version 8 cd "D:\Research\Genetics\Mendelian Randomisation\meta-analysis\Stata" clear set more off program drop _all *-----------------------------* Program to evaluate LogL * program LL args lnl ratio muy tz ty r quietly { scalar tauz=exp(`tz') scalar tauy=exp(`ty') scalar rho=(exp(`r')-1)/(exp(`r')+1) scalar muz=`muy'*`ratio' gen double vz=tauz+1/wz gen double vy=tauy+1/wy scalar cov=rho*sqrt(tauz*tauy) gen double r=cov/sqrt(vz*vy) gen double L3=log(vz)+(z-muz)^2/vz gen double L2=log(vy)+(y-`muy')^2/vy gen double L1=log(vz*vy*(1-r*r))+((z-muz)^2/vz- /* */ 2*r*(z-muz)*(y-`muy')/sqrt(vz*vy)+(y-`muy')^2/vy)/(1-r*r) replace `lnl'=-0.5*L1 if type==1 replace `lnl'=-0.5*L2 if type==2 replace `lnl'=-0.5*L3 if type==3 drop vz vy r L1 L2 L3 } end *-----------------------------* Read Data * *use "..\data\test100",clear *use "..\data\test1000",clear use "..\data\mthfr chd.dta",clear *-----------------------------* Type denotes the available data * gen type=1 replace type=2 if z ==. & y ~=. replace type=3 if y ==. & z ~=. *-----------------------------* Initial Guess * summarize z local tauz=r(Var) local ratio=r(mean) summarize y 32 Technical Report 2003/GE2 local mu=r(mean) local tauy=r(Var) local ratio=`ratio'/`mu' corr z y local r=r(rho) di "Initial Guess "%7.3f `ratio' %7.3f `mu' %7.3f `tauz' %7.3f `tauy' %7.3f `r' local tauz=log(`tauz') local tauy=log(`tauy') local r=log((1+`r')/(1-`r')) *-----------------------------* Fit Model A * ml model lf LL () () () () () ml init eq1:_cons=`ratio' eq2:_cons=`mu' eq3:_cons=`tauz' eq4:_cons=`tauy' eq5:_cons=`r' ml maximize 33 Technical Report 2003/GE2 APPENDIX 3 Stata program for fitting model A with specified correlations ************************************************ * LIKELIHOOD ANALYSIS ************************************************ version 8 clear set more off program drop _all *-----------------------------* Program to evaluate LogL * program LL args lnl ratio muy tz ty quietly { scalar tauz=exp(`tz') scalar tauy=exp(`ty') scalar rho=$R scalar muz=`muy'*`ratio' gen double vz=tauz+1/wz gen double vy=tauy+1/wy scalar cov=rho*sqrt(tauz*tauy) gen double r=cov/sqrt(vz*vy) gen double L3=log(vz)+(z-muz)^2/vz gen double L2=log(vy)+(y-`muy')^2/vy gen double L1=log(vz*vy*(1-r*r))+((z-muz)^2/vz- /* */ 2*r*(z-muz)*(y-`muy')/sqrt(vz*vy)+(y-`muy')^2/vy)/(1-r*r) replace `lnl'=-0.5*L1 if type==1 replace `lnl'=-0.5*L2 if type==2 replace `lnl'=-0.5*L3 if type==3 drop vz vy r L1 L2 L3 } end *-----------------------------* Read Data * *use "..\data\test100",clear *use "..\data\test1000",clear use "..\data\mthfr chd.dta",clear *-----------------------------* Type denotes the available data * gen type=1 replace type=2 if z ==. & y ~=. replace type=3 if y ==. & z ~=. *-----------------------------* Initial Guess * summarize z local tauz=r(Var) local ratio=r(mean) summarize y local mu=r(mean) local tauy=r(Var) 34 Technical Report 2003/GE2 local ratio=`ratio'/`mu' corr z y local r=r(rho) di "Initial Guess "%7.3f `ratio' %7.3f `mu' %7.3f `tauz' %7.3f `tauy' %7.3f `r' local tauz=log(`tauz') local tauy=log(`tauy') *-----------------------------* File to collect profile LnL * postfile pf r LL b se using "..\data\profile", replace *-----------------------------* Loop over r * forvalues i=1/20 { global R=(`i'-0.5)/20 ml model lf LL () () () () ml init eq1:_cons=`ratio' eq2:_cons=`mu' eq3:_cons=`tauz' eq4:_cons=`tauy' ml max matrix cf=e(b) local coef=cf[1,1] matrix v=e(V) local se=sqrt(v[1,1]) post pf ($R) (e(ll)) (`coef') (`se') } postclose pf *-----------------------------* Restore data & plot * use "..\data\profile" , clear twoway line LL r, xtitle("Correlation") ytitle("Log Likelihood") /* */ title("Profile Log Likelihood") /* */ saving("..\plots\profile LnL.gph",replace) twoway line se r, xtitle("Correlation") ytitle("St Error of Ratio") /* */ title("Profile St Error Estimates") /* */ saving("..\plots\profile se.gph",replace) twoway line b r, xtitle("Correlation") ytitle("Estimate of Ratio") /* */ title("Profile Ratio Estimates") /* */ saving("..\plots\profile ratio.gph",replace) gen z=b/se twoway line z r, xtitle("Correlation") ytitle("z=estimate/st error") /* */ title("Profile Ratio to St Error") /* */ saving("..\plots\profile z.gph",replace) 35 Technical Report 2003/GE2 APPENDIX 4 Stata program for fitting model B ************************************************ * LIKELIHOOD ANALYSIS FOR MODEL B ************************************************ version 8 cd "D:\Research\Genetics\Mendelian Randomisation\meta-analysis\Stata" clear set more off program drop _all *-----------------------------* Program to evaluate LogL * program LL args lnl ratio muy tt ty quietly { scalar taut=exp(`tt') scalar tauy=exp(`ty') scalar muz=`muy'*`ratio' gen double vz=`ratio'^2*tauy+`muy'^2*taut+1/wz gen double vy=tauy+1/wy gen double cov=`ratio'*tauy gen double r=cov/sqrt(vz*vy) gen double L3=log(vz)+(z-muz)^2/vz gen double L2=log(vy)+(y-`muy')^2/vy gen double L1=log(vz*vy*(1-r*r))+( (z-muz)^2/vz- /* */ 2*r*(z-muz)*(y-`muy')/sqrt(vz*vy)+(y-`muy')^2/vy )/(1-r*r) replace `lnl'=-0.5*L1 if type == 1 replace `lnl'=-0.5*L2 if type == 2 replace `lnl'=-0.5*L3 if type == 3 drop vz vy r cov L1 L2 L3 } end *-----------------------------* Read Data * *use "..\data\testB100",clear *use "..\data\testB1000",clear use "..\data\mthfr chd.dta",clear *-----------------------------* Type denotes the available data * gen type=1 replace type=2 if z ==. & y ~=. replace type=3 if y ==. & z ~=. *-----------------------------* Initial Guess * summarize z local tauz=r(Var) local ratio=r(mean) summarize y local mu=r(mean) local tauy=r(Var) 36 Technical Report 2003/GE2 local ratio=`ratio'/`mu' gen theta=z/y summarize theta local taut=r(Var) di "Initial Guess "%7.3f `ratio' %7.3f `mu' %7.3f `taut' %7.3f `tauy' local tauz=log(`tauz') local tauy=log(`tauy') *-----------------------------* Fit Model B * ml model lf LL () () () () ml init eq1:_cons=`ratio' eq2:_cons=`mu' eq3:_cons=`taut' eq4:_cons=`tauy' ml maximize 37