* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download TregouetD_EGEE3-presentation
Epigenetics of diabetes Type 2 wikipedia , lookup
Segmental Duplication on the Human Y Chromosome wikipedia , lookup
History of genetic engineering wikipedia , lookup
Non-coding DNA wikipedia , lookup
Neuronal ceroid lipofuscinosis wikipedia , lookup
Medical genetics wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Epigenetics of neurodegenerative diseases wikipedia , lookup
Molecular Inversion Probe wikipedia , lookup
Behavioural genetics wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Copy-number variation wikipedia , lookup
Genealogical DNA test wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Whole genome sequencing wikipedia , lookup
Minimal genome wikipedia , lookup
Genomic library wikipedia , lookup
Gene desert wikipedia , lookup
Designer baby wikipedia , lookup
Microevolution wikipedia , lookup
Pathogenomics wikipedia , lookup
Human genetic variation wikipedia , lookup
Genome (book) wikipedia , lookup
Human genome wikipedia , lookup
Genome editing wikipedia , lookup
Human Genome Project wikipedia , lookup
SNP genotyping wikipedia , lookup
Genome evolution wikipedia , lookup
Public health genomics wikipedia , lookup
Haplogroup G-M201 wikipedia , lookup
HLA A1-B8-DR3-DQ2 wikipedia , lookup
Enabling Grids for E-sciencE Genome Wide Haplotype analyses of human complex diseases with the EGEE grid Tregouet David – [email protected] INSERM UMRS937 – UPMC – Paris - France www.eu-egee.org EGEE-III INFSO-RI-222667 EGEE and gLite are registered trademarks Genome Wide Association Studies (GWAS) Enabling Grids for E-sciencE • Principle Testing the association between a large number (~500K) of single nucleotide polymorphisms (SNPs) and a variable of interest (e.g: a disease) in a large cohort of individuals • How ? Estimate the SNP allele frequencies in cases and controls and calculate the corresponding statistical test yielding a pvalue • SNP definition Genetic variation in a DNA sequence that occurs when a single nucleotide (~ base: A,C,G,T ) in a genome is altered. Often considered as a binary 0/1 variable EGEE-III INFSO-RI-222667 To change: View -> Header and Footer 2 GWAS' main limits Enabling Grids for E-sciencE • Only single SNP associations are tested • May miss 'haplotypic' interaction between SNPs located in the same gene (or region) – Haplotype: Combination of alleles on a given chromosome – For example , with 2 SNPs (C/T & G/A) → 4 haplotypes C G One may want to test for difference in haplotype C A T G T A EGEE-III INFSO-RI-222667 frequencies between cases and controls It may happen that only one haplotype is at risk To change: View -> Header and Footer 3 Genome Wide Haplotype Analysis (GWHAS) Enabling Grids for E-sciencE • Is it possible ? 2 SNPs : up to 4 haplotypes (i.e 00|01|10|11) 3 SNPs : up to 8 haplotypes (i.e 000|001|010|011|100|101|110|111) In a window (eg a gene or a region) of n SNPs, up to 2n haplotypes • Yes...but a large number of tests / comparisons have to be carried out to identify which combination of SNPs is the best predictor for the disease ? EGEE-III INFSO-RI-222667 To change: View -> Header and Footer 4 Genome Wide Haplotype Analysis (GWHAS) Enabling Grids for E-sciencE • Is it possible ? 2 SNPs : up to 4 haplotypes (i.e 00|01|10|11) 3 SNPs : up to 8 haplotypes (i.e 000|001|010|011|100|101|110|111) In a window (eg a gene or a region) of n SNPs, up to 2n haplotypes Example: In a window of 10 adjacent SNPs, restricting the haplotypes of length 4 lead to 375 combinations to be tested: [SNP1 + SNP2] [SNP1 + SNP3] .......................... [SNP1 + SNP10] [SNP2 + SNP3] ........................... [SNP2 + SNP10] ........................... [SNP9 + SNP10] EGEE-III INFSO-RI-222667 [SNP1 + SNP2 + SNP3] [SNP1 + SNP2 + SNP4] ...................................... [SNP1 + SNP9 + SNP10] [SNP2 + SNP3 + SNP4] ........................................ [SNP3 + SNP6 +SNP8] ....................................... [SNP8 + SNP9 + SNP10] [SNP1 + SNP2 + SNP3 +SNP4] ...................................... [SNP1 + SNP6 + SNP7 +SNP10] ....................................... [SNP7 + SNP8 + SNP9 + SNP10] To change: View -> Header and Footer 5 Genome Wide Haplotype Analysis (GWHAS) Enabling Grids for E-sciencE • GWHAS are possible but are extremely computationnally demanding !!!! • Distribution of the haplotypic calculations on EGEE –Development of an easygLite interface –Python & Perl script for results ' visualization EGEE-III INFSO-RI-222667 To change: View -> Header and Footer 6 GWHAS on Coronary Artery Disease (CAD) Enabling Grids for E-sciencE • WTCCC data: 1926 CAD patients & 2938 healthy controls • 378,000 SNPs • Sliding windows approach on each chromosome Windows of size 10 Haplotype composed of up to 4 SNPs 1 to 10 2 to 11 3 to 12 ..... (n-10) to n • Search for regions where haplotypes are stronger predictors of CAD risk than SNP alone EGEE-III INFSO-RI-222667 To change: View -> Header and Footer 7 GWHAS on Coronary Artery Disease Enabling Grids for E-sciencE • 8.1 millions of combinations tested in less than 45 days (instead of more than 10 years on a single Pentium 4) • 29 regions where haplotypes could be better predictors than SNPs alone were identified • To control for false positives , replication was investigated in about 7000 CAD patients and 7000 controls • One region on chromosome 6 was confirmed EGEE-III INFSO-RI-222667 To change: View -> Header and Footer 8 Nature Genetics doi:10.1038/ng.314 Enabling Grids for E-sciencE EGEE-III INFSO-RI-222667 To change: View -> Header and Footer 9 Conclusions Enabling Grids for E-sciencE • Genome Wide Haplotype Association Studies are now a reality thanks to the use of Grid technology • Using EGEE, we were able to identify a cluster of 3 genes where haplotypes are strongly associated with CAD risk (Tregouet et al. Nature Genetics March 2009) • Possibility to apply such tool to other human diseases (Diabetes, Cancer....) • Possibility to use EGEE to investigate interactions between SNPs that are not necesseraly in the same gene/region EGEE-III INFSO-RI-222667 To change: View -> Header and Footer 10 Credits Enabling Grids for E-sciencE UMRS 937 François Cambien Alexandru Munteanu Laurence Tiret Claire Perret Nilesh Samani Heribert Schunkert Inke König Jeannette Erdmann Andreas Ziegler .... UMR 8623 LRI EGEE-III INFSO-RI-222667 Cécile Germain To change: View -> Header and Footer 11