* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Download Diapositive 1 - Institut Pasteur
Gene therapy of the human retina wikipedia , lookup
Transposable element wikipedia , lookup
Population genetics wikipedia , lookup
Frameshift mutation wikipedia , lookup
Gene nomenclature wikipedia , lookup
Human genome wikipedia , lookup
Metagenomics wikipedia , lookup
Oncogenomics wikipedia , lookup
Human genetic variation wikipedia , lookup
Gene therapy wikipedia , lookup
Gene desert wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Point mutation wikipedia , lookup
Pathogenomics wikipedia , lookup
Ridge (biology) wikipedia , lookup
Genomic imprinting wikipedia , lookup
Genetic engineering wikipedia , lookup
Epigenetics of human development wikipedia , lookup
History of genetic engineering wikipedia , lookup
Minimal genome wikipedia , lookup
Biology and consumer behaviour wikipedia , lookup
Genome editing wikipedia , lookup
Gene expression programming wikipedia , lookup
Neuronal ceroid lipofuscinosis wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Epigenetics of neurodegenerative diseases wikipedia , lookup
Genome evolution wikipedia , lookup
Gene expression profiling wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Quantitative trait locus wikipedia , lookup
Microevolution wikipedia , lookup
Designer baby wikipedia , lookup
Computational tools for disease gene identification Sonia ABDELHAK, PhD Molecular Investigation of Genetic Orphan Disorders Institut Pasteur de Tunis Summary How could we identify genes involved in human disorders? Positional cloning in the pre-genomic era. Monogenic/multifactorial diseases. Computational tools: Positional cloning in the post genomic era. Monogenic versus Complex Diseases : Genes & Environment Environmental Effect Genetic Component S.K. Brahmachari, GENOMED-HEALTH meeting What could we learn from disease gene identification? Better understanding of the underlying biology of the trait in question Serve as direct targets for better treatments Pharmacogenetics Interventions Predictions of susceptibility to the disease Predictions of the course of the disease Knowledge for treatment or prevention “SIMPLE” MENDELIAN GENETIC DISEASES Diseases of Simple Genetic Architecture Can tell how trait is passed in a family: follows a recognizable pattern (Mendelian disease) One gene altered per family (exceptions) Usually quite rare in population (exceptions) “Causative” gene Some examples of deleterious mutations Stop codon creation CAG Gln TAG Modes of inheritance •X linked •Duchenne muscular dystrophy •Autosomal dominant •Huntington disease •Autosomal recessive •Cystic fibrosis Mitochondrial Leber Optic atrophy C Functional cloning versus positional cloning of genes Disease hromosomal calisation Function/ Protein Gene Disease Function/ Protein Chromosomal localisation Gene . Position-Independent Methods Gene-specific oligonucleotides: hemophilia A Factor VIII gene (most common form of hemophilia, X-linked) Clotting factor purified from pig, and its Nterminal amino acids were sequenced. This allowed a group of oligonucleotides to be synthesized. These probes were used with colony hybridization against a cDNA library. Positional cloning of genes Disease hromosomal calisation Function/ Protein Gene Disease Function/ Protein Chromosomal localisation Gene Identification of informative families Genetic mapping Physical mapping Identification of coding sequences (candidate genes) normal Mutation screening Functional analysis muté ... CCT GAG GAG... ... CCT GTG GAG... ... Pro Glu Glu ... ... Pro Val Glu ... Genetic mapping What are the markers that are used for genetic mapping Polymorphisms used in Gene Mapping 1980s – RFLP marker maps 1990s – microsatellite marker maps Identification de Polymorphismes de type microsatellites par analyse de séquence: IL-12p35AC F tggtggcagaaatcattgtctgaaaagtaattgttttacttttattcttttcgtgtgtgtgtgtgt gtgtgtgtgtgtgtgtgtgtgtgtgtgtgtgcatgtgccagatttcttgtttgaaaggcaat gagcttcatccaagtatcaa 78.57% IL-12p35AC R IL-12p40AC F atttcaggtgtgagccactgtgcctggccagaactttttcaatgaatattcaagataattgtata cacattttatatatatatatatatatacacacacacacacacacacatatgtatacacaca ttatatatataatccatgttatatacatctctacattatatatatccactatatatattttacttataca tatagattttatttttatgaactaggatcaaattgta 69.23% IL-12p40AC R 1 174 170 166 2 3 4 5 SNPs in Genetic Analysis Abundance – lots Position – throughout genome Haplotype patterns – groups of SNPs may provide exploitable diversity Rapid and efficient to genotype Increased stability over other types of mutation Gene mapping: Linkage analysis Do marker alleles co-segregate with the disease by chance or are there linked to the underlying gene? Crossing over and Recombination Recombination Fraction = ½ : independent assortment (Mendel) < ½ : linked loci = 0 : tightly linked loci (no recombination) LOD Score Analysis The likelihood ratio as defined by Morton (1955): L(pedigree| = x) L(pedigree | = 0.50) where represents the recombination fraction and where 0 x 0.49. When all meioses are “scorable”, the LR is constructed as: L.R. = ( R (1 ) NR ) ( 0.5) N The LOD score (z) is the log10 (L.R.) H1: Linkage : z() is the lod score at a particular value of the recombination fraction : z() is the maximum lod score, which occurs at the MLE of the recombination fraction H0: Exclusion =0 Identification of informative families Cytogenetic anomalies Animal model Genetic mapping Physical mapping Identification of coding sequences (candidate genes) Functional candidate genes normal Mutation screening Functional analysis muté ... CCT GAG GAG... ... CCT GTG GAG... ... Pro Glu Glu ... ... Pro Val Glu ... 1 to 10 years! Branchio-oto-renal syndrome PAC contig Clinical features: deafness, renal anomalies, cervical cysts… Mapped to 8q13. 11083 9480 4405 cDNA library screening, cDNA selection and exon trapping 10910 PAC (P1 derived) Sonication or partial digestion T7 T3 subcloning in pBCSK+ Selection of clones Sequencing T7, T3 Sequence assemble and analysis The different steps used for sequence analysis Quality assessment A G C T A T Elimination of contaminating sequences Blastn against vector, bacteria, yeast… databases Assemble using Phred, Phrap, Consed Identification of candidate genes by blastx and tblastx, Gene prediction tool: GRAIL 11083 9480 4405 10910 BLASTX 1.4.7 [19-Dec-94] [Build 07:11:56 Jun 16 1995] Query= w1g9t7.Seq (743 letters) Translating both strands of query sequence in all 6 reading frames Database: ../../databases/fasta/nrprot 244,544 sequences; 71,258,360 total letters. Searching..................................................done Smallest Sum Reading High Probability Sequences producing High-scoring Segment Pairs: Frame Score P(N) pir|S|A45174 eyes absent (eya) protein (alternatively... -2 173 5.6e-15 1 >pir|S|A45174 eyes absent (eya) protein (alternatively spliced) - fruit fly (Drosophila melanogaster) >gp||DRONOEYE_ Length = 760 Minus Strand HSPs: Score = 173 (79.6 bits), Expect = 5.6e-15, P = 5.6e-15 Identities = 29/36 (80%), Positives = 34/36 (94%), Frame = -2 Query: 169 LCLPXGVRGGVDWMRKLAFRYRRVKEIYNTYKNNVG 62 LCLP GVRGGVDWMRKLAFRYR++K+IYN+Y+ NVG Sbjct: 586 LCLPTGVRGGVDWMRKLAFRYRKIKDIYNSYRGNVG 621 N EYA1 gene structure -1 1 1' -I I 2 I' 3 4 II 5 III IV 6 7 V VI 8 9 VII VIII 12 14 11 13 15 10 IX X XI XII XIV XIII Identification of a new gene family EYA1, EYA2, EYA3, …. 16 XV COMPLEX (MULTIFACTORIAL) GENETIC DISEASE Diseases of Complex Genetic Architecture No clear pattern of inheritance Moderate to strong evidence of being inherited Common in population: cancer, heart disease, dementia etc. Involves many genes and environment “Susceptibility” genes Complex disease loci mapping Linkage Analysis Large Families Small Families Association Studies Family-Based Case-Control Study Designs Linkage Analysis Large Families Small Families Association Studies Family-Based Case-Control 12 12 11 Non-Transmitted TDT calculation Transmitted 2 1 A B C D (B-C)2 TDT= (B+C) With > 5 per cell, this follows a 2 distribution with 1 df Examples: Alzheimer’s Alzheimer’s disease and ApoE E4 present E4 absent Patients 58 33 Controls 16 55 The E4 allele appears to be positively associated with Alzheimer’s disease: Odds Ratio = (58/16)/(33/55) = 6 February 2001 « Finished » sequence April 1953-April 2003 Identification of informative families Genetic mapping Physical mapping Identification of coding sequences (candidate genes) normal Mutation screening Functional analysis muté ... CCT GAG GAG... ... CCT GTG GAG... ... Pro Glu Glu ... ... Pro Val Glu ... Past and present tools Genetic mapping Physical mapping Cytogenetic abnormalities Animal models Positional and functional candidates Genome databases and genome browsers Comparative Genome Hybridization. Comparative Genomics Microarray analysis NCBI genome browser Visualize all the genes in an interval UCSC genome browser Ensembl genome browser NCBI genome browser showing candidate region for EV How to collect and interpret all the data? How to choose the best “candidate” gene? Strategies and adapted tools for gene selection are urgently needed! Find candidate genes for the trait (time and cost!) WHAT genes are there? WHAT do they do? How could they play a role in the disease = Data mining and integration!! Visualization of the whole picture Global view Option to zoom into detail http://www.esat.kuleuven.be/endeavour. Disease Gene Finding (Center for Biological Sequence Analysis) Combining network theory and phenotype associations in an automated large scale disease gene finding platform Networks – deducing functional relationships from network theory Phenotype association Grouping disorders based on their phenotype. Phenotype association Phenotype clustering: Word vectors Each arrow represents a KEYWORD vector. The components in a keyword vector correspond to terms in the document. Vectors that point in the same direction are more alike. Ordering phenotypes in “syndrome families” could tell us about the relationships of the underlying genes. (Brunner and van Driel 2004) Disease gene identification. Clues to gene interactions pathways and functions. %608389 BRANCHIOOTIC SYNDROME 3 14q23.1 SIX1 SIX1 mutations cause branchio-oto-renal syndrome by disruption of EYA1-SIX1-DNA complexes. Ruf RG, Xu PX, Silvius D, Otto EA, Beekmann F, Muerb UT, Kumar S, Neuhaus TJ, Kemper MJ, Raymond RM Jr, Brophy PD, Berkman J, Gattas M, Hyland V, Ruf EM, Schwartz C, Chang EH, Smith RJ, Stratakis CA, Weil D, Petit C, Hildebrandt F. Department of Pediatrics, University of Michigan, Ann Arbor, MI 48109, USA. Urinary tract malformations constitute the most frequent cause of chronic renal failure in the first two decades of life. Branchio-otic (BO) syndrome is an autosomal dominant developmental disorder characterized by hearing loss. In branchio-oto-renal (BOR) syndrome, malformations of the kidney or urinary tract are associated. Haploinsufficiency for the human gene EYA1, a homologue of the Drosophila gene eyes absent (eya), causes BOR and BO syndromes. We recently mapped a locus for BOR/BO syndrome (BOS3) to human chromosome 14q23.1. Within the 33-megabase critical genetic interval, we located the SIX1, SIX4, and SIX6 genes, which act within a genetic network of EYA and PAX genes to regulate organogenesis. These genes, therefore, represented excellent candidate genes for BOS3. By direct sequencing of exons, we identified three different SIX1 mutations in four BOR/BO kindreds, thus identifying SIX1 as a gene causing BOR and BO syndromes. To elucidate how these mutations cause disease, we analyzed the functional role of these SIX1 mutations with respect to protein-protein and protein-DNA interactions. We demonstrate that all three mutations are crucial for Eya1-Six1 interaction, and the two mutations within the homeodomain region are essential for specific Six1-DNA binding. Identification of SIX1 mutations as causing BOR/BO offers insights into the molecular basis of otic and renal developmental diseases in humans. PMID: 15141091 [PubMed - indexed for MEDLINE] Computational tools for disease gene identification Application to EV and T2D Olfa MESSAOUD and Manel BALI GENE SEEKER DGP PROSPECTR SUSPECTS G2D TOM GeneSeeker http://www.cmbi.ru.nl/geneseeker/ Web tool Gathers and combines data from several databases (MIMMAP, MGD, GDB etc.) Selects positional candidate genes according to their expression and phenotypic data from both human and mouse. A general overview of the GeneSeeker program Output of the GeneSeeker program G2D= Genes to Diseases http://www.ogic.ca/projects/g2d_2/ Scoring all terms in GO according to their relevance to each disease using MEDLINE and RefSeq. Identifying candidate genes by performing BLASTX searches. 131244 q13.2 Band(s) 1 63950000 73950000 Databases used Band(s) 3667 3630 3767 1 DGP= Disease Gene Prediction http://cgg.ebi.ac.uk/services/dgp/ A decision tree-based model built based on sequence properties. This model is then applied to all the genes in the disease loci analysed in order to obtain a probability score for these proteins to be involved in hereditary disease. 22500000 33200000 PROSPECTR http://www.genetics.med.ed.ac.uk/prospectr/ Automatic classifier based on sequence features using the alternating decision tree algorithm which ranks genes in the order of likelihood of involvement in disease Score: >0.5 < 0.5 likely to be involved unlikely to be involved SUSPECTS http://www.genetics.med.ed.ac.uk/suspects/ Web-based server. Builds on PROSPECTOR (sequence features) and combines annotation data (from GO, InterPro and expression librairies). q21.1 1 - TOM= Transcriptomics of OMIM http://www-micrel.deis.unibo.it/~tom An automated pipeline for the extraction of the best candidate genes for a given genetic disease. Global description of the process The second option (two loci option) is designed for poorly characterized diseases when no specific gene is a priori known. At least 2 linkage areas need to be present. (Looks for pairs that have similar expression and functional profiles) The results page (genes and GO annotation) Application - A monogenic disorder: Epidermodysplasia verruciformis - A multifactorial disorder: Type 2 diabetes Epidermodysplasia verruciformis (EV) Genetic skin disease (genodermatosis) Predisposition to skin cancer High susceptibility to human papillomavirus (HPV) Genomic organisation of EV1 locus (Ramoz et al., 2002) Haplotypic analysis of microsatellites (A) Sources of input data for each method, (B) number of genes in the starting candidate set and number of genes selected by each method Methods GeneSeeker DGP Prospectr Suspects G2D TOM Input PubMed abstracts X Sequence data X X X GO annotation X X X X X X X Protein data X X Expression libraries X X Orthologous mouse genes X OMIM X X X Number of genes selected EV Starting set of candidates 85 85 85 85 85 85 selected genes 11 37 40 45 20 54 Starting set of candidates 260 260 260 260 260 ? selected genes 24 76 14 26 3 ? T2D Personal annotation GeneSeeker DGP PROSPECTR SUSPECTS G2D TOM (SLC30A6) ALK HADHB SPG4 OTOF BFSP2 LBH BIRC6 CARD12 SNX17 HADHA KCNK3 KRT19 KHK SLC5A6 MSH2 NULL OTOF XDH KRT12 FOSL2 GTF3C2 PDE1C LBH CAD SLC5A6 KRT18 KRT18 PREB POMC BIRC6 GALNTM4 CAD GFAP PPP1CB KCNK3 PPM1G SLC5A6 KCNK3 HADHB NEF3 GTF3C2 NRBP1 SDC1 POMC SLC5A6 SPG4 KRT23 HADHA SELI (SLC23A3) HADHA KIF3C NP056477 KRT33B KRTCAP3 RAB10 SMARCAD1 XDH RNF30 HADHA KRT1 FLJ20254 SOS1 SRD5A2 MAPRE3 RAB10 KRT14 XDH SRD5A2 EIF2B4 DPSYSL5 CENPA KRT35 HADHB OTOF ALK EHD3 KRT14 PPP1CB SPG4 XDH GALNT14 KRT15 HIBCH Comparison between Results obtained by each method Conclusion Several promising computational tools Need for more accurate methods Thank you! Some References and H-References For a good review see: Nucleic Acids Res. 2006 Jun 6;34(10):3067-81. kc.vanderbilt.edu/quant/Seminar/StatGen02-2006.ppt http://www.cbs.dtu.dk/ http://www.bios.niu.edu/johns/humgen/Fin ding_Disease_Genes.ppt