* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Genotype
RNA interference wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Metagenomics wikipedia , lookup
Point mutation wikipedia , lookup
Gene desert wikipedia , lookup
Epigenetics of neurodegenerative diseases wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Gene nomenclature wikipedia , lookup
History of genetic engineering wikipedia , lookup
Pharmacogenomics wikipedia , lookup
Essential gene wikipedia , lookup
Polycomb Group Proteins and Cancer wikipedia , lookup
Metabolic network modelling wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Pathogenomics wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Oncogenomics wikipedia , lookup
Public health genomics wikipedia , lookup
Genome evolution wikipedia , lookup
Gene expression programming wikipedia , lookup
Ridge (biology) wikipedia , lookup
Genomic imprinting wikipedia , lookup
Genome (book) wikipedia , lookup
Minimal genome wikipedia , lookup
Designer baby wikipedia , lookup
Mir-92 microRNA precursor family wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Biology and consumer behaviour wikipedia , lookup
Microevolution wikipedia , lookup
Towards an understanding of Genotype-Phenotype correlations Paul Fisher et al., Genotype The entire genetic identity of an individual that does not show any outward characteristics, e.g. Genes, mutations Genes DNA Mutations ACTGCACTGACTGTACGTATATCT ACTGCACTGTGTGTACGTATATCT Phenotype The observable expression of gene’s producing notable characteristics in an individual, e.g. Hair or eye colour, body mass, resistance to disease vs. Brown White and Brown Genotype to Phenotype Genotype Current Methods Phenotype 200 ? What processes to investigate? Phenotype Genotype 200 ? Metabolic pathways Phenotypic response investigated using microarray in form of expressed genes or evidence provided through QTL mapping Genes captured in microarray experiment and present in QTL (Quantitative Trait Loci ) region Microarray + QTL The Pathway approach Genotype Phenotype Pathway(s) • Obtain a global view of what is happening in the phenotype • Pathways allow for experimental verification in the lab • Provides a driving force for functional discovery Phenotype Pathway A CHR literature QTL Pathway linked to phenotype – high priority Gene A Pathway B Gene B literature Gene C Pathway not linked to phenotype – medium priority Pathway C Genotype literature Pathway not linked to QTL – low priority Issues with current approaches Huge amounts of data QTL region on chromosome Microarray 200+ Genes 1000+ Genes How do I look at ALL the genes systematically? Hypothesis-Driven Analyses 200 QTL genes Case: African Sleeping sickness - parasitic infection - Known immune response Pick the genes involved in immunological process 40 QTL genes Pick the genes that I am most familiar with 2 QTL genes Result: African Sleeping sickness -Immune response -Cholesterol control Biased view -Cell death Manual Methods of data analysis Tedious and repetitive No explicit methods Human error Navigating through hyperlinks Implicit methods Issues with current approaches • Scale of analysis task • User bias and premature filtering • Hypothesis-Driven approach to data analysis • Constant flux of data - problems with re-analysis of data • Implicit methodologies (hyper-linking through web pages) • Error proliferation from any of the listed issues So what do we want to do? • Decrease scale of manual analysis task for user • Limit user bias • Remove premature filtering • Data-driven approach to hypothesis generation • Analyse the data whenever I want or after an update • Create explicit methodologies that can be re-used • Reduce the overall errors Solution – Automate using workflows PhD - Hypothesis Utilising the capabilities of workflows and the pathway-driven approach, we are able to provide a more: - systematic - explicit - scalable - un-biased the benefit will be that new biology results will be derived, increasing community knowledge of genotype and phenotype interactions. Genomic Resource QTL mapping study Microarray gene expression study Identify genes in QTL regions Identify differentially expressed genes Annotate genes with biological pathways Pathway Resource Annotate genes with biological pathways Select common biological pathways Wet Lab Hypothesis generation and verification Literature Statistical analysis Replicated original chain of data analysis Steve Kemp Andy Brass + many Others Trypanosomiasis in Africa http://www.genomics.liv.ac.uk/tryps/trypsindex.html Results A strong candidate gene was found – Daxx gene not found using manual investigation methods – The gene was identified from analysis of biological pathway information – Possible candidate identified by Yan et al (2004): Daxx SNP info – Re-sequencing of the Daxx gene identified mutations – Mutation was published in scientific literature, – affect on the binding of Daxx protein to p53 protein – p53 plays direct role in cell death and apoptosis, one of the Trypanosomiasis phenotypes Shameless Plug! A Systematic Strategy for Large-Scale Analysis of Genotype-Phenotype Correlations: Identification of candidate genes involved in African Trypanosomiasis Fisher et al., (2007) Nucleic Acids Research PubMed ID: 17709344 • Explicitly discusses the methods we used for the Trypanosomiasis use case • Discussion of the results for Daxx and shows mutation • Sharing of workflows for re-use, re-purposing Recycling, Reuse, Repurposing Here’s the e-Science! • • • • • • • • Trypanosomiasis mouse workflow reused without change in Trichuris muris infection in mice Identified biological pathways involved in sex dependence Previous manual two year study of candidate genes had failed to do this. More to follow with additional data Additional workflows constructed for looking at cattle and human Used mouse workflows as basis for development 1 web service changed in entire workflow (BioMart) Exactly the same methods Recycling, Reuse, Repurposing • Share • Search • Re-use • Re-purpose • Execute • Communicate • Record http://www.myexperiment.org/ Prove your methods can be replicated …. and share to get recognition for your work What next? • More use cases for QTL and microarray – African Trypanosomiasis – Trichuris muris – Possibly Lung cancer ??? • Text Mining !!! – Aid biologists in identifying novel links between pathways – Link pathways to phenotype through literature Genomic Resource QTL mapping study Microarray gene expression study Identify genes in QTL regions Identify differentially expressed genes Annotate genes with biological pathways Pathway Resource Annotate genes with biological pathways Select common biological pathways Wet Lab Hypothesis generation and verification Literature Statistical analysis Phenotype Pathway A CHR literature QTL Pathway linked to phenotype – high priority Gene A Pathway B Gene B DONE MANUALLY literature Gene C Pathway not linked to phenotype – medium priority Pathway C Genotype literature Pathway not linked to QTL – low priority It can’t be that hard, right? • PubMed contains ~17,787,763 journals to date • Manually searching is tedious and frustrating • Can be hard finding the links Computers can help with data gathering and information extraction – that’s their job !!! What Does the Text Hold? Protein Info Related Proteins Protein-Protein Interactions Pathways Biological processes What Next ? Biological processes Generate a Profile for Pathway / Phenotype Apoptosis Cell Death Stress response …….. Score and Rank Terms Common terms Apoptosis Cell Death JNK pathway Phenotype Terms 13.27 Apoptosis 28.35 Score pathway links based on occurrence of phenotype term in pathway abstracts Apoptosis Cholesterol Diabetes Apoptosis JNK pathway Another pathway 0.15 Apoptosis Cholesterol JNK pathway The Workflows Steve Kemp Andy Brass + many Others Trypanosomiasis in Africa http://www.genomics.liv.ac.uk/tryps/trypsindex.html Preliminary results – a preview • Glycolysis, reactive oxygen species, alternatively activated macrophages Parasite Sample of ranked workflow results glycolysis ATP antimycin glycolytic enzymes apoptosis reactive oxygen oxidative stress glycolytic intermediates H2O2 TH1 156.87 107.24 102.53 93.27 89.17 85.02 80.25 67.31 64.02 macrophage Reactive oxygen species (NO) Glycolysis TH2 Alternative macrophage N.B. It’s not as linear as this !!! IFN-Gamma Text Mining • A means of assisting the researcher – Time – Effort – Narrow searches • Hypothesis generation and verification – Suggested links – Limited corpus, but its specific NOT A REPLACEMENT FOR DOMAIN EXPERTISE The Final Result Genotype Phenotype Pathway(s) Tools (workflows) to allow easier transition between genotype and phenotype Many thanks to: including: Joanne Pennock, EPSRC, OMII, myGrid, and lots more people