* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download NUS Presentation Title 2006
Gene desert wikipedia , lookup
Non-coding DNA wikipedia , lookup
Pharmacogenomics wikipedia , lookup
Long non-coding RNA wikipedia , lookup
Polycomb Group Proteins and Cancer wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Copy-number variation wikipedia , lookup
Oncogenomics wikipedia , lookup
Pathogenomics wikipedia , lookup
Genomic imprinting wikipedia , lookup
Genetic engineering wikipedia , lookup
Epigenetics of neurodegenerative diseases wikipedia , lookup
Gene expression programming wikipedia , lookup
Ridge (biology) wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Behavioural genetics wikipedia , lookup
Human genome wikipedia , lookup
Genome-wide association study wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Quantitative trait locus wikipedia , lookup
History of genetic engineering wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Minimal genome wikipedia , lookup
Biology and consumer behaviour wikipedia , lookup
Heritability of IQ wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Designer baby wikipedia , lookup
Genome evolution wikipedia , lookup
Microevolution wikipedia , lookup
Gene expression profiling wikipedia , lookup
Public health genomics wikipedia , lookup
Thinking about genetic variation and biomedical research Peter Little Overview • The nature of human differences – Embracing variation • How does genetic variation influence gene expression in individuals – Models – How to study, how to define Regulatory genetic variation (RGV) • Mouse models of RGV effects within and between individuals • Human functional variation • Positioning RGV in human studies Cis acting variations Basal Conditional Detecting cis effects by genetics • Look for statistically significant associations of amount of mRNA with variations IN GENE • 25,000 genes 25,000 comparisons (+/-) • Achievable: P values 10-3 to 10-6 are significant Trans acting RGV Basal Conditional Gene 1 Gene 2 Gene 3 Detecting trans variation • Look for statistically significant associations of amount of mRNA with variations IN WHOLE GENOME • 25,000 genes time 1,000,000 variations • >1010 comparisons • NOT achievable: sample size has to be huge to achieve P values <10-9 or 10-10 • So how do you do it? Shared control of gene expression • All genes share some control components (Basal/conditional) • (Regulation is not necessarily transcriptional) • Genetic variation in regulators will result in correlated changes in the expression levels of multiple genes Experimental Design DBA C57 Age/sex/feed/light matched to limit environmental variability Define differences as genetic Expression differences? • Compare DBA & C57 mRNA levels by replicated microarrays – brain, kidney, liver • ~6000 genes are expressed in all three tissues • mRNA levels of 755 genes are different • Influenced by regulatory genetic variations-by definition • These are the focus of our study Microarrays on each of 31 BXD mice • BXD mice have mix of DBA & C57 derived alleles and are homozygous • 31 BXD lines (genetically typed) • Brain, Kidney and Liver mRNA levels measured by microarrays G1 G1 G2 G2 Spearman’s G3 G4 G5 G3 G4 Brain Kidney Liver G5 G6 G6 G7 G7 G8 G8 Gn Gn mRNA level data correlation matrix Gn G8 G7 G6 G5 G4 G3 RI1-32 RI1-32 RI1-32 G2 G1 Compute correlations between all pairwise combinations of genes Correlations make networks G3 G1 G2 G3 G4 G5 G6 G7 G8 G9 G4 G1 G2 G3 G4 G1 G2 G6 G5 G6 G7 G8 G7 G9 G9 G8 • Construct networks conditional on |ρ|> thresholds A Correlating group of genes “CGG” • A group not connected to other genes at threshold correlation – Not trying to imply biological meaning to correlation threshold • Biologically expect continuum of effects 0 to 1 • A convenient analytic tool (define groups for analysis) • Threshold based upon optimization of CGG number, size, degree: tested by simulation Cor threshold 0.775 Network of 212 genes Cor 0.775 network – 212 genes 05 science/Figures/schematic/annotated.network.darker/schematic.png Unpredictable shared behaviour • The same genes exhibit shared behavior • The behavior they share is NOT the same • In different tissues • In different individuals Shared and unique influences upon mRNA • What proportion of an individual gene’s variation can be explained by shared influences? • Cis acting variations 15-40% • For each CGG compare individual mRNA levels to average behavior of CGG 6 are ribosomal proteins, 2 ribosomal protein/ubiquitin fusions 13 are involved in carbohydrate metabolism, 5 in signalling and 4 in transport Function is not the main organiser RGV compared to structural variation? • RGV is more complex – RGV causes changes to GROUPS of molecules – Same gene(s) can behave differently in different tissues of same individual – Structural variation present in all tissues in which gene(s) is expressed • Study of (disease) expression – Using surrogate tissues for analysis is not feasible – Difficult implications for human research Placing genetic variation into human context • How common is human variation? • More common than most realise Human variation: Craig Venter and James Watson • Venter 2.8 million & Watson 2.72 million existing SNPs • Venter 0.74 million & Watson 0.61 million novel SNPs. • Venter 3,882 SNPs that code for a changed amino acid & Watson 3,766: • 44% of Venter’s genes were heterozygous for one or more variants • SNPs (single nucleotide polymorphisms) are single base differences Frequency of amino acid changes • The HapMap data • Caucasians, Yorubans, Han population samples • 46% of genes contain at least one amino acid change in >5% of individuals • 30% of genes have a variant found in >25% • 18% of genes have a variant in >40% Functional ? (Byoko et al 2008) • PolyPhen ; likely consequence to protein function of a nonsynonymous SNP (Ramensky et al 2001) • 27–29% of changes functionally neutral or nearly neutral • 30–42% moderately deleterious • 29-43% highly deleterious or lethal • Venter ~22% chance of isolating a variant, 16% mild or highly deleterious version Boyko. A.R. et al. (2008) Assessing the evolutionary impact of amino acid mutations in the human genome. PLoS Genet. 4: e1000083 Functional ? (Chun & Fay 2009) • PolyPhen, SIFT, LRT ; likely consequence to protein function of a non-synonymous SNP • Compared Watson, Venter &Chinese individual • 796-837 deleterious changes per individual – (5% cross validated) • ~58% >5% population frequency (in HapMap data) • ~4% of genes in each individual Functional variation in humans • HapMap data are mainly used as anonymous markers in GWAS • Common functional variants in functional classes of proteins (Rohan Willams ANU, Australia) – Transcription factors – Lipid synthesis – Glucose homeostasis etc etc • Use to re-analyse GWAS studies based upon “hypotheses” RGV candidates? http://www.hapmap.org/ Xie et al 2009 • • • • 440 TF motifs in genome Search 2kb +/- of TSS 1-134 dif motifs per gene Median 18 mean 23 • ~5% contain a SNP http://motifmap.ics.uci.edu Does RGV matter? • Cancer biomarkers • 9 FDA approved • 34 more in use • If RGV is irrelevant expect biomarkers to be randomly distributed with respect to RGV in genes that encode them • CEACAM5, TG, AFP, KIT, CGB, PGR, EGFR, ERBB2, KLK3, CFH, MUC1, CSH1, ALPP, CHGA, KLK6, MSLN, ESR1, PRL, KLK11, POMC, KLK7, KLK8, VIP, INS, GAS, CSF1, KLK10, PTHLH, VEGFA, CALCA, IL2RA, GH2, SST and KLK5 Cancer biomarkers Little, Williams, Wilkins. Trends Biotechnol. 2009 RGV data 8000 human genes in 233 individuals Morley et al 2004 What do human phenotypes look like from the standpoint of genetic variation? • Genome wide association studies – 4 September 2009, 386 publications – (http://genome.gov/gwastudies/) • Multiple variants of low effect • Very limited or no prognostic/diagnostic value (Odds ratios 1.01-1.2) The differences between individuals Based upon disease analysis • Are the product of multiple small effects • 10s to 100s of influences on each phenotype • No two people with same phenotype will have the same set of causative variations • SNPs >0.05 in populations: • 0.0510= ~10-14 • 0.05100=~10-131 How do we deal with this? • Probability of association of SNPs with schizophrenia Common polygenic variation contributes to risk of schizophrenia and bipolar disorder The International Schizophrenia Consortium Nature 2009 doi:10.1038/nature08185 How extensive is individuality? • The paradigm that there are (genetic) deviations from “normality” is dead • • • • • Transcription networks Protein abundance Protein activity? Protein/protein interaction networks? Metabolism • How do we develop a framework for probablistic mechanisms in biology? Some challenges • How to treat probabilistic groups as classifiers or in association analysis? • Combinatorial difficulty in defining “best” groups • Defining probabilistic networks – Display? – Analysis? Some benefits The behaviour of groups of molecules • Statistically very robust • Redefine statistical power in association analysis • Open the potential for significant studies at the scale of 10-20 samples • Partially alleviates the multiple testing problem Human variation in biomedical research • Routinely check for common functional variation in individual genes/proteins • Recognize powerful genetic background effects – Control by common variations? (common transcription factor variations) – Probabilistic controls? Thinking about Singaporeans • Molecular group behavior and Ethnic differences Disease death rate globally Lopez et al (2006) Lancet 367 : 1747 Some cancer predispositions in Asians • • • • • • High nasopharengyl cancer Low Chronic lymphocytic leukemia High Esophageal Cancer High Liver Cancer Low Prostate Cancer ………… • Environmental? • Genetic? Singapore variation project • Chia Kee Seng and colleagues (Teo et al, 2009 Genome Research) • 268 individuals from the Chinese, Malay and Indian population groups – 1.6 million variations • Base line for comparison with Caucasian, African and other samples DNA variations in populations • Principle component analysis of Singapore Chinese Teo et al Genome research 2009 “Asian-ness” in biomedical research • For any human sample – – – – – – Genotype Gene expression Proteomic Metabolomic Lipidomic Glycomic • Identify molecular group behaviours • Does shared behavior classify normal phenotypes? • Does shared behavior predict or classify disorder? Acknowledgements • Mark Cowley • Rohan Williams • Chris Cotsapas To my colleagues at NUS Life Sciences Institute