* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download The identification of human quantitative trait loci
Vectors in gene therapy wikipedia , lookup
Copy-number variation wikipedia , lookup
Epigenetics in learning and memory wikipedia , lookup
Behavioural genetics wikipedia , lookup
Polycomb Group Proteins and Cancer wikipedia , lookup
Saethre–Chotzen syndrome wikipedia , lookup
Genomic imprinting wikipedia , lookup
Ridge (biology) wikipedia , lookup
Protein moonlighting wikipedia , lookup
Heritability of IQ wikipedia , lookup
Epigenetics of depression wikipedia , lookup
Human genetic variation wikipedia , lookup
Gene therapy wikipedia , lookup
Primary transcript wikipedia , lookup
Gene desert wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Fetal origins hypothesis wikipedia , lookup
Neuronal ceroid lipofuscinosis wikipedia , lookup
Gene nomenclature wikipedia , lookup
Genome-wide association study wikipedia , lookup
Genome evolution wikipedia , lookup
Helitron (biology) wikipedia , lookup
Genome (book) wikipedia , lookup
Long non-coding RNA wikipedia , lookup
Mir-92 microRNA precursor family wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Microevolution wikipedia , lookup
Gene therapy of the human retina wikipedia , lookup
Public health genomics wikipedia , lookup
Epigenetics of neurodegenerative diseases wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Epigenetics of diabetes Type 2 wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Designer baby wikipedia , lookup
Gene expression programming wikipedia , lookup
Gene expression profiling wikipedia , lookup
The Identification of Human Quantitative Trait Loci Dr John Blangero Southwest Foundation for Biomedical Research ChemGenex Pharmaceuticals The Goals: Genetic Analysis of Complex Phenotypes QTL Localization Where in the genome is the QTL located? QTL Identification What is (are) the gene(s) involved? QTL Allelic Architecture What are the specific QTNs? How many QTNs? What are their frequencies and effect sizes? Quantitative Traits Usually closer to gene action than disease itself. Have superior statistical power. Quantitative Endophenotypes Heritable Genetically correlated with disease or other focal phenotype Closer to the action of the genes Liability: The Threshold Model The process of finding and identifying disease-related genes involves Objective Prioritization. Different Diseases Different Designs Different Methods Family Studies vs Studies of Unrelateds Major Study Designs in Human Genetics: Possible Inferences Design Unrelated individuals Triads Sibling pairs Nuclear families Extended pedigrees Inference: Heritability Linkage Association No No Yes Yes Yes No Yes Yes Yes Yes Yes Yes Yes Yes Yes You can exploit: Linkage and Association Information Jointly in Family Studies Relative Per-Subject Power to Localize QTLs Population Study Relative Efficiency Jirel (Nepal) Vermont SAFHS GAIT Framingham Nuclear (4 sibs) Nuclear (3 sibs) Sib-pair 1.00 0.91 0.59 0.35 0.24 0.17 0.11 0.04 Ped. Size 2300 331 31 19 5 6 5 2 Pedigree Type Extended (isolate) Extended Extended Extended Extended, nuclear Nuclear Nuclear Relative pair Linkage Designs vs Association Designs Power: Linkage vs Association Example 1: Positional Candidate Genes QTL for serum leptin levels in the San Antonio Family Heart Study Highly replicated QTL Chromosome 2 Obesity QTL Mb Coordinate 45.6 44.6 44 42.9 42.1 40.3 39.1 38.8 38.2 37.4 36.6 33.3 32.4 31.5 30.4 29.1 28.6 27.8 27.6 1000 27.5 27.4 27.3 26.9 26.4 26.2 25.2 24.8 24.3 21.3 Hitscore Bioinformatic Prioritization: GeneSniffer Results 2p22 7000 POMC 6000 5000 4000 3000 2000 GCKR UCN 0 What Do You Do With A Good Positional Candidate Gene? The ALL or NOTHING principle Find all of the variation in the gene. Preference: Resequence everyone (no bias against rare variants) Alternative: Resequence a subset of individuals POMC: Pattern of LD POMC QTN Analysis: Marginal Associations How To Find the Most Likely Functional SNPs Bayesian Quantitative Trait Nucleotide Analysis has the potential to aid the discovery of the DNA variants that influence risk of common disease. Objectively prioritizes SNPs for further functional work. BQTN Analysis: Bayesian Model Selection/Model Averaging Evaluate possible models of gene action. This may be very large, 2n models of additive gene action. Use Bayesian model selection to choose best models and average parameters over models. Eliminates problem of multiple testing. Yields unbiased estimates of effect size. Allows prioritization of polymorphisms for further lab evaluation. Calculation of Posterior Probability of Effect. Sequential Oligogenic Linkage Analysis Routines All analyses were performed using a parallel version of SOLAR on up to 1,500 processors. For more information on SOLAR, follow the ‘software’ links at: http://www.sfbr.org BQTN analysis of POMC polymorphisms Three variants account for 11% of variation in leptin levels. The frequencies of these variants are: 0.005, 0.004 and 0.06. LD with any other SNPs is very low: 0.075, 0.248 and 0.189. It would be VERY HARD to find these by LD. Linkage Conditional on POMC SNPs Marginal LOD=5.86 Conditional LOD=3.05 What Do You Do With A Good Positional Candidate Region? The ALL or NOTHING principle Find all of the variation in the region, say 5 – 10 Mb. Preference: Resequence everyone (no bias against rare variants). This can be done NOW! It is the wave of the future. Don’t waste time with LD. It is your ENEMY. Example 2: Identifying Human QTLs Quickly Expression phenotypes that are cis-regulated should be much easier to quickly identify functional variants and correlate them with disease risk. Gene Expression Levels as Endophenotypes Quantitative variation in gene expression levels explains some proportion of the variation in many phenotypes. The amount of mRNA of a specific transcript in a tissue sample is about as “close to gene action” as possible; hence, such phenotypes ought to be dissectible by statistical genetic approaches. Array-based technologies make it feasible to quantify the expression levels of many transcripts simultaneously. Project Description San Antonio Family Heart Study (SAFHS) designed in 1991 to investigate the genetics of CVD in Mexican Americans Includes 1,431 individuals from 42 families 2 recalls since 1991 Extensive phenotypic data anthropometry, blood pressure, lipids, obesity, diabetes, inflammation, oxidative stress, hormones, osteoporosis, brain structure/function Genome scanned Methodology Blood samples collected from first SAFHS examination approx 15 years ago Lymphocytes isolated from blood and stored in RPMI-C media in liquid nitrogen RNA extracted and expression profiles generated on stored lymphocytes 47,289 transcripts interrogated using the Illumina platform Detection Statistics 1,280 samples analyzed, good data from 1,240 (~97%) Of the 47,289 transcripts per array, we significantly detected 20,413 transcripts. 0.1 1.0 0.09 0.9 0.08 0.8 0.07 0.7 0.06 0.6 pdf cdf 0.05 0.5 0.04 0.4 0.03 0.3 0.02 0.2 0.01 0.1 0 0.0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 heritability estimate 0.8 0.9 1 cdf probability mass function Heritabilities of Autosomal RefSeq Transcripts Cis-Regulated Expression QTLs FDR Expected Number 850 P-value cutoff 0.01 Significant Transcripts 859 0.05 1238 1177 0.0031 0.10 1569 1412 0.0080 0.25 2620 1966 0.0333 0.30 3000 2102 0.0480 0.0004 Identifying Novel Candidate Genes for Disease Risk After determining cis-regulated QTLs, look for correlations with phenotypes related to disease risk Transcriptomic Epidemiology—using high dimensional endophenotypic search For example, 383 cis-regulated transcripts are significantly correlated with BMI (an index of obesity). Many of these are novel genes of unknown function. Expression QTLs: LOD > 3 Type of QTL Number of QTLs Mean LOD Cis 693 9.46 Trans 1,325 3.53 Approximately, 34% of QTLs are Cis. Effect size (QTL-specific heritability) is 64% larger for Cis QTLs. Cis Regulation UTS2 (urotensin 2 preprotein) Cis and Trans Regulation HBG2 (G-gamma globin) Trans Regulation LOC389472 Mitochondrial QTLs Influencing Expression FDR Expected Number True 16 P-value cutoff 0.01 Significant Transcripts 16 0.05 35 33 0.000076 0.10 73 66 0.00035 0.25 159 127 0.00146 0.30 251 176 0.0037 0.000008 Identification of Human QTLs: Example 3 QTL influencing inflammatory response A novel positional candidate gene (SEPS1/SELS) found by expression studies in an animal model SEPS1 Gene Discovery SEPS1 (formerly known as Tanis) was first identified by differential gene expression in liver of diabetic P. obesus Putative functions related to ER stress response through processing and removal of misfolded proteins (Ye et al (2004). Nature 429, 841-847) SEPS1 Gene Discovery Human SEPS1 gene is located on 15q26.3 Mammalian plasma membrane selenoprotein & also a member of the GRP family Consists of 6 exons, encodes a 204aa protein 15q26 region shown to contain QTLs influencing inflammatory disorders: Zamani et al (1996). Hum Genet 98, 491-6. - Field et al (1994). Nat Genet 8, 189-94. - Blacker et al (2003).Hum Mol Genet 12, 23-32. - - Susi et al (2001). Scand J Gastroenterol 36, 372-4. Mahaney et al (2005) Unpublished. SEPS1 Variant Identification Sequenced 9.3kb including putative promoter, exons, introns and conserved regions in 50 individuals from three different ethnic populations 16 variants genotyped in cohort of 522 Caucasian individuals from 92 families Plasma levels of IL-1, IL-6 and TNF- measured Results analyzed for association using SOLAR Association Analysis IL-1 IL-6 TNF- BQTN Analysis BQTN analysis strongly supported a model in which the G-105A SNP was responsible for the observed associations with estimated posterior probabilities of >0.999, 0.95, and 0.79 (for TNF-, IL-1, and IL-6 respectively) Analysis indicates the G-105A SNP is of direct functional consequence (or is highly correlated with a functional variant) Analysis performed to test the functionality of this G105A variant Effect of A or G variant on SEPS1 promoter activity under Tunicamycin stress conditions Basal Promoter activity (fold change in luc activity over basal) Tunicamycin 2.5 P = 0.00006 2 1.5 1 0.5 0 A variant G variant Physiological Role of SEPS1 Cytokine production, Apoptosis PM Activation of JNK, caspase12, NFkB Cytoplasm Poly-Ub ER ER lumen lumen Misfolded protein Derlin-1 p97 SelS 26S proteasome PM chaperone Proteins Folded protein chaperone Secretion Golgi mRNA Cell survival Nucleus Exploring the Effects of the SEPS1 G-105A QTN Looked at the in vivo effects of SEPS1 G-105A QTN on expression levels of SEPS1 and genes in the following Gene Ontology categories: Endoplasmic Reticulum Unfolded Protein Response Golgi Stack and Protein Transportation Oxidative Stress SEPS1 Expression is Correlated With Disease In Vivo Phenotype P-value 2 Hr Glucose Direction of Correlation Diabetes Risk 0.050 BMI 0.0006 Relative Fat 0.032 Triglycerides 0.0023 0.027 SEPS1 G-105A QTN Influences Expression In Vivo SEPS1 transcript is cis-regulated (as defined by quantitative trait linkage analysis). The rare A variant is associated with decreased expression in lymphocytes (p = 0.032). SEPS1 G-105A Associated Genes Gene Correlation p-value Function CSX2001 0.00096 Transcript. repressor CSX2002 0.0022 Golgi traffic w/ER CSX2003 0.0022 Golgi traffic w/ER SEPN1 0.0025 ER stress GRP94 0.0034 ER unfolded protein resp GLUT3 0.0055 Oxidative stress STX6 0.0068 TFNalpha secretion Acknowledgements Southwest Foundation for Biomedical Research Joanne Curran Eric Moses Matt Johnson Catherine Jett Tom Dyer Shelley Cole Harald Göring Jean MacCluer Charles Peterson Tony Comuzzie Laura Almasy ChemGenex Pharmaceuticals Jeremy Jowett Greg Collier Special thanks to the Azar family of San Antonio for their financial support of our research