* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Variations
Quantitative trait locus wikipedia , lookup
Zinc finger nuclease wikipedia , lookup
Segmental Duplication on the Human Y Chromosome wikipedia , lookup
Minimal genome wikipedia , lookup
Primary transcript wikipedia , lookup
Epigenetics of neurodegenerative diseases wikipedia , lookup
Genome (book) wikipedia , lookup
Bisulfite sequencing wikipedia , lookup
Frameshift mutation wikipedia , lookup
Transposable element wikipedia , lookup
Oncogenomics wikipedia , lookup
History of genetic engineering wikipedia , lookup
Point mutation wikipedia , lookup
Gene desert wikipedia , lookup
No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup
Molecular Inversion Probe wikipedia , lookup
SNP genotyping wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Designer baby wikipedia , lookup
Copy-number variation wikipedia , lookup
Microevolution wikipedia , lookup
Non-coding DNA wikipedia , lookup
Genomic library wikipedia , lookup
Whole genome sequencing wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Genome-wide association study wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Pathogenomics wikipedia , lookup
Genome evolution wikipedia , lookup
Human genetic variation wikipedia , lookup
Public health genomics wikipedia , lookup
Genome editing wikipedia , lookup
Human genome wikipedia , lookup
Metagenomics wikipedia , lookup
Helitron (biology) wikipedia , lookup
Variation and Functional Genomics Overview of Talk • SNPs and InDels • Larger structural variants (CNVs) • Phenotype data • Individual genomes • HapMap variations and genotypes • Locus Specific Databases • LRGs 2 of 51 Genomic Diversity SNPs (Single Nucleotide Polymorphisms) base pair substitutions InDels insertion/deletion (frameshifts) occur in 1 in every 300 bp (human) ~3 billion base pairs in mammalian genomes! 3 of 51 Functional Consequences Type Consequence SNPs in coding area that alter aa sequence Cause of most monogenic disorders, e.g: Cystic fibrosis (CFTR) Hemophilia (F8) SNPs in coding areas that don’t alter aa sequence May affect splicing SNPs in promoter or regulatory regions May affect the level, location or timing of gene expression SNPs in other regions No direct known impact on phenotype Useful as markers 4 of 51 Sequence Polymorphisms Effects • Cause disease (SNP in clotting factor IX codes for a stop codon: haemophilia) • Increase disease risk (SNP in LDL receptor reduces efficiancy: high cholesterol) • Affect drug response (2 million hospitalized patients suffer serious adverse drug reactions, with more than 100,000 are fatal*) 5 of 51 Studying variation – why? • Determine disease risk • Individualised medicine (pharmacogenomics) • Forensic studies • Biological markers • Hybridisation studies, marker-assisted breeding • Understanding Evolution 6 of 51 Practical Applications 7 of 51 7 of 25 dbSNP http://www.ncbi.nlm.nih.gov/SNP/ 55 organisms covered: http://www.ncbi.nlm.nih.gov/projects/SNP/snp_summary.cgi 8 of 51 8 of 25 Small Scale Sequence Variants • Most SNPs and Indels are imported from dbSNP (rs……): • Imported data: alleles, flanking sequences, pop. frequencies • Calculated data: position, transcript effect • For human also: • HGMD (Human Gene Mutation Database) • HGVS (Human Genome Variation Society) • Affymetrix and Illumina variations • Ensembl-called SNPs (from aligned individual genomes) • For mouse, rat, dog and chicken also: • Sanger- and Ensembl-called SNPs (other strains/breeds) 9 of 9 of 5149 9 of 25 SNPs and InDels in Ensembl Non-synonymous Synonymous Frameshift Stop lost codon Stop gained codon In coding sequence, resulting in an aa change In coding sequence, not resulting in an aa change In coding sequence, resulting in a frameshift In coding sequence, resulting in the loss of a stop In coding sequence, resulting in the gain of a stop Essential splice site In the first 2 or the last 2 basepairs of an intron Splice site 1-3 bps into an exon or 3-8 bps into an intron Upstream Regulatory region 5' UTR Intronic 3' UTR Downstream Intergenic Within 5 kb upstream of the 5'-end of a transcript In regulatory region annotated by Ensembl In 5' UTR In intron In 3' UTR Within 5 kb downstream of the 3'-end of a transcript More than 5 kb away from a transcript 10 of 51 10 of 25 Small Scale Sequence Variants Ensembl Region in Detail View Colour-coded SNPs and InDels Legend 11ofof51 49 11 Polymorphisms in Ensembl • • • • • • • Chicken • Platypus Chimp • Tetraodon Cow • Zebrafish Dog Human Mouse Rat • Plants (Rice, Arabadopsis, Grapevine, Brachypodia) • Yeast • Fly • Mosquito • Plasmodium falciparum 12 of 51 CNV in human Structural variants track 13 of 51 13/72 Phenotype Data Genome wide association data • 159 annotations from EGA http://www.ebi.ac.uk/ega • 2697 from NHGRI http://www.genome.gov/gwastudies/ 14ofof51 49 14 14/72 15 of 51 15/72 Somatic Variations: COSMIC 16 of 51 Population Data in Ensembl http://hapmap.ncbi.nlm.nih.gov/ http://www.1000genomes.org 17 of 51 17/72 Population Data Variation tab: Population genetics 18 of 51 Variation Tab • Flanking sequence • Population genetics and LD plots • Disease relationships (human) EGA, GWAS, HapMap, Clinical/LSDB • Ancestral alleles 19 of 51 Variation Views • View variations drawn on the sequence Gene tab: Sequence link, Transcript tab: Exons, cDNA, protein links • View a table of variations for each transcript Gene tab: Variation Table • View variations drawn along a transcript Gene tab: Variation Image 20 of 51 Comparison Views Human, Mouse, Rat, Dog and Cow have individual or strain comparisons: Comparison Image link at the left of the Transcript tab. 21 of 51 SNP Effect Calculator Click on Manage your data at the left of any page. Follow the link to “SNP Effect Predictor”. Paste in variation positions and alleles 22 of 51 SNP Effect Calculator Location, variation name in Ensembl, and consequence on amino acid sequence is returned. 23 of 51 Ensembl Variation • SNPs and InDels • Larger structural variants (CNVs) • Phenotype data • Individual genomes (human) • HapMap variations and genotypes • Locus Specific Databases • LRGs 24 of 51 Sequencing Individuals • Venter and Watson genomes • 1000 genomes project • HapMap 25 of 51 First diploid genomes for human Craig Venter: • Sequence & analysis ongoing since 2003 Jim Watson: • 454 technology (7.4x) • 100 mill unpaired reads (25 billion bps) • $1,000,000 “The Diploid Genome Sequence of an Individual Human” PLoS Biology 5: 10 2113-2144 (2007) “The Complete Genome of an Individual by Massively Parallel DNA Sequencing” Nature 452:872-876 (2008) “Accurate Whole Human Genome Sequencing Using Reversible Terminator Chemistry ” Nature 456:53-59 (2008) “The Diploid Genome Sequence of an Asian Individual” Nature 456:60-65 (2008) 26 of 51 Reference Sequence • The Human Genome Project gave the “average” DNA sequence of a small number of people. • This helps us find out how a human develops and works • Does not show us the DNA differences between different humans • Does not reflect the major alleles 27 of 51 1000 Genomes Project www.1000genomes.org 1000 genomes track in Region in Detail 28 of 51 HapMap www.hapmap.org • A multi-country effort to identify and catalogue genetic similarities and differences in people. • Collaboration among scientists and funding agencies from Japan, the United Kingdom, Canada, China, Nigeria, and the United States. • All of the information generated by the project released into the public domain. 29 of 51 HapMap (phase III) • Genotypes from 1115 individual from 11 populations: • ASW African ancestry in Southwest USA (71) • CEU Utah residents with Northern and Western European ancestry from the CEPH collection (162) • CHB Han Chinese in Beijing, China (70) • CHD Chinese in Metropolitan Denver, Colorado (70) • GIH Gujarati Indians in Houston, Texas (83) • JPT Japanese in Tokyo, Japan (82) • LWK Luhya in Webuye, Kenya (83) • MEX Mexican ancestry in Los Angeles, California (71) • MKK Maasai in Kinyawa, Kenya (171) • TSI Toscani in Italia (77) • YRI Yoruba in Ibadan, Nigeria (163) 30 of 51 Haplotyping • A haplotype is a set of SNPs (on average ~25 kb) found to be statistically associated on a single chromatid and which therefore tend to be inherited together over time. • Haplotyping involves grouping subjects by haplotypes. 31 of 51 Locus specific databases (LSDB) • Databases that focus on one gene or one disease • e.g. p53, ABO, collagen • e.g. Albinism, cystic fibrosis, Alzheimer’s disease • User communities: •Clinicians – driven by genetic testing of patients • Research groups-disease and function driven 32 of 51 LSDBs • >1000 on the Human Genome Variation Society website 33ofof51 49 33 LSDB examples 34 of 51 Why is it difficult to merge these data? • Historical reasons. LSDBs sometimes • Use sequences which do not start at Methionine • Use transcript coordinates not genomic • Use a different transcript for reporting mutations • Regularly changes with new assemblies/gene builds • It may contain minor alleles or rare alleles • It may be inaccurate • Missing genes (e.g. no α-haemoglobin Thalasemia) • Mixture of sequences from different individuals 35 of 51 Ensembl and LRGs • Define an exchange format for LRGs with the NCBI • Create an LRG website • Create a pipeline for receiving the data and creating an LRG • Extend e! databases to store LRGs • Develop an API to query LRGs and associated annotation • Consult with the LSDBs to develop useful visualisation tools • Build displays for LRG data and annotation 36 of 51 EGA- Repository for genotype data • www.ebi.ac.uk/ega/ 37 of 51 Sequences Differing from the Reference • Common coordinate system for reporting mutations and variation data (stable sequence) • Locus Reference Genomic (LRG) • Ensembl displays LRGs • Project in collaboration with the NCBI and GEN2PHEN • Extension of the RefSeq gene project • View and Request LRGs here: http://www.lrg-sequence.org/ 38 of 51 Locus Reference Genomic LRG = Genomic sequence for reporting mutations (containing transcript) * Often differs from the reference assembly 39 of 51 LRGs in the Browser LRG_13 LRG transcripts and underlying All LRGs sequence can be viewed. http://www.ensembl.org/Homo_sapiens/LRG/Summary?lrg=LRG_13 40 of 51 Fiona Cunningham Variations Team Pontus Larsson Will McLaren Graham Ritchie 41 of 51 Functional Genomics (Wikipedia): Functional genomics is a field of molecular biology that attempts to make use of the vast wealth of data produced by genomic projects (such as genome sequencing projects) to describe gene (and protein) functions and interactions. In Ensembl: Regulatory build using ENCODE project information Promoters and Enhancers from CisRED and VISTA FlyReg features (for Drosophila) 42 of 51 ENCODE Encylopedia Of DNA Elements 14 June 2007, Nature Where are the promoter, enhancer, and other regulatory regions of the human genome? Pilot project showed: Use chromatin accessibility and histone modification analysis to predict TSS 43 of 51 Regulatory Build CTCF-binding sites DNAse1 hypersensitive sites TF binding sites These are “core features” Overlapping methylation sites expand these regions. http://www.ensembl.org/info/docs/funcgen/index.html 44 of 51 The Regulation Tab 45 of 51 How to get there? 46 of 51 The Location Tab 47 of 51 BioMart 48 of 51 There are other sets… Sequence motifs determined by experimental and prediction tools. http://www.cisred.org/ VISTA Enhancer Set Tissue-specific enhancers. Tested experimentally. Nucleic Acids Res. 2007 January; 35(Database issue): D88–D92. 49 of 51 Gene Regulation Summary • Homo sapiens • • • DNase I hypersensitivitiy, CTCF binding sites, TF binding sites (core features) Histone modification data MeDIP-chip methylation data for 17 human tissues and cell lines • • • • VISTA Enhancer Assay (http://enhancer.lbl.gov) cisRED motifs (www.cisred.org) miRanda microRNA target prediction Expression Quantitative Trait Loci (eQTL) from the Sanger Institute • Mus musculus • • • DNase1 Hypersensititvity site (ES cells) Histone modifications for ES, MEF, and NPC cells cisRED motifs (www.cisred.org) • Danio rerio • ZFMODELS-enhancers • Drosophila melanogaster • • • REDfly TFBSs BioTIFFIN REDfly CRMs 50 of 51 Functional Genomics • eFG Ian Dunham • Nathan Johnson • Daniel Sobral • Andy Yates • ENCODE • Steven Wilder • Damian Keefe 51 of 51