* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Ensembl Variations
Genetic drift wikipedia , lookup
Population genetics wikipedia , lookup
Frameshift mutation wikipedia , lookup
Medical genetics wikipedia , lookup
Pharmacogenomics wikipedia , lookup
Non-coding DNA wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Segmental Duplication on the Human Y Chromosome wikipedia , lookup
Whole genome sequencing wikipedia , lookup
Human–animal hybrid wikipedia , lookup
Gene desert wikipedia , lookup
Hardy–Weinberg principle wikipedia , lookup
Metagenomics wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Copy-number variation wikipedia , lookup
Genome evolution wikipedia , lookup
Microevolution wikipedia , lookup
Dominance (genetics) wikipedia , lookup
Genome editing wikipedia , lookup
Genealogical DNA test wikipedia , lookup
Behavioural genetics wikipedia , lookup
Helitron (biology) wikipedia , lookup
Human leukocyte antigen wikipedia , lookup
Pathogenomics wikipedia , lookup
Human Genome Project wikipedia , lookup
Public health genomics wikipedia , lookup
Human genome wikipedia , lookup
Molecular Inversion Probe wikipedia , lookup
Human genetic variation wikipedia , lookup
Haplogroup G-M201 wikipedia , lookup
Sequence Variation in Ensembl 1 of 25 Outline • • • • • 2 of 25 SNPs SNPs in Ensembl Linkage disequilibrium SNPs in BioMart DAS sources Single nucleotide polymorphisms (SNPs) • Two human genomes differ by ~0.1% • Polymorphism: a DNA variation in which each possible sequence is present in at least 1% of people • Most polymorphisms (~90%) take the forms of SNPs: variations that involve just one nucleotide • ~1 out of every 300 bases in the human genome • ~10 million in the human genome 3 of 25 Functional Consequences 4 of 25 • SNPs in coding area that alter aa sequence Cause of most monogenic disorders, e.g: Hemochromatosis (HFE) Cystic fibrosis (CFTR) Hemophilia (F8) • SNPs in coding areas that don’t alter aa sequence May affect splicing • SNPs in promoter or regulatory regions May affect the level, location or timing of gene expression • SNPs in other regions No direct known impact on phenotype, useful as markers Practical Applications • • • • • Disease diagnosis Association studies Pharmacogenomics Forensic testing Population genetics and evolutionary studies • Marker-assisted selection 5 of 25 Practical Applications 6 of 25 SNPs in Ensembl • Most SNPs imported from dbSNP (rs……): • Imported data: alleles, flanking sequences, frequencies, …. • Calculated data: position, synonymous status, peptide shift, …. • For human also: • • • • • HGVbase TSC Affy GeneChip 100K and 500K Mapping Array Affy Genome-Wide SNP array 6.0 Ensembl-called SNPs (from Celera reads and Jim Watson’s and Craig Venter’s genomes) • For mouse, rat, dog and chicken also: • Sanger- and Ensembl-called SNPs (other strains / breeds) 7 of 25 dbSNP • Central repository for simple genetic polymorphisms: • single-base nucleotide substitutions • small-scale multi-base deletions or insertions • retroposable element insertions and microsatellite repeat variations • http://www.ncbi.nlm.nih.gov/SNP/index.html • For human (dbSNP build 128): • 34,434,159 submissions (ss#’s) • 11,883,685 RefSNP clusters (rs#’s) • 6,262,709 validated • 737,679 with frequency 8 of 25 SNPs in Ensembl - Types 9 of 25 Non-synonymous Synonymous Frameshift Stop lost Stop gained In coding sequence, resulting in an aa change In coding sequence, not resulting in an aa change In coding sequence, resulting in a frameshift In coding sequence, resulting in the loss of a stop codon In coding sequence, resulting in the gain of a stop codon Essential splice site Splice site In the first 2 or the last 2 basepairs of an intron 1-3 bps into an exon or 3-8 bps into an intron Upstream Regulatory region 5' UTR Intronic 3' UTR Downstream Intergenic Within 5 kb upstream of the 5'-end of a transcript In regulatory region annotated by Ensembl In 5' UTR In intron In 3' UTR Within 5 kb downstream of the 3'-end of a transcript More than 5 kb away from a transcript SNPs in Ensembl - Species • • • • • • 10 of 25 Human Chimp Mouse Rat Dog Cow • • • • • Platypus Chicken Zebrafish Tetraodon Mosquito Caveat For human, mouse and rat Ensembl defines all SNP alleles respective to the + strand of the genome assembly! (to be able to merge dbSNP data with Sanger resequencing data) Exceptions: Those cases where SNPs are shown as part of a sequence 11 of 25 5 MINUTE EXERCISE A missense SNP, C1858T, in PTPN22 (Tyrosine-protein phosphatase non-receptor type 22) has been identified as a genetic risk factor for rheumatoid arthritis. This SNP is also referred to as R620W. 12 of 25 1. Find the SNPView page for this SNP. 2. Why are the alleles on this page given as A/G? 3. What is the minor allele of this SNP in Caucasians? SNPs in Ensembl GeneSNPView (1) Transcript InterPro domains SNP alleles 13 of 25 SNPs in Ensembl GeneSNPView (2) 14 of 25 SNPs in Ensembl TranscriptSNPView (1) Shows SNP alleles in different: • Individuals (human): Celera HuAA, HuCC, HuDD and HuFF, Craig Venter, Jim Watson • Strains (mouse, rat) • Breeds (chicken, dog) 15 of 25 SNPs in Ensembl TranscriptSNPView (2) Different individuals Resequencing coverage SNP alleles Alleles in different individuals 16 of 25 SNPs in Ensembl TranscriptSNPView (3) 17 of 25 5 MINUTE EXERCISE 18 of 25 1. Find the TranscriptSNPView page for human PTPN22. 2. Do all individuals (HuAA, HuCC, HuDD, HuFF, Venter and Watson) have resequence coverage at the position of the C1858T (R620W) SNP? 3. Has any of the individuals a higher risk to get rheumatoid arthritis based on its genotype at this position? 4. Is there an individual that is heterozygote at this position? Haplotypes and Linkage Disequilibrium A haplotype is a set of SNPs on a single chromatid that are statistically associated Linkage disequilibrium describes a situation in which some combinations of SNP alleles occur more or less frequently in a population than would be expected from a random formation of haplotypes from alleles based on their frequencies 19 of 25 Measures of LD • D = P(AB) – P(A)P(B) • D ranges from – 0.25 to + 0.25 • D = 0 indicates linkage equilibrium • dependent on allele frequencies, therefore of little use • D’ = D / maximum possible value • D’ = 1 indicates perfect LD • estimates of D’ strongly inflated in small samples • r2 = D2 / P(A)P(B)P(a)P(b) • r2 = 1 indicates perfect LD • measure of choice 20 of 25 Linkage Disequilibrium LDView It is also possible to export SNP information for upload into the HaploView software tool 21 of 25 Linkage Disequilibrium LDTableView 22 of 25 5 MINUTE EXERCISE Retrieve all non-synonymous SNPs for the human CFTR gene using BioMart and export their id, genomic position, alleles and peptide shift (hint: which dataset should you start with?). 23 of 25 DAS Sources For human, data from the following DAS Sources can be visualised on ContigView: 24 of 25 • DGV and DGV loci: Structural variations from the Database of Genomic Variations (CNVs, InDels, inversions etc.) • RedonCNV regions and RedonCNV loci: Copy number variations from Redon et al. paper • SegDup Washu: Segmental Duplications, University of Washington Q U E S T I O N S A N S W E R S 25 of 25