* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Orthology, Paralogy, Chains, and Nets - CS273a
Copy-number variation wikipedia , lookup
Metagenomics wikipedia , lookup
Gene desert wikipedia , lookup
Frameshift mutation wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Saethre–Chotzen syndrome wikipedia , lookup
Genetic engineering wikipedia , lookup
Zinc finger nuclease wikipedia , lookup
Koinophilia wikipedia , lookup
Transposable element wikipedia , lookup
Genomic imprinting wikipedia , lookup
Y chromosome wikipedia , lookup
Non-coding DNA wikipedia , lookup
Minimal genome wikipedia , lookup
Pathogenomics wikipedia , lookup
Oncogenomics wikipedia , lookup
Segmental Duplication on the Human Y Chromosome wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Human Genome Project wikipedia , lookup
Genomic library wikipedia , lookup
Human genome wikipedia , lookup
Neocentromere wikipedia , lookup
History of genetic engineering wikipedia , lookup
No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup
Gene expression programming wikipedia , lookup
X-inactivation wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Genome (book) wikipedia , lookup
Designer baby wikipedia , lookup
Point mutation wikipedia , lookup
Helitron (biology) wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Genome editing wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
This Friday 10am Beckman B-302 Introduction to the UCSC Browser. HW1 Due This Fri 10/15 at noon. http://cs273a.stanford.edu [Bejerano Fall10/11] 1 Lecture 7 Genome Evolution Chromosomal Mutations Paralogy & Orthology Chains & Nets http://cs273a.stanford.edu [Bejerano Fall10/11] 2 One Cell, One Genome, One Replication Every cell holds a copy of all its DNA = its genome. The human body is made of ~1013 cells. All originate from a single cell through repeated cell divisions. DNA strings = Chromosomes egg egg cell genome = all DNA cell division chicken chicken ≈ 1013 copies (DNA) of egg (DNA) http://cs273a.stanford.edu [Bejerano Fall10/11] egg 3 Mutation Rate per bp • 10-9 • • chicken egg chicken • • per base pair per cell division This refers to mutations that are not repaired Thus, there are at least six new mutations in each kid that were not present in either parent Mutations range from the smallest possible (single base pair change) to the largest – whole genome duplication. Selection does not tolerate all of these mutation, but it sure does tolerate some. 4 Example: Human-Chimp Genomic Differences Number of events 1% 3% Open question.. 5 Chromosomal (ie Big) Mutations • May Involve: – Changing the structure of a chromosome – The loss or gain of part of a chromosome Chromosome Mutations • Five types exist: – Deletion – Inversion – Translocation – Nondisjunction – Duplication Deletion • Due to breakage • A piece of a chromosome is lost Inversion • Chromosome segment breaks off • Segment flips around backwards • Segment reattaches Duplication • Occurs when a genomic region is repeated Whole Genome Duplication at the Base of the Vertebrate Tree Xen.Laevis WGD http://cs273a.stanford.edu [Bejerano Fall10/11] 11 Translocation • Involves two chromosomes that aren’t homologous • Part of one chromosome is transferred to another chromosomes Nondisjunction • Failure of chromosomes to separate during meiosis • Causes gamete to have too many or too few chromosomes • Disorders: – Down Syndrome – three 21st chromosomes – Turner Syndrome – single X chromosome – Klinefelter’s Syndrome – XXY chromosomes Chromosome Mutation Animation The Species Tree S S Sampled Genomes S Speciation 15 Time A Gene tree evolves with respect to a Species tree Gene tree Species tree Speciation Duplication Loss 16 Terminology Orthologs : Genes related via speciation (e.g. C,M,H3) Paralogs: Genes related through duplication (e.g. H1,H2,H3) Homologs: Genes that share a common origin (e.g. C,M,H1,H2,H3) Gene tree single ancestral gene Species tree Speciation Duplication Loss http://cs273a.stanford.edu [Bejerano Fall10/11] 17 Gene trees and even species trees are figments of our (scientific) imagination Species trees and gene trees can be wrong. All we really have are extant observations, and fossils. Observed Inferred Gene tree single ancestral gene Species tree Speciation Duplication Loss http://cs273a.stanford.edu [Bejerano Fall10/11] 18 Gene Families http://www.ncbi.nlm.nih.gov/Education/BLASTinfo/orthologs3.gif 19 Gu et al. Age distribution of human gene families shows significant roles of both large-scale and small-scale duplication in vertebrate evolution (2002) Nature Genetics 31; 205-208 20 Chaining Alignments Chaining highlights homologous regions between genomes (it bridges the gulf between syntenic blocks and base-by-base alignments. Local alignments tend to break at transposon insertions, inversions, duplications, etc. Global alignments tend to force non-homologous bases to align. Chaining is a rigorous way of joining together local alignments into larger structures. http://cs273a.stanford.edu [Bejerano Fall10/11] 21 “Raw” Blastz track (no longer displayed) Alignment = homologous regions Protease Regulatory Subunit 3 22 Chains & Nets: How they’re built • 1: Blastz one genome to another – Local alignment algorithm – Finds short blocks of similarity Hg18: Mm8: AAAAAACCCCCAAAAA AAAAAAGGGGG Hg18.1-6 + AAAAAA Mm8.1-6 + AAAAAA Hg18.7-11 + CCCCC Mm8.1-5 - CCCCC Hg18.12-16 + AAAAA Mm8.1-5 + AAAAA 23 Chains & Nets: How they’re built • 2: “Chain” alignment blocks together – Links blocks that preserve order and orientation – Not single coverage in either species Hg18: Mm8: AAAAAACCCCCAAAAA AAAAAAGGGGGAAAAA Hg18: AAAAAACCCCCAAAAA Mm8.1-6 + Mm8.12-16 + Mm8 Mm8.7-11 chains Mm8.12-15 + Mm8.1-5 + 24 Another Chain Example Ancestral Sequence A B C D E Human Sequence A B C D E Mouse Sequence A B C B’ D E In Human Browser Implicit Human sequence Mouse chains B’ … D … D In Mouse Browser E E Implicit Mouse sequence Human chains … … D E 25 Chains join together related local alignments likely ortholog likely paralogs shared domain? Protease Regulatory Subunit 3 http://cs273a.stanford.edu [Bejerano Fall10/11] 26 Chains • a chain is a sequence of gapless aligned blocks, where there must be no overlaps of blocks' target or query coords within the chain. • Within a chain, target and query coords are monotonically nondecreasing. (i.e. always increasing or flat) • double-sided gaps are a new capability (blastz can't do that) that allow extremely long chains to be constructed. • not just orthologs, but paralogs too, can result in good chains. but that's useful! • chains should be symmetrical -- e.g. swap human-mouse -> mousehuman chains, and you should get approx. the same chains as if you chain swapped mouse-human blastz alignments. • chained blastz alignments are not single-coverage in either target or query unless some subsequent filtering (like netting) is done. • chain tracks can contain massive pileups when a piece of the target aligns well to many places in the query. Common causes of this include insufficient masking of repeats and high-copy-number genes (or paralogs). [Angie Hinrichs, UCSC wiki] http://cs273a.stanford.edu [Bejerano Fall10/11] 27 Before and After Chaining http://cs273a.stanford.edu [Bejerano Fall10/11] 28 Chaining Algorithm Input - blocks of gapless alignments from blastz Dynamic program based on the recurrence relationship: score(Bi) = max(score(Bj) + match(Bi) - gap(Bi, Bj)) j<i Uses Miller’s KD-tree algorithm to minimize which parts of dynamic programming graph to traverse. Timing is O(N logN), where N is number of blocks (which is in hundreds of thousands) http://cs273a.stanford.edu [Bejerano Fall10/11] 29 Netting Alignments Commonly multiple mouse alignments can be found for a particular human region, particularly for coding regions. Net finds best match mouse match for each human region. Highest scoring chains are used first. Lower scoring chains fill in gaps within chains inducing a natural hierarchy. http://cs273a.stanford.edu [Bejerano Fall10/11] 30 Net Focuses on Ortholog http://cs273a.stanford.edu [Bejerano Fall10/11] 31 Nets • a net is a hierarchical collection of chains, with the highest-scoring non-overlapping chains on top, and their gaps filled in where possible by lower-scoring chains, for several levels. • a net is single-coverage for target but not for query. • because it's single-coverage in the target, it's no longer symmetrical. • the netter has two outputs, one of which we usually ignore: the targetcentric net in query coordinates. The reciprocal best process uses that output: the query-referenced (but target-centric / target singlecov) net is turned back into component chains, and then those are netted to get single coverage in the query too; the two outputs of that netting are reciprocal-best in query and target coords. Reciprocalbest nets are symmetrical again. • nets do a good job of filtering out massive pileups by collapsing them down to (usually) a single level. [Angie Hinrichs, UCSC wiki] http://cs273a.stanford.edu [Bejerano Fall10/11] 32 Before and After Netting http://cs273a.stanford.edu [Bejerano Fall10/11] 33