* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Welcome to Comp 665 - UNC Computational Genetics
DNA supercoil wikipedia , lookup
Epigenomics wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Zinc finger nuclease wikipedia , lookup
Human genome wikipedia , lookup
Cell-free fetal DNA wikipedia , lookup
Gene expression profiling wikipedia , lookup
Genealogical DNA test wikipedia , lookup
Cancer epigenetics wikipedia , lookup
Gene therapy wikipedia , lookup
Genetic drift wikipedia , lookup
Deoxyribozyme wikipedia , lookup
Human genetic variation wikipedia , lookup
Molecular cloning wikipedia , lookup
Oncogenomics wikipedia , lookup
Non-coding DNA wikipedia , lookup
Extrachromosomal DNA wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Genomic library wikipedia , lookup
Population genetics wikipedia , lookup
Gene expression programming wikipedia , lookup
No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup
Genome (book) wikipedia , lookup
Genome evolution wikipedia , lookup
Cre-Lox recombination wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Point mutation wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Genetic engineering wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Genome editing wikipedia , lookup
History of genetic engineering wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Designer baby wikipedia , lookup
Effective Population Size • Real populations don’t satisfy the Wright-Fisher model. • In particular, real populations exhibit reproductive structure, either due to geography or societal constraints. • The number of descendents in a generation depends on many factors (health, disease, etc.), as opposed to the implicit Poisson model. • Population size isn’t fixed, but changes over time 5/1/2017 Comp 790– Continuous-Time Coalescence 1 Sanity Check • When the Wright-Fisher model, or the basic coalescent, is used to model a real population, the size of the population (2N) cannot be taken literally. • For example, many human genes have a MRCA less than 200,000 years ago. • If we consider one generation per 20 years then there have been 200,000/20 = 10000 generations • Recall the average time to MRCA is 2, in Population scaled time, so with a population of 2N, the effective population size is 2N = 10000/2 N = 2500 5/1/2017 Comp 790– Continuous-Time Coalescence 2 Effective Population Size • Without an estimate of an MRCA one can still use coalescence to find the effective population size. • Recall for the discrete coalescent, the expected time for two genes to find a MRCA was E(T2) = 2N • Thus, E(T ) Ne 2 2 • This equation would be applied after tracing many paths of gene pairs, and E(T2) would be measured in actual generations rather than the normalized notion of time used in the continuous coalescent (where t=1.0 represents the time when the population size is 2N) 5/1/2017 Comp 790– Continuous-Time Coalescence 3 Moran Model • In 1958 Moran proposed an alternative to the Wright-Fisher model where reproductive generations overlap • Central idea, is that each epoch represents two events, the loss of one gene and its replacement by another • Rules out multiple coalescent events between epochs 5/1/2017 Comp 790– Continuous-Time Coalescence 4 Moran Formulation • Probability that 2 genes share a common ancestor in the previous generation, P(T2=1) is: 1 P(T2 1) N(2N 1) because only one of the pairs has a common ancestor 1 • Gives a geometric distribution with parameter N(2N1) and a natural time scale of N(2N -1) (compared to 2N for the Wright-Fisher model) 2N 2 5/1/2017 Comp 790– Continuous-Time Coalescence 5 Moran Use • When adjusted for differences in time scale the basic “continuous” coalescent holds for the Moran model as well • Moran model often leads to a more tractable computation than the Wright-Fisher model • The basic “continuous coalescent” is robust to the actual population model, whether it is Haploid or Diploid, WrightFisher or Moran, thus it is commonly used as a first-order approximation for making estimates about population structure, such as, how many variations one should expect in a sample size of N, and how long such divergences have existed 5/1/2017 Comp 790– Continuous-Time Coalescence 6 Dirty Details • Thus far, we’ve considered very simple, and admittedly oversimplified models of biological and genetic processes. • Next we’ll discuss many of the biological realities that the coalescent model either crudely approximates, or entirely ignores • We also want to move from our simple geocentric view to a more complete organism 5/1/2017 Comp 790– Continuous-Time Coalescence 7 Terminology • Gene: A unit of information transferred from generation to the next. • Allele: An alternative form of a gene, information that comes in two or more forms. • SNP: (acronym for Single Nucleotide Polymorphism) A position in a DNA’s sequence that can be found in multiple states of the 4 nucleotides (A, C, G, T). SNPs are one type of allele • Haplotype: A subsequence of DNA that includes only positions known to vary (SNPs) 5/1/2017 Comp 790– Continuous-Time Coalescence 8 Causes of Genetic Variation • Mutation: Changes in the genetic material of an organism. Events that actually modify genes potentially generating new alleles • Recombination: A process in which new gene combinations are introduced – Crossovers, Gene-conversion, Lateral Gene Transfer • Structural Rearrangement: Modifications that impact the number of old gene copies and their relative orderings – Insertions, Deletions, Inversions 5/1/2017 Comp 790– Continuous-Time Coalescence 9 Mutations • There are many ways of altering a gene, some common and some rare – Environmental exposure (radiation, chemical, etc.) – Random events (faulty DNA replication, other malfunctions of biochemical machinery) • Many mutations affect cells of an higher organisms without genetic ramifications (mutations of the so-called somatic cells), but they may be important to the organism (i.e. lead to cancer) • Mutations of the germline (gamete) cells are those of genetic interest because they impact the life of genes, as opposed to their protective organism 5/1/2017 Comp 790– Continuous-Time Coalescence 10 Sequence Organization • The DNA sequence is broken into several independent segments organized into structures called chromosomes • Chromosomes vary between different organisms. The DNA molecule may be circular or linear, and can contain from 10,000 to 1,000,000,000 nucleotides. • Simple single-cell organisms (prokaryotes, cells without nuclei such as bacteria) generally have smaller circular chromosomes, although there are many exceptions. • More complicated cells (eukaryotes, with nuclei) have linear DNA molecules that are broken into segments and wound around special proteins. The aggregates are called chromosomes. 5/1/2017 Comp 790– Continuous-Time Coalescence 11 Monoploid Number • The number of fragments that DNA is broken into leads to a distinct number of chromosomes. The number is called the monoploid number. 5/1/2017 Organism Unique Chromosomes Human 23 Chimpanzee 24 Mouse 20 Dog 39 Horse 32 Donkey 31 Hare 23 Comp 790– Continuous-Time Coalescence 12 Diploidy and Polyploidy • Having only one copy of DNA is a risky proposition, since the loss of a single functional gene could lead to a bad outcome • Evolution has addressed this obvious shortcoming by incorporating a mostly redundant copy of the entire sequence in most cells • The haploid number is the number of chromosomes in a gamete of an individual. • Nearly all mammals are diploid and receive a homologous sequence from each parent • Many plants carry more than 2 copies of there sequence, 4 and 8 are typical, and the number can vary between subspecies. 5/1/2017 Comp 790– Continuous-Time Coalescence 13 Crossover Recombination • In the formation of gametes (sperm and ovum) homologous DNA strands are combined in a process called crossover • This effectively combines the prefix of one sequence with the suffix of another 5/1/2017 Comp 790– Continuous-Time Coalescence 14 Gene Conversion Recombination • The DNA sequence is transferred from one copy (which remains unchanged) to another, whose sequence is altered. • Results from the repair of damaged DNA as described by the Double Strand Break Repair Model. 5/1/2017 Comp 790– Continuous-Time Coalescence 15 Lateral Gene Transfer • Any process in which an organism incorporates genetic material from another organism without being the offspring of that organism. • Horizontal gene transfer is a confounding factor in inferring phylogenetic trees based on sequences. • One of the most prevalent forms of recombination in “early” evolution 5/1/2017 Comp 790– Continuous-Time Coalescence 16 Structural Rearrangements • Large scale structural changes (deletions/insertions/inversions) may occur in a population. Wi’07 Vineet Bafna