Download Genome evolution: a sequence

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Replisome wikipedia , lookup

RNA wikipedia , lookup

Cell-free fetal DNA wikipedia , lookup

Long non-coding RNA wikipedia , lookup

Public health genomics wikipedia , lookup

Cancer epigenetics wikipedia , lookup

Genetic engineering wikipedia , lookup

RNA silencing wikipedia , lookup

Nucleic acid analogue wikipedia , lookup

Nucleic acid tertiary structure wikipedia , lookup

Epigenomics wikipedia , lookup

Metagenomics wikipedia , lookup

Genome (book) wikipedia , lookup

Cre-Lox recombination wikipedia , lookup

Point mutation wikipedia , lookup

Oncogenomics wikipedia , lookup

History of RNA biology wikipedia , lookup

Mitochondrial DNA wikipedia , lookup

Epitranscriptome wikipedia , lookup

NUMT wikipedia , lookup

Designer baby wikipedia , lookup

Epigenetics of human development wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Transposable element wikipedia , lookup

Non-coding RNA wikipedia , lookup

Extrachromosomal DNA wikipedia , lookup

Pathogenomics wikipedia , lookup

No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup

Deoxyribozyme wikipedia , lookup

Microevolution wikipedia , lookup

Short interspersed nuclear elements (SINEs) wikipedia , lookup

Whole genome sequencing wikipedia , lookup

Gene wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Primary transcript wikipedia , lookup

RNA-Seq wikipedia , lookup

History of genetic engineering wikipedia , lookup

Human genome wikipedia , lookup

Human Genome Project wikipedia , lookup

Minimal genome wikipedia , lookup

Genomic library wikipedia , lookup

Genomics wikipedia , lookup

Non-coding DNA wikipedia , lookup

Genome editing wikipedia , lookup

Helitron (biology) wikipedia , lookup

Genome evolution wikipedia , lookup

Transcript
Genome evolution:
a sequence-centric approach
Lecture 7: Brief evolutionary
history of everything
(Probability, Calculus/Matrix theory, some graph theory, some statistics)
Simple Tree Models
HMMs and variants
PhyloHMM,DBN
Context-aware MM
Factor Graphs
DP
Sampling
Variational apx.
LBP
EM
Generalized EM
(optimize free energy)
Probabilistic models
Genome structure
Inference
Mutations
Parameter estimation
Population
Inferring Selection
Genome Structure, Genome Information
Mutation
Genome structure
Genomic information
Selection
Diversity: Brief description of the tree of life
Genome structure: Size, Key features,
Mobile elements
Genome information: Proteins/RNA
genes, regulatory elements
Today: A lot of terminology, basic overview
?
RNA
Ribosome
Based
Proteins
Genomes Genetic Code
DNA
Based
Genomes
3.4 – 3.8 BYA – fossils??
3.2 BYA – good fossils
3 BYA – metanogenesis
2.8 BYA – photosynthesis
..
..
1.7-1.5 BYA – eukaryotes
..
0.55 BYA – camberian explosion
0.44 BYA – jawed vertebrates
0.4 – land plants
0.14 – flowering plants
0.10 - mammals
?
Membranes
Diversity!
Curated set of
universal proteins
Eliminating Lateral
transfer
Multiple alignment
and removal of bad
domains
Maximum likelihood
inference, with 4
classes of rate and a
fixed matrix
Bootstrap
Validation
Ciccarelli et al 2005
EUKARYOTES
PROKARYOTES
Presence of a nuclear membrane
(Also present in the Planktomycetes)
Organelles derived from endosymbionts
(also in b-protebacteria)
Cytoskeleton and vesicle transport
Tubulin-related protein, no microtubules
Trans-splicing
-
Introns in protein coding genes, spliceosome
Rare – almost never in coding
Expansion of untranslated regions of transcripts
Short UTRs
Translation initiation by scanning for start
Ribosome binds directly to a Shine-Delgrano
sequence
mRNA surveillance
Nonsense mediated decay pathway is absent
Multiple linear chromosomes, telomeres
Single linear chromosomes in a few eubacteria
Mitosis, Meiosis
Absent
Gene number expansion
-
Expansion of cell size
Some exceptions, but cells are small
Eukaryotes
Uniknots
Biknots
Eukaryotes
Uniknots – one flagela at some developmental stage
Fungi
Animals
Animal parasites
Amoebas
Biknots – ancestrally two flagellas
Green plants
Red algea
Ciliates, plasmoudium
Brown algea
More amobea
Strange biology!
A big bang phylogeny: speciations across a short time span?
Ambiguity – and not much hope for really resolving it
Vertebrates
Fossil based, large
scale phylogeny
Sequenced
Genomes phylogeny
0.5%
Human
Chimp
Gorilla
Orangutan
Gibbon
Baboon
Macaque
Marmoset
Primates
9%
1.2%
0.8%
3%
1.5%
0.5%
Flies
Yeasts
Genome
Size
Why larger genomes?
• Selflish DNA –
– larger genomes are a result of the proliferation of selfish DNA
– Proliferation stops only when it is becoming too deleterious
• Bulk DNA
– Genome content is a consequence of natural selection
– Larger genome is needed to allow larger cell size, larger nuclear
membrane etc.
Why smaller genomes?
• Metabolic cost: maybe cells lose excess DNA for energetic efficiency
– But DNA is only 2-5% of the dry mass
– No genome size – replication time correlation in prokaryotes
– Replication is much faster than transcription (10-20 times in E. coli)
Mutational balance
• Balance between deletions and insertions
– May be different between species
– Different balances may have been evolved
• In flies, yeast laboratory evolution
– 4-fold more 4kb spontaneous insertions
• In mammals
– More small deletions than insertions
Mutational hazard
Can we model genome
size evolution in a
quantitative way?
• No loss of function for inert DNA
– But is it truly not functional?
• Gain of function mutations are still possible:
– Transcription
– Regulation
Differences in population size may make DNA purging
more effective for prokaryotes, small eukaryotes
Differences in regulatory sophistication may make DNA
mutational hazard less of a problem for metazoan
Genome Structural features: centromeres/telomeres
Human
Rat – Partly acrocentric
Centromeres are essential and universally important for
proper cell division, but are highly diverging among species
Sattelites and repeats
Pericentromeric regions – more repeats
Telomeres are critical for genome maintenance
Sub telomeric regions – also repetitive
May be key to nuclear structure?
Genome Structural features: nuclear organization
The nucleus must be organized to allow functional transcription and
replication
Incredibly dense mesh of chromosomes, cytoskeleton, membranes
Transcription factories / chromosomal territories
“spacer DNA” may affect physical organization in unexpected ways
Inter- and Intra- chromosomal interactions
Entire genome may participate in regulating interactions
Genomic information: Protein coding genes
Modeling protein
coding genes
Modeling protein
structure/function
Structure is complex
Dependencies are not
confined by gene linear
coding
http://predictioncenter.org/
Genomic information: the gene repertoire is evolving
by duplication and loss
Genome information:
Introns/Exons
Genome information: RNA genes
mRNA – messenger RNA. Mature gene transcripts after introns have been
processed out of the mRNA precursor
miRNA – micro-RNA. 20-30bp in length, processed from transcribed “hair-pin”
precursors RNAs. Regulate gene expression by binding nearly perfect
matches in the 3’ UTR of transcripts
siRNA – small interfering RNAs. 20-30bp in length, processed from double
stranded RNA by the RNAi machinary. Used for posttranscriptional silencing
rRNA – ribosomal RNA, part of the ribosome machine (with proteins)
snRNA – small nuclear RNAs. Heterogeneous set with function confined to
the nucleus. Including RNAs involved in the Splicesome machinery.
snoRNA – small nucleolar RNA. Involved in the chemical modifications made
in the construction of ribosomes. Often encode within the introns of ribosomal
proteins genes
tRNA – transfer RNA. Delivering amino-acid to the ribosome.
piRNA - ???
miRNA
clusters
snRNA works
by binding
other RNAs
RNA
structure
affects
function
Computational perspective:
finding and understanding
RNAs and their evolution
Ultra-high throughput sequencing is transforming all aspects of biology
Ultra-high throughput sequencing is transforming all aspects of biology
Genome information: regulatory elements
Specialized proteins can bind DNA in a sequence
specific fashion
Genomes can therefore control the level of affinity of
each region to a large set of DNA binding proteins
DNA binding sites are typically short (<20bp)
Multiple binding sites at different affinities participate in
regulation
Computational perspective: finding and understanding TFBSs
The regulatory process is likely to less deterministic
and discrete the this beautiful idealized sea urchin
regulatory network
Each regulatory interaction is parameterized and
many additional weak interaction participate in the
Process
Evolution of regulatory regions involve more than
a small set of discrete 20bp sites
Chromatin Immunoprecipitation is mapping DNA binding
sites
Structure meets information: packaging and chromosomal
interactions are critical for proper genome function
Structure meets information: HOX clusters as an example
Hox genes are important developmental
regulators
Present in linear clusters, preserving
order
Their expression is frequently coordinate
with the gene order
4 HOX clusters are present in the human
genome
Additional gene clusters: Protocadherins,
Olfactory receptors, MAGE genes, Zinc
fingers
Additional smaller groups of related
regulators are co-located
Mapping chromosomal interactions: 4C
Repeats: selfish DNA
Repetitive elements in the human genome
Class
Copies
Genome Fraction
LINEs
868,000
20.4%
(only ~100 active!!)
SINEs
1,558,000
(70% Alu)
13.1%
LTR elements
443,000
8.3%
Transposons
294,000
2.8%
Retrotransposition via
RNA
Repeats: short tandems, satellites
DNA-based transposons do not involve an RNA intermediate, and are quite
rare.
Satellite DNA duplicate by Replication slippages which is enhanced for specific
sequences. Abundant near telomeres and centromeres. Some of these are still
a mystery.
Retrotransposition is generally sloppy and noisy – so elements die out quickly
Element proliferation appears in evolutionary bursts.
Pseudogenes
Genes that are becoming
inactive due to mutations are
called pseudogenes
mRNAs that jump back into
the genome are called
processed pseudogenes
(they therefore lack introns)
Summary –
•
History/Phylogeny:
–
–
–
•
Genome structure
–
–
–
–
•
Early phylogenetics can be inferred using genome sequences, but conclusions are not
always reliable
Maximum likelihood models sometime depends on the gene/genomic region analyzed,
genome is highly heterogeneous at all levels.
The major clades, phylogeny of model organisms and sequenced genomes
Size and its consequences
Packaging and nuclear organization
Mutational effects and differences
Selfish DNA
Genome information
–
–
–
–
Protein coding genes
RNA genes
Transcription factor binding sites
Chromosomal organization and DNA codes that affect it