Junk DNA and
DNA editing
‫מוצ"ש י"ג אייר‬
Shai Carmi
Bar-Ilan, BU
Genome structure
• DNA has mostly evolved to store the
code of the proteins its host cell is using.
• Thus, the main functional
units of any genome are 5’
protein coding genes.
• The central dogma of
molecular biology:
DNA→RNA → Protein
Final product:
Proteins are the
cellular machinery
20 amino acids
?‫מותר האדם מן הבהמה אין‬
• In human, protein coding sequences are only 2% of the
• All animals have the same order
of magnitude of genes
(few tens of thousands).
• Does non-coding DNA
determines complexity?
• Is everything else junk?
Non-coding DNA
• The rest codes for introns, promoters and enhancers
(regulation of expression), structural sequences (e.g.
telomeres), non-coding RNAs such as rRNA and tRNA
(translation), micro-RNA (silencing), snRNA (splicing).
• But this is not all!
• Almost HALF of the human genome is made of
mobile elements.
• Pieces of ~100-10k base pairs moving around the
genome in a cut&paste or copy&paste mechanisms.
DNA transposons
• DNA transposons:
cut&paste using the
enzyme transposase
(3% of the genome).
• Sometimes transfers
also host sequences.
• Increases the genome volume
only through repeats at the edges
or if happens during S-phase.
• Retrotransposons: copy&paste mechanism through
RNA intermediate.
• Main classes:
 LTR (retrovirus like, 8.7% of the genome).
 LINE (Long interspersed nuclear elements, 21.3%).
 SINE (Short interspersed nuclear elements, 13.6%).
• Retrotransposons behave like retroviruses.
• What are retroviruses?
• Retroviruses are pieces of (ss) RNA (DNA in other viruses)
wrapped in a capsid and envelope.
Few thousand bases
• They penetrate into the cell, and use the cell machinery to
replicate, assemble a new virus, and infect another cell.
• Example: HIV.
Retroviral infection
Retroviral proteins (advanced)
• Pol: Encodes a polyprotein with protease (cleavage of the retrovirus proteins).
 Reverse transcriptase (copy the RNA to DNA).
 RnaseH (degradation of RNA after reverse transcription).
 Integrase (integration of the DNA into the genome).
• Gag: Codes for core and structural proteins of the virus.
• Env: Glycoprotein that recognizes membrane receptors
of the host cell and initiate the process of infection.
• Complex splicing pattern, with partial overlap and
• It is commonly believed that ancient retroviral infection
in the germ line is the origin of nowadays
• How did they occupy 40% of the genome?
1. Transcription: genomic DNA→RNA.
2. Translation of viral proteins (if possible).
3. Reverse transcription: RNA → DNA by reverse
RETRO: violating the central dogma!
4. Insertion into new genomic locations, increasing the
number of genomic copies of the sequence.
• Mobile elements are like double edge sword.
Why are retrotransposons good?
• Serve as reservoir of sequences
for genetic innovation.
• Retroviral proteins have DNA binding capabilities
which can be exploited by the host cell.
• Regulate expression levels of existing genes.
• Change gene regulation networks:
• By copying a promoter, two sequences are
controlled by the same transcription factors (or in
other cases by RNA binding proteins or miRNA).
Why are retrotransposons bad?
• Retroelements generate mutations,
through direct insertion into genes,
or unequal homologous recombination.
• Responsible to 0.3-0.5% of all genetic disorders (e.g.
• Change the normal transcription of the gene (alter
promoter activity, anti-sense transcription, silencing
via methylation or miRNA binding).
• Alternative splicing and protein isoforms.
How can we stop them ???
Inhibition of retroelements
Few mechanisms exist:
• Accumulation of mutations results in non-autonomous
• Methylation and heterochromatin formation attenuates
transcription (LINE).
• RNA interference.
Probably we did:
• DNA editing
(1) Here we are, more complex than any
other organism.
(more to come).
(2) Most elements are inactive–
• Did we succeed?
only Alu and L1 are active with
insertion once in 100 births.
Basics of DNA editing
• The APOBEC3 family of proteins was found to restrict
retroviral replication. One of its mechanisms of operation
is by “Cytosine Deamination of the (-) strand DNA strand
after reverse transcription”. Meaning…
• APOBEC catalyzes some chemical modification of the
DNA just before it is integrated into the genome,
eventually generating G→A mutation (editing).
• (localization varies nucleus/cytoplasm).
• Inducing tens/hundreds of mutations (uracil excision?).
• Editing itself is not sufficient to stop replication- other
mechanisms are also used.
Basics of DNA editing
Evolution of APOBEC
• APOBEC3G is one of the most positively selected
genes (=changes the fastest).
• Ongoing arms race with HIV.
• In response to APOBEC,
HIV developed the Vif protein that can ubiquitinate
APOBEC (=send it to “recycle” (proteasome)).
• Different APOBECs restrict retroviruses/transposons
in different mechanisms (e.g., binding to RNA and
blocking reverse transcription).
DNA editing in the genome
• Some retrotransposons were edited by APOBEC, but
yet integrated into the genome.
• New mechanism of mutagenesis.
• So far, almost neglected by geneticists.
• Together with Erez Levanon, HMS.
• Analyzed retroelements in
mouse, human and chimp,
applying new statistical approach.
Main results
• Editing has fingerprints in
thousands of mouse IAP/MusD
retroelements, with distinguished
• Predicting hundreds of thousands editing sites.
• Edited IAPs are transcribed more than non-edited.
• Some edited IAPs overlap with introns and exons.
• Phylogenetic tree can be changed if considering
editing information.
• Editing also in non-LTR, LINE mouse elements.
• Editing in human and chimp HERV retroelements.
DNA editing demonstration
• Comparing two mouse
• One cluster of 68
consecutive G→A!
• Total 176/202
Easily available raw
material for the
generation of new
(for example: any
editing in TGG creates
premature stop codon).
accelerate evolution?
DNA editing phylogenetics
If two sequences are the same except
for G→A mutation, the sequence with
‘G’ must precede the one with ‘A’.
Thus we can build the tree of elements.
genetic tree.
Same tree,
masking the
Editing affects phylogenetics!
• Significant fraction of the DNA originates from infection by
ancient RNA viruses, spreading through the genome by
reverse transcription and replication.
• Some of them ‘domesticated’ to benefit the host cell (not
really junk!), but some induce deleterious mutations.
• One of the mechanisms to restrict retrotransposition is
editing them before integration into the genome.
• Many genomic sites are ‘edited’
due to this restriction activity.
• New mechanism of mutagenesis,
potentially leading to evolution of
new molecules or function
(for example, HIV drug resistance).