Download Document

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Molecular cloning wikipedia , lookup

Histone acetylation and deacetylation wikipedia , lookup

Ridge (biology) wikipedia , lookup

List of types of proteins wikipedia , lookup

Genomic imprinting wikipedia , lookup

Deoxyribozyme wikipedia , lookup

RNA polymerase II holoenzyme wikipedia , lookup

Gene desert wikipedia , lookup

Cre-Lox recombination wikipedia , lookup

Eukaryotic transcription wikipedia , lookup

Non-coding RNA wikipedia , lookup

Gene expression profiling wikipedia , lookup

Point mutation wikipedia , lookup

Community fingerprinting wikipedia , lookup

Gene regulatory network wikipedia , lookup

Gene expression wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

RNA-Seq wikipedia , lookup

Gene wikipedia , lookup

Molecular evolution wikipedia , lookup

Promoter (genetics) wikipedia , lookup

Non-coding DNA wikipedia , lookup

Genome evolution wikipedia , lookup

Transcriptional regulation wikipedia , lookup

Silencer (genetics) wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Transcript
Genome organization
Eukaryotic genomes are complex and
DNA amounts and organization vary
widely between species.
Genome Organization
G
C value paradox:
The amount of DNA in the haploid
cell of an organism is not related
to its evolutionary complexity or
number of genes.
Highly Repeated Sequences
There are different classes of eukaryotic DNA based on
sequence complexity.
Amount of DNA in a Genome Does
Not Correlate with Complexity
105
106
107
108
109
basepairs
1010
1011
1012
How many genes do humans have?
Original estimate was between 50,000 to 100,000 genes
We now think humans have ~ 20,000 genes
How does this compare to other organisms?
Mice have ~30,000 genes
Pufferfish have ~35,000
Nematodes (C. elegans), have ~19,000
Yeast (S. cerevisiae) has ~6,000
The microbe responsible for tuberculosis has ~4,000
Single Copy Sequences
Exome
Even the Amount of DNA a Gene Spans
Differs Among Species
Problems?
• Some gene products are RNA (tRNA,
rRNA, others) instead of protein
• Some nucleic acid sequences that do not
encode gene products (noncoding
regions) are necessary for production of
the gene product (protein or RNA).
• Eukaryotic genes are complex!
Gene Identification
• Open reading frames
• Sequence conservation
– Database searches
– Synteny
• Sequence features
– CpG islands
• Evidence for transcription
– ESTs, microarrays
• Gene inactivation
– Transformation, RNAi
Unique genes
Noncoding regions
• Regulatory regions
– RNA polymerase binding site
– Transcription factor binding sites
• Introns
• Polyadenylation [poly(A)] sites
Splice Sites
• Eukaryotes only
• Removal of internal parts of the newly
transcribed RNA.
• Takes place in the cell nucleus
• Splice sites difficult to predict
One gene, many proteins
via alternative splicing , 3’ cleavage and polyadenlyation
Exon Shuffling
Trans-Splicing in Higher
Eukaryotes
Gingeras, Nature (2009) 461, 206-211
Non-contiguous Transcription Generates
An Enormous Number of Possible
Transcripts
• Trans-splicing exists in higher eukaryotes as well as in lower ones like Trypanosomes
Six 2-exon co-linear
combinations from four
exons
325 combinations of 3exons, non-colinear
Blue: only co-linear
Red: all combinations
• Reassortment of exons coding for ncRNA or protein domains could dramatically
increase number of functional products beyond the number of ‘genes’
Gingeras, Nature (2009) 461, 206-211
Why genome size isn’t the only concern
(size doesn’t matter?)
• More sophisticated regulation of expression?
• Proteome vastly larger than genome?
– Alternate splicing
– RNA editing
• Postranslational modifications?
• Cellular location?
• Moonlighting
Gene families
• E.g. globins, actin, myosin
• Clustered or dispersed
• Pseudogenes
Pseudogenes
• Nonfunctional copies of genes
• Formed by duplication of ancestral gene,
or reverse transcription (and integration)
• Not expressed due to mutations that
produce a stop codon (nonsense or
frameshift) or prevent mRNA processing,
or due to lack of regulatory sequences
Duplicated genes
• Encode closely related (homologous) proteins
• Formed by duplication of an ancestral gene
followed by mutation
 Five functional genes and two pseudogenes
Paralogs vs Orthologs
• Different members of the globin gene family are paralogs,
having evolved one from another through gene duplication.
Paralogs are separated by a gene duplication event.
• Each specific gene family member (e.g. a specific gene in
human) is an ortholog of the same family member in another
species (e.g. mouse). Both evolved from an ancestral globin
gene. Orthologs are separated by a speciation event.
• It is not always easy to distinguish true orthologs from
paralogs , especially in polyploid organisms!
Protein - coding sequences less than
1.5% of the genome in humans!
Noncoding RNAs (ncRNA)
• Do not have translated ORFs
• Small
• Not polyadenylated
Functions of Known lncRNAs
• Transcriptional interference
-lncRNA transcription turns off transcription of nearby gene
• Initiation of chromatin remodeling
- lncRNA transcription turns on transcription of nearby gene
• Promoter inactivation
- lncRNA binds to TFIIB and to promoter DNA
• Activation of an accessory protein
- lncRNA binds to allosteric effector protein TLS and inhibits
histone acetyltransferase, decreasing transcription
Ponting et al, Cell (2009) 136, 629-641
Functions of Known lncRNAs
• Activation of transcription factors
- binding of lncRNA to Dlx2 activates Dlx5/6 activity
• Oligomerization of an accessory protein
- lncRNA induces heat shock factor trimerization
• Transport of transcription factors
-lnRNA NRON keeps NFAT out of nucleus
• Epigenetic silencing of gene clusters
-Xist RNA inactivates X chromosome
• Epigenetic repression of genes in trans
-HOTAIR binds PRC2, leading to methylation
and silencing of several genes in HOXD locus
Ponting et al, Cell (2009) 136, 629-641
ncRNA
• ~97-98% of the transcriptional output of the
human genome is ncRNA
– Introns
– Transfer RNAs (tRNA)
• ~ 500 tRNA genes in human genome
– Ribosomal RNAs
• Tandem arrays on several
chromosomes
• 150-200 copies of 28S – 5.8S – 18S
cluster
• 200-300 copies of 5S cluster
Genome Organization - ncRNA
• The level of transcription from human
chromosomes 21 and 22 is an order of
magnitude higher than can be accounted for
by known or predicted exons
• Almost half of all transcripts from wellconstructed mouse cDNA libraries are
ncRNAs (identified because they do not code
for an open reading frame of larger than 100
codons)
Repeat sequences – 50% or
more of the genome
Repetitive DNA
• Moderately repeated DNA
– Tandemly repeated rRNA, tRNA and histone genes
(gene products needed in high amounts)
– Large duplicated gene families
– Mobile DNA
• Simple-sequence DNA
– Tandemly repeated short sequences
– Found in centromeres and telomeres (and others)
– Used in DNA fingerprinting to identify individuals
Segmental duplications
• Found especially around centromeres and
telomeres
• Often come from nonhomologous
chromosomes
• Many can come from the same source
• Tend to be large (10 to 50 kb)
• Unique to humans?
Repetitive DNA - Segmental duplications
Mobile DNA
• Moves within genomes
• Most of the moderately repeated DNA
sequences found throughout higher
eukaryotic genomes
– L1 LINE is ~5% of human DNA (~50,000
copies)
– Alu is ~5% of human DNA (>500,000 copies)
• Some encode enzymes that catalyze
movement
Repetitive DNA – Highly repetitive
satellite DNA