Download Isochores and Genes: Who`s in the Driver`s Seat?

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Genetic engineering wikipedia , lookup

Segmental Duplication on the Human Y Chromosome wikipedia , lookup

Epistasis wikipedia , lookup

Neocentromere wikipedia , lookup

Gene therapy wikipedia , lookup

Polyploid wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Point mutation wikipedia , lookup

Copy-number variation wikipedia , lookup

Transposable element wikipedia , lookup

Non-coding DNA wikipedia , lookup

Gene nomenclature wikipedia , lookup

Epigenetics of neurodegenerative diseases wikipedia , lookup

Human genetic variation wikipedia , lookup

Long non-coding RNA wikipedia , lookup

Oncogenomics wikipedia , lookup

Gene desert wikipedia , lookup

Y chromosome wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Public health genomics wikipedia , lookup

Quantitative trait locus wikipedia , lookup

Human genome wikipedia , lookup

Essential gene wikipedia , lookup

Pathogenomics wikipedia , lookup

Polycomb Group Proteins and Cancer wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

History of genetic engineering wikipedia , lookup

RNA-Seq wikipedia , lookup

X-inactivation wikipedia , lookup

Gene expression programming wikipedia , lookup

Gene wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Genomic imprinting wikipedia , lookup

Ridge (biology) wikipedia , lookup

Biology and consumer behaviour wikipedia , lookup

Microevolution wikipedia , lookup

Epigenetics of human development wikipedia , lookup

Minimal genome wikipedia , lookup

Genome evolution wikipedia , lookup

Designer baby wikipedia , lookup

Gene expression profiling wikipedia , lookup

Genome (book) wikipedia , lookup

Transcript
Genes and Isochores:
A Game of Chicken?
William H. Press
UT Austin
Harlan Robins
Fred Hutchinson Cancer Research Center
our convention
is to plot AT up
What are “Isochores”?
• Megabase regions in the genome of
dominant A+T or G+C
• Discovered chemically, before
sequencing! (Bernardi 1980s)
• In all mammals, birds, some
crocodiles, turtles (~300 Ma)
• (Other organisms are in effect 100%
AT isochore.)
• Probably not still forming:
decreasing in mammals last ~200
Ma (Belle et al.)
• No obvious correlations between
gene function and isochores.
Fish: no isochores
Human isochore map has evident regularities.
But are they significant or only anectodal?
Pioneer 10: first representation of
humans to leave the solar system
• chromosome ends almost
always CG isochores
– much larger than telomeres
– exceptions chr 9, 21
• whole chromosomes can be
CG (16, 17, 19)
• interior CG isochores
seemingly random
Genes also divide into two populations
by AT vs. CG richness
• Clearest separation for
3’UTRs
– by A+T vs. C-G
• But also find “cigar-like”
distributions for:
–
–
–
–
3rd codon usage
5’ UTR
intron
upsteam and
downstream regions
• As if a gene “likes” to be
either AT or CG
• Effect seen only in
species with isochores
Robins & Press (2005)
Nucleotide fraction maps to the interior
of a tetrahedron (since A+C+G+T=1)
• Vertices are 100% a
single base
• Edges linear
combination of two
bases, etc., etc.
• Shown here are the 3’
UTRs for 3000 random
human genes.
To good approximation there are 3 , not 4,
populations of genes in isochores:
iAT/AT, iCG/AT, and iCG/CG
In other words: few, if any, CG-rich genes in AT isochores!
We found significant functional differences
between AT and CG genes in GO word counts
Robins & Press (2005)
• AT rich genes are preferentially “early” processes: transcription,
translation, mRNA
• CG rich genes are preferentially “late” processes: signaling cascade,
receptor, membrane
We do not find significant functional differences
between iAT and iCG isochores, but we did
between AT and CG genes! How can this be?
iCG isochores contain a mixture
of CG and AT genes, so the GO
signal averages out.
iAT isochores contain predominantly
AT genes, but we find no functional
signal in AT isochores
Hypothesis: In CG isochores, some
genes resisted isochore formation for
functional reasons. Hence strong GO
signal. In AT isochores, the genes were
“never challenged”, and we see a “preisochore” mixture of the two functional
populations.
We can look for correlation between gene AT
richness and that of the flanking intergenic region
genes “lead
the charge”
genes “resist”
Results for human genes
CG genes strongly leading (not
lagging) in CG isochores
AT genes (in both AT and CG
isochores) weakly leading
(slope), but barely differ from
surround
Offset from zero due to AT
leading, or to slight functional
AT richness bias?
Amino Acid frequencies show that CG pressure in
the genome affected the proteome, too
• AA’s are overrepresented in CG
rich genes
(measured by
synonymous 3rd
codon) iff their
genetic code has C
and/or G rather than
A and/or T
• Essentially 100%
correlation if you
break ties by
A>G>C>T
AT rich
3rd codon decile
CG rich
The effect on the proteome is even clearer if we
look at fish orthologs to human-iCG/CG genes
• Indicates 1st and 2nd
position codon changes
(we already know 3rd
codons changed)
• Expect to see small
number of near-neutral
substitutions
• Instead, see large
numbers of substitutions
that make little
biochemical sense
• E.g., Pro +20%, Ala
+19%, Asn -21%, Ile 18%
underlying diagram after Betts and Russell (2003)
But they make sense if the pressure is to increase CG in
the genome despite any fitness cost in the proteome
27 changes turn
A,T to C,G
5 (italics)
are neutral
zero turn C,G
to A,T
Chickens have isochores, and
two gene populations, too
each dot is a gene
height is AT fraction
color is isochore type
color: human
height: chicken
each dot is a gene
height is AT fraction
color is isochore type
color: human
height: human
What you should have seen in the
blinking of the two previous slides
•
Almost all short chicken chr’s are iCG in both human and chicken
– sometimes AT/iCG are same genes (chr 13) but more often independent
•
•
Almost all human iCG in long chicken chr’s are chicken iAT (chr 1,2,3)
When chicken chr ends are human iAT, they are often chicken iCG
(chr 2R, 4L, 8R, 9R)
color: human
height: chicken
color: human
height: human
Compare human isochore map to painting some
chicken chromosomes onto human genome
paint blue chr 5, 10-32
paint red all other chr
paint blue AT isochores
paint red CG isochores
•
•
•
many interior isochores “explained”
many end isochores not “explained”
whole chromosomes 16, 17
“explained”, 19 ambiguous
Human iCG that are chicken iAT are at human, but
not chicken, chromosome ends
human chr ends
shown green
color: human
height: chicken
Can also ask whether same or different genes
became CG in chicken and human
use 4-component Gaussian mixture model
to find the proportions:
plug into a model for correlated and
independent fractions:
solving for the model parameters gives:
•
•
•
•
So, of all orthologous genes, 40% were “marked”
in the common ancestor to become CG rich in
both species
Of the remaining 60%, 15% independently
became CG rich in human, 15% independently in
chicken (agreement of values coincidental)
Hypothesis is that this reflects new chromosome
ends exposed after divergence
“Marked” (e.g., by ends) not necessarily same as
“realized” (e.g., by mutations)
Simplest hypothesis, marking and realizing in same epoch,
has problems with phylogeny
isochores?
yes
40% of genes marked
and realized here?
yes
yes
yes
NO!
“marking” = chromosome ends
“realizing” = start biased mutation
Also (preliminary), we can rule this out by a Markov model on aligned 3rd
codons by proving independent realization, even of ortholog genes.
Possibly, the onset of biased mutation occurred
coincidentally on two branches, but this seems unlikely!
isochores?
yes
40% marked
here
biased mutatation
starts independently
here and here
yes
yes
yes
NO!
“marking” = chromosome ends
“realizing” = start biased mutation
Most plausible may be an independent event blocking
squamates (or do they have unobvious isochores?)
isochores?
yes
mark here, and turn
on biased mutation at
chromosome ends
biased mutation
continues here
yes
yes
yes
X
“marking” = chromosome ends
“realizing” = start biased mutation
NO!
but is stopped here
Will be easy to sort this out when
we have full genomes of these
Where does this project need to go?
• The basic hypothesis is that exposure to chromosome ends
“causes” most (but not all) genes to become CG rich
– maybe (1st approximation) in proportion to the exposure time?
– can weakly predict, by function, the genes that don’t
• It’s easy to tell whether an ortholog gene became CG rich in a
common ancestor vs. independently on two branches
– so, we get time information on when the exposure occurred
• More relevant genomes are now available
– though mapping to chromosomes (from scaffolds) is about the last thing
done
• So, how can we use phylogenetic methods to estimate a gene’s end
exposure and test the hypothesis?
– how best to do “phylogeny of chromosome ends”
• And, of course, what is the biological significance of isochores?
– defense against chromosome breakage? why all of a sudden?
• or did something change in biased gene conversion (BGC)?
– defense against LINEs or another class of transposons?
Thank you for listening!
(backup slides follow)
Summary (I): What do we know?
• AT-richness is the ancestral state
– fish, frog, etc.
• CG isochores formed at chromosome ends
– human, chicken, and ancestors
– short chromosomes count as “all ends”
• Within CG isochores all base positions see evolutionary
pressure to become CG-rich
– introns, exons, and intergenic regions
– some genes resist and remain AT-rich
• can be understood functionally as “older” or more conserved
processes involving DNA, RNA processing
• associated with miRNA targets
– but most “led the charge”, more extreme than surroundings
• especially functionally associated with signaling, membrane
processes
• but “lagged the retreat” still possible as isochores “unform”
Summary (II): What can we guess?
• A process “turned on” in the amniote CA that strongly
favored AT CG at chromosome ends
– the “minimal hypothesis”
– many amino acid changes
– but was it a positive selection or just a strong mutation bias?
• Most of its active time was after the mammal/ reptile split
– preliminary: we have aligned orthologs to verify this
• It probably “turned off” (independently)
– early in the squamate ancestor
• unless they have occult isochores in some way
– more recently in mammals
• while nearly all iCGs are going away (comparing human, chimp,
monkey), a few iCG human chr ends have the opposite trend
Summary (III): What we don’t know
• What was the process, exactly?
– biased gene conversion?
– mutational hot spots with bias?
– positive selection for “stronger” chromosome ends?
• Why did it turn on?
• Why did it turn off?
– Or did it?
– Could it be episodic, with occasional large “advances”
balanced (or not balanced) by slow “retreats”
• Human chromosomes 3, 4, 6, 8, 15, and 20 may have iCGs
continuing to form at one or both chromosome ends
How to look for something seemingly as
vague as “challenged” and “resisted”?
iCG
hmm. a lot of blue dots
(CG genes) seem to be
at extrema
maybe also red
dots (AT genes) ,
but not as much?
iAT
can we measure
this objectively?
Since we are comparing variances, we must
be careful to use identical window functions
• Use intron as surrogate for “count where gene is”
– might expect small offset due to functional sequences in intron
• We require a gap > 2 x size of gene and center the intergene
window in the gap
• Background has fluctuations on all different scales
• Different windows differ on “regression to the mean”
• Safest to use congruent windows and pairwise comparison
Search for functional differences:
the Gene Ontology database can be mined by
word count statistics
prob. gene i
in + group
1 if word j describes
gene i, else 0
stat. sig. with which word j describes a
difference between the + and - groups
• Why not just use the
categories? Too many,
too sparse, too
inhomogeneous!
• But functional words
occur in (and thus link)
multiple category
definitions
• Also, word counts give
useable measurements of
statistical error
the software is available at
www.nr.com/bio/gowordcount
Can locate (objective, computable)
isochores by a simple Markov model
• AT isochore (iAT) “emits” 90%
AT (genes)
• CG isochore (iCG) emits AT
and CG 50/50
• switch state “by chance” only
every 1000 emissions (easily
over-ridden by posterior)
• can apply to either genes or
window counts
• results are insensitive to the
parameters chosen
• much better than smoothing
or filtering methods
We find the same isochores by the gene mixture
model as by genomic window counts
3’ UTR
window
3rd codon
• Markov model as described
• Genes: 50-50 mixture in CG
isochore, 90-10 mixture in AT
isochore
• Window counts: 80-20 or 2080 mixture of being
above/below median
• Small transition probability
encourages finding large
isochores; results very
insensitive to this value.
• Get 92% agreement between
isochores based on gene
3’UTRs and based on window
counts.
each dot is a gene
height is AT fraction
color is isochore type
Almost all microRNA targets
are genes with AT-rich 3’UTRs
• because evolutionarily old regulatory process? or because
richer conformation space of AT-rich mRNA?
• subject of another talk
But GO wordcounts demonstrate that miRNA
targets are regulators of both AT and CG
characteristic processes
The plotted ellipses are separable components
found by an unsupervised Gaussian mixture model
two in human
(AT-rich, CG-rich)
(model can assign a probability
to each gene of being in one
component vs. the other)
but only one
(AT-rich) in worm
and fly
Isochore formation was quite a train wreck!
There are at least 7 things to explain.
• (Role of genes) What made genes “lead the charge”
(i.e., be at special locations)?
• (Strength) Why was selection pressure be so strong as
to re-engineer vast numbers of proteins?
• (Scale) What made it correlate over 10s of Mbases?
• (Gene-gene correlation) Why is it larger than intergeneintergene on large scales?
• (Asymmetry) Why do iCG’s contain AT genes, but not
vice versa?
• (Relation to function) What made iCGs stratify according
to GO function?
• (Spatial broken symmetry) How did any specific region
decide to become an iCG?
Fish, with no isochores, shows pattern similar to
human, but much more moderate
Genes tend to be more
extreme than surround, both
for AT rich(er) and CG rich(er)
Offset of otherwise
symmetrical distribution
supports its being due to
small amount of functional
AT rich sequence in introns
Fish genome looks a lot
like human AT isochores.
But fish genes have little or no long-distance
correlation in AT, while human genes do
• Shown is structure
function (increase in
variance with distance)
• Human gene-gene is
stronger than human
gene-intergene or
intergene-intergene
• Again see that genes
are special, not just
passively carried along