Download Slide 1

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Comparative Genomics: Overview
Shrish Tiwari
CCMB, Hyderabad
Introduction
• Sequences of 340 species available
(274 bacterial, 25 archaeal and 41
eukaryotic)
• An additional 848 prokaryotic and
560 eukaryotic genome projects are
ongoing
• Comparison of genomes can provide
insights into the functional regions
as well as genome dynamics
Sequence Comparison
• Let us look at a simple example
A A T T G A - A T C G C C A
A – A T C A C A G – G A T C
5 matches, 6 mismatches, 3 indels
A A T T G A – A T C G C - C A
A A T – C A C A – G G A T C –
7 matches, 3 mismatches, 5 indels
Sequence Comparison
• Requirements for sequence comparison:
– A scoring scheme or scoring matrix
– A search algorithm to identify the optimal
alignment
• Scoring matrices available: PAM,
BLOSUM
• Search algorithm used: Dynamic
programming
Applications
• Tracing our origins and history
• Assessing the diversity of a species
• Finding virulence genes
• Designing primers for novel species
• Identifying disease-causing mutations
• Predicting mutations in viral genome
and design vaccines
Comparative Genomics
• Of distantly related species: look for
similarities/conserved regions to infer
functional regions of the genome;
example mouse and man
• Of closely related species: look for
differences, identify subtle mutations
that make one species different from
the other, understand how genomes
evolve; examples chimp and man,
virulent E. coli and benign E. coli
Comparative Genomics
• Comparison of the 73Kbp region of
human β-globin with mouse and chimp
genome shows 1) small stretches
covering the first two exons and
intervening intron matching at ~73%
identity between human and mouse, 2)
almost the complete 73Kbp region
matches at ~97% for human and chimp
How different are we?
• Physical similarity is striking
How different are we?
• Socially, we have similar behaviour,
including cooperation, warfare, politics
and even bribery
Ape the toolmaker
Chimp Genome: Statistics
• Sequence of a single male captive-born
chimpanzee from West Africa subspecies Pan troglodytes verus, obtained
using a whole genome shotgun
approach
• Assembly of the genome was done with
PCAP and ARACHNE programs
• PCAP is a de novo assembly method;
ARACHNE uses the human genome build
34 to facilitate and confirm contig linking
and has more continuity
Chimp Genome: Statistics
• 3.6 fold redundancy of autosomes
and 1.8 fold for sex chromosomes;
covers 94% of chimp genome with
>98% of the sequence in high quality
bases (quality score >40, error rate
<10-4)
• 50% of the sequence (N50) in contigs
of length >15.7Kbp and supercontigs
of length >8.6Mbp
Chimp Genome Sequence
• Chimp genomes are polymorphic
within and between subspecies
• 1.66 million high-quality SNPs
identified, of which 1.01 million are
heterozygous in the primary donor
• Diversity rates among West African
chimps is 8x10-4 (roughly the same as
human diversity) and 17.6x10-4 among
Central African chimps
Genome Comparison
• Genome comparisons can help to
reveal the molecular basis of these
traits as well evolutionary
mechanisms that have moulded our
species
• Reciprocal nucleotide-level alignment
of the chimp and human genome
covers ~2.4Gbp of high quality
sequence
Genome Comparison
• Observed difference nearly always a
single event in time and not multiple
independent changes over time
• Most differences reflect random drift
and hold extensive information about
mutational processes
• A minority of functionally important
changes underlie our phenotypic
differences
Segmental Duplication
• Has had a larger impact (~2.7%) in
altering the genomic landscape than
single nucleotide substitutions
(~1.2%)
• They are responsible for the
emergence of new genes and
adaptation of humans to their
environment
• Human genome particularly enriched
in genes resulting from recent
duplications
Segmental Duplication
• 33% of human duplications (>94%
identity) are not duplicated in
chimpanzee
• An estimated duplication rate of 45Mbp per million years
• These have resulted in differences in
gene expression, disease-causing
duplications and change in the
genomic landscape in general
Segmental Duplication
• Chimp only duplications: 11 out 17
were found only in chimp and not in
man or other great apes in a crossspecies comparison, whereas 6 were
found also in gorilla
• De novo duplications followed by
deletion of older duplications are the
most likely scenarios for excess of
segmental duplications observed in
human-ape genomes
Gene Evolution
• 13,454 pairs of human and chimp
genes with unambiguous 1:1
orthology were used
• Rate of evolution of a gene assessed
using the non-synonymous
substitution rate KA
Gene Evolution
• The background rate is estimated as
the synonymous substitution rate Ks
• KA/Ks is a measure of evolutionary
constraint on a gene
• KA/Ks > 1 implies adaptive or positive
selection, under the assumption that
synonymous changes are neutral
Gene Evolution
• KA/Ks = 0.23 for human-chimpanzee
lineage  77% of amino acid
substitutions are removed by natural
selection
• CpG and non-CpG substitution at
synonymous sites show lower
duvergence, ~50% and ~30% lower
respectively, than in introns, implying
evolutionary constraint on
synonymous substitutions
Gene Evolution
• 585 gene of the 13,454 human-chimp
orthologues have KA/KI > 1
• Given the low divergence between
human-chimp genome, KA/KI statistic
has large variance
• Simulations show that KA/KI > 1
would be expected to occur by
chance in 263 cases, if purifying
selection acts non-uniformly on
genes
Gene Evolution
• The extreme outliers are:
– glycophorin C, mediates P. falciparum
invasion pathways in human
erythrocytes
– granulysin, mediates antimicrobial
activity against intracellular pathogens
– protamines & semenogelins involved in
reproduction
– Mas-related gene family involved in
nociception
Conclusions
• Mean rate of single nucleotide changes
1.23%, <1.06% corresponding to fixed
divergence
• Regional variations same in hominid
and murid genomes except at
subtelomeric regions
• 25% changes in CpG which are similar
in both male and female germ lines
• Indels fewer but account for 1.5% of
euchromatic sequence being lineage
specific
Conclusions
• SINEs have been more active in human
while chimp has acquired two new
retroviral elements
• Orthologous proteins differ by 2 amino
acids, with ~29% identical
• Amino acid altering changes are more
frequent in hominids compared to
murids, but close to changes seen
human polymorphisms
• Substitution rate at silent sites lower
than at intronic sites => purifying
selections
Is Y going extinct?
• X and Y chromosomes have evolved
from an autosomal pair in ancient
mammal nearly 300 million years ago
• Most of Y genes in the X-degenerate
regions
• X-degenerate region of Y does not
recombine, which may lead to rapid
gene loss
• Rate of gene loss estimated at 5 genes
every million years
Is Y going extinct?
• Assuming gene loss occurs randomly
and that human and chimp separated
nearly 6 million years ago, many chimp
Y genes are expected to have no
functional orthologues in human
• Orthologues of all human X-degenerate
genes and pseudogenes were searched
• Chimpanzee orthologues of 16 genes
and 11 pseudogenes were identified
Is Y going extinct?
• All the 11 chimp orthologues of the
human pseudogenes were
pseudogenes in the chimp as well, with
majority of inactivating mutations
shared
• This indicates that none of the
pseudogenes were lost between human
and chimp in the last 6 million years
• GenScan and BLAST analysis of the
chimp X-degenerate Y transcripts
revealed that none were chimp specific
Is Y going extinct?
• Divergence of X-degenerate exons was
compared with those of introns for
genes as well as pedudogenes
• The divergence was found to be less in
the exons than introns for genes, but
same or more in pseudogenes
• These results suggest that purifying
selection has been more effective
during human evolution than
previously assumed
J.F. Hughes et al. (2005) Nature 437, 101-104
Summary
• While we can learn a lot from a
comparison of the human-chimp
genomes, they are too much alike to
get meaningful answers to many
questions, e.g. a DNA sequence
found in humans but missing in
chimps: was it added in humans or
lost in chimps?
Summary
• A difference found could be
significant or just a variant within one
species
• Sequences of other primates will be
needed to establish the uniqueness
of changes seen in human and
chimps
• Genomes of primates like the orangutan and rhesus macaque are
expected soon
Origin of Clothing
• Humans infested with head and body
lice
• Head louse lives and feeds on the
scalp
• Body louse lives in clothing and feeds
on body
• Chimp louse used as outgroup
Origin of Clothing
• 2 sequences from mtDNA (ND4 and
CYTB) and 2 from nuclear DNA (EF-1
and RPII) from 40 lice (26 head lice
and 14 body lice) from 12 different
geographic regions were used for
analysis along with one chimpanzee
louse
• Trees built using ND4 and CYTB
nearly identical
Origin of Clothing
• Results:
– Greater diversity seen in African lice
than in non-African lice  African origin
for body lice
– Body louse originated ~72000 years ago
(assumption human and chimp lice
diverged ~5.5 million years ago)
– Demographic expansion of body lice
correlates with the spread of modern
humans out of Africa
Origin of Clothing
• Results indicate a recent origin of
clothing ~72000 years
R, Kittler, M. Kayser and M. Stoneking
(2003) “Molecular evolution of Pediculus
humanus and the origin of clothing”
Current Biology 13, 1414-1417
Conclusions
• Genomes of human and model
organisms were sequenced in order to
understand ourselves at the molecular
level
• Comparative genomics studies have
revealed interesting features of genome
evolution so far
• This is just the tip of the iceberg!!
Related documents