Download Document

Document related concepts

Epigenetics of human development wikipedia , lookup

Genomic imprinting wikipedia , lookup

RNA-Seq wikipedia , lookup

Metagenomics wikipedia , lookup

Whole genome sequencing wikipedia , lookup

Gene expression profiling wikipedia , lookup

Oncogenomics wikipedia , lookup

Point mutation wikipedia , lookup

Genomic library wikipedia , lookup

Public health genomics wikipedia , lookup

Transposable element wikipedia , lookup

Gene wikipedia , lookup

Non-coding DNA wikipedia , lookup

Pathogenomics wikipedia , lookup

Human–animal hybrid wikipedia , lookup

History of genetic engineering wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Genomics wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Human genetic variation wikipedia , lookup

Segmental Duplication on the Human Y Chromosome wikipedia , lookup

Genome editing wikipedia , lookup

Minimal genome wikipedia , lookup

Microevolution wikipedia , lookup

Designer baby wikipedia , lookup

Helitron (biology) wikipedia , lookup

Genome (book) wikipedia , lookup

Human Genome Project wikipedia , lookup

Human genome wikipedia , lookup

Genome evolution wikipedia , lookup

Transcript
ID _PANTR
HPI meeting
The chimp and us
25 April 2006
ID _PANPA
HPI meeting
The chimp and us
25 April 2006
Complete chimp (PANTR) genome publication: Nature, sept 2005
- Genome derived from one individual ‘Clint’ (male from west Africa)
- Inter vs intra (polymorphism) species differences !!!
- Individual human genome variation: 1bp/1000
- Individual chimp genome variation: 1bp/250 (estimation Varki (2000)
Cheeta has been recognized
by the Guinness Book of
World Records as the
world's oldest chimp.
Chimps rarely live past the
age of 40 in the wild, but
can reach 60 in captivity.
HPI meeting
The chimp and us
25 April 2006
- The chimpanzee genome was sequenced to approximately four-fold coverage (error
rate < 10-4)
- WGS sequencing approach (-> problem for the assembly of region with segmental
duplication): ~22.5 millions of sequence reads to assemble.
 2 assembly approaches (PCAP* and ARACHNE)
- In one* of the 2 approaches, contigs were assembled using the human genome as a
guide  "humanized" in their construction.
 some sequences, such as insertions, deletions, and gene duplications, may not be
accurately represented by the current chimpanzee assembly.
chromosomal fusion in the human lineage ?
-NCBI has adopted the NEW chimpanzee
chromosome naming system as proposed by
McConkey, 2004
- The UCSC-Genome browser currently uses the
original chimpanzee chromosome naming
system.
Humanness
- Bipedalism
- Large cranial capacity (Brain size)
- Advance brain development (langage capability)
- A long generation time
- and some other ‘biomedical’ differences….
Chimps expressed
apoE4 allele
Chimps: no acne, rhinitis but no asthma, no rheumatoid arthritis
Olson et al., (2002)
-The last common acestor of humans and
chimp is believed to have walked on 4 legs.
-The oldest fossils that resemble bipedal
human are 6 to 7 millions years old.
- DNA sequence analyses suggest the 2
lineages separated about 5.4 millions years
ago.
Short time since human-chimp split:
it is likely that a few mutations of large effects are
responsible for part of the differences.
Comparative genomic analysis
Human vs mouse, chick…: focus on similarities
Human vs chimp: focus on differences
Hypothesis to account for the evolution of humanness traits
Quantifying the sequence divergence:
Single nucleotide subtitutions: 1.23% (1, 78% for chromosome Y)
(0.8 % in protein coding region)
Indels: ~1.5 %
Transposable elements: 3 %
Recent duplication of DNA segments: 2.7 %
~ 35 mo nucleotides differences
~ 5 mo indels
Many chromosomal rearrangements
Human: 3.4 109 bp; Chimp: 3.6 109 bp
~ 35 mo nucleotides differences
‘Since we apparently diverged from a common ancestor 6 million
years ago, that is roughly 6 mutations per year that get fixed within
the genome (or 3 per year if you divide them equally amongst the 2
branching species). Given a conservative estimate of average
generational time of 10 years, this means that 30 new mutations had
to be fixed within the population every generation.
The current human mutation rate is around 3 or 4 mutations per
organism.’
http://www.uncommondescent.com/index.php/archives/875
HPI meeting
The chimp and us
25 April 2006
At the genome level
1) Structural variations
chromosomal fusion in the human lineage ?
A genome-wide survey of structural variation between human and chimpanzee
(Newman et al., (2005))
- Approach: Mapping chimp fosmid against human reference sequence and
identifying discordant regions by size and orientation
- Limitations:
The human genome is not complete
The chimp genome = 1 individual (! Inter/intraspecific differences !)
-Identification of 651 regions of putative structural variation between the
human genome assembly and the single chimp individual (293 chimp
deletions, 184 chimp insertions and 174 inversions/duplicative transpositions).
- Chromosome Y is the most rearranged chromosome between human and
chimp (! Repetitive regions !)
- They have identified 245 (RefSeq) genes that may be affected by the structural
differences between chimp and human (drug detoxification, receptors,
reproduction)
(Newman et al., (2005))
At the genome level
1) Structural variations
2) Segmental duplication
Segmental duplication (impact: 2.7 %)
Longer than 20 kilobases (-> 300 kb), greater than 94 %
sequence identity
- 33% of human duplicated segments are human specific
- 17 % of chimp duplication are chimp specific.
Half of the genes in the human specific duplicated regions
exhibit significant differences in gene expression relative
to chimp and are most often upregulated.
Cheng et al., Nature (2005)
About 300 region were identified where the human genome
showed significant increase in copy number when compared to
chimp.
‘Only’ 92 regions where the chimp genome showed an increase in
copy number compared to human (but with higher rate of
duplication)
Cheng et al., Nature (2005)
Example: 4 human regions represented ~ 400 x in chimp genome
(99.2% identity)
Cheng et al., Nature (2005)
At the genome level
1) Structural variations
2) Segmental duplication
3) Interspersed/Transposable repeats
-The human genome is composed of ~ 45 % of interspersed elements
Including:
Long interspersed elements (LINEs); these encode a reverse transcriptase
Short interspersed elements (SINEs); these include Alu repeats
-The human genome contains about 1,000,000 Alu elements.
- Found only in primates .
Interspersed/Transposable element insertions (impact 3 %)
- endogenous mutagens which can alter genes, promote genomic
rearrangements…
- may help to drive the speciation of organisms
- Particular interest in recently mobilized transposons
- The transposons that inserted into human or chimp genome during the passed 6
mo years would be expected to be present in only one of the 2 genome.
~11’000 ‘recent’ transposons copies that are differentially present in
human/chimp:
73 % found in human and 27 % found in chimp
Interspersed/Transposable element insertions
Endogenous retrovirus
Mills et al., Am. J. Hum. Genet., 78:671-679, 2006
Interspersed /Transposable element insertions
- Alu, L1 and SVA
insertions accounted for
> 95% of the insertion in
both species
SVA: composite element (1.5-2.5
kb) (2 Alu, a tandem repeat and a
region derived form HERV-k)
- Human and chimp
have amplified
different subfamilies
of these elements.
Human have supported higher
levels of transposition than
chimp during the past several
million years
(but…not the case for the baboo
which shows an activity 1.6 fold
higher than human -> general
decline in Alu activity in chimp)
Blat
human DNA vs chimp DNA
AJ271736
Xq pseudoautosomal
Interspersed /Transposable element insertions
- 34 % of the insertions were located within known genes during the
evolution of human and chimp
Interspersed /Transposable element insertions - conclusions
- The original set of transposons in the common ancestor of
human and chimp behaved differently during the subsequent
evolution of the 2 organisms
- Human received at least 4’800 additional transposon insertions
compared to chimp -> impact of transposon mutagenesis is likely
to be greatest in human during the past several million years.
- Human and chimp have amplified different subfamilies of these
elements.
- Factors such as differences in population size may also have
influence the pattern of transposon insertion.
At the sequence level
(coding sequence level)
Nucleotide divergence: 1.23 %
14-22 % of these differences are due to polymorphism
-> fixed divergence rate = ~1.06 %
Chromosome X: ~0.94 %
Chromosome Y: ~1.9 %
Higher mutation rate in the male compared with female germ line
(higher number of cell division (5 to 6 fold))
At the gene level:
13’454 pair of orthologous genes
(507 Swiss-Prot, 1134 TrEMBL: 1641) (NCBI: 3111)
- 29 % are 100 % identical
- 5% with in-frame indel (mainly in repetitive region)
A classical measure of the overall evolutionary constraint on a gene
KA: non-synonymous substitution rate in coding sequence
KB or Ks synonymous substitution rate in coding sequence
Kl: substitution rate in non-coding sequence
KA/KB << 1: typical of most proteins where change is detrimental (negative selection)
KA/KB > 1: for the rare protein for which it is a positive selection
About 500 genes with a KA/KB > 1
Most of the genes with a KA/KB > 1 are not involved in process
related to supposed humanness.
Genes with highest KA/KB ratio are mostly related to hostpathogen interaction, immunity and reproduction
(pattern also found in other mammals (cf Valeria’s work on
human/mouse orthologs)
In fact genes related to brain function and neuronal
activities show lower-than-average KA/KB ratio
- Neural genes, as a group, have much lower average of
KA/KB ratio than genes expressed outside of the brain.
Hypothesis: only a small subset of genes may be the
target of positive selection: not visible in such type of
studies.
(Hill, Walsh (2005))
Example 1: FOXP2
- gene relevant for the human ability to develop language
- among the 5% most conserved protein
-CC
-CC
-CC
-CC
-CC
-CC
-CC
-CC
-!- DISEASE: Defects in FOXP2 are the cause of speech-language
disorder 1 (SPCH1) [MIM:602081]; also known as autosomal dominant
speech and language disorder with orofacial dyspraxia. Affected
individuals have a severe impairment in the selection and
sequencing of fine orofacial movements, which are necessary for
articulation. They also show deficits in several facets of
language processing (such as the ability to break up words into
their constituent phonemes) and grammatical skills.
- Extremely conserved among mammals
- Acquired 2 aa changes in the human lineage (T303N and N325S),
including one potential/functional phosphorylation site (N325S)
-Estimation: fixation of these mutations occurs during the last
200’000 years of human history, concomitant with of subsequent to
the emergence of anatomically modern humans.
Enard et al., Nature (2002)
BUT:
- no aa substitution are shared between song-learning birds, vocal
learning whales, dolphins and bats, and human, …
AND…
- during times of song plasticity, FoxP2 is upregulated in a striatal
region esssential for song learning.
- selection acted on large non-coding regulatory regions of FoxP2 ???
- duplication of the chromosomal region (27 genes including FoxP2)
may be another cause of speech and language disturbance ???
Less-is-more hypothesis
Loss of function changes (lack of body hair, preservation of
juvenile traits, expansion of the cranium) could be caused by
non-synonymous substitutions, indels, loss of coding regions
and deletions of entire genes.
-> 53 human genes with disruptive indels in the coding
regions (compared to chimp)
Well documented examples of human specific pseudogenization
- MYH16, CMAH, CASP12, ELN, T2R62P (bitter taste receptor),
MBL1
- Microcephalin (MCPH1)
Challenge: dating the event !
MYH16
Myosin gene mutation (MYH16) correlates with anatomical
changes in the human lineage inactivated by a frameshifting
mutation after the lineages leading to humans and
chimpanzees diverged (~2.4 Myr).
The gene is transcribed (-> the coding sequence deletion was
not preceeded by a mutation in a transcriptional control
domain). Expressed only in masticatory muscles in other
mammals.
Loss of this protein isoform is associated with marked size
reductions in individual muscle fibres and entire masticatory
muscles.
Nature 428, 415-418 (2004)
Phylogenetic reconstruction for all human sarcomeric myosin genes (heavy chain), showing
early divergence of MYH16 from others.
Nature 428, 415-418 (2004)
Aligned DNA sequences for MYH16 exon 18 representing seven non-human primate species and
six geographically dispersed human populations, revealing the effect of frameshift on reading
frame and deduced amino acid sequence. Note stop codon at position 72−74.
Nature 428, 415-418 (2004)
The findings on the age of the inactivating mutation in the MYH16 gene raise the
intriguing possibility that the decrement in masticatory muscle size removed an
evolutionary constraint on encephalization, as suggested by the anatomy of the
muscle attachments relative to the sutures -> marked increase in cranial capacity.
Nature 428, 415-418 (2004)
But:
Human encephalization -> obstetric constraints associated with
pelvic dimensions for bipedality
Importance of the genes that control the development of brain
size in mammals: ASPM, MCPH1, CASP3 …
Have undergone accelerated rates of protein evolution
Strong positive selection at several loci
McCollum et al., (2006), J. of Human Evolution, 50, 232-236
MCPH locus (microcephalin)
MCPH1 -> MCPH6
MCPH5/ASPM locus (abnormal spindle microcephaly)
- High KA/KB ratio
- Patients with loss-of-function in microcephalin have cranial
capacities about 4 SD below the mean at birth and ~1/3 of the size
as adult.
- May control the proliferation and/or differenciation of neuroblasts
during neurogenesis.
- Continues its trend to adaptive evolution
- Ex: APSM acquires an advantageous aa change every 350’000
Evans et al., Nature 2005
years.
Pseudogenization
CMAH
Alu-mediated sequence replacement -> inactivation of the enzyme
CMP-N_acetylneuraminic acid hydrolase in human
This mutation occurred after our last common ancestor with bonobos
and chimpanzees, and before the origin of present-day humans (~2.8
mya )
-> susceptibility or resistance to certain microbial pathogens (host
receptors).
Chou et al., 2005
Pseudogenization
CASP12
Functional gene in all mammals except human.
Mediator of apoptosis in response to perturbed calcium homeostasis
-> loss of this gene in mice increases resistance in to amyloid-induced
neuronal apoptosis (-> Alzheimer in human ?)
-> loss of this gene seems also to confer resistance to severe sepsis
Last but not least…
HPI meeting
The chimp and us
25 April 2006
Nature (2005)
Comparative analysis of cancer genes in the human and
chimpanzee genomes
- The incidence of cancer in non-human primates is very low.
-All examined human cancer genes (n=333) are present in
chimpanzee, contain intact open reading frames and show a high
degree of conservation between both species (99.38%)
Blat of P53_Human vs chimp
Ex: Pro-72 is
polymorphic
only in human
- Sequencing of the BRCA1 gene has shown an 8 Kb deletion in the
chimpanzee sequence that prematurely truncates the co-regulated
NBR2 gene.
Puente et al., (2006)
Transcriptome evolution (and epigenetics events)
Changes in gene usage may be a primary contributor to the
differences in chimp and human brains.
10% of all genes expressed in the brain differ in their expression
levels between humans and chimps…but no causative connection
found…
Several studies, but none with convincing results
Heissig et al., 2005
Khaitovich et al., 2005
Swiss-Prot annotation
Existence of orthologs
DEF7_PANTR
MISCELLANEOUS: The human orthologous protein seems not
to exist, its coding region does not have a start codon.
Pseudogenization (when documented)
T2R64_PANPA
MISCELLANEOUS: The human and chimpanzee orthologous
proteins do not exist, their genes are pseudogenes.
That’s all folk !