Download Comparative analysis of processed pseudogenes in the mouse and

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
The DNA Sequence of chimpanzee
chromosome 22 and comparative
analysis with its human ortholog,
chromosome 21
Bioinformatics
Dae-Soo Kim
MPL
Comparative analysis of Human and
chimpanzee genome
Human-chimp comparative genome research is
essential for narrowing down the genetic change
involved in the acquisitions of unique human
features
 We report the high quality DNA sequence of
33.3Mb of chimpanzee chromosome 22.
 1.44% of the chromosome consisted of single base
substitutions in addition to nearly 68,000 INDEL
 83% of the 231 coding sequence show difference at
the amino acid sequence level.

BIOINFORMATICS
MPL
Introduction
 Estimates
of nucleotide substitution rates of
aligned sequences were quite ranging from
1.23% by BAC end sequencing to about 2%
by molecular analysis
 Molecular analysis of HSA21 and its genes
is of central medical interest because of
trisomy 21, the most common genetic cause
of metal retardation in the human population.
BIOINFORMATICS
MPL
Mapping, sequencing and global view of
chimpanzee chromosome 22
 Genomic
DNA origination from three male
chimpanzee individuals.
 Sequence coverage of the euchromatic
potion of the long arm of chromosome 22 is
98.6%.
 Accuracy was calculated as 99.99% from the
overlap clone sequence
BIOINFORMATICS
MPL
Overall differences
The overall structural features of PTR22 are almost
the same as those of HSA21.
 About a 400kb or 1.2% difference in size with
HSA21 being larger then PTR22 (ISRs;53.7% and
simple repeats;9.54%)
 The pericentromeric copy of a 200kb region found
duplicated in HSA21 is missing in PTR22
 We also detected apparently human specific
sequences (first intron PFKL of HSA21a)

BIOINFORMATICS
MPL
Two large indel hot spots werw found around
9.5~11.5Mb and 16.5~17.5Mb from the centromere
 We found large human insertion/chimpanzee
deletions in the first introns of the
NCAM2(~10kb)and GRIK1(~4kb)
(Neural functions)
 One of the largest structural changes identified here
is a 54kb region located at 11.4Mb from the
centromere in HSA21 but absent in PTR22.(flanked
by HSAT5 satellite repeat and consists of 164
fragments from 64 different LTR)

BIOINFORMATICS
MPL
HSA21q
Size (bp)*1
PTR22q
33,127,944
32,799,845
25,242
101,709
14
22
3
2
73,108
74,311
G+C%
40.94%
41.01%
CG dinucleotide
361,259
358,450
Unaligned s ites *2
# of s equencing gaps
# of clone gaps *3
Es timated total clone gap s ize
CpG is lands
950
Nucleotide divers ity
Repeats
885
0.072%
0.14%
bp
#
bp
#
3,649,153
15,137
3,614,825
15,048
21,557
75
2,606
10
5,853,821
8,737
5,736,911
8673
82,493
48
78,657
55
3,621,501
7,282
3,550,807
7,180
949,215
3,363
945,129
3,350
8,830
100
8,722
99
Satellite
19,327
21
14,773
18
Others
30,452
38
34,776
43
14,132,299
34,678
13,905,943
34,411
SINEs
Young Alus *4
LINEs
Young L1s *5
LTRs
Trans pos ons
RNAs *6
Total
42.7%
42.4%
*1 Size of the contig data after the s ite where the firs t bas e of the PTR22q contig is aligned
*2 Regions extended into HSA21q clone gaps and s ubtelomeric unmatched regions
*3 Excluding pericentromeric and s ubtelomeric gaps
*4 AluYa5, AluYa8, AluYb8 and AluYb9
*5 L1HS and L1PA2
*6 s nRNA,
s cRNA, 5S rRNA, tRNA, 7SL RNA and other s mall RNA genes
BIOINFORMATICS
MPL
Base substitutions
 The
overall nucleotide substitution level in
aligned regions between PTR22 and HSA21
is about 1.44%(excluding INDEL)
 The most conserved region was around
12.5Mb corresponding to the distal boundary
region of the gene desert.
BIOINFORMATICS
MPL
BIOINFORMATICS
MPL
Repetitive elements
 HSA21
is about 1.2% longer in size than
PTR22
 Five LTR subfamilies LTR are more
abundant in HSA21
 All MER4A1-int and MER83B-int elements
are specific to HSA21
 All of the seven AluYb9’s found in HSA21
and the one in PTR22 are lineage specific
 Although the AluYa8 subfamily is though to
be a recent derivative of AluYa5
MPL
BIOINFORMATICS
Lineage specific insertions and deletions
 We
identified about 68,000 INDEL is total
 Greater than 99% of the INDELs were
shorter than 300bp
 These site should be produced either through
h-ins/p-dels or p-ins/h-dels
 We tested 567 INDEL larger than 300bp in
size using DNA samples from 5
human ,5chimpanzee ,1 gorilla, 2 orangutan
 Insertions being mostly produced by the
integration of Alu and L1 elements MPL
BIOINFORMATICS
400
350
300
250
200
150
100
50
0
50
BIOINFORMATICS
100
150
200
250
300
350
400
450
500
MPL
Lineage specific insertion
Lineage specific deletion
BIOINFORMATICS
MPL
70
60
50
40
30
20
HSA21q insertion
PTR22q insertion
10
HSA21q deletion
PTR22q deletion
0
2.4
251
BIOINFORMATICS
2.6
398
2.8
631
3
1000
3.2
1585
3.4
2512
3.6
3981
MPL
 Deletions
not being related to particular
repetitive structures except for a few cases.
 We found that most of the insertions 300350bp in length were members of AluY
family in both chromosome
 Between 370-1000bp only a smaller number
of insertions mostly L1 and LTR
 We observed that the distribution of newly
integrated Alu are quit different between
HSA21 and PTR 22 (HSA21; 56% high
G+C ,PTR22;70% low G+C)
MPL
BIOINFORMATICS
Unlike the insertion, deletions do not exactly
correspond to any ISR elements, indicating that
deletion events are independent of ISRs.
 The deletion of these elements may have also been
generated by homologous recombination between
these relatively short identical or similar flanking
segments.
 HSA21 gained 32kb but lost 39kb while PTR22
gained 25kb and lost 53kb
(INDEL 300~5000bp)
 PTR 22 has suffered more losses than HSA21 since
speciation.

BIOINFORMATICS
MPL
 A neighbor
joining analysis show that such
AluY elements can be largely separated into
chimp and human groups as expected
(AluY was inserted after speciation)
 Humans seem to have experienced such
expansions more frequently and more
recently than chimp
BIOINFORMATICS
MPL
HSA21 120.AluY
PTR22 033.AluY
PTR22 097.AluY
PTR22 075.AluY
PTR22 063.AluY
PTR22 140.AluY
PTR22 058.AluY
PTR22 153.AluY
PTR22 069.AluY
PTR22 147.AluY
PTR22 096.AluY
PTR22 010.AluY
58
96
HSA21 211.AluY
PTR22 192.AluY
HSA21 172.AluY
HSA21 197.AluYa5
HSA21 121.AluYa5
HSA21 045.AluYa5
HSA21 216.AluYa5
HSA21 017.AluYa5
HSA21 131.AluYa5
HSA21 166.AluYa5
60
52
54
54
83
PTR22 098.AluY
HSA21 215.AluY
96
HSA21 201.AluY
HSA21 148.AluY
HSA21 188.AluY
HSA21 132.AluY
HSA21 106.AluY
69
85
75
83
BIOINFORMATICS
HSA21 208.AluY
HSA21 218.AluYb8
HSA21 018.AluYb8
HSA21 034.AluYb8
64
HSA21 174.AluYb9
HSA21 135.AluYb9
HSA21 020.AluYb8
54
HSA21 036.AluYb8
HSA21 025.AluYb8
HSA21 187.AluYb8
HSA21 206.AluYb8
65
HSA21 076.AluYb8
HSA21 013.AluYb8
HSA21 168.AluYb8
HSA21 244.AluYb8
HSA21 213.AluY
PTR22 082.AluY
HSA21 153.AluY
MPL
Gene catalogue and structural
characterization of coding sequences





We have annotated 284 protein coding genes and 98
pseudogenes for HSA21 and 272 genes and 89 pseudogenes
for PTR22
All the conserved pseudogenes showed the same size except
for KRTAP21P1 which is non processed in HSA21 but
processed in PTR22
Six HSA21 genes showing hallmarks of retrogenes were not
found in PTR 22 and are likely to have inserted during
human evolution (H2BFS;histon family S,5 keratin
associated protein)
The minimum nucleotide sequence identity is
83%(KRTAP6-3) and the maximum is 100%
We compared the human and chimp coding sequences in
231 genes (omitted 41)
BIOINFORMATICS
MPL
Among the 231 genes associated to a canonical
ORF 179 show a coding sequence of identical
length in human and chimpanzee and exhibit
similar intron-exon boundaries
 39genes shown an identical amino acid and
nucleotide sequence between human and chimp
(biological process 5, metabolic enzymes 5, signal
transduction 8, protein folding 2)
 One hundred and forty out of these 179 genes show
amino acid replacements but no gross structural
changes and expected.

BIOINFORMATICS
MPL
Ka/Ks analysis
 10%
of the genes had Ka/Ks rations >1 with
the highest value being 3.37 for the human
hair keratine associated protein
 Relatively rapidly evolving genes may be
estimated from Ka, Ka+Ks or just nucleotide
divergence values.
(3 KRTAP gene, KCNE1; potassium channel
protein ,TCP10L;complex protein,
B3GALT5;galctocyltransferase,IGSF5;immu
noglobulin)
BIOINFORMATICS
MPL
Promoter analysis
 Computation
analysis of the transcription
factor binding site within the l-kb upstream
region of each gene.
 All of the specific TFBSs were caused by
base substitution in either human or
chimpanzee
 These may mot clearly account for the
expression changes observed in this study
BIOINFORMATICS
MPL
Red: TF binding sites found only in human
Blue: TF binding sites found only in chimpanzee
Yellow: TF binding sites common in huamn, chimpanzee and mouse
Grey: TF binding sites common in human and mouse.
MPL
BIOINFORMATICS Position 1 locates 1000 bases upstream from the coding sequence of
gene
BIOINFORMATICS
MPL
Conclusion

This study shows for the first time a chromosome
wide comparison between human and chimpanzee
using high quality sequence.
BIOINFORMATICS
MPL
Related documents