Download Lazarus and doppelganger genes

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

X-inactivation wikipedia , lookup

RNA interference wikipedia , lookup

Genetically modified crops wikipedia , lookup

Copy-number variation wikipedia , lookup

Epistasis wikipedia , lookup

Long non-coding RNA wikipedia , lookup

Transposable element wikipedia , lookup

Human genome wikipedia , lookup

Epigenetics of diabetes Type 2 wikipedia , lookup

Oncogenomics wikipedia , lookup

Metagenomics wikipedia , lookup

NEDD9 wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Gene therapy wikipedia , lookup

Genetic engineering wikipedia , lookup

Epigenetics of neurodegenerative diseases wikipedia , lookup

Quantitative trait locus wikipedia , lookup

Public health genomics wikipedia , lookup

Gene nomenclature wikipedia , lookup

Polycomb Group Proteins and Cancer wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Gene desert wikipedia , lookup

Essential gene wikipedia , lookup

Pathogenomics wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Gene expression programming wikipedia , lookup

RNA-Seq wikipedia , lookup

Genomic imprinting wikipedia , lookup

History of genetic engineering wikipedia , lookup

Gene wikipedia , lookup

Genome evolution wikipedia , lookup

Ridge (biology) wikipedia , lookup

Biology and consumer behaviour wikipedia , lookup

Minimal genome wikipedia , lookup

Epigenetics of human development wikipedia , lookup

Genome (book) wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Designer baby wikipedia , lookup

Microevolution wikipedia , lookup

Gene expression profiling wikipedia , lookup

Transcript
LAZARUS AND
DOPPELGANGER GENES
Creation Research Society Conference
Dr. Matthew Cserhati and Bendeguz Levente Szuk
Ann Arbor, MI
July 28-30, 2016
Introduction
• There are 1010-1012 different gene familes (Choi, 2006)
• If an average protein is 267 amino acids (801 bp)
(Brochieri, 2005), then there are 4801 ~ 10160 different
possible gene sequences
• Ratio of possible gene sequences to gene families: 10160/1012 =
10148
Introduction
• According to François Jacob, no two evolutionary
trajectories are the same
• Meaning the same species doesn’t evolve twice
• Thus, in the world of genes, the same gene should also
not evolve twice
• Very unlikely, since as we saw (Cserhati, 2015, CRS conference),
the probability of a random gene sequence being functional is 10342
Lazarus species
Lazarus genes
Examples of Lazarus genes
Gene name
Species distribution
Function
Reference(s)
Aristaless
N. vectensis,
vertebrates
Homeobox gene, craniofacial McGonnell, 2010
development
Wnt-2,3,4,8,11
N. vectensis,
vertebrates
secreted signalling
molecules in development
Dickkopf
C. capillata, vertebrates cellular differentiation
regulator
BMP
Suberites (worm),
vertebrates
OR genes
H. magnipapillata,
cephaolocordates,
vertebrates
cytovec
N. vectensis,
vertebrates
Lazarus genes
Kusserow, 2005;
Miller, 2005
Yang, 2003
Determination of
Müller, 2003
dorsoventral axis of
mesoderm
Olfaction, cellular migration Churcher, 2011
Intermediate filemant
Zimek, 2012
How many Lazarus genes are there?
• Putnam et al. (2007) discovered that there are 13,830
human genes and 12,319 genes from Nematostella
vectensis which are present in the eumetazoan gene set
• 7,309 present in Drosophila (fruit fly)
• 7,261 in Caenorhabditis elegans
(worm)
• This means that around 5,000
(12.3k – 7.3k) genes were lost
in the sea anemone and reappeared
in human
Nematostella vectensis –
starlet sea anemone
In summary
• Some evolutionists may say that genes could have been
deactivated throughout long evolutionary time periods and
resurfaced later on
• Survival as pseudogenes
• But why keep a functionless gene for so long?
• Sequence erosion would occur rapidly
Doppelganger genes - introduction
The CaMKII gene
Arabidopsis thaliana – thale cress
CaMKII
• Average similarity between thale cress, yeast, human, fly,
and worm is 54%
• Boundary for sequence homology is 40%
• Twilight zone of sequence homology
• Plays a role in long-term memory in vertebrates
• Obviously plays a role in invertebrates like worms
• Also yeast, plants
• Obvious question is, why does a protein occur in both
plants and animals with a similar sequence ?
Other Doppelganger genes
Doppelganger genes
CaMKII
Vertabrates,
planarian, yeast,
thale cress
neural signal
transmission
Mineta, 2003
APX1
Plants, Hydra,
trypanosomes
ascorbate
peroxidase, plays
role in oogenesis
Habetha, 2005
cADPR
Euglena, sponges ADP-ribosyl cyclase
Puce, 2004
Other Doppelganger genes
• Technau et al. (2007) discovered 56 Nematostella and 44
Acropora* proteins, which are shared uniquely with nonmetazoan organisms
• plants, fungi, protists, and prokaryotes
• e-score of <1e-10
• At an e-score of 1e-4, 2.7% of Nematostella and 2.5% of
Acropora proteins were shared uniquely with nonmetazoans
• *another anemone species
Horizontal gene transfer and
Lazarus/doppelganger genes
• Evolutionists could say that Lazarus and doppelganger
genes are due to horizontal gene transfer (HGT)
• Highest proportion of HGT genes are in bdelloid rotifers
• 10% of transcripts
• In more complex organisms HGT is less likely
• More cellular boundaries to pass through
• Cell wall, plasma membrane, nuclear membrane
Horizontal gene transfer and
Lazarus/doppelganger genes
• HGT between bacteria and eukaryotes would have to
introduce introns into the gene
• No biological mechanism is known which does this
• Whatever foreign gene is found within an organism not
caused by HGT may be a Lazarus/doppelganger gene
• Whatever gene is not a Lazarus or doppelganger gene
could be horizontally transferred
How many Lazarus/doppelganger genes
are there?
• Filter 1: N. vectensis genes were BLASTed against
sequences from 10 different large taxonomic categories
• humans, rodents, mammals, vertebrates, invertebrates, plants,
Fungi, Bacteria, Archaea, viruses
• e-score: 1e-20 (strict criterion)
• Filter 2: When BLASTed against invertebrates
• e-score: 1e-4 (lax criterion)
• This is to filter out all possible hits with any invertebrate species
and to ensure that the gene occurs only in N. vect. and the other
category (gray area in figure)
• Genes also filtered if they were found in bacteria or viruses which
may serve as vectors
Animals
=> Disjunct groups
Filtering
Number of sequences
Number of
Nematostella hits
(E<1e-20)
Vertebrates
17,862
8,220
995
959 (5.4%)
Mammals
19,846
9,128
1,161
1,103 (5.6%)
Rodents
26,339
12,193
2,083
2,002 (7.6%)
Human
20,204
12,372
2,196
2,117 (10.5%)
Invertebrates
26,050
14,785 [e-score: 1e-4]
0
0
Plants
39,177
4,285
325
302 (0.8%)
Fungi
31,585
4,447
387
369 (1.2%)
Bacteria
332,193
2,281
411
397 (0.1%) [single filter]
Archaea
19,357
927
79
76 (0.4%) [single filter]
Viruses
16,602
2,405
127
0
Taxonomic group
Non-invertebrate
Non-HGT genes (after
Nematostella hits (E<
bacterium+virus
1e-5)
double filter)
Common Lazarus/doppelganger genes
Lazarus genes in human
• The Human Genome Sequencing Consortium discovered
223 genes which were possible HGT genes between
bacteria and humans
• No similarity to any other nonvertebrate eukaryote
• Lander et al. (2001) found 113 genes which had no
homologs in non-vertebrate eukaryotes
• Thought to be result of HGT but had introns
• Salzberg et al. (2001) found only 40 of them to be
genuine HGT
• Stanhope (2001) found that only 28 of the 113 genes
studied by Lander were genuine HGT
Lazarus genes in human
• The data from these study groups was re-analyzed by
Crisp et al. (2015) and found 363 genes
• From the 365 genes rejected as HGT by Stanhope,
Salzberg and Crisp, 94 genes were rejected as HGT by
all groups
• Also found members of 12 gene families with at least 3
genes which were hypothetically transferred from
prokaryotes to humans
• Is it possible to transfer a large number of members of a
gene family between bacteria and humans by random
chance during HGT?
Note
• P-values were calculated by the hypergeometric
distribution: p = Hypergeometric(N,M,n,k), where
• N, the population size is 60,620 human ENSG identifiers present in
the Ensembl database,
• M = the 363 human non-HGT genes found by Crisp et al.,
• n = the size of the gene family under consideration, and
• k = the number of genes from the specific gene family which were
found in human.
HGT genes between bacteria and human
with multiple members
Gene family
Number of genes
transferred
Number of genes in family
p-value
acyl-CoA synthetase
7
28
2.83e-10
arylsulfatase
3
14
7.38e-5
cytochrome P450, family 26
3
3
2.17e-7
GTPase, IMAP family member
6
10
9.38e-12
hyaluronan synthase
3
4
8.61e-7
monoamine oxidase
4
4
1.29e-9
Na/glucose (and other solutes) cotransporter
3
4
8.61e-7
N-acetyltransferase (arylamine Nacetyltransferase)
3
4
8.61e-7
peptidyl arginine deiminase
4
5
6.43e-9
PRAME protein
12
41
1.26e-17
retinol binding protein
3
7
7.4e-6
solute carrier family 37 (glucose-6-phosphate
transporter)
5
5
7.7e-12
Conclusion
• The appearance, disappearance, and then subsequent
re-appearance of a gene is highly unlikely according to
evolution, given the huge size of the gene sequence
universe
• Yet Lazarus and doppelganger genes seem to appear
abundantly in nature
• Just like living fossil species
• There are at least 94 which occur between bacteria and
human
• Unique or ORFan genes are specific to species, but
Lazars/doppelganger genes may also be common
• Also a case of mosaicism
Thanks for your attention!