* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Lazarus and doppelganger genes
X-inactivation wikipedia , lookup
RNA interference wikipedia , lookup
Genetically modified crops wikipedia , lookup
Copy-number variation wikipedia , lookup
Long non-coding RNA wikipedia , lookup
Transposable element wikipedia , lookup
Human genome wikipedia , lookup
Epigenetics of diabetes Type 2 wikipedia , lookup
Oncogenomics wikipedia , lookup
Metagenomics wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Gene therapy wikipedia , lookup
Genetic engineering wikipedia , lookup
Epigenetics of neurodegenerative diseases wikipedia , lookup
Quantitative trait locus wikipedia , lookup
Public health genomics wikipedia , lookup
Gene nomenclature wikipedia , lookup
Polycomb Group Proteins and Cancer wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Gene desert wikipedia , lookup
Essential gene wikipedia , lookup
Pathogenomics wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Gene expression programming wikipedia , lookup
Genomic imprinting wikipedia , lookup
History of genetic engineering wikipedia , lookup
Genome evolution wikipedia , lookup
Ridge (biology) wikipedia , lookup
Biology and consumer behaviour wikipedia , lookup
Minimal genome wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Genome (book) wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Designer baby wikipedia , lookup
LAZARUS AND DOPPELGANGER GENES Creation Research Society Conference Dr. Matthew Cserhati and Bendeguz Levente Szuk Ann Arbor, MI July 28-30, 2016 Introduction • There are 1010-1012 different gene familes (Choi, 2006) • If an average protein is 267 amino acids (801 bp) (Brochieri, 2005), then there are 4801 ~ 10160 different possible gene sequences • Ratio of possible gene sequences to gene families: 10160/1012 = 10148 Introduction • According to François Jacob, no two evolutionary trajectories are the same • Meaning the same species doesn’t evolve twice • Thus, in the world of genes, the same gene should also not evolve twice • Very unlikely, since as we saw (Cserhati, 2015, CRS conference), the probability of a random gene sequence being functional is 10342 Lazarus species Lazarus genes Examples of Lazarus genes Gene name Species distribution Function Reference(s) Aristaless N. vectensis, vertebrates Homeobox gene, craniofacial McGonnell, 2010 development Wnt-2,3,4,8,11 N. vectensis, vertebrates secreted signalling molecules in development Dickkopf C. capillata, vertebrates cellular differentiation regulator BMP Suberites (worm), vertebrates OR genes H. magnipapillata, cephaolocordates, vertebrates cytovec N. vectensis, vertebrates Lazarus genes Kusserow, 2005; Miller, 2005 Yang, 2003 Determination of Müller, 2003 dorsoventral axis of mesoderm Olfaction, cellular migration Churcher, 2011 Intermediate filemant Zimek, 2012 How many Lazarus genes are there? • Putnam et al. (2007) discovered that there are 13,830 human genes and 12,319 genes from Nematostella vectensis which are present in the eumetazoan gene set • 7,309 present in Drosophila (fruit fly) • 7,261 in Caenorhabditis elegans (worm) • This means that around 5,000 (12.3k – 7.3k) genes were lost in the sea anemone and reappeared in human Nematostella vectensis – starlet sea anemone In summary • Some evolutionists may say that genes could have been deactivated throughout long evolutionary time periods and resurfaced later on • Survival as pseudogenes • But why keep a functionless gene for so long? • Sequence erosion would occur rapidly Doppelganger genes - introduction The CaMKII gene Arabidopsis thaliana – thale cress CaMKII • Average similarity between thale cress, yeast, human, fly, and worm is 54% • Boundary for sequence homology is 40% • Twilight zone of sequence homology • Plays a role in long-term memory in vertebrates • Obviously plays a role in invertebrates like worms • Also yeast, plants • Obvious question is, why does a protein occur in both plants and animals with a similar sequence ? Other Doppelganger genes Doppelganger genes CaMKII Vertabrates, planarian, yeast, thale cress neural signal transmission Mineta, 2003 APX1 Plants, Hydra, trypanosomes ascorbate peroxidase, plays role in oogenesis Habetha, 2005 cADPR Euglena, sponges ADP-ribosyl cyclase Puce, 2004 Other Doppelganger genes • Technau et al. (2007) discovered 56 Nematostella and 44 Acropora* proteins, which are shared uniquely with nonmetazoan organisms • plants, fungi, protists, and prokaryotes • e-score of <1e-10 • At an e-score of 1e-4, 2.7% of Nematostella and 2.5% of Acropora proteins were shared uniquely with nonmetazoans • *another anemone species Horizontal gene transfer and Lazarus/doppelganger genes • Evolutionists could say that Lazarus and doppelganger genes are due to horizontal gene transfer (HGT) • Highest proportion of HGT genes are in bdelloid rotifers • 10% of transcripts • In more complex organisms HGT is less likely • More cellular boundaries to pass through • Cell wall, plasma membrane, nuclear membrane Horizontal gene transfer and Lazarus/doppelganger genes • HGT between bacteria and eukaryotes would have to introduce introns into the gene • No biological mechanism is known which does this • Whatever foreign gene is found within an organism not caused by HGT may be a Lazarus/doppelganger gene • Whatever gene is not a Lazarus or doppelganger gene could be horizontally transferred How many Lazarus/doppelganger genes are there? • Filter 1: N. vectensis genes were BLASTed against sequences from 10 different large taxonomic categories • humans, rodents, mammals, vertebrates, invertebrates, plants, Fungi, Bacteria, Archaea, viruses • e-score: 1e-20 (strict criterion) • Filter 2: When BLASTed against invertebrates • e-score: 1e-4 (lax criterion) • This is to filter out all possible hits with any invertebrate species and to ensure that the gene occurs only in N. vect. and the other category (gray area in figure) • Genes also filtered if they were found in bacteria or viruses which may serve as vectors Animals => Disjunct groups Filtering Number of sequences Number of Nematostella hits (E<1e-20) Vertebrates 17,862 8,220 995 959 (5.4%) Mammals 19,846 9,128 1,161 1,103 (5.6%) Rodents 26,339 12,193 2,083 2,002 (7.6%) Human 20,204 12,372 2,196 2,117 (10.5%) Invertebrates 26,050 14,785 [e-score: 1e-4] 0 0 Plants 39,177 4,285 325 302 (0.8%) Fungi 31,585 4,447 387 369 (1.2%) Bacteria 332,193 2,281 411 397 (0.1%) [single filter] Archaea 19,357 927 79 76 (0.4%) [single filter] Viruses 16,602 2,405 127 0 Taxonomic group Non-invertebrate Non-HGT genes (after Nematostella hits (E< bacterium+virus 1e-5) double filter) Common Lazarus/doppelganger genes Lazarus genes in human • The Human Genome Sequencing Consortium discovered 223 genes which were possible HGT genes between bacteria and humans • No similarity to any other nonvertebrate eukaryote • Lander et al. (2001) found 113 genes which had no homologs in non-vertebrate eukaryotes • Thought to be result of HGT but had introns • Salzberg et al. (2001) found only 40 of them to be genuine HGT • Stanhope (2001) found that only 28 of the 113 genes studied by Lander were genuine HGT Lazarus genes in human • The data from these study groups was re-analyzed by Crisp et al. (2015) and found 363 genes • From the 365 genes rejected as HGT by Stanhope, Salzberg and Crisp, 94 genes were rejected as HGT by all groups • Also found members of 12 gene families with at least 3 genes which were hypothetically transferred from prokaryotes to humans • Is it possible to transfer a large number of members of a gene family between bacteria and humans by random chance during HGT? Note • P-values were calculated by the hypergeometric distribution: p = Hypergeometric(N,M,n,k), where • N, the population size is 60,620 human ENSG identifiers present in the Ensembl database, • M = the 363 human non-HGT genes found by Crisp et al., • n = the size of the gene family under consideration, and • k = the number of genes from the specific gene family which were found in human. HGT genes between bacteria and human with multiple members Gene family Number of genes transferred Number of genes in family p-value acyl-CoA synthetase 7 28 2.83e-10 arylsulfatase 3 14 7.38e-5 cytochrome P450, family 26 3 3 2.17e-7 GTPase, IMAP family member 6 10 9.38e-12 hyaluronan synthase 3 4 8.61e-7 monoamine oxidase 4 4 1.29e-9 Na/glucose (and other solutes) cotransporter 3 4 8.61e-7 N-acetyltransferase (arylamine Nacetyltransferase) 3 4 8.61e-7 peptidyl arginine deiminase 4 5 6.43e-9 PRAME protein 12 41 1.26e-17 retinol binding protein 3 7 7.4e-6 solute carrier family 37 (glucose-6-phosphate transporter) 5 5 7.7e-12 Conclusion • The appearance, disappearance, and then subsequent re-appearance of a gene is highly unlikely according to evolution, given the huge size of the gene sequence universe • Yet Lazarus and doppelganger genes seem to appear abundantly in nature • Just like living fossil species • There are at least 94 which occur between bacteria and human • Unique or ORFan genes are specific to species, but Lazars/doppelganger genes may also be common • Also a case of mosaicism Thanks for your attention!