* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Arrowsmith extensions to bioinformatics
Neuronal ceroid lipofuscinosis wikipedia , lookup
Gene therapy of the human retina wikipedia , lookup
Long non-coding RNA wikipedia , lookup
Human genome wikipedia , lookup
Copy-number variation wikipedia , lookup
Non-coding DNA wikipedia , lookup
Transposable element wikipedia , lookup
Epigenetics of neurodegenerative diseases wikipedia , lookup
Public health genomics wikipedia , lookup
Epigenetics in learning and memory wikipedia , lookup
Genetic engineering wikipedia , lookup
Epigenetics of diabetes Type 2 wikipedia , lookup
Minimal genome wikipedia , lookup
Gene therapy wikipedia , lookup
Ridge (biology) wikipedia , lookup
Quantitative trait locus wikipedia , lookup
Metagenomics wikipedia , lookup
Pathogenomics wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Point mutation wikipedia , lookup
Biology and consumer behaviour wikipedia , lookup
History of genetic engineering wikipedia , lookup
Genomic imprinting wikipedia , lookup
Genome editing wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Gene nomenclature wikipedia , lookup
Genome evolution wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Gene desert wikipedia , lookup
Gene expression programming wikipedia , lookup
Genome (book) wikipedia , lookup
Helitron (biology) wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Microevolution wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Designer baby wikipedia , lookup
Arrowsmith extensions to bio-informatics Vetle I. Torvik Discovering new gene sequences Start with a novel DNA sequence find overlapping sequences within the expressed sequence tag (EST) database find others that overlap with that one, until one has identified an entire new full-length gene ATGATAGGAGA GGAGAGCTGAGA TGAGATGCGCTG CGCTGATACTAGA CTAGATGATAGAGATGCC ATGATAGGAGAGCTGAGATGCGCTGATACTAGATGATAGAGATGCC The Arrowsmith approach applied to nucleotide or protein sequences begin with two different sets A and C of sequences that do not overlap search for sequences B in the database that overlap with one or more sequences in both A and C AB1 ATGCTCTCGCGCTACGACTAGCATACTG CCTGATCGCTACTACTAGCTGA CTCGATGAGCGATGATCGCTAGCTATGGG GTGAGGATCGCGATGATGATG B1 ACTGATCGCTAGCTATGA BC1 ATCGACAAGCTATGTGCAACTG TCTCGCTACTAGATCACTAGCTTA ATCTGATACTAGCTACGACTAGC Linking to microarray experimental data A = set of microarray experiments that measured reelin C = set of microarray experiments that measured tooth development A and C might be in the same or different databases B-terms = genes whose expression was correlated with reelin in some system, and that were expressed during tooth developing on the other If reelin regulates certain genes that have roles during tooth development, one may hypothesize a role for reelin in tooth development as well, even if none of the tooth microarray studies had examined reelin explicitly This might stimulate someone to test... if reelin is expressed at specific times and places within the developing toothbud if reelin actively regulates the genes on the B-list if tooth development is abnormal in the reeler mouse that genetically lacks reelin Linking PubMed to bioinformatics databases B-gene list Microarray Microarray gene A gene C PubMed A-literature PubMed C-literature Other databases Genomic Quantitative trait loci (QTL) Atlases Images ETC Using the literature to link genes If genes A strongly co-occurs with gene B in the literature due to a biologically significant relationship, and gene B and C similarly co-occur, Then genes A and C are likely to be biologically related as well When A and C do not co-occur above the chance level, then the relation between A and C may not be previously known or documented Special case of the Arrowsmith 1-node search Gene B 0.9 0.9 Gene C Gene A 0.2