Download Arrowsmith extensions to bioinformatics

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Neuronal ceroid lipofuscinosis wikipedia , lookup

Gene therapy of the human retina wikipedia , lookup

Long non-coding RNA wikipedia , lookup

Genomics wikipedia , lookup

Human genome wikipedia , lookup

Copy-number variation wikipedia , lookup

Non-coding DNA wikipedia , lookup

Transposable element wikipedia , lookup

Epigenetics of neurodegenerative diseases wikipedia , lookup

Public health genomics wikipedia , lookup

Epigenetics in learning and memory wikipedia , lookup

Genetic engineering wikipedia , lookup

NEDD9 wikipedia , lookup

Epigenetics of diabetes Type 2 wikipedia , lookup

Minimal genome wikipedia , lookup

Gene therapy wikipedia , lookup

Ridge (biology) wikipedia , lookup

Quantitative trait locus wikipedia , lookup

Metagenomics wikipedia , lookup

Pathogenomics wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Point mutation wikipedia , lookup

Biology and consumer behaviour wikipedia , lookup

History of genetic engineering wikipedia , lookup

Genomic imprinting wikipedia , lookup

Genome editing wikipedia , lookup

Epigenetics of human development wikipedia , lookup

Gene nomenclature wikipedia , lookup

Genome evolution wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Gene desert wikipedia , lookup

Gene expression programming wikipedia , lookup

Genome (book) wikipedia , lookup

Gene wikipedia , lookup

Helitron (biology) wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

RNA-Seq wikipedia , lookup

Microevolution wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Designer baby wikipedia , lookup

Gene expression profiling wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Transcript
Arrowsmith extensions to
bio-informatics
Vetle I. Torvik
Discovering new gene sequences
 Start with a novel DNA sequence
 find overlapping sequences within the expressed
sequence tag (EST) database
 find others that overlap with that one, until one has
identified an entire new full-length gene
ATGATAGGAGA
GGAGAGCTGAGA
TGAGATGCGCTG
CGCTGATACTAGA
CTAGATGATAGAGATGCC
ATGATAGGAGAGCTGAGATGCGCTGATACTAGATGATAGAGATGCC
The Arrowsmith approach applied to
nucleotide or protein sequences
 begin with two different sets A and C of sequences
that do not overlap
 search for sequences B in the database that overlap
with one or more sequences in both A and C
AB1
ATGCTCTCGCGCTACGACTAGCATACTG
CCTGATCGCTACTACTAGCTGA
CTCGATGAGCGATGATCGCTAGCTATGGG
GTGAGGATCGCGATGATGATG
B1
ACTGATCGCTAGCTATGA
BC1
ATCGACAAGCTATGTGCAACTG
TCTCGCTACTAGATCACTAGCTTA
ATCTGATACTAGCTACGACTAGC
Linking to microarray
experimental data
 A = set of microarray experiments that measured reelin
 C = set of microarray experiments that measured tooth
development
 A and C might be in the same or different databases
 B-terms = genes whose expression was correlated with
reelin in some system, and that were expressed during
tooth developing on the other
 If reelin regulates certain genes that have roles during tooth
development, one may hypothesize a role for reelin in
tooth development as well, even if none of the tooth
microarray studies had examined reelin explicitly
This might stimulate someone to test...
 if reelin is expressed at specific times and places
within the developing toothbud
 if reelin actively regulates the genes on the B-list
 if tooth development is abnormal in the reeler
mouse that genetically lacks reelin
Linking PubMed to bioinformatics databases
B-gene
list
Microarray
Microarray
gene A
gene C
PubMed
A-literature
PubMed
C-literature
Other databases
 Genomic
 Quantitative trait loci (QTL)
 Atlases
 Images
 ETC
Using the literature to link genes
 If genes A strongly co-occurs with gene B in the
literature due to a biologically significant
relationship, and
 gene B and C similarly co-occur,
 Then genes A and C are likely to be biologically
related as well
 When A and C do not co-occur above the chance
level, then the relation between A and C may not
be previously known or documented
 Special case of the Arrowsmith 1-node
search
Gene B
0.9
0.9
Gene C
Gene A
0.2