* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download PPT - BeeSpace
Neuronal ceroid lipofuscinosis wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
History of genetic engineering wikipedia , lookup
Public health genomics wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Biology and consumer behaviour wikipedia , lookup
Gene expression programming wikipedia , lookup
Point mutation wikipedia , lookup
Epigenetics of neurodegenerative diseases wikipedia , lookup
Protein moonlighting wikipedia , lookup
Metagenomics wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Genome (book) wikipedia , lookup
Helitron (biology) wikipedia , lookup
Minimal genome wikipedia , lookup
Pathogenomics wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Genome evolution wikipedia , lookup
Gene nomenclature wikipedia , lookup
Microevolution wikipedia , lookup
Designer baby wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Analysis Environments For Functional Genomics Bruce R. Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign [email protected] , www.beespace.uiuc.edu Informatics Research First Annual BeeSpace Workshop June 6, 2005 What are Analysis Environments Functional Analysis Find the underlying Mechanisms Of Genes, Behaviors, Diseases Comparative Analysis Top-down data mining (vs Bottom-up) Multiple Sources especially literature Building Analysis Environments Manual by Humans Interaction Classification user navigation collection indexing Automatic by Computers Federation Integration search bridges results links Needles and Haystacks Genes Honey Bees have 13K genes Perhaps 100 have known functions Paths Perhaps 30K protein families exist KEGG has 200 known pathways Statistical Clustering for Interactive Discovery Across Two Orders of Magnitude! Trends in Analysis Environments Central versus Distributed Viewpoints The 90s Pre-Genome Entrez (NIH NCBI) versus WCS (NSF Arizona) The 00s Post-Genome GO (NIH curators) versus BeeSpace (NSF Illinois) Pre-Genome Environments Focused on Syntax pre-Web WCS (Worm Community System) Search words across sources Follow links across sources Words automatic, Links manual Towards Uniform Searching Post-Genome Environments Focused on Semantics post-Web BeeSpace (Honey Bee Inter Space) Navigate concepts across sources Integrate data across sources Concepts automatic, Links automatic Towards Question Answering Worm Community System WCS Information: Literature BIOSIS, MEDLINE, newsletters, meetings Data Genes, Maps, Sequences, strains, cells WCS Functionality Browsing search, navigation Filtering selection, analysis Sharing linking, publishing WCS: 250 users at 50 labs across Internet (1991) WCS Molecular WCS Cellular WCS invokes gm WCS vis-à-vis acedb Towards the Interspace from Objects to Concepts from Syntax to Semantics Infrastructure is Interaction with Abstraction Internet is packet transmission across computers Interspace is concept navigation across repositories THE THIRD WAVE OF NET EVOLUTION CONCEPTS OBJECTS PACKETS LEVELS OF INDEXES Technology Engineering FORMAL (manual) Electrical IEEE communities INFORMAL groups (automatic) individuals Navigation in MEDSPACE For a patient with Rheumatoid Arthritis Find a drug that reduces the pain (analgesic) but does not cause stomach (gastrointestinal) bleeding Choose Domain Concept Search Concept Navigation Retrieve Document Navigate Document Post-Genome Informatics I Comparative Analysis within the Dry Lab of Biological Knowledge Classical Organisms have Genetic Descriptions. There will be NO more classical organisms beyond Mice and Men, Worms and Flies, Yeasts and Weeds. Must use comparative genomics on classical organisms Via sequence homologies and literature analysis. Post-Genome Informatics II Functional Analysis within the Dry Lab of Biological Knowledge Automatic annotation of genes to standard classifications, e.g. Gene Ontology via homology on computed protein sequences. Automatic analysis of functions to scientific literature, e.g. concept spaces via text extractions. Thus must use functions in literature descriptions. Conceptual Navigation in BeeSpace Behavioral Biologist Bee Literature Molecular Biology Literature Brain Gene Expression Profiles Brain Region Localization Neuroscience Literature Neuroscientist Molecular Biologist Bee Genome Flybase, WormBase BeeSpace Analysis Environment Build Concept Space of Biomedical Literature for Functional Analysis of Bee Genes -Partition Literature into Community Collections -Extract and Index Concepts within Collections -Navigate Concepts within Documents -Follow Links from Documents into Databases Locate Candidate Genes in Related Literatures then follow links into Genome Databases Question Answering Behaviour Molecular Function Organism Gene Reference Rover vs sitter phenotype Drosophila melanogaster for Protein kinase G 8 Roamer vs dweller phenotype C. elegans egl-4 Protein kinase G 16 Division of labour: age at onset of foraging Apis mellifera for Protein kinase G 9 Division of labour: age at onset of foraging Apis mellifera mlv Mn transporter 19 Division of labour: foraging-related? Apis mellifera per Transcription cofactor 68 Division of labour: foraging-related? Apis mellifera ache Acetylcholine esterase 69 Division of labour: foraging-related? Apis mellifera IP(3)K Inositol signaling 70 Foraging specialization: nectar vs. pollen Apis mellifera pkc Protein kinase C 71 Social feeding Drosophila melanogaster dpnf Neuropeptide Y (NPY) homolog 21 Social feeding (aggregation) C. elegans npr-1 Foraging Receptor for NPY 22, 23 Functional Phrases <gene> encodes <chemical> Sokolowski and colleagues demonstrated in Drosophila melanogaster that the foraging gene (for) encodes a cGMP dependent protein kinase (PKG). The dg2 gene encodes a cyclic guanosine monophosphate (cGMP)- dependent protein kinase (PKG). <chemical> affects/causes <behavior> Thus, PKG levels affected food-search behavior. cGMP treatment elevated PKG activity and caused foraging behavior. <gene> regulates <behavior> Amfor, an ortholog of the Drosophila for gene, is involved in the regulation of age at onset of foraging in honey bees. This idea is supported by results for malvolio (mvl), which encodes a manganese transporter and is involved in regulating Drosophila feeding and age at onset of foraging in honey bees. Data Integration (FlyBase Gene) D. melanogaster gene foraging , abbreviated as for , is reported here . It has also been known in FlyBase as BcDNA:GM08338, CG10033 and l(2)06860. It encodes a product with cGMP-dependent protein kinase activity (EC:2.7.1.-) involved in protein amino acid phosphorylation which is a component of the cellular_component unknown . It has been sequenced and its amino acid sequence contains an eukaryotic protein kinase , a protein kinase C-terminal domain , a tyrosine kinase catalytic domain , a serine/Threonine protein kinase family active site , a cAMPdependent protein kinase and a cGMP-dependent protein kinase . It has been mapped by recombination to 2-10 and cytologically to 24A2--4 . It interacts genetically with Csr . There are 27 recorded alleles : 1 in vitro construct (not available from the public stock centers), 25 classical mutants ( 3 available from the public stock centers) and 1 wild-type. Mutations have been isolated which affect the larval nerve terminal and are behavioral, pupal recessive lethal, hyperactive, larval neurophysiology defective and larval neuroanatomy defective. for is discussed in 80 references (excluding sequence accessions), dated between 1988 and 2003. These include at least 6 studies of mutant phenotypes , 2 studies of wild-type function , 3 studies of natural polymorphisms and 7 molecular studies . Among findings on for function, for activity levels influence adult olfactory trap response to a food medium attractant. Among findings on for polymorphisms, the frequency of for R and for s strains in three natural populations are studied to determine the contribution of the local parasitoid community to the differences in for R and for s frequencies. BeeSpace Information Sources Biomedical Literature - - Medline (medicine) Biosis (biology) Agricola, CAB Abstracts, Agris (agriculture) Model Organisms (heredity) - -Gene Descriptions (FlyBase, WormBase) Natural Histories (environment) -BeeKeeping Books (Cornell, Harvard) Medical Concept Spaces (1998) Medical Literature (Medline, 10M abstracts) Partition with Medical Subject Headings (MeSH) Community is all abstracts classified by core term 40M abstracts containing 280M concepts computation is 2 days on NCSA Origin 2000 Simulating World of Medical Communities 10K repositories with > 1K abstracts (1K with > 10K) Biological Concept Spaces (2006) Compute concept spaces for All of Biology BioSpace across entire biomedical literature 50M abstracts across 50K repositories Use Gene Ontology to partition literature into biological communities for functional analysis GO same scale as MeSH but adequate coverage? GO light on social behavior (biological process) Concept Switching In the Interspace… each Community maintains its own repository Switching is navigating Across repositories use your specialty vocabulary to search another specialty CONCEPT SWITCHING “Concept” versus “Term” set of “semantically” equivalent terms Concept switching region to region (set to set) match Semantic region term Concept Space Concept Space Biomedical Session Categories and Concepts Concept Switching Document Retrieval Interactive Functional Analysis BeeSpace will enable users to navigate a uniform space of diverse databases and literature sources for hypothesis development and testing, with a software system beyond a searchable database, using literature analyses to discover functional relationships between genes and behavior. Genes to Behaviors Behaviors to Genes Concepts to Concepts Clusters to Clusters Navigation across Sources BeeSpace Information Sources General for All Spaces: Scientific Literature -Medline, Biosis, Agricola, Agris, CAB Abstracts -partitioned by organisms and by functions Model Organisms -Gene Descriptions (FlyBase, WormBase, MGI, OMIM, SCD, TAIR) Special Sources for BeeSpace: -Natural History Books (Cornell Library, Harvard Press) XSpace Information Sources Organize Genome Databases (XBase) Compute Gene Descriptions from Model Organisms Partition Scientific Literature for Organism X Compute XSpace using Semantic Indexing Boost the Functional Analysis from Special Sources Collecting Useful Data about Natural Histories e.g. CowSpace Leverage in AIPL Databases Towards the Interspace The Analysis Environment technology is GENERAL! BirdSpace? BeeSpace? PigSpace? CowSpace? BehaviorSpace? BrainSpace? BioSpace … Interspace Prototype System Overall Architecture and Interface -- Todd Littell Language Parsing and Entity Recognition – Jing Jiang Normalization and Theme Clustering – Qiaozhu Mei Concept Navigation and Switching – Azadeh Shakery Gene Summarization and Linking – Xu Ling Collection Development and Navigation – Xin He Specialty Systems Question Answering – Eugene Grois Annotation Pipeline – Pouya Kheradpour