Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Neuroanatomy wikipedia , lookup
Nervous system network models wikipedia , lookup
Metastability in the brain wikipedia , lookup
Cognitive neuroscience wikipedia , lookup
History of neuroimaging wikipedia , lookup
Neurophilosophy wikipedia , lookup
Neuropsychopharmacology wikipedia , lookup
The Measure of a Man (Star Trek: The Next Generation) wikipedia , lookup
Navigating the Neuroscience Data Landscape Maryann Martone, Ph. D. University of California, San Diego “Neural Choreography” “A grand challenge in neuroscience is to elucidate brain function in relation to its multiple layers of organization that operate at different spatial and temporal scales. Central to this effort is tackling “neural choreography” -- the integrated functioning of neurons into brain circuits--their spatial organization, local and long-distance connections, their temporal orchestration, and their dynamic features. Neural choreography cannot be understood via a purely reductionist approach. Rather, it entails the convergent use of analytical and synthetic tools to gather, analyze and mine information from each level of analysis, and capture the emergence of new layers of function (or dysfunction) as we move from studying genes and proteins, to cells, circuits, thought, and behavior.... However, the neuroscience community is not yet fully engaged in exploiting the rich array of data currently available, nor is it adequately poised to capitalize on the forthcoming data explosion. “ Akil et al., Science, Feb 11, 2011 NIF is an initiative of the NIH Blueprint consortium of institutes What types of resources (data, tools, materials, services) are available to the neuroscience community? How many are there? What domains do they cover? What domains do they not cover? Where are they? Web sites Databases Literature • • PDF files Desk drawers Supplementary material Who uses them? Who creates them? How can we find them? How can we make them better in the future? http://neuinfo.org How many resources are there? •NIF Registry: A catalog of neuroscience-relevant resources •> 4800 currently listed •> 2000 databases •And we are finding more every day The Neuroscience Information Framework: Discovery and utilization ofLiterature web-based resources for neuroscience UCSD, Yale, Cal Tech, George Mason, Washington Univ Database Federation A portal for finding and using neuroscience resources A consistent framework for describing resources Provides simultaneous search of multiple types of information, organized by category Supported by an expansive ontology for neuroscience Utilizes advanced technologies to search the “hidden web” Registry Supported by NIH Blueprint http://neuinfo.org What are the connections of the hippocampus? Hippocampus OR “Cornu Ammonis” OR “Ammon’s horn” Data sources categorized by “data type” and level of nervous system Link back to record in original source Common views across multiple sources Query expansion: Synonyms and related concepts Boolean queries Tutorials for using full resource when getting there from NIF Results are organized within a common framework Target site Synapsed by innervates Connects to Input region Synapsed with Cellular contactProjects to Axon innervates Subcellular contact Source site Each resource implements a different, though related model; systems are complex and difficult to learn, in many cases The scourge of neuroanatomical nomenclature •NIF Connectivity: 6 databases containing connectivity primary data or claims •Brain Architecture Management System (rodent) •Connectome Wiki (human) •Brain Maps (various) •CoCoMac (primate cortex) •UCLA Multimodal database (Human fMRI) •Avian Brain Connectivity Database (Bird) •Total: 1800 unique brain terms (exluding Avian) •Number of exact terms used in > 1 database: 42 •Number of synonym matches: 99 •Number of partonomy matches: 385 The INCF is working with NIF to develop semantic and spatial strategies for translating anatomy across information systems What is an ontology? Ontology: an explicit, formal representation of concepts relationships among them within a particular domain that expresses human knowledge in a machine readable form Branch of philosophy: a theory of what is e.g., Gene ontologies Brain has a Cerebellum has a Provide universals for navigating across different data sources Purkinje Cell Layer Semantic “index” has a Provide the basis for concept-based Purkinje cell queries to probe and mine data Perform reasoning Link data through relationships not just oneto-one mappings is a neuron PONS program Structural Lexicon Taskforce Concentrate on Human, Non-human Primate, Rat and Mouse Define structural concepts from level of organ to macromolecular complexes Provide a set of criteria by which structures can be identified Neuronal Registry Taskforce Establish conventions for naming new types of neurons Establish a standard set of properties to define neurons Create a Neuron Registry for registering new types of neurons Deployment and representation (Alan Ruttenberg) Brought together ontologists working across scales Courtesy of Chris Mungall, Lawrence Berkeley Labs ***Not about imposing a single view of anatomy; about making concepts computable and being able to translate among views NeuroLex Wiki •Provide a simple framework for defining the concepts required •Cell, Part of brain, subcellular structure, molecule •Community based: •Avian neuroanatomy •Fly neurons (England) •Neuroimaging terms •Brain regions identified by text mining •Creating a computable index for neuroscience data •INCF working to coordinate Wiki efforts underway at Allen Institute, Blue Brain and Neurolex Demo D03 http://neurolex.org Stephen Larson Comparison of traffic to NIF Portal vs Neurolex Wiki is readily indexed by search engines 5000 hits 15000 hits Neurons in Neurolex INCF building a knowledge base of neurons and their properties via the Neurolex Wiki Led by Dr. Gordon Shepherd Consistent and parseable naming scheme Knowledge is readily accessible, editable and computable Stephen Larson NIF data federation Percentage of data records per data type Brain activation foci Animals Images Pathways Drugs connectivity Antibodies Microarray 98% Recently added: BioNOT literature mining tool; Retraction Watch blog Grants Primary data, secondary data, claims, repositories What do you mean by data? Databases come in many shapes and sizes Primary data: Data available for reanalysis, e.g., microarray data sets from GEO; brain images from XNAT; microscopic images (CCDB/CIL) Secondary data Data features extracted through data processing and sometimes normalization, e.g, brain structure volumes (IBVD), gene expression levels (Allen Brain Atlas); brain connectivity statements (BAMS) Tertiary data Claims and assertions about the meaning of data E.g., gene upregulation/downregulation, brain activation as a function Registries: Metadata Pointers to data sets or materials stored elsewhere Data aggregators Aggregate data of the same type from multiple sources, e.g., Cell Image Library ,SUMSdb, Brede Single source Data acquired within a single context , e.g., Allen Brain Atlas NIF landscape analysis Data source Brain region Brain Striatum Hypothalamus Olfactory bulb Cerebral cortex Vadim Astakhov, Keppler Workflow Engine How much of the landscape do we have? Query for “reference” brain structures and their parts in NIF Connectivity database Gender bias NIF can start to answer interesting questions about neuroscience research, not just about neuroscience NIF Reports: Male vs Female Embracing duplication: Data Mash ups •~300 PMID’s were common between Brede and SUMSdb •Same information; value added Same data; different aspects Same data: different analysis Drug Related Gene database: extracted statements from figures, tables and supplementary data from published article Gemma: Reanalyzed microarray results from GEO using different algorithms Both provide results of increased or decreased expression as a function of experimental paradigm 4 strains of mice 3 conditions: chronic morphine, acute morphine, saline http://www.chibi.ubc.ca/Gemma/home.html Chronic vs acute morphine in striatum Mined NIF for all references to GEO ID’s: found small number where the same dataset was represented in two or more databases How easy was it to compare? Gemma: Gene ID + Gene Symbol DRG: Gene name + Probe ID NIF annotation Gemma: Increased expression/decreased expression standard DRG: Increased expression/decreased expression But...Gemma presented results relative to baseline chronic morphine; DRG with respect to saline, so direction of change is opposite in the 2 databases Analysis: 1370 statements from Gemma regarding gene expression as a function of chronic morphine 617 were consistent with DRG; over half of the claims of the paper were not confirmed in this analysis Results for 1 gene were opposite in DRG and Gemma 45 did not have enough information provided in the paper to make a judgment Grabbing the long tail of small data Analysis of NIF shows multiple databases with similar scope and content Many contain partially overlapping data Data “flows” from one resource to the next Data is reinterpreted, reanalyzed or added to When does it become something else? Is duplication good or bad? Phases of NIF 2006-2008: A survey of what was out there 2008-2009: Strategy for resource discovery NIF Registry vs NIF data federation Ingestion of data contained within different technology platforms, e.g., XML vs relational vs RDF Effective search across semantically diverse sources NIFSTD ontologies 2009-2011: Strategy for data integration Unified views across common sources Mapping of content to NIF vocabularies 2011-present: Data analytics Uniform external data references Data, not just stories about them! 47/50 major preclinical published cancer studies could not be replicated “The scientific community assumes that the claims in a preclinical study can be taken at face value-that although there might be some errors in detail, the main message of the paper can be relied on and the data will, for the most part, stand the test of time. Unfortunately, this is not always the case.” Begley and Ellis, 29 MARCH 2012 | VOL 483 | NATURE | 531 “There are no guidelines that require all data sets to be reported in a paper; often, original data are removed during the peer review and publication process. “ Getting data out sooner in a form where they can be exposed to many eyes and many analyses, and easily compared, may allow us to expose errors and develop better metrics to evaluate the validity of data A global view of data You (and the machine) have to be able to find it Accessible through the web Annotations You have to be able to use it Data type specified and in a usable form You have to know what the data mean Some semantics Context: Experimental metadata Provenance: Where did the data come from? Reporting neuroscience data within a consistent framework helps enormously NIF team (past and present) Jeff Grethe, UCSD, Co Investigator, Interim PI Amarnath Gupta, UCSD, Co Investigator Anita Bandrowski, NIF Project Leader Gordon Shepherd, Yale University Perry Miller Luis Marenco Rixin Wang David Van Essen, Washington University Erin Reid Paul Sternberg, Cal Tech Arun Rangarajan Hans Michael Muller Yuling Li Giorgio Ascoli, George Mason University Sridevi Polavarum Fahim Imam, NIF Ontology Engineer Larry Lui Andrea Arnaud Stagg Jonathan Cachat Jennifer Lawrence Lee Hornbrook Binh Ngo Vadim Astakhov Xufei Qian Chris Condit Mark Ellisman Stephen Larson Willie Wong Tim Clark, Harvard University Paolo Ciccarese Karen Skinner, NIH, Program Officer Concept-based search: search by meaning Search Google: GABAergic neuron Search NIF: GABAergic neuron NIF automatically searches for types of GABAergic neurons Types of GABAergic neurons