Download The possibility and probability of establishing a global neuroscience

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Neuroanatomy wikipedia , lookup

Nervous system network models wikipedia , lookup

Metastability in the brain wikipedia , lookup

Cognitive neuroscience wikipedia , lookup

History of neuroimaging wikipedia , lookup

Connectome wikipedia , lookup

Neurophilosophy wikipedia , lookup

Neuropsychopharmacology wikipedia , lookup

The Measure of a Man (Star Trek: The Next Generation) wikipedia , lookup

Neuroinformatics wikipedia , lookup

Transcript
Navigating the Neuroscience Data
Landscape
Maryann Martone, Ph. D.
University of California, San Diego
“Neural Choreography”
“A grand challenge in neuroscience is to elucidate brain function in relation
to its multiple layers of organization that operate at different spatial
and temporal scales. Central to this effort is tackling “neural
choreography” -- the integrated functioning of neurons into brain
circuits--their spatial organization, local and long-distance connections,
their temporal orchestration, and their dynamic features. Neural
choreography cannot be understood via a purely reductionist approach.
Rather, it entails the convergent use of analytical and synthetic tools to
gather, analyze and mine information from each level of analysis, and
capture the emergence of new layers of function (or dysfunction) as we
move from studying genes and proteins, to cells, circuits, thought, and
behavior....
However, the neuroscience community is not yet fully engaged in exploiting
the rich array of data currently available, nor is it adequately poised to
capitalize on the forthcoming data explosion. “
Akil et al., Science, Feb 11, 2011
 NIF is an initiative of the NIH Blueprint consortium of institutes
 What types of resources (data, tools, materials, services) are
available to the neuroscience community?
 How many are there?
 What domains do they cover? What domains do they not cover?
 Where are they?
 Web sites
 Databases
 Literature
•
•
PDF files
Desk drawers
 Supplementary material
 Who uses them?
 Who creates them?
 How can we find them?
 How can we make them better in the future?
http://neuinfo.org
How many resources are
there?
•NIF Registry: A
catalog of
neuroscience-relevant
resources
•> 4800 currently
listed
•> 2000 databases
•And we are finding
more every day
The Neuroscience Information Framework: Discovery and
utilization ofLiterature
web-based resources for neuroscience
UCSD, Yale, Cal Tech, George
Mason, Washington Univ
Database
Federation

A portal for finding and
using neuroscience
resources

A consistent framework for
describing resources

Provides simultaneous
search of multiple types of
information, organized by
category

Supported by an expansive
ontology for neuroscience
Utilizes advanced
technologies to search the
“hidden web”

Registry
Supported by NIH Blueprint
http://neuinfo.org
What are the connections of the
hippocampus?
Hippocampus OR “Cornu Ammonis” OR
“Ammon’s horn”
Data sources
categorized by
“data type” and
level of nervous
system
Link back to
record in
original
source
Common views
across multiple
sources
Query expansion: Synonyms
and related concepts
Boolean queries
Tutorials for using
full resource when
getting there from
NIF
Results are organized within a common
framework
Target site
Synapsed by
innervates
Connects to
Input region
Synapsed with
Cellular contactProjects to
Axon innervates
Subcellular contact
Source site
Each resource implements a different, though related model;
systems are complex and difficult to learn, in many cases
The scourge of neuroanatomical
nomenclature
•NIF Connectivity: 6 databases containing connectivity primary data or claims
•Brain Architecture Management System (rodent)
•Connectome Wiki (human)
•Brain Maps (various)
•CoCoMac (primate cortex)
•UCLA Multimodal database (Human fMRI)
•Avian Brain Connectivity Database (Bird)
•Total: 1800 unique brain terms (exluding Avian)
•Number of exact terms used in > 1 database: 42
•Number of synonym matches: 99
•Number of partonomy matches: 385
The INCF is working with NIF to develop semantic and spatial strategies for translating
anatomy across information systems
What is an ontology?
 Ontology: an explicit, formal representation
of concepts relationships among them
within a particular domain that expresses
human knowledge in a machine readable
form

Branch of philosophy: a theory of what is

e.g., Gene ontologies
Brain
has a
Cerebellum
has a
 Provide universals for navigating across
different data sources

Purkinje Cell Layer
Semantic “index”
has a
 Provide the basis for concept-based
Purkinje cell
queries to probe and mine data

Perform reasoning

Link data through relationships not just oneto-one mappings
is a
neuron
PONS program
 Structural Lexicon Taskforce

Concentrate on Human, Non-human
Primate, Rat and Mouse

Define structural concepts from level of
organ to macromolecular complexes

Provide a set of criteria by which
structures can be identified
 Neuronal Registry Taskforce

Establish conventions for naming new
types of neurons

Establish a standard set of properties to
define neurons

Create a Neuron Registry for registering
new types of neurons
 Deployment and representation (Alan
Ruttenberg)

Brought together ontologists working
across scales
Courtesy of Chris Mungall, Lawrence
Berkeley Labs
***Not about imposing a
single view of anatomy;
about making concepts
computable and being
able to translate among
views
NeuroLex Wiki
•Provide a simple framework
for defining the concepts
required
•Cell, Part of brain,
subcellular structure,
molecule
•Community based:
•Avian neuroanatomy
•Fly neurons (England)
•Neuroimaging terms
•Brain regions identified
by text mining
•Creating a computable
index for neuroscience data
•INCF working to coordinate
Wiki efforts underway at
Allen Institute, Blue Brain
and Neurolex
Demo D03
http://neurolex.org
Stephen Larson
Comparison of traffic to NIF Portal vs
Neurolex
Wiki is readily indexed by search engines
5000 hits
15000 hits
Neurons in Neurolex
 INCF building a
knowledge base of
neurons and their
properties via the
Neurolex Wiki
 Led by Dr. Gordon
Shepherd
 Consistent and
parseable naming
scheme
 Knowledge is readily
accessible, editable
and computable
Stephen Larson
NIF data federation
Percentage of data records per
data type
Brain activation foci
Animals
Images
Pathways
Drugs
connectivity
Antibodies
Microarray
98%
Recently added: BioNOT literature
mining tool; Retraction Watch blog
Grants
Primary data, secondary data, claims,
repositories
What do you mean by data?
Databases come in many shapes and sizes
 Primary data:

Data available for reanalysis, e.g.,
microarray data sets from GEO;
brain images from XNAT;
microscopic images (CCDB/CIL)
 Secondary data

Data features extracted through
data processing and sometimes
normalization, e.g, brain structure
volumes (IBVD), gene expression
levels (Allen Brain Atlas); brain
connectivity statements (BAMS)
 Tertiary data

Claims and assertions about the
meaning of data
 E.g., gene
upregulation/downregulation,
brain activation as a function
 Registries:
 Metadata
 Pointers to data sets or
materials stored elsewhere
 Data aggregators
 Aggregate data of the same
type from multiple sources,
e.g., Cell Image Library
,SUMSdb, Brede
 Single source
 Data acquired within a single
context , e.g., Allen Brain Atlas
NIF landscape analysis
Data source
Brain region
Brain
Striatum
Hypothalamus
Olfactory bulb
Cerebral cortex
Vadim Astakhov, Keppler Workflow Engine
How much of the landscape do we have?
Query for “reference” brain structures and their parts in NIF Connectivity database
Gender bias
NIF can start to
answer interesting
questions about
neuroscience
research, not just
about neuroscience
NIF Reports:
Male vs Female
Embracing duplication: Data Mash ups
•~300 PMID’s were common between Brede and SUMSdb
•Same information; value added
Same data; different aspects
Same data: different analysis
 Drug Related Gene database:
extracted statements from
figures, tables and supplementary
data from published article
 Gemma: Reanalyzed microarray
results from GEO using different
algorithms
 Both provide results of increased
or decreased expression as a
function of experimental
paradigm
 4 strains of mice
 3 conditions: chronic morphine,
acute morphine, saline
http://www.chibi.ubc.ca/Gemma/home.html
Chronic vs acute
morphine in striatum
Mined NIF for all references to GEO
ID’s: found small number where the
same dataset was represented in two
or more databases
How easy was it to compare?
 Gemma: Gene ID + Gene Symbol
 DRG: Gene name + Probe ID
NIF annotation
 Gemma: Increased expression/decreased expression
standard
 DRG: Increased expression/decreased expression
 But...Gemma presented results relative to baseline chronic morphine; DRG with
respect to saline, so direction of change is opposite in the 2 databases
 Analysis:
 1370 statements from Gemma regarding gene expression as a function of chronic
morphine
 617 were consistent with DRG;  over half of the claims of the paper were not
confirmed in this analysis
 Results for 1 gene were opposite in DRG and Gemma
 45 did not have enough information provided in the paper to make a judgment
Grabbing the long tail of small
data
 Analysis of NIF shows
multiple databases with
similar scope and content
 Many contain partially
overlapping data
 Data “flows” from one
resource to the next
 Data is reinterpreted,
reanalyzed or added to
 When does it become
something else?
 Is duplication good or bad?
Phases of NIF
 2006-2008: A survey of what was out there
 2008-2009: Strategy for resource discovery
 NIF Registry vs NIF data federation
 Ingestion of data contained within different technology platforms,
e.g., XML vs relational vs RDF
 Effective search across semantically diverse sources
 NIFSTD ontologies
 2009-2011: Strategy for data integration
 Unified views across common sources
 Mapping of content to NIF vocabularies
 2011-present: Data analytics
 Uniform external data references
Data, not just stories about them!
47/50 major preclinical
published cancer studies
could not be replicated
 “The scientific community
assumes that the claims in a
preclinical study can be taken
at face value-that although
there might be some errors in
detail, the main message of
the paper can be relied on and
the data will, for the most
part, stand the test of time.
Unfortunately, this is not
always the case.”
Begley and Ellis, 29 MARCH 2012 | VOL 483 |
NATURE | 531
 “There are no guidelines that
require all data sets to be
reported in a paper; often,
original data are removed
during the peer review and
publication process. “
 Getting data out sooner in a
form where they can be exposed
to many eyes and many
analyses, and easily compared,
may allow us to expose errors
and develop better metrics to
evaluate the validity of data
A global view of data
 You (and the machine) have to be able to
find it
 Accessible through the web
 Annotations
 You have to be able to use it
 Data type specified and in a usable form
 You have to know what the data mean
 Some semantics
 Context: Experimental metadata
 Provenance: Where did the data come from?
Reporting neuroscience data within a consistent framework helps enormously
NIF team (past and present)
Jeff Grethe, UCSD, Co Investigator, Interim PI
Amarnath Gupta, UCSD, Co Investigator
Anita Bandrowski, NIF Project Leader
Gordon Shepherd, Yale University
Perry Miller
Luis Marenco
Rixin Wang
David Van Essen, Washington University
Erin Reid
Paul Sternberg, Cal Tech
Arun Rangarajan
Hans Michael Muller
Yuling Li
Giorgio Ascoli, George Mason University
Sridevi Polavarum
Fahim Imam, NIF Ontology Engineer
Larry Lui
Andrea Arnaud Stagg
Jonathan Cachat
Jennifer Lawrence
Lee Hornbrook
Binh Ngo
Vadim Astakhov
Xufei Qian
Chris Condit
Mark Ellisman
Stephen Larson
Willie Wong
Tim Clark, Harvard University
Paolo Ciccarese
Karen Skinner, NIH, Program Officer
Concept-based search: search by meaning
 Search Google: GABAergic neuron
 Search NIF: GABAergic neuron
 NIF automatically searches for types of
GABAergic neurons
Types of GABAergic
neurons