Download azarmgin - Engineering Computing Facility

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Neurophilosophy wikipedia , lookup

The Measure of a Man (Star Trek: The Next Generation) wikipedia , lookup

Data (Star Trek) wikipedia , lookup

Neuroinformatics wikipedia , lookup

Transcript
1
Neuroinformatics Providing the Tools for a
Systems Biology Approach to Neuroscience
Sogol Azarmgin, M.A.Sc. Candidate, University of Toronto, IBBME

Abstract—The new and emerging field of systems biology
enables researchers and scientists to enhance their understanding
of biological systems by examining the overall structure and
dynamics of a system. Applying systems biology to neuroscience
for better understanding of brain functions and mechanisms
requires integration of information from the level of gene to the
level of behavior. The enormous amount of information at each of
the many levels involved along with the complexity of these data
due to their diverse and distributed nature would inherently
necessitate the development of new databases and their associated
informatics tools. This document presents an overview of the
challenges and the issues facing the development, implementation,
and maintenance of new neuroinformatics tools.
Index Terms—neuroinformatics, systems biology, databases,
data heterogeneity, integrative neuroscience.
I. INTRODUCTION
S
YSTEMS
biology, a new and an emerging field, aims at exploring
the behavior and relationships of all of the elements in a
particular biological system while it is functioning [1]. This
approach would inherently require collective efforts from
multiple research areas such as genetics and molecular
biology, high precision measurement, computer science and
other scientific and engineering fields [2]. As a result, data
sharing and development of data repositories with their
associated tools are of crucial importance to make the best use
of the massive data sets available.
Prior to prevalence of electronic data sharing, the traditional
mode of experimental science involved little data sharing
where each laboratory generated its data to solve a problem,
published a fraction of it, and moved on. The published data
remained in the hard copy volume or reprint, where it was
inaccessible by electronic means; the unpublished data were
lost forever. This classical approach was altered during the
1980’s by the gene-sequencing community. Genetic and
protein sequences were submitted, in parallel with journal
publications, to central databases where they could be archived
and made accessible to others for further analyses such as
determining homologies across different species [3]. Such
databases and their associated tools have formed the
foundations for the development of bioinformatics, a key
discipline, as essential to molecular biology as are its
experimental methods.
In direct analogy to the indispensable role of bioinformatics
to molecular biology, neuroinformatics is about to gain
considerable importance in helping to overcome the daunting
challenge of understanding the human brain. To ensure
greatest possible benefits from the pool of collected data over
the past decades, neuroscientific community has started
moving in the similar directions and has begun its efforts (e.g.
the Human Brain Project) to develop new information
management systems in the form of interoperable databases
with associated data management tools. These tools would
include algorithms and software packages to perform querying,
data mining, image processing, modeling, simulation and
electronic collaboration.
Neuroinformatics offers a promising future to the field of
neuroscience, but there are numerous challenges that need to
be resolved in order to make advancements. These challenges
include issues pertaining to development, implementation and
maintenance of databases in addition to technical,
standardization, legal and ethical concerns. This paper focuses
its attention on the challenges faced by the neuroscientific
community in the three phases of development,
implementation, and maintenance of databases and their
associated tools. It also provides a perspective into the future
of neuroinformatics.
II. DEVELOPMENT OF DATABASES & ASSOCIATED TOOLS
Unlike the linear genetic and protein sequences, the
neuroscientific data are very complex and heterogeneous in
nature spanning multiple scales and dimensions of analysis.
The complexity of dealing with these multiple dimensions and
scales is further complicated by the specific connectivity of
neuronal pathways and the dynamic temporal dimensions of
the brain [4]. Fig. 1 is an illustration of the complexity
involved in neuroscience research. As depicted in this figure,
the multiple dimensions of analysis include, and are not
limited to imaging, neurophysiology, neurochemistry, cellular
signaling, morphology, molecular biology and genetics. The
multiple scales of analysis include intracellular, extracellular,
single neurons, networks of neurons and systems of neurons
that extend to the whole brain studies. This overwhelming
complexity requires innovative ways for development of
databases and their associated tools so that they can serve as
2
effective channels of communication and collaboration
between neuroscientists worldwide and provide them with the
capability to analyze brain’s mechanisms and functional
interactions in greater depth. Achieving this goal would in turn
require a clear understanding and agreement on the scope of
these databases and the techniques for data acquisition and
representation.
Fig. 1: The complexity of neuroscience research is due to
the multiple scales (top) and dimensions (bottom) of
analysis across the life time and different situations [4].
A. Scope of Databases and associated software tools
According to Kötter [5], the current scope of neuroscience
databases and their associated software tools ranges from data
inventories for personal use, to specialized data collections by
a group or community of collaborating neuroscientists, to large
multi-scale projects of general interest. Individual and
specialized data collections and their associated software tools
usually arise when certain research groups having certain
interests are following their own specific requirements. Many
of these efforts are not published and at times there maybe
recreations of similar work in different places. Nonetheless,
some of these efforts have grown and have made considerable
impact on modeling and analysis capabilities. For example, the
GENeral NEural SImulation System (GENESIS) is a modelbased database system that holds data ranging from the
characteristics of ion channels in cell membranes to the details
of the connections between neurons. But these data only
become usefully organized when a user creates a model of a
neuron or networks of neurons [6]. Other examples of isolated
efforts by different groups include the BrainMap Project and
NEURON [5].
Large-scale projects, typically funded and carried out by a
consortium of research groups, are aimed at supporting variety
of research approaches to develop databases and tools that can
efficiently deal with the complexity of brain data. The
International Consortium for Brain Mapping, ICBM and the
American Human Brain Project, both initiated in 1993 to
support research into databases and related tools for
neuroscientists [7].
Much like the preceding Human Genome Project which
completely changed the way research was carried in molecular
biology, the Human Brain Project has thus far shown that the
scope of informatics approaches is of international scale and
significance and requires a highly organized collaboration
between neuroscientists worldwide.
A. Data Acquisition
Acquiring data is perhaps the first prerequisite for any
database development effort. In other words, without
availability of data, there is not much incentive to initiate
database development efforts and as it has been the case for
most situations, it is the growing amount of available data
that calls for more efficient ways of handling them.
Neuroscience is no exception as there has been an enormous
explosion of experimental data over the past few decades
ranging from multielectrode recordings in behaving animals
to functional imaging experiments in humans. While the
availability of data poses no problems, it is the lack of
compatibility between different data formats that can hinder
data exchange and integration. The incompatibility and lack
of standardization is the direct result of various independent
development efforts in the past.
A second problem in data acquisition is incompleteness
of datasets and their accompanying information. Typically
what is published is a small subset of data produced in the
course of an experiment. The pressure for publication space
and the bias for publishing only the positive findings limit
data presentations to the most significant data whereas
seemingly minor, poorly understood or merely confirmatory
data are suppressed. Furthermore, the accompanying
descriptions of experimental details are often insufficient to
repeat or evaluate all aspects of the experiment without
additional information. As a results, some believe that the
scope of databases should be extended to include raw and
even unpublished data, which itself has given rise to
controversy among neuroscientists [5], [3].
The success of genomic and proteomic databases is partly
due to the active participation of data producers that have an
interest in seeing their data presented in a database, often as
a requirement for having their work published. In
neuroscience the situation has been different thus far. Many
researchers are reluctant to take the time and trouble to
share their data. They argue that the complexity of primary
data makes understanding them too difficult for anyone
other than the original producers. Others express their
concern for copy right issues and the loss of control over
their data. Gordon Shepherd at Yale has taken the initiative
to address this issue. The olfactory receptor database within
3
Shepherd’s SenseLab allows users to enter unpublished
sequences that are kept hidden from others. Instead, when a
search finds matches with unpublished data, the database
provides the searcher with the contact information for the
researcher who submitted the data. Researchers can thus
avoid revealing their primary data and yet benefit from
identifying potential collaborators [8], [9]. There are also
other opposing arguments against sharing of primary data
which Koslow [7] address by stating that the scientific goal
of discovery outweighs the arguments for not sharing data.
He argues that the primary data will gain in value if it is put
into the public domain once it has been analyzed and
published. The combination of this new data with other data
and further analysis will lead to increased value, new
knowledge and understanding (Fig. 2).
Fig. 2: The value of data increases if shared in public
domain, yielding new knowledge and understanding [4].
The highly controversial decision pertaining to what level of
data sharing is desirable and what type of data needs to be
included in these cohesive interoperable databases is also
reflected on the two opposing views in relation to development
efforts. Some scientists believe it is more efficient to fund and
develop a wide variety of small databasing projects and then
build a link between those that prove to be most successful.
Others on the other hand, believe that the problem of how to
form database federations have to be tackled by experts
working in dedicated informatics centers [6]. There are also
scientists and database architects who propose designing filters
to let users decide how to weigh the data. NeuroScholar, a
database on neural connectivity in the rat brain is an example
of this case. The users of this system can search for studies
based on attributes such as the journal they appeared or the
techniques the investigators used. They can then combine the
weighted data from several studies to assess the strength of
hypotheses they might wish to test [10].
B. Data Representation
Data representation poses a central problem to most
databasing approaches. In contrast to gene and protein
sequences that can be represented linearly, the complexity of
neuroscientific data prevents the possibility of codifying all
data into one standard format. There are no universal and
simple codes for fundamental objects of neuroscience, such as
neuronal cell type and their activity patterns, cortical columns
and layers. As a result, the complexity of data representation in
neuroscience can range from simple viewing of text or tables
to dynamically-created graphical interfaces. Therefore,
analysis and processing steps are required to make data of
variable sources comparable. The lack of standardization is
particularly noticeable in neuroanatomy where the
nomenclature can easily create confusion [11]. For examples,
the same brain region can go by the name of caudate nucleus
or nucleus caudatus, or referred to as part of a larger structure
called basal ganglia. This confusion leaves only the experts to
sort the data. Therefore, ontological systems that specify the
relationship between words and the operationally defined
concepts they represent are essential for databases. The US
National Library of Unified Medical Language System
(UMLS) includes neuroanatomical nomenclature systems
designed to address this issue [11], [12]. Lack of
standardization in data representation and thus incompatible
data formats are roadblocks when it comes to transforming
from one data format to another which would impose the
requirement for developing algorithms that can perform this
transformation. The Objective Relational Transformation
(ORT) algorithm as an examples is used to transform
neuroanatomical data from one mapping scheme to another
[13].
C. Tools for Data Analysis
Having the capability to merely store and retrieve data may
be of little use if the user does not have access to analytical
tools to allow for interactive and comparative analyses
between the data. Therefore, it is essential to develop
sophisticated tools that extend and refine sorting, analysis, and
data integration. The heterogeneous nature of neuroscientific
data would require software packages for image processing,
compression, graphical interfaces, visualization, geometric
warping (spatial transformation so that subjects are
geometrically
comparable),
modeling,
simulation,
computation, querying and data mining to name a few.
III. SYSTEM IMPLEMENTATION
Notwithstanding the complexity and rapid development of
neuroscientific data, the existing database technologies can
still be used to carry out the goals of neuroinformatics.
Examples of standard database management products available
include Microsoft SQL server, MySQL and PostgreSQL on
UNIX platform as well as commercial systems that can be
licensed from Oracle, Sysbase, Informix, IBM and others. In
spite of the relative ease in obtaining the required technology,
several issues need to be addressed prior to implementation of
database projects in neuroscience. A potential limiting factor
in some database projects comes from the requirement for
efficient storage and processing of large datasets (e.g. image
files), flexible representation of heterogeneous data types and,
object oriented rather than relational data representation [5].
4
The exchange of data across the Internet would be facilitated if
the data was already in Internet-suitable format (e.g. XML
standard). There also needs to be a standardized approach to
facilitate data conversions and communications between
databases. Some scientists recommend that a universal markup language for neuroscience can efficiently address this
problem [14]. Other issues to be addressed include
performance issues and data processing. Due to bandwidth
limitations, an increase in the size of data and the number of
database users can dramatically affect the performance.
Therefore, strategies for efficient data transfer and presentation
are crucial. There is also the response time and the serverclient interaction that may affect the general acceptance of
databases. The tools associated with databases may be
required to perform complicated analyses and intensive failsafe computations and thus efficient data representation and
optimization techniques are of necessity [5]. `
--Build a variety of integrated tools to complement the
databases.
VI. CONCLUSION
Neuroscience research is generating increasing amounts
of data that go far beyond the traditional means of analyzing
data. The complexity and the variety of neuroscientific data
ranging from physiological recordings to cellular and
molecular interactions to brain images have driven scientists
and researchers towards developing and adapting new
electronic methods that can make more efficient use of the
kinds of data available. These methods take the form of
neuroinformatics tools. Once developed and proven
effective, these tools will become as essential in
neuroscience research as they are in genomic and proteomic
research. There remain, however, significant problems both
technical and sociological. The challenge for the
neuroscientific community is to remain cooperative and to
link the parallel efforts such that they work in concert, each
complementing each other.
IV. MAINTENANCE & QUALITY CONTROL
As a database project grows, it involves more and more
people, leading to differentiation between database developers,
administrators, data collectors and general users. Therefore, it
becomes necessary to keep track of data and to verify the
accuracy and completeness of entries, and the appropriateness
of data representations. Data maintenance is specially
important to avoid the possible need for the costly
reclassification of data particularly when it is desired to keep
the old entries from going obsolete. Proper maintenance and
suitable quality control measures may also potentially help in
addressing some of the legal and ethical issues involved in
reposting published data. Much of quality control involves
active communication between database administrators and the
authors of the study. Common practices would involve authors
themselves entering most information concerning their own
study, thereby reducing the possibility of misinterpretation
[15]. There are also other processes such as peer reviews that
may be helpful in quality control if tailored towards special
needs of databases. Some argue that there is a need for
implementing an independent review system if the databases
are to include preliminary and unpublished data.
REFERENCES
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
V. FUTURE DIRECTIONS
Despite the challenges ahead, neuroinformatics offers a
promising future for neuroscience. The future efforts of
experts in neuroinformatics and other collaborating fields will
likely incorporate ways to:
--Enhance the awareness of neuroscientists, computer
scientists, IT specialists and other collaborating researchers
for the need to integrate the enormous amount of data into
cohesive interoperable databases.
--Develop strategies and systematic ways of data
collection, representation and analysis.
--Establish effective and suitable measures for
maintenance and quality control.
[11]
[12]
[13]
[14]
T. Ideker, T. Galitski, L. Hood, “A New Approach To Decoding Life:
Systems Biology”, Annual Review of Genomics and Human Genetics,
vol 2, pp. 342-372, Sept 2001.
H. Kitano, “Looking beyond the details: a rise in system-oriented
approaches in genetics and molecular biology”, Current Genetics, vol
41, no. 1, pp. 1-10, Apr 04, 2002.
G.M. Shepherd, et al., “The Human Brain Project: neuroinformatics
tools for integrating, searching and modeling multidisciplinary
neuroscience data”, Trends in Neuroscience, vol. 21, no. 11, pp. 460468, Nov 1, 1998.
S. Koslow, “Sharing primary data: a threat or asset to discovery?”
Nature Reviews Neuroscience, vol. 3, pp. 311-313, Apr 2002.
R. Kotter, “Neuroscience databases: tools for exploring brain structurefunction relationships” Philosophical Transactions: Biological
Sciences, vol. 356, no. 1412, pp. 1111-1120, Aug 29, 2001
M. Chicurel, “Databasing the brain”, Nature, vol. 406, pp. 822-825,
Aug 24, 2000.
S. Koslow, “Should the neuroscience community make paradigm shift
to sharing primary data?”, Nature, vol. 3, no. 9, pp. 863-865, Sept 2000.
P.L. Miller, et al., “Integration of Multidisciplinary Sensory Data: A
Pilot Model of the Human Brain Project Approach” Journal of the
American Medical Informatics Association, vol. 8, no. 1, Feb 2001.
M. Hines, “ModelDB: A Database to Support Computational
Neuroscience”, Journal of Computational Neuroscience, vol. 17, pp. 711, Aug 2004.
G.A. Burns, et al., “Tools and approaches for the construction of
knowledge models from the neuroscientific literature.”,
Neuroinformatics, vol. 1, no. 1, pp. 81-110, Spring 2003.
A. W. Toga, “NEUROIMAGE DATABASES: THE GOOD, THE BAD
AND THE UGLY”, Nature Neuroscience, vol.3, pp. 302-209, April
2002.
B.L Humphreys, et al., “the Unified Medical Language System: An
Informatics Research Collaboration”, Journal of the American Medical
Informatics Association, vol. 5, no. 1, pp. 1-11, Feb 1998.
K.E. Stephan, K. Zilles, R. Kotter, “Coordinate-independent mapping of
structural and functional data by objective relational transformation
(ORT)”, Philosophical Transactions: Biological Science, vol. 355, pp.
37-54, Jan 29, 2000.
M. Martone, A. Gupta, M.H. Ellisman, “e-Neuroscience: Challenges
and triumphs in integrating distributed data from molecules to brains.”,
Nature Neuroscience, vol. 7, no. 5, pp. 467-472, May 2004
5
[15]
J.D. Van Horn, M.S. Gazzaniga, “Databasing fMRI
studies-towards a ‘discovery science’ of brain function”,
Nature, vol. 3, pp. 314-318, April 2002.