Download Georgetown MRI Reading Center (GMRC)

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Georgetown MRI Reading Center (GMRC)
1. Overview
The Georgetown MRI Reading Center (GMRC) brings together a group of researchers and computer specialists
uniquely capable of carrying out the responsibilities of the MRI Reading Center part of the SPRINT-MIND
(Systolic Blood Pressure Intervention Trial - Memory and Cognition IN Decreased Hypertension) multi-site
study. Dr. VanMeter has extensive experience in three areas critical for the successful implementation of the
MRI Reading Center: 1) large volume multi-site MRI data collection and QC procedures; 2) management of
large-scale software and networking projects; and 3) image processing and segmentation expertise. Dr. Fricke is
an MRI physicist with over 20 years of experience in the MRI field and has special expertise in MRI protocol
development and optimization of QC procedures. Vlad Staroselskiy is a senior systems and network engineer
with 16 years experience in developing and implementing secure network transfer systems including those used
at the Pskov Savings Bank of the Russian Federation. These individuals have a long track record of working
together on a number of different projects going back 6 years. This core group of experts will be assisted by
additional personnel with the requisite skills needed to fulfill the mission of a project of this size and scope.
The remainder of this proposal describes the GMRC team members’ expertise and their role on the project, MRI
segmentation methodologies, the MRI protocol development, quality control procedures, the database design
and development plan, and the infrastructure needed to implement this project.
2. Personnel
John VanMeter, Ph.D., is Assistant Professor in the Department of Neurology at Georgetown University
Medical Center (GUMC) and Interim Director of the 3T MRI facility at Center for Functional and Molecular
Imaging. Dr. VanMeter received his Ph.D. in computer science from Dartmouth College. He has over 15 years
experience in the design and implementation of neuroimaging experiments as well as their data analysis. Dr.
VanMeter’s experience includes two years as a staff fellow in the Laboratory of Neuroscience in the National
Institute of Aging, where he co-authored the first paper to use fMRI to investigate dyslexia. Dr. VanMeter has
led the development of a number of major software programs as the Director of Research and Development at
Sensor Systems, Inc. This includes a commercial software package that is utilized at over 300 research
institutions worldwide in the analysis of multi-modal datasets including structural MRI and fMRI as well as the
subsequent development of one of the first FDA cleared fMRI clinical analysis package. As PI of an NIH
funded STAART (Studies to Advance Autism Research and Treatment) center grant project, Dr. VanMeter is
applying a number of MRI based techniques including fMRI, Diffusion Tensor Imaging (DTI), and MR
spectroscopy to investigate the neurobiological basis of autism. He also designed and developed the database
and data transfer systems used for the MRI data collected in the NIH Pediatric Brain Development project
(http://www.brain-child.org), which was a longitudinal study that included MRI scanning and
neuropsychological evaluation of 500 children of various ages at seven sites across the country. Dr. VanMeter
developed the secure data transfer mechanisms and tracking tools, a suite of data retrieval and viewing software
used in the quality assurance of the imaging data, and the database used for the imaging data collected. The
volume of data collected, stored, and databased at the end of this project was over 20TB. He has also developed
a database system for the Georgetown’s Center for Functional and Molecular Imaging (CFMI) that integrates
and manages the neuropsychological and imaging data collected in the 3.0Tesla MRI facility at Georgetown.
Dr. VanMeter will be the Director of the GMRC and be responsible for managing the overall efforts of the
GMRC throughout the SPRINT-MIND project period. In addition, he will be responsible for establishing and
coordinating with the CC the submission of the GMRC’s data to the CC and incorporating the MRI data with
the CC’s database systems. His responsibilities in the GMRC will include designing the database and MRI scan
transfer mechanisms including automated removal of PHI (personal health information) from the MRI scan
headers. He will work closely with Mr. Staroselskiy in the development efforts and oversee the project
programmer. Dr. VanMeter will work with Mr. Staroselskiy on the implementation of the data flow strategies to
coordinate the data collected at the MRI sites. Dr. VanMeter will work with Dr. Fricke on the development and
implementation of both the MRI scanning protocol and the QC (quality control) procedures.
Stan Fricke, Ph.D., is currently actively working on five NIH funded grants of which two involve brain
imaging and one involves MRI equipment development for ultra high-speed imaging. Dr. Fricke has a Nuclear
Engineer's degree (Nucl. Eng.) in radiological sciences from the Massachusetts Institute of Technology, as well
as a degree in Statistical Physics from the University of Torino (Turin, Italy). These academic qualifications
make him well suited to field almost any question in the field of in-vivo imaging. He has extensive experience
in in-vivo magnetic resonance spectroscopy and in magnetic resonance imaging.
Dr. Fricke's first experience in human imaging was in Tulsa Oklahoma (1985-1986). As a summer research
student Dr. Fricke worked at Oral Roberts City of Faith Hospital, imaging cancer patients and assisting in
planning radiation therapy based on the image data. From this experience was born his first publication in 1987
(meeting abstract that can be searched on the "Web of Science") was on manganese chloride's and nickel
chloride's effect as contrast agents for magnetic resonance imaging studied at various magnetic field strengths.
In just the last few years manganese has been used as a neuronal tracer/contrast agent for MRI with phase one
clinical trials. During 1991-1999, Dr. Fricke worked with the University of Florence on various projects
involving multiplatform imaging for the diagnosis of vision disorders due to cerebral oncological complications.
Later in 1994-1999, Dr. Fricke worked at Italy's Scientific Institute for Tumor Research (IST, Genoa, Italy). Dr.
Fricke was a faculty member in the Department of Psychiatry at Wayne State University for three years and an
Associate Professor in the department of Neuroscience at Georgetown University Medical Center for five years.
He has lectured at Georgetown University on the topic of imaging of CNS trauma and he is well published in
the field of brain trauma and rare disease linked neurodegeneration. During his time at Georgetown University
Medical Center, he helped to setup the quality control procedures that are still in use today, defined a set of
protocols most notably the MR spectroscopic protocol used to reliably and reproducibly acquire spectra from a
number of different brain regions, which has been used in two large scale imaging projects: STAART (Studies
to Advance Autism Research and Treatment) and the UCRDRC (UREA Cycle Rare-Diseases Research Center)
center grants. Currently Dr. Fricke is the MRI Physicist at Children's National Medical Center. Dr. Fricke has
worked on General Electric, Siemens and Philips MRI platforms and worked with a neuroradiologists,
radiologists, and research to develop MRI protocols that are uniquely suited for the specific study.
Dr. Fricke will have primary responsibility in the development of the MRI scan protocol and the QC
procedures. He will work with Dr. VanMeter to ensure the scanning protocol that is used meets the needs of the
project in terms of contrast required for the various segmentation procedures and acquisition parameters with an
eye towards maximizing the trade-off between total scan time and optimization of MRI scan quality. Dr
Fricke’s long history in the MRI field makes him ideally suited to design the QC procedures for this project. He
will work with Dr. VanMeter on the development of these procedures and their implementation to ensure that
the data are truly comparable across all sites. In addition, he will make the annual onsite visits to each MRI
center to ensure proper staff training and conduct a full and rigorous QC protocol.
Vlad Staroselskiy, M.S., has excellent expertise in systems and network protection. His experience has
included designing and implementing the networking infrastructure for several banks, which obviously require
high-level of data security. He has successfully planned, designed, installed and configured a number of systems
around the world. Between 2002 and 2004 he also took a part in development of NetBait product, which acts as
a trap for network intruders, trying to obtain unauthorized access to protected systems. More recently he has
worked for a number of academic institutions including Wayne State University and Georgetown University
Medical Center. Through these various positions he has been thoroughly trained in the requirements for HIPAA
(Health Insurance Portability and Accountability Act) compliance and the need to protect PHI (Personal Health
Information). As a Senior Systems Administrator at Georgetown University Medical Center, Mr. Staroselskiy
has been involved into the maintenance of multi-platform computer systems, providing day-to-day support for
research faculty and staff. This has included designing the websites for both the Center for Function and
Molecular Imaging (CFMI) and the Center for the Study of Learning (CSL). The website for CFMI included
developing the calendar system for scheduling MRI scanner and EEG time. In addition, this system includes a
database that is used to record laboratory notes regarding the scanning sessions. Specific projects have included
planning, installing and configuring Cisco PIX firewalls for Center for the Study of Learning CFMI, CSL, and
the Small Animals Imaging Lab with network monitoring, using MIDAS, Nagios, Cacti and MRTG software.
He also implemented secure site-to-site VPN connections between these labs with secure off-site access. Mr.
Staroselskiy has also setup and administers CFMI’s 19-node Linux cluster using a diskless LM/MPI
computational design with a dedicated management node. He also has extensive experience installing,
configuring, and maintaining various MRI image analysis and spectroscopy applications, such as MedX,
LCModel, FSL, Matlab/SPM, AFNI, Paravision, and others.
Mr. Staroselsky will be primarily responsible for the development of the networking protocols and procedures.
He will implement the necessary procedures to ensure the appropriate level of security across the network of
MRI facilities and the GMRC. He will also be the main developer of the web-based database system used in this
project leveraging his experience in the development of the CFMI database system. He and Dr. VanMeter will
oversee a programmer dedicated to this project who will implement the database system, the MRI data transfer
system, and the various notification systems. He will also oversee a system manager for the GMRC whose
responsibility it will be to troubleshoot network and computer systems problems.
3. MRI Segmentation and Volume Measurement
MRI provides a method for examining tissues in-vivo and with the use of high-resolution MR imaging it is
possible to produce very detailed images of the brain. It has long been possible to quantify the volume various
complicated tissues such as gray matter, white matter, and CSF from MR images of the brain (VanMeter and
Sandon 1992). A number of morphometric techniques based have relied on human raters to manually trace the
boundaries of the tissue of interest; however this method inherently introduces subjectivity. To reduce this
subjectivity, the raters are trained and their ability to trace the same boundary must be compared within rater
and across raters.
Image segmentation techniques are a class of computer algorithms designed to automatically extract the
boundaries of a given class of tissue or an organ. By their nature these algorithms provide an objective way to
identify and measure particular parts of the image separate from the rest of the image. There are several basic
techniques available that can be applied to a given image segmentation problem. These include 1) thresholding
which rely strictly on the intensity (pixel brightness) to isolate a given tissue; 2) region growing which starts
from a seed point in the tissue of interest and iteratively grows outward until the boundaries of the tissue are
reached typically based on a threshold; and 3) clustering which iteratively assigns pixels to one of k-classes
based on how close it is to the intensities of a given class (Duda and Hart 2000). These image segmentation
techniques provide a basic toolbox
and are often combined to
improve the overall result.
An example of the output of a
computer algorithm used to
identify gray matter in a T1weighted coronal slice of a human
brain (VanMeter and Sandon
1992) is show in Figure 1. This
particular method uses a
mathematical model of the
distribution of MR image
intensities called a material
mixture model. In this model the
Figure 1. Example of volumetric measurement of gray matter. In the MR
image pixels classified as gray matter are shown in red. The histogram
shows the distribution of intensities regardless of tissue type (blue) and
the Gaussian distribution models for the expected range of CSF (green),
gray matter (red), and white matter (yellow).
distribution of intensities for each tissue class (gray matter, white matter, and CSF) is modeled with Gaussian
distributions (Figure 1b). This type of model is required even though a given tissue class such as gray matter
might be darker than white matter it is not the case gray matter will have the exact same range of intensities
across a series of images. Thus, this model provides a mechanism for identifying the range of intensities
(thresholds) hat most likely correspond to a given tissue class. In the material mixture model the thresholds
correspond to the where the distributions of two neighboring tissue classes cross. The complete algorithm
combines the thresholds obtained from the material mixture model with a neural network trained to recognize
the overall morphology of the different tissues (VanMeter 1993).
Measurement of total cerebral, left- and righthemisphere, and cerebellar volume can be calculated
from high-resolution T1-weighted images such as the
Siemens MPRAGE or the GE SPGR. Pre-processing of
the images included: rating of the individual scans for
quality; removal of non-cortical structures, such as the
scalp; intra-subject registration of the stripped
MRPAGES; calculation of the mean MPRAGE; and
bias-field estimation and correction. The scalp stripping
and bias field correction was performed using BET
(Smith 2002) and FAST (Zhang, Brady et al. 2001) from
the FSL software library (http://www.fmrib.ox.ac.uk/fsl/)
respectively. The images were registered using a rigidbody transformation with AIR (Woods, Grafton et al.
1998).
Figure 2. Results of automatic segmentation of the
cerebellum (red) and left (yellow) and right (green)
hemispheres.
Using a program called Graph-Cuts, each subject’s mean MPRAGE was subdivided into three compartments:
the first two consisted of the left- and right-cerebral hemispheres and the third included both the cerebellum and
brain stem (Liang, Rehm et al. 2005). This program uses a 12 degree-of-freedom transformation derived from
registering a template volume to the subject’s mean image to reslice a volume with the three compartments
previously labeled. From this initially labeling, three connected graphs representing the two cerebral
hemispheres and the cerebellum are formed and then expanded providing a preliminary labeling of the subject’s
brain. This labeling is refined by determining the best locations to cut the three graphs from one another using a
standard graph-cuts method (Liang, Rehm et al. 2005). The volume of each of these compartments was
computed from the resulting labeled image. Total cerebral volume was computed by summing the volumes of
the two hemispheres with the cerebellum and brain stem. The volume of CSF was removed in all volumetric
calculations. In addition, a left-right symmetry index (SI) was computed as follows:
(Lv – Rv)
SI = 100 x
½ (Lv + Rv)
where Lv and Rv are equal to volume of the left and right hemispheres respectively (Galaburda, Rosen et al.
1987).
In the STAART (Studies to Advance Autism Research and Treatment) Corpus callosum (CC) area
measurements have been calculated using the seven subdivisions defined by Witelson: rostrum, genu, rostral
body, anterior midbody, posterior midbody, isthmus, and splenium (Witelson 1989). The brain is first oriented
with the Talairach atlas using a 6 degree-of-freedom transformation based on manual rotation of the midline and
identification of the AC and PC using tools in MEDx. The outline of the corpus callosum is manually traced in
the mid-sagittal plane of the transformed brain. The exact boundaries of the CC are identified using an intensity
threshold inside the manually traced region. In addition, the anterior most point of the inner convexity is
identified. Wtielson’s method subdivides the length of the CC between its anterior and posterior extents into
areas using specific geometric rules. The area of each subdivision is calculated from the overlap of the
subdivided regions and the hand-traced outline of the CC. In
addition, the cerebellar vermal areas have been computed by
manually tracing of lobules I-V, lobules VI-VII, and lobules
VIII-X in the mid-sagittal plane of the Talairach oriented brain.
Statistical comparisons of all the subregions of the CC and the
cerebellar vermis are examined with and without correcting for
total brain volume. A one-way multivariate analysis of variance
(MANOVA) was conducted using SPSS 14.0 (SPSS, Inc,
Chicago, IL) to test the null hypothesis that area of these regions
was not different between the two populations (autistic and
typically developing children).
4. MRI Protocol Development
Drs. VanMeter and Fricke will work together to develop the MRI scanning protocol. Development of this
protocol will build on the experience Dr. VanMeter has gained by participating in the ADNI (Alzheimer’s
Disease Neuroimaging Initiative), the Valproate Neuroprotection study, and the NIH Pediatric Brain
Development project. Both of these projects have developed standard acquisition protocols that are designed to
collect MRI data that has as uniform as possible contrast characteristics and quality from all three of the major
scanner manufacturers: GE, Philips, and Siemens. In particular, the ADNI protocol is quite attractive as each of
the scanner manufacturers have worked with the ADNI participating sites to distribute the sequences
appropriate for that site’s particular scanner model. In addition, this protocol has been developed to work with
both 1.5T and 3.0T MRI scanners. This protocol includes the collection of two high-resolution
(1.0x1.0x1.2mm3 voxels) T1-weighted images and a high-resolution (0.9x0.9x3.0mm3 voxels) double-echo
sequence, which acquires both the T2 and Proton-Density weighted images in the same scan. Additional
calibration scans that characterize the B0 field are also acquired. The entire protocol can fit into a 30-minute
scan slot. These NIH Pediatric Brain Development project also has a standard protocol optimized to provide
cross-site consistency in contrast and quality. The particular needs of the SPRINT-MIND study will be taken
into consideration when developing the protocol to ensure the scans collected provide the optimal data for
measuring the different volumes of interest. For example, if the study requirements call for the measurement of
hippocampal volume one of the scans in the protocol will be collected perpendicular to the long axis of the
hippocampus as is used in the Valproate Neuroprotection study.
5. Quality Control Procedures
The quality control procedures will be developed based on the best practices and
accumulated knowledge that Drs. VanMeter and Fricke have gained through their
participation in other multi-site neuroimaging studies and their own experiences in
the operation of MRI systems. The procedures will include four levels of quality
control: site qualification, monthly QC of on a phantom, acquisition of QC scans
within 24-hours of each subject’s scan, and annual onsite QC visit. The Magphan®
Quantitative Imaging Phantom will be purchased for each MRI-center from the
Phantom Lab, Inc. (Salem, NY). This phantom contains 165 polycarbonate spheres
mounted on a series of polycarbonate plates and posts filled with copper sulfate and
water solutions. Measurements of the phantom are compared with the known
positions of the spheres to give an accurate measurement of the distortion of the scanner. In addition, 4 of the
spheres produce unique contrast on T1, T2, and PD-weighted scans and can be used to track changes in contrast
over time. This phantom has the advantage of having been tested and used in the ADNI project. All of the QCscanning data will be uploaded to the GMRC using the same methods for the subject data. The GMRC will
develop automated procedures to track the quality at each over time measuring both spatial fidelity and contrast
consistency. The annual onsite QC visit will be used by the GMRC to perform a more rigorous set of quality
control tests as well as a living phantom scan using the MRI protocol from the study. This onsite visit will also
be used to train and re-test the staff at each site in the appropriate procedures for the collection of the subject
and QC data.
6. Database design and development
One of the database systems Dr. VanMeter and Mr. Staroselskiy have developed is in use in Georgetown’s
Center for Functional and Molecular Imaging (CFMI). This system is used to store, manage, and integrate the
neuropsychological and imaging data collected in the various projects ongoing in the 3.0Tesla MRI facility.
One of the advantages of this system is that it employs a web-based interface. This allows researchers to enter
data, sort, and retrieve results from their own desktop computers.
This system is designed to permit the rapid addition of new forms through the use of an underlying architecture
that is based on a set of core database tables and a combination of HTML, JSP, and Java-script web-pages.
Once the design elements have been determined, all forms are laid out and designed by a research assistant
using Microsoft Frontpage, which is available through a site license at GUMC. This eliminates the need for a
senior developer to be involved in the development of the forms.
One of the goals of the CFMI database project was to attempt to reproduce in the web interface the design of the
original paper version of each instrument. This greatly simplifies the training of users on data entry procedures
using the web-based forms. These web-based data entry screens are structured nearly identically to the paper
forms from which the data are entered. Sample input forms from this system are shown for a part of the WAIS3
instrument (Figure 3, below).
Subject ID#
Figure 3. Web-based data entry form for the WAIS3 Digit Span subtest
The administration of the handedness instrument involves checking one or more boxes on the paper form. This
is replicated using check boxes in the same locations in the web-based form shown in Figure 1. As an example
of automatic scoring, the web-based data entry form for the digit span test (Figure 2) computes the item and
total scores from the entered trial scores.
Data Submission Tracking
Dr. VanMeter’s role in another database system was specifically with regards to MRI data submission. The NIH
Pediatric Database (NIHPD) project collected both longitudinal imaging and neuropsychological data over 6
years on 500 children ranging in age from several months to 18 years old at baseline. The purpose of the project
is the development of a number of age specific imaging templates and to examine normal brain development.
Data were collected at seven research centers across the country and transmitted to a central coordinating center
at the Montreal Neurological Institute (MNI). All of the data collected is made available to both the center
collecting the data and the central coordinating center. Dr. VanMeter developed the image transfer database and
protocols for this project. This database architecture was designed to be scalable in nature.
Transmission and storage of the image data collected at each of the sites participating in the NIHPD project
required the development of an imaging database, specialized data transfer software, and quality control tools.
Of particular concern is the proper tracking of the data with respect to accurate subject and MRI sequence
identification. Data collected in this project are transferred from the MRI scanner to an onsite workstation that
has a fully functional version of the database and image retrieval server. The data are reviewed locally for
image quality using a set of tools designed and developed by Dr. VanMeter. All imaging studies are then
transferred to a central data integrity center managed at Georgetown University by Dr. VanMeter. The data are
passed through automated integrity checks such as identify missing slices, violations of the established imaging
protocols, and mislabeled data before being sent onto the central coordinating center.
The data transfer system developed for this project uses the international image transfer and storage protocol
DICOM, Digital Imaging and Communications in Medicine (ACR/NEMA 2001). The DICOM standard
includes complete specification of attributes of the data including fields for subject ID, imaging parameters, and
sequence labels. The DICOM standard also defines a rigorous protocol for the transfer and storage of medical
image data from a variety of sources including MRI. An advantage of the DICOM image transfer protocol over
simpler methods such as FTP is that the receiving application must communicate any type of transfer failure to
the sender. Furthermore, when such failures are detected, the sender is configured to automatically attempt to
re-send the images that failed.
This system has been in place since 2002 and has been used to successfully transfer over 10,000 image
acquisitions to date with only 150 cases requiring manual intervention. The stability and reliability of this
system has exceeded the original requirements. In addition, laborious hand identification and manual transfer of
the data has been eliminated increasing the integrity and accuracy of the data submitted through this process.
Development of new databases, systems of data entry/flow, and data submission procedures
A comprehensive data management system will be developed to support the activities of the GMRC. This
development process will leverage the work put into the database system already in place in the Center for
Molecular and Functional Imaging (CFMI). The data entry system will employ a web-based system allowing
multiple users to enter and access data at the same time. The GMRC will ensure that data access will be
provided in a secured manner and limited to GMRC, CC, and the appropriate MRI center personnel.
1. Database Management System Development
a. Database development plan
The development of a new database system for the GMRC will utilize a spiral model of software development
(Boehm 1988). This type of development process includes all the components of any development project
including the gathering of requirements based upon an assessment of the needs of the GMRC, development of a
design document, implementation, testing and deployment. In contrast to a waterfall development process, all of
these components are performed in multiple phases of the life of the project (Hogan, Smith et al. 2002). This
type of development process incorporates a mixture of top-down and bottom-up design philosophies. The
iterative development process of the spiral model has also been shown to lead to a flexible design that provides
early and frequent feedback to the GMRC through rapid prototyping (Thayer and Dorfman 1997).
b. Design
The design process of the new database system
will follow an Object Oriented Design (OOD)
approach, which encapsulates the data of the
system in terms of objects, the relationships
between objects, and the operations that can be
performed on them (Coad and Yourdan 1991).
The advantages of the OOD method include
modularity, data abstraction, and encapsulation.
Given the high-degree of overlap between the
requirements of this database system and the one
developed by Dr. VanMeter for the CFMI, a
significant portion of the design of this system can
be reused for the GMRC’s system.
Figure 4. Model-View-Controller architecture (from
To provide access to users across multiple sites
http://java.sun.com)
(GUMC, CC, and the appropriate CCN) a webbased system called the Model-View-Controller (MVC) architecture will be used. An MVC architecture
segregates the functionality of the system between objects used to maintain data and those used to present data
(Cavaness 2002). The advantages of this system are a) that it can be used to minimize the degree of interaction
between objects and b) its usefulness in multi-tier enterprise web-based applications. The model captures the
logic of the processes the system implements and manages access to the data of the system. In addition, the
functionality of the system is encapsulated in the model. The view presents data from the model to the user
while sending keyboard and mouse inputs to the controller. Finally, the controller defines the application’s
behavior by interpreting the user’s inputs, sending requests to the model, and selecting between views for
presentation. An overview of this architecture is shown in Figure 4.
The model and controller parts of the system will be implemented using Java Beans (reusable software
components) running on a Tomcat web-server. Tomcat is an open-source server that provides an
implementation of Java Servlets and JavaServer Pages (JSP) (Harrison and McFarland 2002). JSP technology is
a mechanism for creating dynamic web-based applications that are platform-independent. One of the major
features of JSP technology is the ability of the programmer to encapsulate chunks Java-code in HTML-style
tags that can be used by a web-page developer in the creation of the views. The JSP custom-tag libraries
communicate with the model’s Java Beans, which provide encapsulated access to the database in objects that
abstract the details of the organization of the database from the rest of the application. The controller’s Java
Beans acts much like a ‘traffic cop’, directing user requests to the appropriate handlers. The system has handlers
to process requests from the user such as updating the data displayed in a form, getting data from the database,
and generating reports.
The view portion of the system will be implemented using JSP-based web pages to present the various
neuropsychological tests, input forms, and menus to the user. These JSP pages are initially developed using
Microsoft FrontPage as a simple means of laying out the HTML portions of the web page. The JSP custom-tag
libraries greatly simplify the task of implementing the interface by allowing the web-page developer to add
commands in the form HTML-style tags to validate inputs, get data from the database, and insert data into the
database. This simplification in the development of input forms provides a low-cost and efficient means of
adding new forms to the system since in many cases a research assistant can handle this work.
To ensure secure access to the data over the internet, the system will use 128-bit SSL (Secure Socket Layer)
encryption. In addition, a system of access control lists will be implemented. Users of the system will be
categorized with respect to the rights they will have to the read data from the database, enter data into the
database, and modify data in the database. These control lists will be used to limit data access based on need.
c. Database Implementation
The database will be implemented using an open-source SQL database engine called MySQL
(http://www.mysql.com). This database engine is a low cost (free) implementation of ANSI standard SQL-99
and is used by corporations such as Google and Cisco Systems as well as governmental institutions such as
NASA. In addition, there are a number of connectors for MySQL including two for Java.
The database structure itself is built around two major types of data: subjects and forms (see Figure 5). The
subject data is the basic demographic data related to the subjects. For security reasons the subject data is split
into two parts: public and private. The “public” data, which includes the subject ID, age, gender, and ethnicity,
is accessible (under password protection) by all users of the system. The “private” data, which includes
sensitive data such as name, telephone number, and address, is only accessible when users access this separate
database directly on the server housed at GUMC. When a subject is administered a test, the values are recorded
in a TestInstance and ItemValue tables. These tables can be thought of as instantiations of the Form and Item
tables. In addition, the test information is linked to the subject through the Session table, which is used to track
results longitudinally.
Form
FormID
ParentFormID
Name
Class
JSPView
JSPScope
Description
TestInstance
TestInstanceID
SessionID
FormID
TestDate
Comment
LastModified
ModifiedBy
Description
Session
SessionID
SubjectID
SessionDateTime
SessionNumber
LastModified
ModifiedBy
Item
ItemID
FormItem
FormID
ItemID
SequenceNum
ItemValue
TestInstanceID
ItemID
ItemStrValue
ItemIntValue
ItemDblValue
LastModified
ModifiedBy
SubjectPublic
SubjectID
Gender
Ethnicity
AgeInYears
LastModified
ModifiedBy
DisplayText
DisplayImageURL
RangeMin
RangeMax
DefaultValue
Type
ValidInputSet
HTMLFormat
SubjectPrivate
SubjectID
FirstName
MiddleName
LastName
DOB
Gender
Ethnicity
Address1
Address2
City
State
Zip
Country
…
LastModified
ModifiedBy
Figure 5. GMRC database architecture.
The forms portion of the database includes the various neuropsychological instruments, neuropathology reports,
and the clinical visit data. Rather than developing a unique database table for each specific form, the CFMI
database uses a generic form table that includes references to any number of generic question/score item table
entries and information needed to display the form on the screen. The item table entries are designed to allow
for multiple data types including a string/text input, integers, and double-precision real numbers. Another
feature of the item table entries is the ability to include data for automatic range checking as well as enumerated
input set checking. By abstracting the various forms into a handful of general tables, it is possible to very
quickly add new forms to the system. In the CFMI database project, this greatly reduced the amount of time that
was required of a senior database programmer and led to the swift development of the data entry form.
d. Development timeline
It will be possible to significantly compress the development cycle of the GMRC’s database system by
leveraging Dr. VanMeter’s efforts put into development and testing of the database system used in CFMI.
While the CFMI database structure will serve as model for the development of the GMRC databases, it will still
be necessary to develop GMRC-specific requirements and design documents as well as new data collection
forms (DCFs) needed by the GMRC. In addition, it will be necessary to add a mechanism for importing the data
in the existing MDP Access database into the new database system. The various phases of the development
process for the new database system are shown in Figure 6.
PHASE 2 \ 1 Month
Review requirements document
Deploy Prototype v0.1
Beta testing of Prototype v0.1
Revise CFMI DB system design to
satisfy GMRC requirements
PHASE 3 \ 1 Month
PHASE 1 \ 1 Month
Requirements refinement
Install hardware on GMRC server
Install Linux on GMRC server
Develop Prototype1
Port CFMI database to GMRC server
Add GMRC Clinical forms to new DB
Develop SQL and Tcl scripts to convert
existing MDP DB into new DB
Deliver revised requirements document
Review requirements and results of beta testing
Deliver GMRC DB system design
Develop Prototype v0.2
Modify interface based on beta testing
Incorporate all of the CC’s MDS
Incorporate Neuropathology BRC report
PHASE 4 \ 2 Months
PHASE 6 \ 1 Month
Deliver final version of requirements and design
documents
Finalize training manuals
Deploy Production version of system
Train RA’s on data entry procedures
Train researchers on data access methods
Review requirements and design documents
and results of beta testing
Deploy Prototype v0.2
Beta testing of Prototype v0.2
Begin development of training manuals
Develop Prototype3
Modify interface base on beta testing
Add SQL procedures extracting data for CC
Incorporate remaining testing instruments
PHASE 5 \ 2 Months
Deliver revised requirements and design documents
Deploy Prototype v0.3
Beta testing of Prototype v0.3
Beta test CC extraction procedures
Develop Production version of system
Incorporate changes arising from beta testing
Update training manuals to include final set of
testing instruments and changes in interface
Figure 6. GMRC database system development plan
2. Data Flow
a. Collection of raw data
The flow of data to the GMRC is shown in Figure 7. The following standard operating procedure will be used
for data collection:
1) The DCFs (document control forms) filled out by the MRI technician at the time of each scan will be
entered in a secure web data entry form host at the GMRC.
2) Data in the DCFs will be checked for completeness at the time of entry. The computer will generate
a list of corrections identified during data entry, including missing, out of range or unclear values.
3) An automated tracking system will begin a process of checking for all of the scans in the study.
Incomplete or missing scans will be trigger an email notification to MRI site’s point of contact if the
scan data has not been received within three business days of submission of the DCF.
4) Data missing or incomplete after 5 business days will trigger a review by the GMRC to track down
the source of the problems.
5) An automated notification system will be developed to provide email reminders to the appropriate
clinical site to submit schedule the individual’s follow-up scan. Notification will be based on the
date of their previous MRI visit.
6) MRI data will be sent to the GMRC using a Java-based program that will first remove all PHI
information from the DICOM headers of the scans. The person sending the data will be prompted to
provide the appropriate subject ID. Standardized MRI protocol scan names will be used at each site
and used to identify the different parts of each study.
7) Once the scan data has been successfully uploaded to the GMRC a trained research assistant will
check the quality of each scan. The results of the QC will be entered into the GMRC database.
8) Scans not passing QC will trigger a notification to the MRI site that the subject should be rescanned.
9) Automated segmentation analyses will be run on each complete dataset and the results will be
automatically compared to expected norms.
MRI 1
Fax
Forms
MRI 2
GMRC
Form archival
Data QC and validation
QC report generation
MRI 3
Web based
data entry &
QC
Web based DCF
and MRI scan
submission
GMRC Database
System
Status
report
CC
Figure 7. Data flow diagram
b. Data entry and validation
All data collected by the GMRC will pass through a QC procedure to ensure it can be used in the volumetric
analyses. Automated segmentation analyses will be run on each complete dataset and the results will be
automatically compared to expected norms. Scans that are 2 standard deviations outside of the norm will be
flagged and checked by the GMRC.
Data Management
The central mission of the GMRC is to ensure the integrity of the data collected and stored in the GMRC’s
database system. This includes ensuring the quality of the data being entered into the system, preventing
unauthorized access, tracking data modification, handling missing values, as well as providing regular data
backups.
1. Data Integrity
a. Automated data validation
As data are entered into the database, range checks and missing value identification will be performed providing
the data entry clerks with immediate feedback. Missing data will be coded in the database by filling the field
with all 9s as prescribed by the CC’s data submission manual. When missing, out-of-range, unreadable, or
logically inconsistent values are identified, the data entry clerk will flag the items on the paper form. The codirectors of the GMRC will conduct regular data validation meetings to resolve any issues so identified
In addition to the feedback mechanism built into the data entry system, the GMRC will develop computerized
error checking routines that will routinely validate the accumulated data. These routines will be designed to
detect logical errors and outliers in the data; they will run automatically and generate reports that will be
emailed to the co-directors.
b. Privacy/Security
The GMRC is extremely security conscious and will limit access to the GMRC’s database system through a
system of access controls and data encryption. All data submission and access to the GMRC databases will be
conducted through a secure web-based system. To ensure that the data is not compromised, SSL (secure socket
layer) 128-bit encryption will be used to encrypt all web sessions. In addition, each GMRC member will be
provided with a login and password. Users’ access to data will be limited based on their role in the SPRINTMIND project. In particular, members of the GMRC will have data entry and modification privileges. Data
access and retrieval will be determined by the policies setup by the CC.
c. Data modification tracking
Any change to database records after the initial entry will automatically be recorded in an audit trail fields. Each
audit trail field will include the name of the field being changed, the date of the change and the name of the
person making the change. This system of tracking changes to data will enable us to construct a change history
for any record.
d. Missing values
Missing data will initially be flagged by the data entry clerks on the original DCF and submitted to the codirectors of the GMRC. The flagged DCFs will be verified and stored in the database by filling the field with all
9s as prescribed by the CC’s data submission manual. In addition, one of the tables in the database will be used
to track the reasons for missing data linking the record and field with missing data with the missing code. A
routine will be developed to extract the information for the CC’s “Missing Data Alert for Documentation” form.
e. Data backups
While the use of RAID level-5 provides a high level of fault tolerance since the data is redundantly stored and
striped across multiple disk drives, all of the data in the database system will be backed up on a regular basis.
The current backup system will be expanded with additional tape drives.
The following backup regime will be implemented:
 Daily incremental backups will be used to backup any data changed since the last full backup.
 Weekly full backups with a set of 4 tapes used on a rotating schedule.
 Monthly full backups with a set of 3 tapes used on a rotating schedule.


Quarterly full backups with a set of 4 tapes used on a rotating schedule. These tapes will be stored
off-site to protect the data against theft, or loss, such as fire, flood, or earthquake.
Yearly full backups will be stored offsite. The tapes used for these backups will not be rotated.
With this backup system in place it will be possible to recover data going up to four months back in time using
the onsite tapes. In addition, the offsite tapes will provide the ability to recover data even further back in time.
f. Quality assurance
Quality assurance is a major focus of the GMRC. To that end, the GMRC will conduct regular data quality
reviews to determine if corrections are required. These meetings will be used to resolve issues identified from
the double-entry discrepancy check procedures, DCFs flagged for missing values, and other problems identified
by the automated data checking routines. Minutes of these meetings will be used to document the resolution of
each of the identified problems. In addition, the automated data modification tracking system built into the
database system provides an audit trail of who made the changes to the data and when.
Additionally, the GMRC plans to institute an audit procedure to identify potential inconsistencies between the
hardcopy DCFs and the database records. Quarterly audits of the database will be implemented by randomly
extracting 5% of the records in the database for "manual" comparisons. Detected inconsistencies will be
recorded and investigated in a systematic manner.
1. Procedures Manual
Because the GMRC will design and produce the DCFs, the GMRC will also be responsible for ensuring that all
MRI site personnel that will be working on the SPRINT-MIND project are fully versed with the entry forms,
the MRI data submission program, and the scan tracking system. We will develop procedure manuals on the
use of the DCFs, scan data submission, and scan tracking procedures. Additional, procedures for tracking MRI
scanner quality over time will be setup. The manual will include detailed, step-by-step instructions for filling in
all relevant DCFs, what constitutes an acceptable scan, and the scan tracking and notification system. This
procedures manual will also include instructions on personnel requirements, training procedures for new
personnel, and QC procedures. A virtual training program will also be developed to train new users with a set of
quizzes to test their knowledge.
2. Training and Certification of Morphometric Raters
The following procedures will be used to train new raters and certify all personnel involved in the
morphometric rating process. A standard set of images will be used for this purpose. With the help of a
neuroradiologist the gold-standard of the boundaries of all of the structures being measured in the SPRINTMIND study will be identified. A minimum of 10 separate scans will be included in the gold-standard dataset.
The GMRC director will validate each rater by examining their inter- and intra-rater reliability in comparison to
the established gold-standard. A minimum of 0.95 and 0.97 will be required for inter-rater and intra-rater
reliability respectively. All raters will be retested annually. A list of both GUMC and HUH personnel and their
certification status on all registry instruments will be maintained in the GMRC regulatory binder at each clinical
site.
7. MRI Reading Center Infrastructure
The GMRC computer and networking systems will be built upon the existing resources in CFMI’s 3.0T MRI
facility. These resources include a 750 square feet equipment room that has a dedicated Liebert Challenger 3000
temperature and humidity controlled equipment room; a new IBM TS3310 tape library (3576-L5B) with 2
LTO4 drives with a total backup capacity of 24Tb; Arkeia 7.0 Network Backup data backup/retrieval software
that includes a federated data protection architecture; and 3 72”x29”x36” rack-mount computer cabinets fully
protected by 5 APC Smart UPS 5000 uninterrupted power supplies that connect to master computer to initiate
orderly shutdown of all equipment in the event of sustained power failure.
Other major equipment in CFMI includes a 40-node Linux compute cluster, which has 20 TB of attached disk
storage. Every node is equipped with standard software for statistical analysis of fMRI and structural MRI data
as well as visualization utilizing software packages such as SPM, FSL, and MEDx. All computers are linked via
an area 1000 base-T Ethernet Local Area Network (LAN). All of the CFMI computer equipment is connected to
the internet via Georgetown University’s Internet2 connection. The CFMI LAN is protected by Cisco PIX 525E
firewall, which has 2x1Gbit ports for LAN and WAN traffic, and 100Mbit interface for "demilitarized" zone
(DMZ), that hosts the CFMI web server. It provides security from outside threats and supports constant virtual
private network (VPN) connectivity for remote access by CFMI users.
Expansion of the CFMI Infrastructure for SPRINT-MIND
The CFMI infrastructure provides an excellent base upon which to build the infrastructure that will be needed
for the MRI Reading Center at a relatively low cost. To meet these needs and to maintain a strict separation
between the CFMI data operations and those of the GMRC the following additional equipment will need to be
purchased: 5 IBM x3350 compute node servers, 1 IBM x3350 backup server, a 3Com network switch, an IBM
LTO TS3100 Tape Library, APC Smart UPS, and software licenses for Red Hat Enterprise Linux and Arkeia
network backup software. These additional hardware components will be housed in the CFMI equipment rack
in a separate computer rack. Access to these systems will be strictly limited to those conducting GMRC related
work.
GMRC Training Facilities
Other resources available at Georgetown University that will be useful for conducting training include a Video
Teleconferencing facility. This service is made possible via the University’s phone system, a conventional TV,
and a PolyCom View Station. There are two rooms on campus that have been specially wired for
teleconferencing.
Georgetown University has several conference rooms available for departmental functions. The Research
Auditorium located in the Research Building can be reserved to accommodate larger workshops and seminars.
The Research Auditorium houses state of the art equipment and technical experts to assist with functions. In
addition to individual conference rooms located throughout the University campus, The Leavey Conference
Center houses an on-campus hotel for out of the area participants along with catering services and several
interconnected conference rooms.
References:
ACR/NEMA (2001). Digital Imaging and Communications in Medicine (DICOM). Rosslyn (VA), National
Electrical Manufacturers Association (NEMA).
Boehm, B. W. (1988). "A Spiral Model of Software Development and Enhancement." IEEE Computer 21(5):
61 - 72.
Cavaness, C. (2002). Programming Jakarta Struts. New York, O'Reilly & Associates.
Coad, P. and E. Yourdan (1991). Object Oriented Design. New York, Prentice Hall.
Duda, R. O. and P. E. Hart (2000). Pattern Classification, Wiley-Interscience.
Galaburda, A. M., G. D. Rosen, et al. (1987). "Individual variability in cortical organization: its relationship to
brain laterality and implications for function." Neuropsychologia 28: 529–46.
Harrison, P. and I. McFarland (2002). Mastering Tomcat Development. New York, John Wiley & Sons.
Hogan, J. M., G. Smith, et al. (2002). The Real World Software Process. Ninth Asia-Pacific Software
Engineering Conference, Gold Coast, Australia, IEEE Computer Society.
Liang, L.-C., K. Rehm, et al. (2005). Automatic Extraction of Left and Right Hemispheres from MRI Brain
Volumes Using the Graph-cuts Algorithm. Organization of Human Brain Mapping, Toronto, Canada.
Smith, S. M. (2002). "Fast robust automated brain extraction." Human Brain Mapping 17(3): 143-55.
Thayer, R. H. and M. Dorfman (1997). Software Requirements Engineering, Wiley-IEEE Computer Society
Press.
VanMeter, J. W. (1993). Segmentation and restoration of magnetic resonance images using material mixture
models. Computer Science. Hanover, NH, Dartmouth College. Ph.D.: 166.
VanMeter, J. W. and P. A. Sandon (1992). Identification of cortex in magnetic resonance images. SPIE
Conference on Biomedical Image Processing and 3-D Microscopy. San Jose, CA.
Witelson, S. F. (1989). "Hand and sex differences in the isthmus and genu of the human corpus callosum. A
postmortem morphological study." Brain 112 ( Pt 3): 799-835.
Woods, R. P., S. T. Grafton, et al. (1998). "Automated image registration: I. General methods and intrasubject,
intramodality validation." J Comput Assist Tomogr 22(1): 139-52.
Zhang, Y., M. Brady, et al. (2001). "Segmentation of brain MR images through a hidden Markov random field
model and the expectation-maximization algorithm." IEEE Trans Med Imaging 20(1): 45-57.