Download SunRise Clinical Manager

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Neuroinformatics wikipedia , lookup

Transcript
The Integrated Data Repository (IDR)
Data Discovery and Data Request Lifecycle
Authors: Maggie Massary1, Ketty Mobed2, Mark Weiner1, Marco
Casale3, Prakash Lakshminarayanan2, John Holmes1, Kevin Haynes1,
Hillari Allen2, Paul Norris2, Davera Gabriel4, Rob Wynden2
1University
2University
3University
4University
of
of
of
of
Pennsylvania – Health System
California, San Francisco – Academic Research Systems
Rochester - Medical Center
California, Davis - Clinical & Translational Science Center
August 12, 2008
5/6/2017
Page 1 of 19
1. Introduction
In efforts to integrate disparate data within the Health System, an initiative is underway to build
an Integrated Data Repository (IDR) to collectively hold and integrate clinical and biomedical
research data, which also includes economic, administrative, and public health information. This
integrated data store will serve the needs of both the Health System, and Clinical/Translational
Research. The available and integrated data can then be transformed into useful information
that could be structured and managed with ease. The IDR will be an active environment that will
allow continuous access to the latest detailed data and built-in analysis to drive optimal clinical
practice and biomedical research activities. This product will be a compilation of internal and
external data fields drawn from protected medical electronic systems and publically available
data files aggregated into one single secure environment.
1.1 Purpose
This document will outline the interaction between biomedical researcher or clinician and the
IDR system. Our primary focus is to both summarize and present in detail the different types of
biomedical research pathways where the IDR can serve as the primary source of data and the
utility of the IDR for clinical bedside practice. The IDR environment may also serve as a secure
storage site of data to the investigator which can be joined with data already residing within
IDR.
The purpose of this document is to give the reader a detailed understanding of the goals of the
project, the customer requirements, and underlying expectations.
2. Overview
2.1 Project Perspective
Since the plan for this repository is to be the primary source for clinical and biomedical data, it is
imperative to understand how data is currently being extracted, interpreted, and used by
biomedical researchers and clinicians.
It is important to define these interactions for the following reasons:
a. To properly vet the functional requirements of the IDR based on the business process of
our research customers.
b. To better identify areas of improvement in IDR technology. This may be required in order
to most effectively service the needs of the research community.
c. To identify all data and document artifacts which are typically generated as a result of
interaction with the IDR. These artifacts must be identified and a determination must be
made regarding which of those artifacts may contain Personal Health Information (PHI)
to therefore require additional security and compliance for their proper handling.
d. To identify the publication process as it relates to IDR. This is necessary when sharing
information with collaborators, for instance on patient recruitment, to provide an
oversight when necessary, and so that data may be sent for publication on websites and
journals. It is pertinent for these activities to be properly defined in order for the
security and confidentiality of PHI information to be maintained. This is also critical for
protecting the institutions’ rights with regard to any new Intellectual Property generated
and the possible patenting of that information.
5/6/2017
Page 2 of 19
2.2 Project Properties and Objectives
With the introduction of IDR technology we expect that the typical business process followed by
a researcher will be fundamentally altered as a result of the availability of this new and powerful
technology.
Specifically,
a. At the very beginning of the research enterprise it is common for a researcher to set out
to validate certain hypothesized correlations between data elements collected from
disparate sources (see Figure 1.). Until now it has been very difficult to quickly and
effectively look for the possible existence of these ad-hoc correlations. However with the
advent of the IDR it will be possible for a biomedical investigator to gain access to this
comprehensive data and very quickly establish the relevance of a hypothesis or a
biomedical research idea.1
b. Quick interaction with IDR will help with inclusion/exclusion research criteria in a cohort
discovery.
c. Correlations between data elements don’t really prove anything. However they can be
used to bolster the case that a specific line of inquiry may have merit. That support will
typically be used to argue for grant funding and IRB approval.
d. During this process, access to custom user interfaces (UIs) for Time Domain Query,
Genetic Epidemiology, etc. (depending on the researcher’s Domain) to refine these
speculative hypotheses would be extremely useful.
e. Prior to the conduct of a study and during the process of writing the clinical protocol (a
standard operating procedure (SOP)), a study design phase is typically conducted.
The IDR as a general purpose data mining tool is extremely helpful to do large numbers
of ad-hoc queries to aid in the study design. Also, during the study design phase the
researcher may typically require access to a statistical software package such as SAS,
STATA or SPSS; thus, secure access can also be provided for data extraction and analysis
in the pre-research phase.
f. Once the study has been designed, a cohort of patients must be identified. IDR would be
used to generate that list. This functionality of an IDR is expected to save a large
amount of money and time during the conduct of a clinical trial or other biomedical
research. However, these de-identified cohorts will typically be considered to be a form
of Intellectual Property (IP) which should not be shared with commercial entities without
prior consent from the Technology Transfer (Data Warehouse) Office1.
g. The researcher provides the documentation given by the IRB to the Business Analyst
associated with the IDR as proof of sufficient access privileges. With IRB approval for
access to PHI, an identified set of patients will be obtained and the process of
recruitment of patients into the trial will begin. The IDR will provide the researcher with
a more accurate list of patients.2 By supplying the researcher with PHI data we are
assuming some of the responsibility for maintaining a secure environment where that
researcher may safely conduct patient recruitment
h. Physicians who contact their patients for study participation must be willing to follow the
prescribed clinical protocol and must sign an Investigator approval form (a contract
which usually contains a non-disclosure agreement, patent protection and release of
liability) and they must also be supplied with the necessary training, documentation and
forms (a case report form).
Please note that even access to the de-identified data by the investigator may require a waiver by the IRB. Some
sites have allowed unrestricted access to the IDR by all faculty, however it is clear that such practices may not
typically be acceptable by the IRB for such a large and powerful warehouse of information.
1
Although it is expected that the process of recruitment will not be altered by the IDR, the fact that the PHI data is
derived from the IDR will have a significant impact on the researcher’s interaction with these systems. For
example, a medical center will provide the information as requested under IRB approval but the establishment of
such a large warehouse of information will require the very secure handling of that PHI both within the IDR itself
and most importantly by the researcher that is supplied that data by the IDR.
2
5/6/2017
Page 3 of 19
i.
All signed Investigator Approval contracts, must be securely stored (as they may be
considered a form of PHI).
Figure 1. The Current State of Biomedical Research Flow
Medical
Research
Idea
Medical Knowledge
Data Discovery Phase
Chart Review
Clinical disparate electronic systems
Other Data Sources
Study/Cohort Design
IRB Consent Form
YES
Data Exists
NO
END
IRB / Other
Approval
NO
END
Other
YES
Data Extraction Phase
Other Data
Sources
Case Report
Patient
Interview
Electronic
Clinical
System
Chart
Review
Analysis (Study Development) Phase
Data
Analysis
Data
Integration
Presentation
IRB Closeout
Study ends
3. Current and Projected Data Collection and Processing Stages &
Anticipated Challenges
There are basically 5 different methods of collecting clinical practice and biomedical research
data in common use today.
3.1 Clinical Data Collection Methods:
a. Data entry of the information from a paper source
b. Data extracted from individual electronic clinical systems such as electronic medical
records (EMR) or computerized physician order entry (CPOE)
c. Data from a 3rd party such as a lab service
d. Capture of data from electronic equipment present at the investigator site
e. Direct electronic data entry by a patient (a patient diary).
3.2 Types of data used in clinical and biomedical research:
5/6/2017
Page 4 of 19
a. Case Report Form Data – data entered manually by the patient or by the
clinician/investigator’s staff into a predefined form
1. Blank – a “blank” is a case report form (CRF) with all of its associated instructions
and field constraints.
2. eCRF – a portable document format (PDF) transform of the data entry screens
used to enter the CRF
3. Archival eCRF – a PDF and an associated computer audit trial that shows who has
modified the data, when and what the old values were before it was altered.
4. Sub-CRF – A CRF that was sent to the FDA because the patient died had a serious
adverse event or withdrew from the study.
5. CRF Annotation – a document that describes the association between the fields in
a tabular report to the fields on the CRF where the data was first entered.
b. Patient Recorded Diaries (a kind of ePRO)
c. Electronic Data Sources
1. Data collected or compiled in clinical / research / public health systems.
2. Information collected by the patient personally (patient portals)
3. Data that is entered directly into the computer system, such as EMR without first
writing it down onto paper
4. Data that is first entered on paper and then electronically converted to a computer
format, either through scanning or manual entry.
d. Source Data Employed Frequently in Biomedical Research
1. Documents
2. Hospital records
3. Clinical patient charts
4. Lab notes
5. Informal memoranda
6. Patient diaries
7. Evaluation checklists
8. Pharmacy dispensing info
9. Data from automated systems
10. Electronic representations of paper records (that have been verified against the
source paper and later certified)
11. Photographs
12. X-rays
13. Subject records within the pharmacy
14. Medical department notes from clinical environments
15. Hospital specimen banks, i.e. pathology department, blood banks
16. Local, state or federal public health reports and data banks, i.e. cancer registry,
death index
3.3 Type of Research, Business Processes and the Vision for IDR
a. Biomedical Research
Biomedical research basically encompasses four distinct research arenas: 1) clinical
research, 2) population research, 3) animal research and 4) bench research. Research in all
four arenas are fundamentally necessary to reach the common goal of improving individual
and population health. A researcher may utilize more than one research arena for a specific
research project. The availability of IDR will consequently be of great help to the researcher
to streamline the data gathering and analysis process.
5/6/2017
Page 5 of 19
Figure 2. illustrates that work and business flows among the four outlined research arenas
are very similar.
5/6/2017
Page 6 of 19
Figure 3. presents a simplified generic flow of transactions which the biomedical researcher
needs to undertake from the pre-research phase to post-phase research. The possible
interactions of a researcher to IDR (IDR) are also illustrated.
Use Case Examples for Biomedical Research
The use case examples below illustrate typical requests and needs by biomedical researchers in
the four different described research arenas and the required interactions to IDR.
Use Cases for Clinical Research (CR)
CR-Study 1:
Title: Specific Toxicities, Adverse Events, and Hypoglycemia of Various Diabetes Therapies
Description: The purpose of this retrospective clinical study is to investigate the therapeutic effect of
how sugar levels are controlled and to quantify blood sugar levels for each drug. The study will
5/6/2017
Page 7 of 19
further explore specific clinical outcomes and adverse events associated with diabetes therapies,
such as heart failure, renal disease, liver dysfunction, polyps, and weight gain.
Assumption: Sufficient amount patient information is available in the clinical data base(s) to provide
the ability to do a power analysis and come up with conclusive evidence to support the hypotheses.
Data Plan: To aggregate patients with diabetes, based on one unique patient identifier (i.e. MRN),
into various defined cohorts, including aggregations based on combinations of number, frequency
and duration of diagnosis codes, relevant lab parameters, and the intensity and duration of
medications.
Research Requirements: Sufficiently complete data is available in IDR to perform the primary and
associated secondary analyses. These data should include effect modifiers and outcomes, but also
elements, such as co-morbidities and demographics that could be used as stratifiers to adjust for
confounding.
Conclusion: The pulled and aggregated dataset is suitable for the proposed analysis. The dataset
needs to be exported into a structured format that can be studied and plugged into the analytical
software*.
Alternative Conclusion: The aggregated dataset is not adequate and suitable for the proposed
analysis. The principal investigator may need to redefine the data search before a new dataset can
be pulled and aggregated.
CR-Study 2:
Title: The Impact of Avandia® and Other Similar Glycemic Controlling Medications on Cases of
Myocardial Infarction (MI)
Description: This is a comparative study to determine if there is an impact on cases of MI and using
different types of glycemic controlling drugs such as Avandia, frequently used to control diabetes. A
clinical investigator would like to search the available clinical data and compare the use of Avandia
and similar drugs on the occurrence of myocardial infarctions (MI).
Assumption: Electronic health record data, clinical trials data and medication data are available on
IDR.
Data Plan: The Clinical Investigator submits the request specifying all drug and disease outcome
codes under investigation, plus any additional data points needed for the planned aggregated data
analyses.
Research Requirements: All approvals are in place. Sufficiently complete medication data and
accurate disease status data are available in IDR to perform the comparative analyses, including the
availability of de-identified demographic information and co-morbidities for stratifaction and
adjustment purposes. The extracted data needs to be converted and delivered in a format specified
by the investigator.*
Conclusion: The Clinical Investigator obtains an aggregated de-identified data set regarding clinical
trials or any patient encounters on the use of Avandia and similar medications with any association
related to MI. The data set is composed of variables from the IDR which are derived from electronic
health record data, clinical trials data and medication data.
Alternate Conclusion: Some of the requested therapeutic data are not available at all or incomplete.
Based on the specific output results the investigator re-assesses and re-defines the original scientific
research plan.
CR-Study 3:
Title: A Randomized Clinical Trial to Compare the Therapeutic Effects of Combination Drugs and
Singly Dispensed Drugs in Post-Coronary Angioplasty Patients
Description: The intent of this study is to compare differences in therapeutic delivery and effects in
randomly allocated post-coronary angioplasty patients with high cholesterol and high blood pressure
to the 1) regularly prescribed combination drug Caduet®, or 2) separately Norvasc® and Lipitor®,
or 3) generic amlodipine and atorvastatin.
Assumption: Hospital data for patients who have just undergone coronary angioplasty are readily
identifiable through the IDR.
Data Plan: Over a 12-month period, patients who are either scheduled or just have undergone
coronary angioplasty will be identified through IDR and the requested patient information is
forwarded to the investigator. Data to be received should include date and time of surgery, patient
identifiers such as DOB, contact information, surgeon name and other specified demographic data.
Research Requirements: Date and surgery information is readily identifiable and extractable prior to,
or immediately after the coronary angioplasty procedure using the clinic IDR. The clinical researcher
submits the specified patient data request with proof of all required approvals. The data will be
delivered to the investigator in the pre-specified form.*
Conclusion: The specified data is extracted, merged can be made available to the researcher.
5/6/2017
Page 8 of 19
Alternate Conclusion 1: The researcher was not able to provide all the required documentation. The
data extraction will be put on hold until the researcher has been able to provide required
information.
Alternate Conclusion 2: The requested information cannot be pulled in the prescribed timely manner.
Refer this problem back to the investigator.
Use Cases for Population Research (PR)
PR-Study 1:
Title: Prenatal Care, Delivery Procedure, Length of Post-Partum Stay, Health Insurance, and
Demographic Characteristics at Time of Delivery in Three In-system Hospitals
Description: This de-identified retrospective observational study will compare prenatal care
frequency, type of delivery procedure, duration of post-partum hospital stay, demographic and
health insurance characteristics for all women that have delivered in three in-system hospitals in the
past 5 years.
Assumption: Information of frequency of prenatal care, complete delivery, health insurance and
demographic information is available for all women who have delivered at the three in-system
hospital sites in the IDR.
Data Plan: De-identified, but individual information on prenatal care frequency, delivery method,
duration of post-partum hospital stay, health insurance, and demographic characteristics need to be
extracted and merged from clinical databases available in the system’s IDR. The merged and deidentified data will be analyzed by the investigator controlling for influential confounders.
Research Requirements: Prenatal visit frequencies and complete delivery (discharge) and insurance
information for patients are available from clinic charts in IDR for the three in-system hospitals for
the last 5 years. The data can be extracted and merged into a specified dataset type.*
Conclusion: All requested 5-year data for the 3 in-system hospitals exist and are extractable into
one data file to be analyzed by the investigator.
Alternate Conclusion: Prenatal visit history is not available for many of the women and only partial
5-year data exists for one of the in-system hospitals, but complete 3-year data exists for all
hospitals under study. Refer back to the investigator and wait for further directives.
PR-Study 2:
Title: Do Age and Race/Ethnicity Matter in 3-Yr. Outcomes of Different Types of Implemented
Prostate Cancer Therapy on Quality of Life?
Description: This is a prospective observational study to investigate how prostate cancer
therapeutics impact quality of life in regard to age and race/ethnicity. Over a 12-month period, 250
men newly diagnosed with and treated for prostate cancer will be recruited into this quality of life
study to be followed-up for 36 months after the initial treatment procedure. Required data for initial
recruitment will include Early Case Ascertainment (ECA) data from the regional or hospital cancer
registry, which will include relevant physician and patient information needed to recruit prostate
cancer patients.
Assumption: ECA of prostate cancer and relevant physician and pathology data is captured and
available on IDR.
Data Plan: MD and patient information and pathology confirmation on newly diagnosed prostate
cancer patients located on IDR, are merged and forwarded to investigator on a continuous basis for
the 12-month recruitment period or shorter, depending on when target recruitment numbers are
reached.
Research Requirements: The investigator has full IRB study approval; early identification of prostate
cancer cases, contact information on diagnosing physician and patient, and definitive laboratory
results are available and can be extracted from IDR and merged.
Conclusion: The required MD, patient and laboratory information can be extracted in a timely
manner and merged and forwarded on a regular basis to the investigator for the next 12 months.
Alternate Conclusion: ECA is not readily available at the research hospital, but only through the
State’s Cancer Registry. It will need to be explored if access to the state cancer registry is
permissible and what data elements are available if use is permitted. Forward this information back
to investigator waiting for further directives.
Use Cases for Animal Research (AR)
AR-Study 1:
Title: Propagation of Human Brain Tumor Cell Lines in Mice and Other Rodents
5/6/2017
Page 9 of 19
Description: The intent of this de-identified research is to find the best (animal) rodent model in
which glioblastoma cells (brain tumor cells) can be propagated and maintained for further in vitro
research purposes. Freshly autopsied human glioblastoma cells will be identified for through the
clinical pathology database available in IDR. The available and appropriate pathology tissues will
then be retrieved and extracted. Glioblastoma cells will then will be injected into different rodent
species (i.e. mice, rats, hamsters) and monitored, measured and evaluated over time for normal
growth, pathology and animal and cell survival for future harvesting purposes.
Assumption: The clinical database available in IDR contains detailed reports of the type, date of
biopsy and location of banked pathological tissues, including biopsied pathology specimen from brain
tumor patients.
Data Plan: The investigator submits a detailed data request to locate newly autopsied unaltered
pathological glioblastoma tissues using IDR. The deliverable data also should include date of biopsy
and the specific brain location biopsied.
Research Requirements: The investigator has all the required approvals. The extracted information
specifies where the requested pathological tissues are banked and also contains contact information
for the banking location.
Conclusion: The IDR can link specific pathology and tissue bank requests to existing collected
pathology and tissue samples and the location where the samples have been processed and stored.
Alternate Conclusion: The pathology department does not receive and store fresh biopsied cancer
tissue samples. However newly obtained pathology tissue samples are recorded and kept in another
investigator’s laboratory. The IDR sends the original investigator this information so the research
protocol can be modified according to resulting investigators’ agreements.
AR-Study 2:
Title: Using a Mouse Model for the Evaluation of Pathogenesis and Immunity to Specific Influenza
Virus Strains Isolated from Humans
Description: The intent of this study is to evaluate growth, pathogenesis and immunity of specific
influenza virus strains isolated from humans in mouse models. Identified human viral isolates from
influenza infected patients will be inoculated intra-nasally into groups of homogeneous laboratory
mice and their course of illness and herd immunity to the illness will be documented and evaluated.
Assumption: Immediate (time-sensitive) information of human infectious disease occurrences and
laboratory documentation on infectious disease isolates are captured in the clinical database(s) and
available in IDR.
Data Plan: The investigator gets notified by IDR who, when and where specified new infected
influenza tissues or isolates are available.
Research Requirements: The study is time-sensitive. The sampled virally infected tissues or isolates
must still be viable and intact to be prepared for inoculation into mice. The investigator (with all
approvals in place) will need to be notified through IDR who the diagnosing clinician was and when
and where the isolated and confirmed viral specimens are being stored.
Conclusion: The requested information is available in IDR within the set time frame and can be
forwarded to the investigator, who in turn receives the infectious tissue samples to prepare and
inoculate the laboratory mice.
Alternate Conclusion: The available information in IDR does not meet the time constraints necessary
for viable viral tissues to be harvested and prepared. This limitation needs to be explained to the
investigator and further research directives need to be awaited.
Use Cases for Bench Research (BR)
BR-Study 1:
Title: In vitro Development of a New and Powerful Multi-Drug Regimen to Treat Multidrug-Resistant
Tuberculosis (MDR-TB)
Description: The purpose of this multi-phased study is to firstly develop a powerful new multi-drug
regimen in vitro to treat MDR-TB. If successful, the next step will be to test this new drug regimen in
vivo (animal models). For this study MDR mycobacterium tuberculosis (mt) bacterial colonies
harvested from infected human biologic samples (i.e. sputum) are required. The investigator needs
specimen storage and access information and detailed information on de-identified patient treatment
and specific bacterial drug susceptibility properties for the collected and banked biologic samples
available in either clinical charts or State Health Department documentation.
Assumption: IDR has the capacity to store and make accessible this type of required information to
the investigator.
5/6/2017
Page 10 of 19
Data Plan: A clinic- or institution-wide record search for banked and available MDR-mt colonies
within the IDR will generate a list of labs (or locations) where the specified organisms are housed.
Specific treatment outcomes for each banked specimen are available as well. This information is
compiled and forwarded on to the investigator.*
Research Requirements: The investigator has all the appropriate approvals. The IDR contains all the
requested information and can be shared with the investigator. The investigator has the appropriate
means to receive, store and maintain bacterial samples.
Conclusion: The requested search items are known and available in IDR. The information is compiled
and sent to the investigator.
Alternate Conclusion: All the required information is not available through IDR. However, a list of
local and state investigators, who have either in the past or are currently studying, storing or
maintaining mt colonies, is available through IDR. Refer these results back to the initial investigator
and wait for further directives.
BR-Study 2:
Title: Comparison of Genetic Ancestry and Genetic Markers of Small Cell and Non-Small Cell Lung
Cancer Patients
Description: The scope of the study involves using de-identified frozen blood samples, collected
during another investigator’s population-based lung cancer study and banked at that investigator’s
laboratory. Since the specific lung cancer diagnosis of the study patients have been histologically
identified and documented, the present investigator requests only those blood samples that have
histologically been identified as either small-cell carcinoma or non-small cell carcinoma. Genetic
ancestry and marker studies will be carried out on the DNA components of the stored blood samples
and compared between the two different histologies.
Assumption: Diagnostic and histological information for the stored blood samples and laboratory
contact information is available on IDR and study and lab id numbers can be linked.
Data Plan: In the IDR where the previous investigator’s data may be available, the study id number
and histology code of those patients who have been diagnosed and coded as either small or nonsmall cell lung carcinoma will be extracted and linked to the lab database to identify lab numbers
linked to the stored blood samples. The aggregated data set contains patient study id number, lab
number and histology code. Based on the lab id number, the correct frozen blood specimen can be
pulled and genetically analyzed.
Research Requirements: The investigator has all the appropriate approvals. The IDR contains the
previous investigator’s study outcome and laboratory information.
Conclusion: Study id and lab id are linkable and the histology code can be extracted from the study
data file. An aggregated list with three data points is procured for the investigator.
Alternate Conclusion: The study information which contains patient study id and lung cancer
diagnosis and histology codes are available on IDR. However, the laboratory data is not available in
the IDR. Study id numbers and histology codes for the patients under study can be pulled and
transmitted to the investigator. A short message informs the investigator that the lab data is not
available in IDR. Await further directives.
* A data dictionary will be provided to define the variables for analysis.
b. Clinical Patient Care – “Bedside” Practice
The IDR is foreseen to be utilized for another purpose as well. Clinicians, who are administering
patient care at the ‘bedside’, would be able to use the IDR ‘in-real’ time to pull up relevant
information in regard to general patient characteristics and treatment profiles for specific
diseases and conditions.
Use Case for patient care-bedside practice:
Title: Local Real-time Antibiotogram
Description: A clinician is examining a new inpatient with fever, and other signs of infection localizing
to a specific body system. Cultures are drawn, but the clinician wants to initiate an empiric antibiotic
regimen that is consistent with the patient’s age, allergies, and renal function as well as typical
organisms associated with that system and antibiotic resistance patterns that are specific to that part
of the hospital based on recent patterns.
Cohort Requirement: Patients who are clinically and demographically similar to the current patient.
5/6/2017
Page 11 of 19
Data Requirement: To access current patient characteristics and basic decision support to avoid
presentation of regimens to which the patient is allergic, or at doses that are inconsistent with the
patient’s age and renal function; decision support to suggest possible infectious etiologies given
patient’s symptoms; access to comparable microbiology culture data on prior patients who had
similar presentations.
Conclusion: Generated dataset shows the final diagnoses of patients with similar clinical
presentations as the current patient, as well as the summarized results of prior patient culture,
including organisms isolated and resistance patterns, the antibiotic regimens chosen for these other
patients and the overall clinical course of these prior patients. This data should be integrated with
the current patients clinical status to avoid highlighting antibiotic regimens that would be
inappropriate for the current patient based on allergies or renal function.
3.4 Potential Challenges of IDR Utilization to the Investigator
Several challenges to the investigator and gaps using the IDR are apparent:
a. Data Accessibility – The investigator may have knowledge of other relevant data sources
pertinent for the anticipated research but are not housed in IDR,
b. Duplicate Data with Differing Values – Some available data, such as date of birth or date
of death, may be captured in more than one database housed in IDR; some values of this
duplicate data may be different.
c. Data Collection Protocols – To write up their own protocol an investigator may need to
know how exactly other IDR housed data were collected.
d. Data Quality – The cleanliness, reliability and interoperability of data and data sources
included in IDR need to be addressed and ascertained
e. IRB Issues – All data residing in IDR need to be backed by institute-specific IRB
approvals.
f. Data Ownership / Intellectual Property Issues – The need to have data ownership issues
be resolved and documented for all IDR available data
5/6/2017
Page 12 of 19
4. Capturing User Requests
Figure 4. Research Data Request Lifecycle
Form Includes:
Medical Knowledge
Detail Requestor’s data needs
Detail specifications
All Underlying assumptions
End-user expectations
Upon data request completions:
Report SQL code
Reusable code bits
Development notes
Any lessons learned
Data Discovery Interface
Self Service - User Interface
(a web-based researchers’ portal)
Request form
Study Development
Refine question
Data request
Identify Sources
What’s available?
Preliminary Specification
Document
Data Extraction Phase
Determine the
format of the
extract
Includes:
List available data
List data currently unavailable
Data source (s)
System
Instrument
Study procedure
Description of Study (DOS) document
Vocabulary / Data Dictionary Maps
Each report needs to contain a header page
that contains:
Sources,
Query parameters, Cautions,
Disclaimers, etc.).
Report delivery option to the user (on
demand, scheduled, self service, auto email,
etc.)
Query
Refine query
Review data
&
Data Governance
Document any
data/system
issues
What did we
learn?
What more do we
need to know?
Post-Development
Document
Add:
Data Extraction & Discovery
what we learned
what we changed
Updated data request document (if needed)
Display designs) employed “reusable bits”
5/6/2017
Study Conclusion,
Report
Includes:
Data used
Data source (s)
Study classification
(a la Kravtiz taxonomy)
Page 13 of 19
5/6/2017
Page 14 of 19
Figure 4. illustrates the general architecture and flow of information in the IDR from data discovery to
the finish line of a study.
4.1 The Data Discovery User Interface (UI)
The Data Discovery UI (Figure 5.) will facilitate detailed views of the IDR-available data and selfselection of specific variables needed by the researcher.
This screen will provide the following features:
a. Prompt the principal investigator (PI) to name data discovery query and save it or to recall a
saved query
b. Prompt the PI to search available data bases, select specific data criteria and find more
explanations or characteristics for specified data points
c. Prompt the PI to indicate which of the data sources are to be used
d. Give the PI an overview of the selected variable list
Figure 5. Data Discovery UI
5/6/2017
Page 15 of 19
4.2 IDR Request User Interface
This screen will log user requests to capture a new request or allow the PI to access a saved query or
previous data request. This screen will describe who and for what purpose IDR access is requested.
The ‘IDR Request’ interface (Fig. 6) will provide the following features:
a. To prompt PI to select the request as ‘new’ with a relevant project title or to choose from the list
logged in from previous encounters.
b. To prompt the name and contact information of the PI or PI designated contact person
(imported from User Background UI where applicable).
c. To prompt for a brief description of the proposed research
d. To prompt if the study already has been IRB approved, including iploading of relevant
documents
Figure 6. IDR Request UI
5/6/2017
Page 16 of 19
4.3 Data Request User Interface
The Data Request UI (Fig. 7) will summarize the anticipated research and specify in detail the data
criteria and parameters. This screen will
a. Prompt the PI to select research type and to select and upload required IRB / IACUC (for animal
research) information and documentation
b. Prompt the name and contact information of PI or PI designated requester (imported from User
Background UI where applicable).
c. Prompt for a detailed research plan or the research abstract to be downloaded
d. Prompt the PI to narrow criteria search to and designate variable parameters
e. Prompt the requester to choose the report format.
Figure 7. Data Request UI
5/6/2017
Page 17 of 19
4.4 Other Projected User Interfaces
Several other User Interfaces will also be incorporated into the working schema of IDR.
a. The Background User Information UI is where initially the PI enters his/her
professional/institutional affiliation and contact information.
b. The Mapping UI – here the PI can request for specified mapping of requested or extraneous data
source.
c. The Formatting UI – here the PI can request for specific available formatting
d. The Data Status Request UI – the PI can check on his/her specific data request status (based on
Req. ID #) and also will be able to retrieve the processed data set/query (defined drop down
list) at this site
e. The BA only UI is only available to the Business Analyst where all relevant queries are
performed and the status of each request is kept updated. Analysts may also enter comments,
and lessons learned, as well as edit or look up existing SQL code.
5. Benefits of IDR
1. To allow multiple custom changes to the exclusion criteria within a cohort discovery
2. Expedited cohort discovery process
3. Integrated data from all clinical systems and research data stores
4. other…
6. Rules Based Ontology Mapping
Once a researcher has determined what data elements they require, a request for access to that data
must be approved by the IRB. Once access has been approved the researcher will be given access to a
view of the IDR data requested.
The researcher will have access to both the direct source data, and when necessary, to translated
(ontology mapped) data housed in the IDR. That data will have been translated by a rules based
system and be housed within “harvest tables” within the data warehouse.
6.1 Types of electronic clinical data that will typically be captured:
a. Data derived from other sources within the IDR.
b. Data entered by the researcher’s staff that may need to be joined with data within the
IDR.
c. Data that may typically flow into the IDR from a source CTMS environment.
d. Data obtained from a CRO (Contract Research Organization)
6.2 Some of the Roles
a. Sponsor – the organization that is sponsoring the researcher (The Researcher)
b. EDC System Owner – the organization (or company) that owns the electronic data
capture system
c. CRO – Contract Research Organization
d. Clinical Research Associate or Project Coordinator/Manager– oversees the clinical
research project and verifies the data collected against source documents.
e. Principal Investigator – The person that has IRB permission to conduct the research and
has overall responsibility for the study
5/6/2017
Page 18 of 19
References: (TO BE FORMATTED)
Grimes David, Schulz Kenneth. An overview of clinical research: the lay of the land.
The Lancet 2002, Vol 359; 57-61
Clinical Trial Electronic Data Capture Task Group, PhRMA Biostatistics and Data Management
Technical Group. Electronic Clinical Data Capture. EDS Position Paper Revision 1. May
2005
Ash Joan, Anderson Nicholas, Tarczy-Hornoch Peter. People and Organizational Issues in
Research Systems Implementation. JAMIA, 2008, Vol 15
APPENDIX
• IDR – Integrated Data Repository
• IS – Information Services
• ETL – Extract Transform and Load
• EMPI – Enterprise Master Patient Index
• HIPAA – The Health Insurance Portability and Accountability Act
o Enacted by the US Congress to establish, amongst other items, a
national standard for protecting the security and privacy of health
information. (http://www.hhs.gov/ocr/hipaa/)
• HL7 – Health Level 7
o Standard used for information transportation amongst disparate IT
systems.
o “HL7 is an international community of healthcare subject matter
experts and information scientists collaborating to create standards for
the exchange, management and integration of electronic healthcare
information. HL7 promotes the use of such standards within and
among healthcare organizations to increase the effectiveness and
efficiency of healthcare delivery for the benefit of all.”
(http://www.hl7.org/)
• ODBC - Open Database Connectivity. (Reference: http://en.wikipedia.org/wiki/ODBC)
• VPN - Virtual Private Network (Reference: http://en.wikipedia.org/wiki/ODBC)
• i2b2 – Informatics for Integrating Biology and the Bedside
• CTMS - Clinical Trial Management System
5/6/2017
Page 19 of 19