Download Development of a Data Warehouse for Cancer Diagnosis

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
Vineetha Appidi et al
Int J Engg Techsci Vol 5(3) 2014, 22 - 26
Development of a Data Warehouse for Cancer
Diagnosis andTreatment Decision Support
Vineetha Appidi#1, Dr Syed Umar#2, Sushma Vallamkonda#3
#1
Student, Department of Electronics and computer Engineering, K L University, India.
#2
Assc.Professor, ECM dept., K L University, Vaddeswaram, Guntur, AP
#2
Student , Department of Electronics and computer Engineering, K L University, India
[email protected],[email protected]
Abstract—Data Warehouse is the most reliable technology
used by the company for planning, forecastingand
management. Critical business management data was
contained in several unrelated anddisconnected databases,
both internally managed and fromexternal sources. Client was
unableto view the data from an integrated viewpoint. The data
warehousing is one of the besttechnique to integrate data. This
paper presents the Influenza (Flu) diseases specific
datawarehouse architecture for health care. This could be
used by the database administrator orexecutive manager,
doctors, nurses, other staff members of the health care. Health
care datawarehouse is mostly important to integrate different
data format from different data source. Allinformation about
patient including their medical test reports are store in the
database, theexecutive manager needs to access those data and
make a report. By seeing the report, thedoctor takes action.
Keywords—Data warehouse, cancer, health support
I. INTRODUCTION
There are relatively few institutions that have developed
clinical data warehouses, containingpatient data from the
point of care. Because of the various care practices, data
types anddefinitions, and the perceived incompleteness of
clinical information systems, the development ofa clinical
data warehouse is a challenge [1]. The world largest, fast
growing and mostinformation’s are available in the health
care industry. In health care industry data may berecorded
as doctor’s name, patient’s name, patient’s record,
individual patient pathology report,physician order entry,
doctor’s decision support system, medicine. Most of the
health care centersare still stand along, they are not
communicating with other health care center, and they
don’tshare their documents with others. But now in our
design health care data warehouse, doctors can also share
patient record to others, they can take decision from others.
Most of the health carecenter design their patient record in
individual group but in my design group are created based
ondisease i.e. same disease parent are in the same group.
This paper is based on Lymphoma (Cancer) disease, the
cost of treatment, treatment using drug andvaccine, risk
factor, death rate.
Lymphoma is one of the diseases that uses ClinicalDSS
fordiagnosis and treatment recommendation,and has
achieved significant results. The ImageGuided Decision
Support System developed bySiemens Corporate Research
is one of those ClinicalDSS to assist pathologists in
IJETS | www.techsciencepub.com/ijets
discriminating amongmalignant lymphomas and chronic
lymphocyticleukemia directly from microscopic specimens
[1].There are 30 to 40 types of Lymphoma that have
beendiscovered so far. The disease was first discovered
inthe 1960s, however, its clinical management andsupport
has yet to be established completely due tothe reasons
below:
1. It is not easy to do early detection since thecauses of the
majority of Lymphoma are stillunknown.
2. It is sometimes hard to diagnose Lymphomabecause the
disease has imprecise symptoms andsimilar syndrome
as other medical problems (forexample infections and
"Cat Scratch Fever") [2].
3. It is difficult to determine an optimal treatmentfor the
patient, because there are manyuncertainties associated
with the selection of atreatment for Lymphoma.
Figure 1: Cancers that are increasing: 1975 – 2006Source:
SEER program, National Cancer Institute
Since the early 1970s, incident rates for Lymphomacancer,
especially Non-Hodgkin's lymphoma, havenearly doubled.
As shown in Figure 1, according tothe “Cancer Trends
Progress Report – 2007” [3]published by U.S. National
Cancer Institute,Lymphoma is one of the cancers on the rise
andrequire greater efforts at control. The report had
alsoestimated 71,380 new cases of Lymphoma will occur in
22
Vineetha Appidi et al
the
year
2007,
including
8,190
cases
of
Hodgkin’sLymphoma and 63,190 cases of NonHodgkin'sLymphoma.Hence, the development of a clinical
DSS and datawarehouse to improve the diagnosis and
treatmentdecision making process of Lymphoma, so as
make itto be more precise and accurate, is very important
anduseful.
II. REVIEW
The concept of "data warehousing" arose in mid1980s with
the intention to support huge informationanalysis and
management reporting. Data warehousewas defined as a
“subject-oriented, integrated, timevariantand non-volatile
collection of data in supportof management’s decision
making process” by W. HInmon, father of the data
warehouse, in year 1990 [4].Today, data warehouses are not
only deployedextensively in banking and finance, consumer
goodsand
retail
distribution
and
demandbasedmanufacturing, it has also became a hot topic in
noncommercial
sector,
mainly
in
medical
fields,government, military services, education and
researchcommunity etc.
The growingly application of clinical informationsystem
and electronic medical records (EMR) inmedical field in the
past few years, has led to theevolution of clinical data
warehouses.There are quite a few clinical data
warehousescurrently exceed 150 terabytes in size. In a
researchdone by Deloitte Healthcare College in year 2006,
asshown in Figure 2, based on the rate of clinical
datagrowth every year, It is estimated that the firstpetabyte
database (i.e. 1000 terabytes) might appearby the year 2010
[5].
Figure 2: Rate of clinical data growth every yearSource:
Deloitte Healthcare College 2006
Clinical data warehouse is normally built to
validateassumptions and to discover trends on large amount
ofpatient
data
[6].
It
contains
not
only
alphanumericadministrative data, but also images or signals
such asX-ray pictures, echography, electrocardiogram,
etc…[7].
Torben Bach Pedersen and Christian S. Jensen
alsoidentified in their “Research Issues in Clinical
DataWarehousing” that clinical data warehouse needs
tosupport for “complex-data modeling features,advanced
temporal support, advanced classificationstructures,
IJETS | www.techsciencepub.com/ijets
Int J Engg Techsci Vol 5(3) 2014, 22 - 26
continuously valued data, dimensionallyreduced data, and
the integration of very complexdata” [8]. Hence, clinical
data warehouse requiresadvanced data modeling than
conventionalmultidimensional
data
warehousing
approaches.
III. DATA WAREHOUSE DEVELOPMENT
Like all other Information system developments, thedesign
and implementation of a data warehousedevelopment use
methodologies. There are numerousdata warehouse
development frameworks and bestpractice methodologies,
the appropriate approach to adata warehouse development
varies depending on theobjectives it desires to achieve and
organization
itsupports.Some
adopt
MIDEA,
a
multidimensional
datawarehouse
development
methodology based on amultidimensional data model [10];
some followconventional Software Development Life
Cyclemodel;
others
apply
tailor-made
business
processoriented development strategy to specific
datawarehouse projects [11]. This paper proposes a clearcutsequential 5 stages approach for the developmentof the
Lymphoma specific data warehouse, asillustrated in
Figure 4.
Figure 4: Data warehouse development methodology
4.1 Business Analysis
In the business analysis stage, the following twoaspects are
studied in detail to produce a high-levelview of the desired
data warehouse, together with itsgoals and acceptance
criteria:
* Business process analysis –the existing processis studied
and analyzed.
* Business requirement analysis – to collect andunderstand
business requirements which state thebusiness value of the
data warehouse and drivethe architecture of the data
warehouse.
23
Vineetha Appidi et al
4.1.1 Business process analysis
The use case diagram as shown in Figure 5 depicts ahigh
level overview of system functionality providedby a typical
Clinical DSS that uses a clinical datawarehouse for
Lymphoma diagnosis [12].
Int J Engg Techsci Vol 5(3) 2014, 22 - 26
Activity 2.1: Determine the type of Lymphoma
First, the doctor will perform an excisional biopsy, bytaking
tissue sample from the affected organs forfurther
examination. If other organs e.g. skin, brain,stomach are
affected, a biopsy from these organs isalso required. Next,
pathologist will inspect thephysical appearance of the
biopsy sample under amicroscope, or identify the special
molecules on thecancer cells using markets, that result to
determiningthe type of Lymphoma (Figure 8).
Figure5: Clinical DSS use case diagram
There are 4 actors in the process; they are Patient,Doctor,
Pathologist, and Oncologist. The interactionsbetween the
actors and activities are demonstrated inthe following
paragraphs.
Activity 1: Seek consultation
Patient
seeks
consultation
from
doctor
when
certainprolonged symptoms are noticed, e.g. lumps in
neck,armpits, groin,weight loss, fever, loss of
appetite,itchiness all over body,excessive sweating
etc(Figure 6).
Figure 8: Activity 2.1: Determine type of Lymphoma
Figure 6: Activity 1 – Seek consultation
Activity 2: Perform diagnosis
Upon confirming the warning signs and symptoms
ofLymphoma, the doctor together with pathologist
willperform a series of tests on the patient to determinethe
type, stage and prognosis of the disease (Figure 7).
Activity 2.2: Determine the stage of LymphomaOnce the
exact type of lymphoma is determined, thedoctor will
proceed with a number of tests to see howadvanced the
cancer is and how far it has spread.These tests include
blood tests, chest x-ray, bonemarrow aspiration, PET scan,
CT scan, MRI scan ofthe chest/abdomen/pelvis, lumbar
puncture etc(Figure 8).
Activity 2.3: Determine the prognosis of LymphomaNext,
the doctor will look into other factors todetermine the
prognosis of the disease.
Activity 3: Propose treatment
When the above investigations are completed, thedoctor
and oncologist will counsel the patientregarding the best
treatment options available, basedon the type and the stage
of the disease and someprognostic factors (Figure 9).
There are four main types of treatment normally usedto cure
Lymphomas:
* Chemotherapy – Using drugs as infusions intothe
patient’s veins.
* Radiotherapy – Using high energy rays over theaffected
areas.
* Biological therapy or antibiotic therapy – Usingdrugs like
Rituximab to target special moleculeson the cancer
cells.
* Bone marrow or stem cell transplant – Usinghigh doses of
chemotherapy or radiation to killall cancer cells while
saving the bone marrowwith transplantation of marrow
or stem cells.
Figure 7: Activity 2 – Perform diagnosis
IJETS | www.techsciencepub.com/ijets
24
Vineetha Appidi et al
Besides proposing the treatment options to patient,the
doctor also need to explain to the patient about therisks of
taking the particular treatment and chance ofrecovery. Upon
endorsement by the patient, thedoctor will schedule the
treatments for the patient.
Int J Engg Techsci Vol 5(3) 2014, 22 - 26
4.3 Physical Development
This stage involves:
* Data warehouse creation – create the datawarehouse using
the MySQL RDBMS based onthe architecture designed.
* Data provisioning – extracting and loading datafrom
source into data warehouse.
* Data cleansing – cleansing and transforming rawdata to
ensure data quality and integrity.
4.4 Implementation and Deployment
Next, the cleansed and transformed data are migratedonto
an operational data warehouse.
Figure 9: Activity 3: Propose treatment of Lymphoma
4.2 Architecture Design
Figure 5 illustrates the overall architecture of theclinical
data warehouse. Data is imported fromseveral sources and
transformed within a stagingdatabase before it is integrated
and stored in theproduction data warehouse for further
analysis.
4.4.1 Data Warehouse creation
The clinical data warehouse is created in 4 steps:
1. Migration of database schema
A new schema called “cdw*production” is created.The
CREATE statements of all the finalized tables inthe
staging schema will be captured, modified andexecuted
in the production schema.
2. Verification of database schema
Next, the production data warehouse is refreshed andall
table structures are verified against the stagingdatabase
to ensure that the correct schema has beenmigrated.
3. Migration of production data
All data in the staging database are exported to
tabdelimited text files (.txt) and subsequently loaded
toproduction data warehouse using the LOAD
DATAINFILE command.
4. Verification of production data
Finally, all migrated data are verified to ensure theyare
of certain “qualities” (i.e. Accuracy,Completeness,
Consistency, Timeliness, and Validity)for data mining.
Reports are generated to compareproduction data
against the staging database.
4.4.2 Data Query
The ultimate objective of this stage is to put the
datawarehouse into operation, whereby data can bequeried
and feed into a data mining software forprocessing and
reporting.The data should be able to be queried and
presentedin a multidimensional view.
4.4 Evaluation
The finalized production data will be queried andfeed into a
data mining software for benchmarking,reporting and
analysis. The data will be evaluatedagainst some acceptance
criteria, such as applicability,novelty, understandability,
representative, provability,validity etc. Research on how the
evaluation of thedeveloped data warehouse and how it
would supportthe data mining process will be further
discussed in aseparate paper.
V. CONCLUSIONS
Several challenges were encountered during the
datawarehouse development:
1. Data warehouse development requires specializedskills
that are very different from a typicaldatabase development.
2. Data cleansing plays the most critical role in adata
warehouse development. It is timeconsumingyet must be
Figure 5: Clinical data warehouse architecture
IJETS | www.techsciencepub.com/ijets
25
Vineetha Appidi et al
handled thoroughly andwith full effort, to avoid no garbage
in garbageout.
3. In an ideal solution, data should be extractedfrom
physical medical files, such as patientmedical records,
blood tests, urine test results, xray results, CT scan results
etc, or retrieved directly from the operational medical
system.However, due to data security issue, real
worldmedical data cannot be seized easily. Hence,
thisproject uses data from the Internet or mocked-updata.
However, the proposed methodology andarchitecture
framework can be applied to anoperational Clinical data
warehouse development.Additionally, the structure of the
data warehouseis designed in such a flexible way that it can
beextended to other diseases.
Having said that, this paper has reached it goals topropose
and develop a methodology and architecturefor a specific
disease clinical data warehouse. Thepotential benefits of the
developed data warehouseare:
1. The data warehouse can be used as a
fundamentalbuilding block of a Clinical DSS to
improve thequality of Lymphoma diagnosis and
treatmentdecision support.
2. The data warehouse can also be used with datamining
tools such as Integral Solution’sClementine, Thinking
Machines’ Darwin,Cognos’ Scenario, IBM’s Intelligent
Miner, SAS’Enterprise Miner, Data Mind’s
DataCrunche,open source tools such as MIDAS,
Tysonsoftware etc.
3. The structure of the data warehouse is designedin such a
flexible way that it can be extended toother diseases.
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
Int J Engg Techsci Vol 5(3) 2014, 22 - 26
[13] Teh Ying Wah, Ng Hooi Peng, Ching Sue Hok,Development of
Specific Disease DataWarehouse for Developing Content
fromGeneral Guide for Hypertension Screening,Referral and Follow
Up, 7th WSEASInternational Conference on APPLIEDCOMPUTER
SCIENCE, Venice, Italy,November 21-23, 2007.
REFERENCES
Dorin Comaniciu1, Peter Meer, David J. Foran,Image-guided
decision support system forpathology. Machine Vision and
Applications,Report No. 11, August 16, 1999.
Harvey Simon and David Zieve, Hodgkin'sLymphoma, In-Depth
From A.D.A.M. The NewYork Time. 25 June 2008.
The “Cancer Trends Progress Report – 2007”,U.S. National Cancer
Institute. 2007.
W. H Inmon, R. D.Hackethorn, Using the DataWarehouse. New
York: John.
Wiley & Sons(1994) Sami Benmechiche, CarolChouinard, Ross
Christen, Richard Kupcunas,Deepak Goyal, Ajit Kumar, “Clinical
data isgold. Data warehouse are Ft Knox”, Using IT toturn your data
into a strategic information asset,Deloitte, 2006.
Anne Tchounikine, MaryvonneMiquel, AndréFlory, Information
Warehouse for MedicalResearch, Data Warehousing and
KnowledgeDiscovery, Volume 2114/2001, Springer Berlin/
Heidelberg, January 01, 2001.
Anne-Muriel Arigon1 , MaryvonneMiquel,Anne Tchounikine,
Multimedia datawarehouses: a multiversion model and amedical
application,
Multimedia
Tools
andApplications,
Springer
Netherlands, Volume 35,Number 1 / October, 2007.
Torben Bach Pedersen, Christian S. Jense,Research Issues in Clinical
Data Warehousing,Proceedings of SSDBM’98, July 1-3 1998.
Laura Hadley, Developing a Data WarehouseArchitecture, 2002.
Jose
Maria
Cavero,
Mario
Piattini,
EsperanzaMarcos,
Multidimensional modeling usingMIDEA, 5th WSES CSCC, 3rd
WSES MCP,3rd WSES MCME International Conferences,2001.
Maris Klimavicus, Data warehousedevelopment with EPC,
Proceedings of the 5thWSEAS Int. Conf. on DATA
NETWORKS,COMMUNICATIONS & COMPUTERS,Bucharest,
Romania, October 16-17, 2006.
Detailed Guide: Lymphoma, Non-HodgkinType, How is NonHodgkins Lymphoma staged,American Cancer Society Inc., August
29, 2007Retrieved from http://www.cancer.org/.
IJETS | www.techsciencepub.com/ijets
26