Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Vineetha Appidi et al Int J Engg Techsci Vol 5(3) 2014, 22 - 26 Development of a Data Warehouse for Cancer Diagnosis andTreatment Decision Support Vineetha Appidi#1, Dr Syed Umar#2, Sushma Vallamkonda#3 #1 Student, Department of Electronics and computer Engineering, K L University, India. #2 Assc.Professor, ECM dept., K L University, Vaddeswaram, Guntur, AP #2 Student , Department of Electronics and computer Engineering, K L University, India [email protected],[email protected] Abstract—Data Warehouse is the most reliable technology used by the company for planning, forecastingand management. Critical business management data was contained in several unrelated anddisconnected databases, both internally managed and fromexternal sources. Client was unableto view the data from an integrated viewpoint. The data warehousing is one of the besttechnique to integrate data. This paper presents the Influenza (Flu) diseases specific datawarehouse architecture for health care. This could be used by the database administrator orexecutive manager, doctors, nurses, other staff members of the health care. Health care datawarehouse is mostly important to integrate different data format from different data source. Allinformation about patient including their medical test reports are store in the database, theexecutive manager needs to access those data and make a report. By seeing the report, thedoctor takes action. Keywords—Data warehouse, cancer, health support I. INTRODUCTION There are relatively few institutions that have developed clinical data warehouses, containingpatient data from the point of care. Because of the various care practices, data types anddefinitions, and the perceived incompleteness of clinical information systems, the development ofa clinical data warehouse is a challenge [1]. The world largest, fast growing and mostinformation’s are available in the health care industry. In health care industry data may berecorded as doctor’s name, patient’s name, patient’s record, individual patient pathology report,physician order entry, doctor’s decision support system, medicine. Most of the health care centersare still stand along, they are not communicating with other health care center, and they don’tshare their documents with others. But now in our design health care data warehouse, doctors can also share patient record to others, they can take decision from others. Most of the health carecenter design their patient record in individual group but in my design group are created based ondisease i.e. same disease parent are in the same group. This paper is based on Lymphoma (Cancer) disease, the cost of treatment, treatment using drug andvaccine, risk factor, death rate. Lymphoma is one of the diseases that uses ClinicalDSS fordiagnosis and treatment recommendation,and has achieved significant results. The ImageGuided Decision Support System developed bySiemens Corporate Research is one of those ClinicalDSS to assist pathologists in IJETS | www.techsciencepub.com/ijets discriminating amongmalignant lymphomas and chronic lymphocyticleukemia directly from microscopic specimens [1].There are 30 to 40 types of Lymphoma that have beendiscovered so far. The disease was first discovered inthe 1960s, however, its clinical management andsupport has yet to be established completely due tothe reasons below: 1. It is not easy to do early detection since thecauses of the majority of Lymphoma are stillunknown. 2. It is sometimes hard to diagnose Lymphomabecause the disease has imprecise symptoms andsimilar syndrome as other medical problems (forexample infections and "Cat Scratch Fever") [2]. 3. It is difficult to determine an optimal treatmentfor the patient, because there are manyuncertainties associated with the selection of atreatment for Lymphoma. Figure 1: Cancers that are increasing: 1975 – 2006Source: SEER program, National Cancer Institute Since the early 1970s, incident rates for Lymphomacancer, especially Non-Hodgkin's lymphoma, havenearly doubled. As shown in Figure 1, according tothe “Cancer Trends Progress Report – 2007” [3]published by U.S. National Cancer Institute,Lymphoma is one of the cancers on the rise andrequire greater efforts at control. The report had alsoestimated 71,380 new cases of Lymphoma will occur in 22 Vineetha Appidi et al the year 2007, including 8,190 cases of Hodgkin’sLymphoma and 63,190 cases of NonHodgkin'sLymphoma.Hence, the development of a clinical DSS and datawarehouse to improve the diagnosis and treatmentdecision making process of Lymphoma, so as make itto be more precise and accurate, is very important anduseful. II. REVIEW The concept of "data warehousing" arose in mid1980s with the intention to support huge informationanalysis and management reporting. Data warehousewas defined as a “subject-oriented, integrated, timevariantand non-volatile collection of data in supportof management’s decision making process” by W. HInmon, father of the data warehouse, in year 1990 [4].Today, data warehouses are not only deployedextensively in banking and finance, consumer goodsand retail distribution and demandbasedmanufacturing, it has also became a hot topic in noncommercial sector, mainly in medical fields,government, military services, education and researchcommunity etc. The growingly application of clinical informationsystem and electronic medical records (EMR) inmedical field in the past few years, has led to theevolution of clinical data warehouses.There are quite a few clinical data warehousescurrently exceed 150 terabytes in size. In a researchdone by Deloitte Healthcare College in year 2006, asshown in Figure 2, based on the rate of clinical datagrowth every year, It is estimated that the firstpetabyte database (i.e. 1000 terabytes) might appearby the year 2010 [5]. Figure 2: Rate of clinical data growth every yearSource: Deloitte Healthcare College 2006 Clinical data warehouse is normally built to validateassumptions and to discover trends on large amount ofpatient data [6]. It contains not only alphanumericadministrative data, but also images or signals such asX-ray pictures, echography, electrocardiogram, etc…[7]. Torben Bach Pedersen and Christian S. Jensen alsoidentified in their “Research Issues in Clinical DataWarehousing” that clinical data warehouse needs tosupport for “complex-data modeling features,advanced temporal support, advanced classificationstructures, IJETS | www.techsciencepub.com/ijets Int J Engg Techsci Vol 5(3) 2014, 22 - 26 continuously valued data, dimensionallyreduced data, and the integration of very complexdata” [8]. Hence, clinical data warehouse requiresadvanced data modeling than conventionalmultidimensional data warehousing approaches. III. DATA WAREHOUSE DEVELOPMENT Like all other Information system developments, thedesign and implementation of a data warehousedevelopment use methodologies. There are numerousdata warehouse development frameworks and bestpractice methodologies, the appropriate approach to adata warehouse development varies depending on theobjectives it desires to achieve and organization itsupports.Some adopt MIDEA, a multidimensional datawarehouse development methodology based on amultidimensional data model [10]; some followconventional Software Development Life Cyclemodel; others apply tailor-made business processoriented development strategy to specific datawarehouse projects [11]. This paper proposes a clearcutsequential 5 stages approach for the developmentof the Lymphoma specific data warehouse, asillustrated in Figure 4. Figure 4: Data warehouse development methodology 4.1 Business Analysis In the business analysis stage, the following twoaspects are studied in detail to produce a high-levelview of the desired data warehouse, together with itsgoals and acceptance criteria: * Business process analysis –the existing processis studied and analyzed. * Business requirement analysis – to collect andunderstand business requirements which state thebusiness value of the data warehouse and drivethe architecture of the data warehouse. 23 Vineetha Appidi et al 4.1.1 Business process analysis The use case diagram as shown in Figure 5 depicts ahigh level overview of system functionality providedby a typical Clinical DSS that uses a clinical datawarehouse for Lymphoma diagnosis [12]. Int J Engg Techsci Vol 5(3) 2014, 22 - 26 Activity 2.1: Determine the type of Lymphoma First, the doctor will perform an excisional biopsy, bytaking tissue sample from the affected organs forfurther examination. If other organs e.g. skin, brain,stomach are affected, a biopsy from these organs isalso required. Next, pathologist will inspect thephysical appearance of the biopsy sample under amicroscope, or identify the special molecules on thecancer cells using markets, that result to determiningthe type of Lymphoma (Figure 8). Figure5: Clinical DSS use case diagram There are 4 actors in the process; they are Patient,Doctor, Pathologist, and Oncologist. The interactionsbetween the actors and activities are demonstrated inthe following paragraphs. Activity 1: Seek consultation Patient seeks consultation from doctor when certainprolonged symptoms are noticed, e.g. lumps in neck,armpits, groin,weight loss, fever, loss of appetite,itchiness all over body,excessive sweating etc(Figure 6). Figure 8: Activity 2.1: Determine type of Lymphoma Figure 6: Activity 1 – Seek consultation Activity 2: Perform diagnosis Upon confirming the warning signs and symptoms ofLymphoma, the doctor together with pathologist willperform a series of tests on the patient to determinethe type, stage and prognosis of the disease (Figure 7). Activity 2.2: Determine the stage of LymphomaOnce the exact type of lymphoma is determined, thedoctor will proceed with a number of tests to see howadvanced the cancer is and how far it has spread.These tests include blood tests, chest x-ray, bonemarrow aspiration, PET scan, CT scan, MRI scan ofthe chest/abdomen/pelvis, lumbar puncture etc(Figure 8). Activity 2.3: Determine the prognosis of LymphomaNext, the doctor will look into other factors todetermine the prognosis of the disease. Activity 3: Propose treatment When the above investigations are completed, thedoctor and oncologist will counsel the patientregarding the best treatment options available, basedon the type and the stage of the disease and someprognostic factors (Figure 9). There are four main types of treatment normally usedto cure Lymphomas: * Chemotherapy – Using drugs as infusions intothe patient’s veins. * Radiotherapy – Using high energy rays over theaffected areas. * Biological therapy or antibiotic therapy – Usingdrugs like Rituximab to target special moleculeson the cancer cells. * Bone marrow or stem cell transplant – Usinghigh doses of chemotherapy or radiation to killall cancer cells while saving the bone marrowwith transplantation of marrow or stem cells. Figure 7: Activity 2 – Perform diagnosis IJETS | www.techsciencepub.com/ijets 24 Vineetha Appidi et al Besides proposing the treatment options to patient,the doctor also need to explain to the patient about therisks of taking the particular treatment and chance ofrecovery. Upon endorsement by the patient, thedoctor will schedule the treatments for the patient. Int J Engg Techsci Vol 5(3) 2014, 22 - 26 4.3 Physical Development This stage involves: * Data warehouse creation – create the datawarehouse using the MySQL RDBMS based onthe architecture designed. * Data provisioning – extracting and loading datafrom source into data warehouse. * Data cleansing – cleansing and transforming rawdata to ensure data quality and integrity. 4.4 Implementation and Deployment Next, the cleansed and transformed data are migratedonto an operational data warehouse. Figure 9: Activity 3: Propose treatment of Lymphoma 4.2 Architecture Design Figure 5 illustrates the overall architecture of theclinical data warehouse. Data is imported fromseveral sources and transformed within a stagingdatabase before it is integrated and stored in theproduction data warehouse for further analysis. 4.4.1 Data Warehouse creation The clinical data warehouse is created in 4 steps: 1. Migration of database schema A new schema called “cdw*production” is created.The CREATE statements of all the finalized tables inthe staging schema will be captured, modified andexecuted in the production schema. 2. Verification of database schema Next, the production data warehouse is refreshed andall table structures are verified against the stagingdatabase to ensure that the correct schema has beenmigrated. 3. Migration of production data All data in the staging database are exported to tabdelimited text files (.txt) and subsequently loaded toproduction data warehouse using the LOAD DATAINFILE command. 4. Verification of production data Finally, all migrated data are verified to ensure theyare of certain “qualities” (i.e. Accuracy,Completeness, Consistency, Timeliness, and Validity)for data mining. Reports are generated to compareproduction data against the staging database. 4.4.2 Data Query The ultimate objective of this stage is to put the datawarehouse into operation, whereby data can bequeried and feed into a data mining software forprocessing and reporting.The data should be able to be queried and presentedin a multidimensional view. 4.4 Evaluation The finalized production data will be queried andfeed into a data mining software for benchmarking,reporting and analysis. The data will be evaluatedagainst some acceptance criteria, such as applicability,novelty, understandability, representative, provability,validity etc. Research on how the evaluation of thedeveloped data warehouse and how it would supportthe data mining process will be further discussed in aseparate paper. V. CONCLUSIONS Several challenges were encountered during the datawarehouse development: 1. Data warehouse development requires specializedskills that are very different from a typicaldatabase development. 2. Data cleansing plays the most critical role in adata warehouse development. It is timeconsumingyet must be Figure 5: Clinical data warehouse architecture IJETS | www.techsciencepub.com/ijets 25 Vineetha Appidi et al handled thoroughly andwith full effort, to avoid no garbage in garbageout. 3. In an ideal solution, data should be extractedfrom physical medical files, such as patientmedical records, blood tests, urine test results, xray results, CT scan results etc, or retrieved directly from the operational medical system.However, due to data security issue, real worldmedical data cannot be seized easily. Hence, thisproject uses data from the Internet or mocked-updata. However, the proposed methodology andarchitecture framework can be applied to anoperational Clinical data warehouse development.Additionally, the structure of the data warehouseis designed in such a flexible way that it can beextended to other diseases. Having said that, this paper has reached it goals topropose and develop a methodology and architecturefor a specific disease clinical data warehouse. Thepotential benefits of the developed data warehouseare: 1. The data warehouse can be used as a fundamentalbuilding block of a Clinical DSS to improve thequality of Lymphoma diagnosis and treatmentdecision support. 2. The data warehouse can also be used with datamining tools such as Integral Solution’sClementine, Thinking Machines’ Darwin,Cognos’ Scenario, IBM’s Intelligent Miner, SAS’Enterprise Miner, Data Mind’s DataCrunche,open source tools such as MIDAS, Tysonsoftware etc. 3. The structure of the data warehouse is designedin such a flexible way that it can be extended toother diseases. [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] Int J Engg Techsci Vol 5(3) 2014, 22 - 26 [13] Teh Ying Wah, Ng Hooi Peng, Ching Sue Hok,Development of Specific Disease DataWarehouse for Developing Content fromGeneral Guide for Hypertension Screening,Referral and Follow Up, 7th WSEASInternational Conference on APPLIEDCOMPUTER SCIENCE, Venice, Italy,November 21-23, 2007. REFERENCES Dorin Comaniciu1, Peter Meer, David J. Foran,Image-guided decision support system forpathology. Machine Vision and Applications,Report No. 11, August 16, 1999. Harvey Simon and David Zieve, Hodgkin'sLymphoma, In-Depth From A.D.A.M. The NewYork Time. 25 June 2008. The “Cancer Trends Progress Report – 2007”,U.S. National Cancer Institute. 2007. W. H Inmon, R. D.Hackethorn, Using the DataWarehouse. New York: John. Wiley & Sons(1994) Sami Benmechiche, CarolChouinard, Ross Christen, Richard Kupcunas,Deepak Goyal, Ajit Kumar, “Clinical data isgold. Data warehouse are Ft Knox”, Using IT toturn your data into a strategic information asset,Deloitte, 2006. Anne Tchounikine, MaryvonneMiquel, AndréFlory, Information Warehouse for MedicalResearch, Data Warehousing and KnowledgeDiscovery, Volume 2114/2001, Springer Berlin/ Heidelberg, January 01, 2001. Anne-Muriel Arigon1 , MaryvonneMiquel,Anne Tchounikine, Multimedia datawarehouses: a multiversion model and amedical application, Multimedia Tools andApplications, Springer Netherlands, Volume 35,Number 1 / October, 2007. Torben Bach Pedersen, Christian S. Jense,Research Issues in Clinical Data Warehousing,Proceedings of SSDBM’98, July 1-3 1998. Laura Hadley, Developing a Data WarehouseArchitecture, 2002. Jose Maria Cavero, Mario Piattini, EsperanzaMarcos, Multidimensional modeling usingMIDEA, 5th WSES CSCC, 3rd WSES MCP,3rd WSES MCME International Conferences,2001. Maris Klimavicus, Data warehousedevelopment with EPC, Proceedings of the 5thWSEAS Int. Conf. on DATA NETWORKS,COMMUNICATIONS & COMPUTERS,Bucharest, Romania, October 16-17, 2006. Detailed Guide: Lymphoma, Non-HodgkinType, How is NonHodgkins Lymphoma staged,American Cancer Society Inc., August 29, 2007Retrieved from http://www.cancer.org/. IJETS | www.techsciencepub.com/ijets 26