Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Knowledge Discovery from Data as a framework to decission support in medical domains K. Gibert(1) and L. Salvador-Carulla (2) (1)Department of Statistics and Operation Research Knowledge Engineering and Machine Learning group Universitat Politècnica de Catalunya, Barcelona (2) PSICOST Scientific Research Association Bridging Knowledge in long term care and support , Barcelona, March 5th, 2009 K. Gibert Outline 1.- Introduction 2.- KDD 3.- Our Research ill-structured domains Artificial Intelligence and Statistics Development plattform KLASS 5.- Methodological overview (in parallel) Clustering based on rules (by States) Class panel graph Traffic light panel 4.- Some real applications DEFDEP project Response to neurorehabilitation Long-term quality of life perception 8.- Conclusions K. Gibert Introduction XXIth Century: Knowledge Society Great need of getting knowledge from Data Organizations Natural, industrial or artificial phenomena Support complex decision making processes Enormous quantities of data to analyze Boom Internet New technologies Classical data analysis is poor Too much data Phenomena too complex New approaches required K. Gibert Data Mining and Knowledge Discovery Interdisciplinary problem “Non trivial identifying of valid, novel, potentially useful, ultimately understandable patterns in data” [Fayyad 96] Starting: 1989: First Int’l Workshop on KDD in IJCAI 1994: First proceedings August 1995: First Int’l Conference on KDD 1996: First State of the art (Fayyad et al.) K. Gibert Data Mining and Knowledge Discovery Knowledge Discovery System [Fayy96]: Problem definition Term Data collection inolog ycal a Data cleaning and preprocessing Data mbigu M ity ining Dimensionality reduction vs KD DM technique choice D Data mining Interpretation and discovered knowledge production K. Gibert Data Mining and Knowledge Discovery Knowledge Discovery System [Fayy96]: Very ambicious goals K. Gibert Data Mining and Knowledge Discovery Banca d’Italia [1995]: Built a KDD system for Daily update of the whole set of movements Decide what and how to analyze Select relevant results Produce a daily 2-pages synthesis (natural language) Daily support to main boss decision making K. Gibert Data Mining and Knowledge Discovery Banca d’Italia Built a KDD system for Daily support decision making of the main boss Technological problems Millions of movements per day Time to transmit to the central server? Time to update the database? How to select and retrieve proper data to analyze from DB? How to validate results and verify technical assumptions? Methodological problems Which is important to analyze today? Which is the proper Data Mining technique? Which are relevant results? How to express results for the main boss? K. Gibert Data Mining and Knowledge Discovery Big Supermarket chains Daily update the datawarehouse with costumer’s bill contents Decide what and how to analyze Select relevant results What is buyed more Main associations between products “Buying nappies and beer in supermarket on Saturday evening” Support decision making of Buying department Marketing department Important economic implications K. Gibert Data Mining and Knowledge Discovery Knowledge Discovery System [Fayy96]: Very ambicious goals No complete system on yet Connection to DataWarehouses Tools to assist preprocessing Collection of data mining techniques (AMD, NN, IR, AssR, Reg…) Some help on reporting phase Manual process management and knowledge production K. Gibert Data Mining and Knowledge Discovery New paradigm proposed by Fayyad “Most previous work on KDD has focussed on [...] data mining step. However, the other steps are of considerable importance for the successful application of KDD in practice"” [Fayyad 96] Include prior and posterior analysis in KDD Requires Great efforts in real applications Specially in medical systems (uncertainty, imprecise, multi-scaled,..) stablished) Time consuming, difficult (no standardInmethodology clude i Expert interaction required exper nteraction t w Domain-dependent? meth as part of ith odolo gy its the elf After good prior analysis, proper data mining easy K. Gibert Data Mining and Knowledge Discovery Knowledge Discovery System [Fayy96]: Wide scope approach Also interesting to better know very complex small datasets Multidisciplinariety Combination or hybridation of techniques K. Gibert Data Mining and Knowledge Discovery Domain (complex) ? KDD Data Analysis Inductive Learning Data Bases •Understandability •Prediction •Description Visualisation •Validity •Summary Discovered knowledge base •Overview Domain model •Utility •Simplicity/complexity •Novelty K. Gibert Our Research Applied approach (real domains) Ill-structured domains (ISD) [AIComm 94] K. Gibert Ill-structured domains D ? [AIComm94] D John Partial knowledge Heterogeneous Data Weight Height Sex Eyes John J Additional Knowledge on domain structure 85 . . 1.85 1.85 . . ...M . . azul ... . . Numerical Categorical Heterogeneous data K. Gibert Our Research Applied approach (real domains) Ill-structured domains (ISD) [AIComm 94] Solving problems of knowledge discovery on ISD to support complex DECISION-MAKING ar n i l p i c is Multid ach appro K. Gibert Artificial Intelligence and Statistics Interdisciplinar research field ¾ Starting: 1985: Douglas Fisher and Bill Gale (AI&Stats Society) 1986: First Int’l Conference on AI & Stats ¾Main goals: Promote communication between AI and Statistics communities “We feel that there is great potential for development at the intersection of Artificial Intelligence, Computational Science and Statistics” Cheeseman and Oldford 94. Improve research in problems common to both ( Data Mining and Knowledge Discovery, ...) K. Gibert Our Research Applied approach (real domains) Ill-structured domains (ISD) [Gibert 94] Solving problems of knowledge discovery on ISD Design of hybrid methodologies in the AI & Stats field K. Gibert Our Research Applied approach (real domains) Ill-structured domains (ISD) [Gibert 94] Solving problems of knowledge discovery on ISD Design of Building of s p u o r hybrid methodologies in the AI ble&g Stats a s tfield h c s i e j u b g o n i s t s u i D neo e g o m ts) to ho oriented hybrid Systems mainly n e i t a (p KDD using Clustering as main Data Mining tool “a number of real applications in KDD either require a clustering process or can be reduced to it"” [Nakhaeizadeh 98] Focus on prior knowledge exploitation Support for implicit knowledge elicitation Focus on interpretation support tools Post-processing discovered knowledge K. Gibert Our Research Applied approach (real domains) Ill-structured domains (ISD) [Gibert 94] Solving problems of knowledge discovery on ISD Design of mixed methodologies in the AI & Stats field Building hybrid Systems mainly oriented on KDD using Clustering as main Data Mining tool Development platform KLASS [Gib91] Integrates AI and Statistics methods and pre and post processing tools for KDD on ISD K. Gibert Outline 1.- Introduction 2.- KDD 3.- Our Research ill-structured domains Artificial Intelligence and Statistics Development plattform KLASS 5.- Methodological overview (in parallel) Clustering based on rules (by States) Class panel graph Traffic light panel 4.- Some real applications DEFDEP project Response to neurorehabilitation Long-term quality of life perception 8.- Conclusions K. Gibert Methodological overview – Data cleaning – Relevant variable selection – Prior Knowledge acquisition – Clustering based on rules (by States) – Interpretation Select number of classes class panel graf Traffics light panel – Experts conceptualization – [More frequent trajectories diagram] tion c a r te n i – Identification of profiles and labelling s rt lv e e o p v x n I e h t i w – Description of profile characteristics – Validation K. Gibert Some real applications DEFDEP project Response to neurorehabilitation Long term quality of life perception K. Gibert DEFDEP project New Spanish “Dependency Low” (LPAD 39/2006, 14th Dec) Law for the promotion of personal autonomy and care for persons with dep. Spain: First Mediterranean country adopting dependency policies which includes severe mental illness PRODEP (specially created in Catalunya to develop the model and supporting system to Dependency) Conselleria de Salut + ICATSS (Institut Català de Serveis Socials) Particularities of pshychic disabled population regarding dependency DEFDEP project (leaded by Dr. L. Salvador Carulla) Goal Propose a dependency model and an assistential system proper for pshychic disabled population (Severe Mental Disorders and Intellectual Disability) groups of experts, relevant institutions, familiar associations, and a knowledge engineer + data from 306 patients with SMI K. Gibert Prior knowledge acquisition R4 ={r1: If then the patient is institutionalized (INSTITUC = {EVOINST or INSTI} mark him as an instituzionalized patient (i) r2: If the patient has poor levels of functioning ((GAFCLA2<40) or (GAFSOA2<40)) and high need of family support in daily activities ((MAXIMOA > 15) and recurrent behavioral problems ((MAXIMOB=Every_Day)) then the patient is in ill-condition (m) r3: If then ECFOS was not evaluated because the patient is autonom mark him as autonom (a) r4: If the patient was not evaluated under ECFOS by lack of carer and is not institucionalized then mark it as leaving alone (s) r5: If } the patient is able to work (INGRESE2 = TRABAJO) and is not functionally impaired (GAFCLA2 > 70) or (GAFSOA2 > 70) then mark it as a patient in good condition (b) K. Gibert Clustering based on rules [AIComm 96] Clase P Clase S Initial Data Set Clase T Clase residual Use the KB to find the Rules Induced Partition K. Gibert Clustering based on rules [AIComm 96] Hierarchically cluster every Rules-induced class Find Rules-induced prototypes Clase P Clase S Initial Data Set Clase T Clase residual K. Gibert Clustering based on rules [AIComm 96] Hierarchically cluster new dataset New Data Set K. Gibert Clustering based on rules [AIComm 96] Hierarchically cluster new dataset K. Gibert Clustering based on rules [AIComm 96] Retrieve hierarchical Structures of Rulesinduced prototypes New hierarchical tree K. Gibert Determine the final classification New hierarchical tree K. Gibert Dendrograma ClBR Autonomous (u) Good (b) Single (o) Bad (m) Residential (i) 306 pacs HSJD bo m u K. Gibert i Dendrograma ClBR TMG-K: Autónomos-K (93) Solos-K (87) Dependientes- K (105) Residenciales-K (9) S D A I 306 pacs HSJD K. Gibert Profiles of Dependency in Schyzophrenia Autonomous (Auto-K) (93 pacs) (Cr292): 9 they work 9 scondary school 9 Better scores in all scales 9 do not need familiar help 9 almost do not use health-services Alone (Alone-K) (87 pacs) (C300): 9 Do not have care giver 9 Intermediate scores, they are no well 9 Autonomous, but require help in domestic activities 9 Exagerated and chaotic use of health services 9 Require supervision (do not attend doctor appointments…) Dependents (Dep-K) (105 pacs) (C297): 9 Cannot work 9 Without primary school 9 Worse scores in all scales 9 Require the higher quantity of help from care givers 9 Use a lot of health services: Long hospital stays Institutionalized (Resid-K) (9 pacs) (Ci7): 9 Longest disease (23 years on average) 9 Suicide trials 9 Important negative sympthoms. 9 No help from family or health services, but from institution K. Gibert K. Gibert Final results [COMPSTAT2008, Phy-Verlag] Five 5 types of Dependency in Severe Mental Illness Autonomous, Living alone, Dependent in the community, Instituzionalized, Uncomplete Elicitation of implicit known profiles (living alone profile, requires special atention) In SMI impaired dimensions of DLA are different than elderly persons Motivation and volition, rather than execution Correct assessment of dependency in SMI cannot be restricted to – movility, – self-care – domestic tasks – – – Only between 4 y 6% of population with SMI have problems there A 49,28% of dependent patients with SMI would not be identified A 27% of the persons getting the best scores is in fact dependent FICE is not reliable to assess dependency in SMI Proposal of an operational definition of dependency for SMI, (including specific items for SMI, not originaly in FICE) – A 39,3% of persons with schizophrenia are dependents K. Gibert Some real applications DEFDEP project Response to neurorehabilitation Long term quality of life perception K. Gibert Response to neurorehabilitation in TBI Cognitive Cognitive Deficit Deficit Brain Damage Neuropsychologycal Rehabilitation No scientific evidence type I yet Collaboration with Institut Guttmann, Spain Better know which rehabilitation programmes are more Goal effective according to the deficit characteristics Data on neuropsychological functions (Attention; Memory; Executive functions; Language) of 47 patients before and after treatment + prior expert knowledge K. Gibert Results [MedArch 62(3)] [FAIA184] t ge ua irmen g n pa d a L im ire re impa e v Se eech Sp ent Resist ent impairm e r e v e S o ns e No res p X Older a ge m a D Mo r e DisEx e Severe cutive impair Impair ment ed exe cutive functio ns Gl o ba Sev l Imp ro er Up e imp veme a n to n orm irmen t ality t Valuab le Starting better Up to n ormality K. Gibert Iterpretation tools: CPGs Memory tests [NNW05] Classes Variables : V1 V2 … V10 V11 … C1 C2 C3 C4 : Histogram of V1|C1 K. Gibert ass l c w S ho l a r i t i e s icu part CPGs Memoria NonAssessable at the beginning The best improvement Still nonAssessable after rehabilitati on K. Gibert From CPG to Traffic Lights Panel LOW HIGH MEDIUM HIGH HIGH K. Gibert From CPG to Traffic Lights Panel K. Gibert From CPG to Traffic Lights Panel Low High Medium K. Gibert Traffic lights panel for Memory assessment [AIM08] Good Normal Bad K. Gibert Global TLP for the whole neuropsicological assessment K. Gibert TLP supports expert conceptualization AssessableImprove ? K. Gibert TLP supports expert conceptualization AssessableImprove GlobalImprovement +MemoDisExe Resistant KN O WLE DGE DISC OVER Ne w Do m Y: ain M odel K. Gibert Currently in progress Assessable: mild to moderate neuropsychologic impairment, use to improve after treatment (up to normality). They could be assessed at the beginning of treatment Global Improvement: initial severe impairment global satisfactory improvement (up to normality) The group with gretaer improvement regarding the initial conditions atient’s profile p f o rs to ic d re p l fu e s entify uimpairment Idsevere Disexecutive: initial s) lesion characteristic generally satisfactory (including improvement but persisting executive functions disorder rogramm done p n o ti a it il b a h re h it le w s profito Crosunable remain develop complex routines and planning rehabilitation rd a d n ta s ll fu s s e c olderPro and more damage than previous group c posal of su rofile Resistant: initial severe impairment program for every p mild improvement in tasks requiring minimal attention good improvement in memory and learning skills remains attention deficit for complez and executive tasks cannot perform alone daily live activities Low response to treatment Language: language problems and very severe global cognitive impairment. They only can improve as they recover language. Logopedic therapy K. Gibert Some real applications DEFDEP project Response to neurorehabilitation Long term quality of life perception K. Gibert Long term Quality of Life perception. Spinal Cord Injury Maintain and improve quality of life in chronic patients patient’s-centered approach QoL: Multidimensional construct emotional wellness, functional autonomy, social inclusion… Collaboration with Institut Guttmann, Spain How is a patient with SCI perceiving his/her Goal quality of life along time Data on IBP, CIF, ESIG of 109 patients at 3 consecutive annual follow-up after clinical discharge (2002-2008) + prior expert knowledge K. Gibert Clustering based on rules by States ID i1 i2 i3 i4 … … … Xe11 … Xe21 … X11 … … … X11 … … X21 … X31 … … … X21 … … … X31 … X41 … … … X41 … … … … … … … … … … … X n1 … … n1 … … e2 e1 Xe1K1 … … … XeE1 … … X11 … … … … X21 … … … … X31 … … … … … X41 … … … … … … … … … … … … … … … … … … … … … … … … … X n1 … … … … … n1 … Xe2K2 eE X n1 n1 … XeE KE P e 1P e2 ... P eE in Knowledge KLASS Base τe 1 τe 2 Pe2 Pe1 … τe E PeE K. Gibert Results: More typical patterns (γ≥0.05) 1st Assessment 2nd Assessment 3rd Assessment C59 C63 C55 IndepPos IndepPositius IndepPos TRAJECTORIES IndepModerat C62 C49 C57 IndepModAnt SemidepHetero Dependents T4 T6 SemiDepNeg DepEstoics C54 T7 T12 C56 C46 IndepMod C64 DepEstoics C52 K. Gibert Interpretation of patterns [Stud. Health Tech Inform 09 (in press)] e sion. Som le t n e c . r. Re tonomy u Younge a l a ic s Phy distress. 1st Assessment p stable e e k y e Th VIP2 Physic al au wellne tonomy an d ss m a i ntainin psycholog ic g alon VIP3 g time al C59 C63 C55 IndepPositius IndepPositius IndepPositius TRAJECTORIES IndepModerat C62 C49 C57 IndepModAnt SemidepHetero Depenents T4 T6 SemiDepNeg DepEstoics C54 T7 T12 C56 C46 IndepMod C64 Beginning : Function distress. H al autono ealth pro my, some blems ap and they pear with loose fun time ctionality strategie . Different s. Old pe c oping ople, old lesion. DepEstoics C52 rent e f f i th d ation to i w rting adapt a t S . t rm ty men Long te o anxie r i a imp ies. ss, n High strateg e distre ing oderat cop m K. Gibert Methodological review – Data cleaning – Relevant variable selection – Prior Knowledge acquisition – Clustering based on rules (by States) – Interpretation Select number of classes class panel graf Traffics light panel – Experts conceptualization – [More frequent trajectories diagram] tion c a r te n i – Identification of profiles and labelling s rt lv e e o p v x n I e h t i w – Description of profile characteristics – Validation K. Gibert Conclusions KDD useful complement to partial prior expert knowledge Hybrid AI&Stats methodologies allows KDD in complex medica domains ClBR and ClBRxE resulted useful tools for KDD Interpretation-oriented tools crucial for understandable results – (CPG and TLP good support interpretation tools) Expert should be integrated as part of the methodology itself KDD helps elicitation of implicit expert knowledge K. Gibert Knowledge Discovery from Data as a framework to decission support in medical domains Karina Gibert, Luis Salvador Carulla Dpt. Statistics and Operation Research Knowledge Engineering and Machine Learning Research group Universitat Politècnica de Catalunya, Barcelona (Spain). [email protected] PSICOST Scientific Association [email protected] Are there any questions?... Bridging 2009 La Pedrera, Barcelona, 4-7 th March 2009 K. Gibert