Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
بنام خدا داده كاوي و كاربرد آن درپزشكي شماره دانشجويي 85233510 : نام دانشجو :بابك رزاقي استاد راهنما :جناب آقاي دكتر توحيد خواه (سمينار درس كاربرد فناوري اطالعات در پزشكي) WHY DATA MINING? Necessity is mother of invention Huge amounts of data Electronic records of our decisions Choices in the supermarket Financial records Our comings and goings We swipe our way through the world – every swipe is a record in a database Data rich – but information poor Lying hidden in all this data is information! 2 WHAT IS DATA MINING? Extracting or “mining” knowledge from large amounts of data Data -driven discovery and modeling of hidden patterns in large volumes of data Extraction of implicit, previously unknown and unexpected, potentially extremely useful information from data 3 DATA VISUALIZATION Data mining Large database Data visualization Ways of seeing patterns in large data sets Uses the efficiency of human pattern recognition 4 TERMINOLOGY Gold Mining Knowledge mining from databases Knowledge extraction Data/pattern analysis Knowledge Discovery Databases or KDD 5 Knowledge Discovery Process Integration Interpretation & Evaluation Knowledge Knowledge __ __ __ __ __ __ __ __ __ DATA Ware house 6 Transformed Data Target Data Patterns and Rules Understanding Raw Data DATA MINING CENTRAL QUEST Find true patterns and avoid overfitting (false patterns due to randomness) 7 MAJOR DATA MINING TASKS Classification: predicting an item class Clustering: finding clusters in data Associations: e.g. A & B & C occur frequently Visualization: to facilitate human discovery Summarization: describing a group Estimation: predicting a continuous value Deviation Detection: finding changes Link Analysis: finding relationships 8 DATA MINING CHALLENGES Computationally expensive to investigate all possibilities Dealing with noise/missing information and errors in data Choosing appropriate attributes/input representation Finding the minimal attribute space Finding adequate evaluation function(s) Extracting meaningful information Not over fitting 9 DATA MINING SOFTWARE INSIGHTFUL MINER Angoss Knowledge ACCESS ARMiner Eudaptics Viscovery Goal TV MDR Viscovery SOMine SPSS 10 DATA MINING APPLICATIONS Science: Chemistry, Physics Bioscience Financial Industry - banks, businesses, e-commerce Sequence-based analysis Protein structure and function prediction Protein family classification Microarray gene expression Stock and investment analysis Pharmaceutical companies Health care Sports and Entertainment 11 Clinical Data Mining processes Digital format for all pertinent data Create structure Obtain coded information Natural language understanding Create a widely accessible repository 12 13 Minimum systolic blood pressure over a 24-hour period following admission to the hospital > 91 <= 91 Class 2: Age of Patient <=62.5 >62.5 Early death CLASSIFICATION EXAMPLE FOR MEDICAL DIAGNOSIS AND PROGNOSIS HEART DISEASE Class 1: Was there sinus tachycardia? Survivors YES NO Class 1: Class 2: Survivors Early death 14 15 GENOME, DNA & GENE EXPRESSION An organism’s genome is the “program” for making the organism, encoded in DNA Human DNA has about 30-35,000 genes A gene is a segment of DNA that specifies how to make a protein Cells are different because of differential gene expression About 40% of human genes are expressed at one time Microarray devices measure gene expression 16 MICROARRAY RAW IMAGE Gene D26528_at D26561_cds1_at D26561_cds2_at D26561_cds3_at D26579_at D26598_at D26599_at D26600_at D28114_at Scanner enlarged section of raw image Value 193 -70 144 33 318 1764 1537 1204 707 raw data 17 MICROARRAY POTENTIAL APPLICATIONS New and better molecular diagnostics New molecular targets for therapy Outcome depends on genetic signature best treatment? Fundamental Biological Discovery few new drugs, large pipeline, … finding and refining biological pathways Personalized medicine ?! 18 MICROARRAY DATA MINING CHALLENGES Avoiding false positives, due to too few records (samples), usually < 100 too many columns (genes), usually > 1,000 Model needs to be robust in presence of noise For reliability need large gene sets; for diagnostics or drug targets, need small gene sets Estimate class probability Model needs to be explainable to biologists 19 20 21 22 23 INITIAL QUERY PAGE 24 CLUSTERS MATCHING QUERY RESULTS 25 DISPLAY OF CLUSTER 26 DATA MINING SOFTWARE GUIDE 27 28 CONCLUSION Discover useful relationships in data Discover information otherwise overlooked Provide intelligence to improve various phases Intellectual property Competitive advantages: Getting more out of your data Finding other relevant information faster Exploratory, hypothesis-generating analyses Increase productivity – reduced amount of time and money 29 30 Thank You All [email protected] 31