* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Integration within Health-care records
Genomic imprinting wikipedia , lookup
Pathogenomics wikipedia , lookup
Cancer epigenetics wikipedia , lookup
Polycomb Group Proteins and Cancer wikipedia , lookup
Genome evolution wikipedia , lookup
Epigenetics of diabetes Type 2 wikipedia , lookup
Biology and consumer behaviour wikipedia , lookup
Metagenomics wikipedia , lookup
Ridge (biology) wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Genome (book) wikipedia , lookup
Gene expression programming wikipedia , lookup
Mir-92 microRNA precursor family wikipedia , lookup
Microevolution wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Oncogenomics wikipedia , lookup
Public health genomics wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Designer baby wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
DATA WAREHOUSE FOR BIO-GEO HEALTH CARE INFORMATICS Ranjit Ganta, Raj Acharya, Shruthi Prabhakara Department of Computer Science and Engineering, Penn State University INTRODUCTION Health-Care records: ‘Bio-geo’ Informatics •Patient identification information •Geographical Information. •Clinical Information: Organ/Cellular level: Tumor, pathology. Molecular level: DNA sequence, Microarray. Laboratory data: Blood tests, diagnosis, prognosis. Prototype for Bio-geo Data Research Warehouse Cancer Grid Integration of Health-care records: •Privacy Violation •Distributed integration of health care records. Integration within Health-care records: •Information Fusion: Combine multiple disparate sources of Figure: Information information such that the whole is Fusion Based Attack more than the sum of it’s parts. Patient Information Cancer Analysis Applications Clinical and Pathology Global Statistics Information Fusion based Clustering Gene Expression Result Visualization Geographical Information Public Data (Literature etc) CORRESPONDENCE ANALYSIS For the patient demographic data set this helps answer questions such as: Which age/race profile(s) if any, define a typical profile of a prostate cancer patient? Are middle-aged Caucasian males more prone to prostate cancer than Caucasians of other age groups? Is there a close association between age and race groups? Sample Result: Example Data : Dhanasekharan et al. "Delineation of prognostic biomarkers in prostate cancer", Letters to Nature, Vol 412, August 2001, pages 822-826. Supplementary data (Fig 1C, pg 823,Commercial Pool) Gene expression (microarray data) in four clinical states of prostate-derived tissues Benign states Malignant states Sample Result: CLINICAL STATES BPH : Benign Prostatic Hyperlasia NAP : Normal Adjacent Prostate PCA : Localized prostate cancer MET : Metastatic sample KL-CLUSTERING Genes To Co-regulated genes g1 Down-regulated {g1} g2 g3 g5 No change {g6} g6 Input Profiles Clusters Frequency of occurrence Motif 1 Motif 2 Gene 1 0 1 0 Gene 2 0 1 2 … p(x) D( p || q) p(x) log x q(x) p ( x) p log q ( x) Up-regulated {g2, g4}; {g3}; {g5} g4 Common Motifs The Kullback-Leibler (KL) divergence measures the relative dissimilarity of the shapes of two gene profiles. Gene n … Motif k … 3 0 0 Motif: short segments of DNA that act as a 1-D SOM algorithm + KL Minimize D(Gene || SOM weight for each node) at each iteration step. binding site for a specific transcription factor Typically 6-25bp in length Statistically different in composite compared to the background Often repeated within a sequence [Bioinformatics, Vol. 19, No. 4, 2003, 449-458] COMBINED CLUSTERING Clustering using more than one data source aims at identifying clusters of genes with similar properties among all data. Goal of combined clustering is to answer the following: 1. If genes have similar expression profile patterns, do they also share common motifs? 2. If genes have a set of motifs in common, do they also exhibit similar expression profile patterns? 3. Which genes share BOTH - that is, they have similar expression profile patterns AND share a set of common motifs? Alpha Factor Experiments Combined clustering All genes in the cluster share the Transcription Factor MCBa Cluster on Motif vectors Cluster on Gene expression CONCLUSION We have demonstrated the significance of information fusion based tools for bio-geo health care informatics. • As a data warehouse for various data sets involved in bio-geo health care informatics studies. • To provide and demonstrate a set of information fusion tools for disease research.