Download Integration within Health-care records

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Genomic imprinting wikipedia , lookup

Pathogenomics wikipedia , lookup

Cancer epigenetics wikipedia , lookup

Polycomb Group Proteins and Cancer wikipedia , lookup

Genome evolution wikipedia , lookup

Epigenetics of diabetes Type 2 wikipedia , lookup

Biology and consumer behaviour wikipedia , lookup

Gene wikipedia , lookup

Metagenomics wikipedia , lookup

Ridge (biology) wikipedia , lookup

Epigenetics of human development wikipedia , lookup

Genome (book) wikipedia , lookup

Gene expression programming wikipedia , lookup

NEDD9 wikipedia , lookup

Mir-92 microRNA precursor family wikipedia , lookup

Microevolution wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Oncogenomics wikipedia , lookup

Public health genomics wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Designer baby wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Gene expression profiling wikipedia , lookup

RNA-Seq wikipedia , lookup

Transcript
DATA WAREHOUSE FOR BIO-GEO HEALTH CARE INFORMATICS
Ranjit Ganta, Raj Acharya, Shruthi Prabhakara
Department of Computer Science and Engineering, Penn State University
INTRODUCTION
Health-Care
records:
‘Bio-geo’
Informatics
•Patient identification information
•Geographical Information.
•Clinical Information:
 Organ/Cellular level: Tumor,
pathology.
 Molecular level: DNA sequence,
Microarray.
 Laboratory data: Blood tests,
diagnosis, prognosis.
Prototype for Bio-geo Data
Research
Warehouse Cancer
Grid
Integration
of
Health-care
records:
•Privacy Violation
•Distributed integration of health
care records.
Integration within Health-care
records:
•Information
Fusion:
Combine
multiple disparate sources of Figure: Information
information such that the whole is Fusion Based Attack
more than the sum of it’s parts.
Patient
Information
Cancer Analysis
Applications
Clinical and
Pathology
Global Statistics
Information Fusion based
Clustering
Gene
Expression
Result Visualization
Geographical
Information
Public Data
(Literature
etc)
CORRESPONDENCE ANALYSIS
For the patient demographic data
set this helps answer questions such
as:
Which age/race profile(s) if any,
define a typical profile of a
prostate cancer patient?
Are middle-aged Caucasian males
more prone to prostate cancer
than Caucasians of other age
groups?
Is there a close association
between age and race groups?
Sample Result:
Example Data : Dhanasekharan et al.
"Delineation of prognostic biomarkers in prostate
cancer", Letters to Nature, Vol 412, August 2001,
pages 822-826. Supplementary data (Fig 1C, pg
823,Commercial Pool)
Gene expression (microarray data) in four clinical
states of prostate-derived tissues
Benign
states
Malignant
states
Sample Result:
CLINICAL STATES
BPH : Benign Prostatic Hyperlasia
NAP : Normal Adjacent Prostate
PCA : Localized prostate cancer
MET : Metastatic sample
KL-CLUSTERING
Genes To Co-regulated genes
g1
Down-regulated
{g1}
g2
g3
g5
No change
{g6}
g6
Input
Profiles
Clusters
Frequency of occurrence
Motif 1
Motif 2
Gene 1
0
1
0
Gene 2
0
1
2
…
 p(x) 

D( p || q)   p(x) log 
x
 q(x) 
 p ( x) 

  p log 
 q ( x) 
Up-regulated
{g2, g4}; {g3}; {g5}
g4
Common Motifs
The
Kullback-Leibler
(KL)
divergence measures the relative
dissimilarity of the shapes of two
gene profiles.
Gene n
…
Motif k
…
3
0
0
Motif: short segments of DNA that act as a
1-D SOM algorithm + KL
Minimize D(Gene || SOM weight for
each node) at each iteration step.
binding site for a specific transcription factor
Typically 6-25bp in length
Statistically different in composite compared
to the background
Often repeated within a sequence
[Bioinformatics, Vol. 19, No. 4,
2003, 449-458]
COMBINED CLUSTERING
Clustering using more than one data source
aims at identifying clusters of genes with
similar properties among all data.
Goal of combined clustering is to answer the
following:
1. If genes have similar expression profile
patterns, do they also share common
motifs?
2. If genes have a set of motifs in common,
do they also exhibit similar expression
profile patterns?
3. Which genes share BOTH - that is, they
have similar expression profile patterns
AND share a set of common motifs?
Alpha Factor Experiments
Combined clustering
All genes in the cluster
share the Transcription
Factor MCBa
Cluster on Motif vectors
Cluster on Gene expression
CONCLUSION
We have demonstrated the significance of information fusion based tools for bio-geo health care informatics.
• As a data warehouse for various data sets involved in bio-geo health care informatics studies.
• To provide and demonstrate a set of information fusion tools for disease research.