Download Slide 1

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Max Planck Institute for Molecular Genetics
A pipeline based on
multivariate correspondence analysis
with supplementary variables
for cancer genomics
Christine Steinhoff
Max Planck Institute for Molecular Genetics
Berlin, Germany
Information
Source
Literature/
database
• DNA/Genome
In silico
• RNA
Profiling/
characterizing
• Protein
• Phenotype
experimental
Data Sources
Biological
Level
Technology
Examples
ESTlibrary; physical parameters of
DNA, RNA, Proteins, etc; DNA
sequence, datamining, literature
mining, ...
Methylation prediction: TFBS
prediction; functional
annotations (repetitive elements,
functional categories,... ),
Splicing,
Epigenetics; SNP arrays,
arrayCGH; sequencing;
expression arrays; ...
interaction
ChIP chip; Preotein interaction;
MASS of complexes; ...
phenomics
Imaging; RNAi techniques;
MASS; medical observations
Max Planck Institute for Molecular Genetics
PROBLEMS
R(n1 , m)
R(n2 , m)
grade
stage
Died
2
1
Yes
4
3
No
2
2
yes
Cat (m, c)
After appropriate normalization
Approx lognormal
symmetric
Not symmetric
skew
Scale and Distribution differ!
Max Planck Institute for Molecular Genetics
Discrete categories
Procedure
Data INPUT
Discretization
Filtering
Indicator coding
Multiple Correspondence
Analysis
Max Planck Institute for Molecular Genetics
Step 1: Discretization
Expression
arrayCGH
nE xp
nA xp
R
R
Patients
covariates
Categorical:
e.g.
Staging
Grading
Smoking
Mutation
....
Max Planck Institute for Molecular Genetics
Step 1: Discretization
Expression
arrayCGH
nE xp
nA xp
R
Package:
DNAcopy
Probability
of expression
Fold Change
Criterion
R
Segmentation and discretization of
arrayCGH data
Max Planck Institute for Molecular Genetics
Step 1: Discretization
Expression
arrayCGH
nE xp
nA xp
R
{1,0,1}
nE xp
R
Patients
covariates
Cat pxm
{1,0,1}
nA xp
Typically: n~23,000 -> reduce number
Max Planck Institute for Molecular Genetics
Step 2: Filtering (optional)
Possibilities
-Neglect all genes with no change in any patient
-Choose genes with highest Variance across patients
-Select for high Correlation between arrayCGH and expression
Max Planck Institute for Molecular Genetics
Procedure
Data INPUT
Discretization
Filtering
Indicator coding
Multiple Correspondence
Analysis
Max Planck Institute for Molecular Genetics
Step 3: Indicator Matrix - Binary Coding
down
normal
Up
pat 1
1
0
0
pat 2
0
0
1
pat 3
0
1
0
pat 4
0
1
0
Gene 1
pat 1
Down
pat 2
Up
pat 3
Normal
pat 4
normal
Original matrix
With categories
Indicator matrix
With binary coding
Max Planck Institute for Molecular Genetics
From:
Multiple Correspondence
Analysis and related
Methods
I pt [ I E I A ]B
Max Planck Institute for Molecular Genetics
EXAMPLE: PUBLISHED DATA
Max Planck Institute for Molecular Genetics
Covariate States‘ Display
Max Planck Institute for Molecular Genetics
Explore ERBB2 and MYC
ERBB2
Amplified in ACGH
ERBB2
overexpression
ERBB2
normal in ACGH
Max Planck Institute for Molecular Genetics
ERBB2
underexpr
ERBB2
loss in ACGH
Max Planck Institute for Molecular Genetics
MYC
amplification
MYC
Overexpression
Max Planck Institute for Molecular Genetics
MYC
underexpression
MYC
Normal acgh
Max Planck Institute for Molecular Genetics
Enrichment of GO Categories
Max Planck Institute for Molecular Genetics
Thank you for your attention !
ACKNOWLEDGEMENT
Max Planck Institute for Molecular
Genetics
Martin Vingron
Sensor Lab, CNR-INFM
Matteo Pardo
Max Planck Institute for Molecular Genetics
Related documents