Download Multidimensional Analysis

yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

Site-specific recombinase technology wikipedia, lookup

BRCA mutation wikipedia, lookup

Designer baby wikipedia, lookup

Gene wikipedia, lookup

Minimal genome wikipedia, lookup

Biology and consumer behaviour wikipedia, lookup

Microevolution wikipedia, lookup

Therapeutic gene modulation wikipedia, lookup

Polycomb Group Proteins and Cancer wikipedia, lookup

Nutriepigenomics wikipedia, lookup

Ridge (biology) wikipedia, lookup

Pharmacogenomics wikipedia, lookup

Genome (book) wikipedia, lookup

Epigenetics of human development wikipedia, lookup

Mir-92 microRNA precursor family wikipedia, lookup

Gene expression profiling wikipedia, lookup

Oncogenomics wikipedia, lookup

RNA-Seq wikipedia, lookup

NEDD9 wikipedia, lookup

Multidimensional Analysis
If you are comparing more than two
conditions (for example 10 types of
cancer) or if you are looking at a time
series (cell cycle or progression of
cancer) you are looking at a
multidimensional problem
Example: 6000 genes in 10 patients
• 6000 points in 10dimensional space
(gene view)
• 10 points in 6000dimensional space
(patient view)
Reduction of dimensions:
• Principal Component
Analysis (PCA)
• Clustering
• Correspondence
Patient view
1: patients surviving 5 years after breast cancer surgery
2: patients dead within 5 years of breast cancer surgery
Other classifiers
• Neural Networks
• Support Vector Machines
• Other classifiers from statistical literature
Issues in building a classifier
• Feature selection: a selected group of genes
may be optimal (t-test)
• Independent validation: you must test the
classifier on samples that were not used for
feature selection or for building the
classifier (training set - test set or leave-oneout crossvalidation)
Promoter Analysis
• Genes that pass the significance test are clustered
and their corresponding promoter regions
• Regions are searched for potential transcription
factor binding sites that they have in common
• Saco-patterns looks for exactly identical patterns
• Gibbs sampler allows for degeneracy of patterns
with weight matrix description
• Transfac is a database of known transcription
factor binding sites.
Patterns can be assessed based on overrepresentation in cluster
relative to background set.