* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Download Multidimensional Analysis
Document related concepts
Multidimensional Analysis If you are comparing more than two conditions (for example 10 types of cancer) or if you are looking at a time series (cell cycle or progression of cancer) you are looking at a multidimensional problem Example: 6000 genes in 10 patients • 6000 points in 10dimensional space (gene view) • 10 points in 6000dimensional space (patient view) Reduction of dimensions: • Principal Component Analysis (PCA) • Clustering • Correspondence Analysis Patient view Classification 1: patients surviving 5 years after breast cancer surgery 2: patients dead within 5 years of breast cancer surgery Other classifiers • Neural Networks • Support Vector Machines • Other classifiers from statistical literature Issues in building a classifier • Feature selection: a selected group of genes may be optimal (t-test) • Independent validation: you must test the classifier on samples that were not used for feature selection or for building the classifier (training set - test set or leave-oneout crossvalidation) Promoter Analysis • Genes that pass the significance test are clustered and their corresponding promoter regions extracted. • Regions are searched for potential transcription factor binding sites that they have in common • Saco-patterns looks for exactly identical patterns • Gibbs sampler allows for degeneracy of patterns with weight matrix description • Transfac is a database of known transcription factor binding sites. Patterns can be assessed based on overrepresentation in cluster relative to background set.