Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Analyzing Microarray Data with Methods from Statistics and Machine Learning B-IT IPEC Winter School 2008 Prof. Dr. A. B. Cremers Jörg Zimmermann DNA Microarray Data • Genome Chips containing a collection of microscopic DNA spots • Simultaneous determination of > 105 Gene Expression Levels • Dramatic acceleration of data aquisition • New possibilities for disease diagnosis, treatment studies, network analysis, … Analyzing Microarray Data with Methods from Statistics and Machine Learning DNA Microarray Data The resulting data have the form: x11 x12 … x1n ( L1 ) . . . . . . xp1 xp2 … xpn ( Lp ) n = number of measured cell states (e.g. gene expression levels) p = number of samples xij = real number e.g. representing expression level of gene j in sample i Li = Label of sample i Analyzing Microarray Data with Methods from Statistics and Machine Learning Challenges for Data Analysis • Normalization (removing systematic measurement effects) • Variable Selection (Identification of relevant Variables) • Large sample Effects: Type I and Type II errors (False positives / False negatives) • Dimensionality Reduction • Identification of new disease classes • Classification of data into known disease classes Analyzing Microarray Data with Methods from Statistics and Machine Learning Cluster Analysis Finding Structure in data without labels (unsupervised learning) Does a cluster characterize a (new) disease type? Analyzing Microarray Data with Methods from Statistics and Machine Learning Prediction Problem • Classify data into known disease classes: Supervised Learning • Split data in Training and Test set • Learn a model on the training set • Evaluate model on the test set Analyzing Microarray Data with Methods from Statistics and Machine Learning Prediction Problem Under- and Overlearning: Analyzing Microarray Data with Methods from Statistics and Machine Learning Data Analysis Methods Dimension Reduction • PCA (Principle Component Analysis) • ICA (Independent Component Analysis) • Multidimensional Scaling Unsupervised Learning • K-Means / K-Medoid • Hierarchical Clustering Algorithms Supervised Learning • Linear Discriminant Analysis • Maximum Likelihood Discrimination • Nearest Neighbor Methods • Decision Trees • Random Forests Analyzing Microarray Data with Methods from Statistics and Machine Learning Organisation Schedule: 31.3.2008 – 4.4.2008, B-IT Building Language: german and english (Slides in english) Talk: 45 min + 15 min discussion Documentation: 10 – 15 pages (german or english) Bereich (DPO Bonn): B Background Literature: Hastie, Tibshirani, Friedman: The Elements of Statistical Learning, Springer, 2001 Contact: [email protected] Summer Course: Gene Mining and Network Analysis Summer School: Programming Data Analysis Algorithms with R Analyzing Microarray Data with Methods from Statistics and Machine Learning