Download Analyzing Microarray Data with Methods from Statistics and

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Analyzing Microarray Data
with Methods from
Statistics and Machine Learning
B-IT IPEC Winter School 2008
Prof. Dr. A. B. Cremers
Jörg Zimmermann
DNA Microarray Data
• Genome Chips containing a collection
of microscopic DNA spots
• Simultaneous determination of > 105
Gene Expression Levels
• Dramatic acceleration of data
aquisition
• New possibilities for disease diagnosis,
treatment studies, network analysis, …
Analyzing Microarray Data with Methods from Statistics and Machine Learning
DNA Microarray Data
The resulting data have the form:
x11 x12 … x1n ( L1 )
.
.
.
.
.
.
xp1 xp2 … xpn ( Lp )
n = number of measured cell states (e.g. gene expression levels)
p = number of samples
xij = real number e.g. representing expression level of gene j in sample i
Li = Label of sample i
Analyzing Microarray Data with Methods from Statistics and Machine Learning
Challenges for Data Analysis
• Normalization (removing systematic measurement effects)
• Variable Selection (Identification of relevant Variables)
• Large sample Effects:
Type I and Type II errors (False positives / False negatives)
• Dimensionality Reduction
• Identification of new disease classes
• Classification of data into known disease classes
Analyzing Microarray Data with Methods from Statistics and Machine Learning
Cluster Analysis
Finding Structure in data without labels (unsupervised learning)
Does a cluster characterize a (new) disease type?
Analyzing Microarray Data with Methods from Statistics and Machine Learning
Prediction Problem
• Classify data into known disease classes:
Supervised Learning
• Split
data in Training and Test set
• Learn a model on the training set
• Evaluate model on the test set
Analyzing Microarray Data with Methods from Statistics and Machine Learning
Prediction Problem
Under- and Overlearning:
Analyzing Microarray Data with Methods from Statistics and Machine Learning
Data Analysis Methods
Dimension Reduction
• PCA (Principle Component Analysis)
• ICA (Independent Component Analysis)
• Multidimensional Scaling
Unsupervised Learning
• K-Means / K-Medoid
• Hierarchical Clustering Algorithms
Supervised Learning
• Linear Discriminant Analysis
• Maximum Likelihood Discrimination
• Nearest Neighbor Methods
• Decision Trees
• Random Forests
Analyzing Microarray Data with Methods from Statistics and Machine Learning
Organisation
Schedule:
31.3.2008 – 4.4.2008, B-IT Building
Language: german and english (Slides in english)
Talk: 45 min + 15 min discussion
Documentation: 10 – 15 pages (german or english)
Bereich (DPO Bonn): B
Background Literature:
Hastie, Tibshirani, Friedman:
The Elements of Statistical Learning, Springer, 2001
Contact: [email protected]
Summer Course: Gene Mining and Network Analysis
Summer School: Programming Data Analysis Algorithms with R
Analyzing Microarray Data with Methods from Statistics and Machine Learning
Related documents