Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Development and Differentiation: The Princeton Stem Cell Data Set Chris Stoeckert Center for Bioinformatics, University of Pennsylvania GPBA Workshop October 13, 2003 Science 298:601-604, 2002 Relevant Analysis Points • Affymetrix – Single channel – Probe sets & MAS 4.0 • Biology-based classification – Organize genes into clusters • Comparison of gene lists – Mostly looking at overlap – Significance of overlap • RT-PCR validation – Independent corroboration Affymetrix Probe Sets & MAS 4 • Probe sets – Targeted region of gene covered with 16 overlapping 25 nt oligos. (PM) – Each is matched with an oligo with a mismatch in the middle. Used to subtract out non-specific signal. (MM) • MAS 4: P/A calls (Present/Absent/Marginal) – Proprietary heuristic – Replaced by a statistical test in MAS 5. • MAS 4: AvgDiff (expression level estimator) – Average difference of PM-MM. Can be negative! – Replaced in MAS 5. • MAS 4: logFC (log base 2 fold change) – Ratio of a sample to a reference – Normalize by global scaling (multiplying with a constant based on trimmed mean intensity) How Affymetrix data was used in this study • P/A calls: use replicates to identify expressed genes – eliminate from consideration genes that are not expressed in any sample – Compare results with EST-based methods • AvgDiff: means of replicates used for fold change determinations. • logFC: eliminate from consideration genes that don’t vary. Use for classification. – logFC relative to bone marrow mature blood cells – Must be > 2X difference between highest and lowest signals – Fold change must be greater than the noise Reality Check From about 36,000 probe sets on mouse U74 v2 set (A, B, C), ended up with ~ 4,290 for fetal and adult HSCs. Biologically-based classification • Created virtual genes for bone marrow (~50) and fetal (14) with ratios relative to mature blood cells expected for different hematopoietic stages. (vectors) • Associated real genes with these based on closest correlation (Pearson). • Combined these to create 7 clusters Generating Clusters Vector for a HSC gene logFC 2 1 0 -1 -2 LT-HSC ST-HSC LMP MBC Vectors from Affy data Adult and fetal HSC comparisons Mouse and human HSC common genes Likelihood of seeing mouse-human overlap From Mathematica 4.1: “The hypergeometric distribution HypergeometricDistribution[n, n_succ, n_tot] is used in place of the binomial distribution for experiments in which the a trials correspond to sampling without replacement from a population of size n_tot with n_succ potential successes.” Comparison of different stem cell types Science 298:597-600, 2002 How many of the stem cell genes were common between the two studies? • What was in common experimentally? • What was different: – Experimentally? – Computationally?