Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Advanced Statistical Methods: Beyond Linear Regression John R. Stevens Utah State University Notes 1. Case Study Data Sets Mathematics Educators Workshop 28 March 2009 1 http://www.stat.usu.edu/~jrstevens/pcmi Why this workshop? Me … Outreach mission of USU Recruitment – undergraduate & graduate Too much fun You … 2 Outline Notes 1: Case Study Data sets 1. Challenger Explosion 2. Beetle Fumigation 3. T-cell Cancer Notes 2: Statistical Methods I Logistic Regression – incl. Separation of Points EM Algorithm Notes 3: Statistical Methods II Tests for Differential Expression Multiple hypothesis testing Visualization Machine Learning Notes 4: Computer Implementation (Notes 5): Bonus Material 3 Case Study 1: Challenger January 18, 1986 explosion prompted the Presidential Commission on the Space Shuttle Challenger Accident Commission's 1986 report attributed the explosion to a burn through of an O-ring seal at a field joint in one of the solidfuel rocket boosters After each of the previous 24 launches, the solid rocket boosters were inspected, and the presence or absence of damage to the field joint was noted 4 Challenger Data Motivating question: What was so different on the 25th launch? 5 Obs 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 Flight STS1 STS9 STS51B STS2 STS41B STS51G STS3 STS41C STS51F STS4 STS41D STS51I STS5 STS41G STS51J STS6 STS51A STS61A STS7 STS51C STS61B STS8 STS51D STS61C Temp 66 70 75 70 57 70 69 63 81 80 70 76 68 78 79 67 67 75 72 53 76 73 67 58 Damage NO NO NO YES YES NO NO YES NO YES NO NO NO NO NO NO YES NO YES NO NO NO YES Case Study 2: Beetle Fumigation – Rhyzopertha Dominica 6 (Image courtesy Clemson University – USDA Cooperative Extension Slide Series, www.insectimages.org) Motivation Beetle: lesser grain borer A primary pest of stored grain A year-round problem in moderate climates Australian grain industry: $6–8 billion Zero tolerance for insect-infested grain Phosphine fumigant for control Some beetles have developed resistance levels more than 235 times greater than normal 7 (UQ News Online, 18 Oct. 1999) Experimental Background Two DNA markers linked to resistance rp6.79: two genotypes: –,+ rp5.11: three genotypes: B,H,A Motivating question: What contributes to the degree of resistance? Mixture of six beetle genotypes exposure to various concentrations of fumigant (48 hours) 8 Experimental Data Phosphine Total Dosage Receiving (mg/L) Dosage 0 98 0.003 100 0.004 100 0.005 100 0.01 100 0.05 300 0.1 400 0.2 750 0.3 500 0.4 500 1.0 7850 10,798 9 Total Deaths 0 16 68 78 77 270 383 740 490 492 7,806 10,420 Total Survivors 98 84 32 22 23 30 17 10 10 8 44 378 Survivors Observed at Genotype -/B -/H -/A +/B +/H +/A 31 27 10 6 20 4 18 26 10 6 20 4 10 4 3 5 7 4 1 4 7 2 6 2 0 1 9 8 5 0 0 0 0 5 20 5 0 0 0 0 10 7 0 0 0 0 0 10 0 0 0 0 0 10 0 0 0 0 0 8 0 0 0 0 0 44 Practical Considerations in Choosing Dosage Clearly a high dosage would kill all beetles, regardless of genotype Time more important than concentration Expense more time with lower dose Technical limitations maintain concentration in silos Safety spontaneous combustion at high conc. 10 Case Study 3: T-cell Cancer Acute lymphoblastic leukemia (ALL) leukemia – cancer of white blood cells ALL – excess of lymphoblasts (immature cells that become white blood cells) Two types of interest here: T-cell – manage cell-mediated immune response (activation of cells, release of cytokines) B-cell – manage humoral immune response (secretion of antibodies) Researchers used gene expression technology 11 Central Dogma of Molecular Biology 12 General assumption of microarray technology Use mRNA transcript abundance level as a measure of the level of “expression” for the corresponding gene Proportional to degree of gene expression 13 How to measure mRNA abundance? Several different approaches with similar themes: Affymetrix GeneChip Nimblegen array Two-color cDNA array more Representation of genes on slide Small portion of gene Larger sequence of gene 14 oligonucleotide arrays Affymetrix Probes 25 bp 15 (Images courtesy Affymetrix, www.affymetrix.com) Affymetrix Technology – GeneChip Each spot on array represents a single probe sequence (with millions of copies) Perfect match Mismatch 16 (Image courtesy Affymetrix, www.affymetrix.com) Each gene is represented by a unique set of probe pairs (usually 12-20 probe pairs per probe set) These probes are fixed to the array Affymetrix Technology – Expression A tissue sample is prepared so that its mRNA has fluorescent tags; wait for hybridization 17 (Images courtesy Affymetrix, www.affymetrix.com) Affymetrix GeneChip 18 Image courtesy Affymetrix, www.affymetrix.com Cartoon Representations Animation 1: GeneChip structure (1 min.) Animation 2: Measuring gene expression (2.5 min) 19 Data: Spot Intensities Full Array Image 20 Close-up of Array Image Images courtesy Affymetrix, www.affymetrix.com Basic goal of microarray technology “Observe” gene expression in different conditions – healthy vs. diseased, e.g. Decide which genes’ expression levels are changing significantly between conditions Target those genes – to halt disease, e.g. Study those genes – to better understand differences at the genetic level 21 ALL Data “Preprocessed” gene expression data 12625 genes (hgu95av2 Affymetrix GeneChip) 128 samples (arrays) a matrix of “expression values” – 128 cols, 12625 rows phenotypic data on all 128 patients, including: 95 B-cell cancer 33 T-cell cancer Motivating question: Which genes are changing expression values systematically between B-cell and T-cell groups? 22 Next … Analysis for these case studies Build on known statistical methods Notice huge potential for additional methods 23