Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Good Statistics Can Help You Discover Biological Process Using Modern Genomic Assays Tom Downey President, Partek Inc. GoToWebinar Attendee Interface 1. Viewer Window 2. Control Panel 2 Copyright © 2009, All Rights Reserved. Partek Inc. Who is Partek? Founded in 1993 Based in St. Louis, MO USA Focused on Genomics Thousands of customers worldwide Building tools for both biologists and bioinformaticians 3 Copyright © 2009, All Rights Reserved. Partek Inc. Partek® GS™ - Software for Integrated Genomics Multi-Platform Support • Multi-Assay Support • Gene Expression • Alternative Splicing • miRNA • Copy Number Variation • ChIP-Chip, ChIP-Seq • RNA-Seq 4 Copyright © 2009, All Rights Reserved. Partek Inc. Partek® GS™ for Integrated Genomics Microarray Genome • Copy Number • Total & Allele Specific • Association • Loss of Heterozygosity & Next Generation Sequencing Regulation Transcriptome • Gene Expression • Exon/Alternative Splicing • DGE & mRNA –Seq 5 • ChIP-Chip • ChIP-Seq • microRNA Copyright © 2009, All Rights Reserved. Partek Inc. What is Statistics? • Statistics (Source: www.m-w.com) a branch of mathematics dealing with collection, analysis, interpretation,and presentation of masses of numerical data 6 Copyright © 2009, All Rights Reserved. Partek Inc. The Role of Experimental Design • • The goal of statistics is to find signals in a sea of noise The goal of experimental design is to reduce that noise so true biological signals can be found with as small a sample size as possible 7 Copyright © 2009, All Rights Reserved. Partek Inc. Example 1: ER+ Positive Breast Cancer Study Evaluate expression of ER+ breast cancer cell lines 5 treatments 3 time points 2 replicates Control Estrogen E2 + (E2) ICI E2 + Raloxifene E2 + Tomoxifen 0 hr 2 0 0 0 0 8 hr 0 2 2 2 2 48 hr 0 2 2 2 2 8 Copyright © 2009, All Rights Reserved. Partek Inc. As Seen Using PCA Samples clustered by batches “A” and “B” • 9 Copyright © 2009, All Rights Reserved. Partek Inc. As Seen Using Hierarchical Clustering 10 Copyright © 2009, All Rights Reserved. Partek Inc. Good News! • • Since the treatments were perfectly balanced with the batches, the batch can be can be completely removed from the data. With a simple 2-way ANOVA, this gene was #228 on the gene list and would not pass multiple test correction for significance. With a 3-way ANOVA including batch, it was #2 on the gene list. Factor 2-way ANOVA 3-way ANOVA Treatment 0.00391497 3.43275e-007 Time 0.396031 0.00964938 Treatment*Time 0.100862 3.56752e-005 11 Copyright © 2009, All Rights Reserved. Partek Inc. Balanced with respect to Batch Monday Control Estrogen E2 + (E2) ICI E2 + Raloxifene E2 + Tomoxifen 8 hr 1 1 1 1 48 hr 1 1 1 1 Estrogen E2 + (E2) ICI E2 + Raloxifene E2 + Tomoxifen 8 hr 1 1 1 1 48 hr 1 1 1 1 0 hr 1 Tuesday Control 0 hr 1 12 Copyright © 2009, All Rights Reserved. Partek Inc. #2 Most Significant Gene MedianA=8.5 MedianB=9.7 Tue/Mon more than 2-fold Monday on top, Tuesday on bottom 13 Copyright © 2009, All Rights Reserved. Partek Inc. ER+ B.C. Unadjusted Signal 14 Copyright © 2009, All Rights Reserved. Partek Inc. ER+ B.C. Batch-Adjusted Signal 15 Copyright © 2009, All Rights Reserved. Partek Inc. Experiment Design: Consider Batch Effects 16 Copyright © 2009, All Rights Reserved. Partek Inc. How NOT to Ruin Your Next Experiment! • • • Samples are frequently “organized” by treatment groups. Samples are then processed in batches corresponding to treatment groups. E.g. Control samples processed on Monday, Treated samples processed on Tuesday. 17 Copyright © 2009, All Rights Reserved. Partek Inc. Statistical Analysis 18 Copyright © 2009, All Rights Reserved. Partek Inc. A brief history of statistics Pearson 1900-1901: Regression, Correlation (r), ChiSquare Student (William Gosset) 1908: T-test Fisher 1923: ANOVA 1934: Statisticians began to realize that these were all re-inventions of the “linear model”… 19 Copyright © 2009, All Rights Reserved. Partek Inc. Analysis of Variance Analysis (m-w.com) separation of a whole into its component parts “Analysis of Variance” (ANOVA) measures the effect of multiple experimental factors (or phenotypes) on expression levels. 20 1.15% Treatment 17.49% Time 17.40% 58.36% Treatment * Noise Batch 1.64% Copyright © 2009, All Rights Reserved. Partek Inc. 2-way ANOVA Total variance is partitioned into variability due to influencing factors and the rest is assumed to be due to random error (noise). 18.64% Treatment 17.40% 58.36% Time Treatment * Time Noise 1.64% R2=81% for 2-way ANOVA 21 Copyright © 2009, All Rights Reserved. Partek Inc. ANOVA Partitions Variability Total variance is partitioned into variability due to influencing factors and the rest is assumed to be due to random error (noise). 1.15% Treatment 17.49% Time 17.40% 58.36% Treatment * Time Noise Batch 1.64% R2=99% when Batch included 22 Copyright © 2009, All Rights Reserved. Partek Inc. Monitor Biological Trends with GO Enrichment 23 Copyright © 2009, All Rights Reserved. Partek Inc. Upregulation/Downregulation by Biological Process 24 Copyright © 2009, All Rights Reserved. Partek Inc. Correlation is a form of ANOVA Correlating a continuous variable with a binary variable is mathematically equivalent to a ttest, or 1-way ANOVA Gene A Gene B 8.3 5.6 7.1 7.3 6.8 8.2 8.9 6.1 7.3 6.9 … … … … … … … … … AML/ALL 0 (AML) 1 (ALL) 1 (ALL) 0 (AML) 1 (ALL) … 25 Copyright © 2009, All Rights Reserved. Partek Inc. Example 2: Aging Study Aging Study: Examine how gene expression profiles change with age (GEO GSE8479) Total of 49 Adult Subjects N=24 Young (Age 18-29) N=25 Old (Age 65-84) 26 Female, 23 Male ~6 Beadchips (6*8=48) 26 Copyright © 2009, All Rights Reserved. Partek Inc. Simple Correlation of Age w/ Genes R2=81% for correlation (1-factor ANOVA) Noise = 19% 27 Copyright © 2009, All Rights Reserved. Partek Inc. Correlation of Age w/ Genes, considering other effects R2=90% using 3-factor ANOVA Noise = 10% 28 Copyright © 2009, All Rights Reserved. Partek Inc. Partek ANOVA Balanced & Unbalanced & Incomplete Random & Fixed Effects (mixed model) Numeric & Categorical Variables Nested/Nesting (Hierarchical) designs Any number of factors Flexible Linear Contrasts 29 Copyright © 2009, All Rights Reserved. Partek Inc. Sample preparation High-Throughput Sequencing Alignment Downstream Analysis Publication 30 Copyright © 2009, All Rights Reserved. Partek Inc. Partek Genome Browser – RNA-Seq Data 31 Copyright © 2009, All Rights Reserved. Partek Inc. Next Generation Sequencing Data Along with Exon Microarray Data Partek® Genomics Suite™ Powerful Statistics with Interactive Visualization Fast, Memory-efficient Easy to Use Support for a Wide Variety of Genomic Technologies 3’ Gene Expression Whole Transcript Exon & Gene Copy Number, AsCN, LOH microRNA Expression CHIP-chip CHIP-Seq and RNA-Seq Taqman Real-time PCR Genomic Integration Integrated with Public Genomic Resources: NCBI GEO, UCSC, Ensembl, Gene ontology, KEGG etc. 33 Copyright © 2009, All Rights Reserved. Partek Inc. Micro RNA - hsa_miR_25 miRNA negatively correlated with its mRNA target mRNA expression - 208860_s_at 34 Copyright © 2009, All Rights Reserved. Partek Inc. Questions & Answers Questions may be sent via the question box FREE Data Analysis Webinars www.partek.com/webinars Get your FREE trial today! Email [email protected] 35 Copyright © 2009, All Rights Reserved. Partek Inc.