* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Microarrays - Computational Bioscience Program
Fetal origins hypothesis wikipedia , lookup
RNA interference wikipedia , lookup
Metagenomics wikipedia , lookup
Oncogenomics wikipedia , lookup
RNA silencing wikipedia , lookup
Ridge (biology) wikipedia , lookup
Transposable element wikipedia , lookup
X-inactivation wikipedia , lookup
Biology and consumer behaviour wikipedia , lookup
Epigenetics in learning and memory wikipedia , lookup
Pathogenomics wikipedia , lookup
Point mutation wikipedia , lookup
Genomic imprinting wikipedia , lookup
Epigenetics of neurodegenerative diseases wikipedia , lookup
Copy-number variation wikipedia , lookup
Genetic engineering wikipedia , lookup
History of genetic engineering wikipedia , lookup
Public health genomics wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Saethre–Chotzen syndrome wikipedia , lookup
Epigenetics of diabetes Type 2 wikipedia , lookup
Genome evolution wikipedia , lookup
Gene therapy of the human retina wikipedia , lookup
Neuronal ceroid lipofuscinosis wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Genome (book) wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Gene therapy wikipedia , lookup
The Selfish Gene wikipedia , lookup
Gene desert wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Helitron (biology) wikipedia , lookup
Gene nomenclature wikipedia , lookup
Gene expression programming wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Gene expression profiling wikipedia , lookup
Microevolution wikipedia , lookup
Microarrays Tzu Lip Phang, Ph.D. Lawrence Hunter, Ph.D. Associate Professor of Bioinformatics Director, Computational Bioscience Program Division of Pulmonary Sciences and Critical Care Medicine University of Colorado School of Medicine University of Colorado School of Medicine [email protected] [email protected] http://compbio.uchsc.edu/Hunter The Central Dogma Genome Transcriptome Microarrys in the Literature 7000 Number of papers 6000 5000 4000 3000 2000 1000 0 Year Barrett, T., Wilhite, S. E., Ledoux, P., Evangelista, C., Kim, I. F., Tomashevsky, M., et al. (2012). NCBI GEO: archive for functional genomics data sets--update. Nucleic acids research, 41(D1) Public Data Usages • Preliminary Data/Results, hypothesis • • • generation Test Algorithm Power Analysis (sample size calculation) Enhance sample size Array technology • • Basic idea: Genomic material DNA/RNA hybridizes best to exactly complementary sequences. Method: – Probes are attached to a substrate in a known location – DNA/RNA in one or more samples are fluorescently labelled – samples are hybridized to probe array, excess is washed off, and fluorescence reading are taken for each position Microarray: Primer Array synthesis • Photolithography for oligonucleotides • Cost proportional to length of oligo, not • number of features (genes) per chip! Many layers compared to computer chips. Affymetrix Probe Sets http://intermedin.stanford-edu/Arrays.ppt AAAA. . 25mer 25mer 25mer (11 to 16) 25mer PM MM Gene Expression • Still most common use for microarrays • Aim to determine differential expression • between groups of samples e.g. disease and control Generate hypotheses about the mechanisms underlying the disease of interest Basic Statistical Analysis Experimental Design • • • Biological replication is essential – Technical replication not essential except for quality control studies Pooling biological samples to reduce array variability – Increase sample size without running more chips – BUT, if individual variation is important, pooling wash out the effect Power Analysis is essential Power Analysis • How many biological replication? • My experience; at least 3, preferably 5, even 7 • Bioconductor: SSPA Preprocessing • Including image analysis, normalization, • and data transformation Data normalization: – Remove systematic errors introduced in labeling, hybridization and scanning procedures – Correct these errors while preserve biological variability / information Why normalization? Technical replicate difference A different look … Average Intensity Values To normalize or not to … AffyComp Rafael Irizarry, Dept BioStat John Hopkins University Statistical Testing • Hypothesis Testing: Is the means of two groups different from each other – Fold Change – Student-T Test Microarray Scatter Plot Student-T Test What is Multiple Comparison Testing??! Genes Gene 1 Gene 2 Gene 3 Gene 4 Gene 5 Gene 6 Gene 7 Gene 8 Gene 9 Gene 10 Alpha level = 0.05 P-values 0.0001 0.0002 0.008 0.009 0.005 0.09 0.05 0.09 0.2 0.3 <= <= <= <= <= <= <= <= <= <= Critical level 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 Ho 1 1 1 1 1 0 0 0 0 0 When large number of tests … Genes Gene 1 Gene 2 Gene 3 Gene 4 Gene 5 Gene 6 … … Gene 999 Gene 1000 P-values 0.0001 0.0002 0.008 0.009 0.005 0.09 … … 0.2 0.3 Alpha level = 0.05 50 wrong genes … <= <= <= <= <= <= … … <= <= Critical level 0.05 0.05 0.05 0.05 0.05 0.05 … … 0.05 0.05 Ho 1 1 1 1 1 0 … … 0 0 Correction … Bonferroni Genes Gene 1 Gene 2 Gene 3 Gene 4 Gene 5 Gene 6 … … Gene 999 Gene 1000 P-values 0.0001 0.0002 0.008 0.009 0.005 0.09 … … 0.2 0.3 <= <= <= <= <= <= … … <= <= Alpha level = 0.05 / 1000 = 0.00005 Critical level 0.00005 0.00005 0.00005 0.00005 0.00005 0.00005 0.00005 0.00005 0.00005 0.00005 Ho 0 0 0 0 0 0 … … 0 0 Strike the balance … Most Conservative Bonferroni Most Lenient False Discovery Rate No correction The False Discovery Rate (FDR) of a set of predictions is the expected percent of false predictions in the set of predictions. Example: If the algorithm returns 100 genes with false discovery rate of 0.3, then we should expect 70 of them to be correct Put them together Result Validation • • • RT-PCR: most common method Gene levels at the borderline of differential expression – Their measurability reduce by random error For highly differentially expressed genes, having sufficient replicates would serve as validation. Biological Interpretation