Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Microarray data analysis Annamaria Carissimo [email protected] Outline Microarray analysis Pipeline Practicals: Array Express Gene Ontology with David Tool Gene Set Enrichment Analysis (GSEA) What is a DNA microarray? A grid of DNA spots on a substrate (chip) used to detect complementary sequences Monitoring the expression of several thousand genes at the same time Hybridization on a chip Probe Array Hybridized Array Detect Labeled cDNA/RNA Fluorescent Stain (for the data Acquisition) Intensity -> how much hybridization occurred for each probe Zoom in..... Zoom in.. Zoom in... How does it look like? Data flow Chip scanning Image Processing Intensity files .CEL (Affymetrix) .txt (Illumina-Agilent) DATA ANALYSIS USING OUR PIPELINE Microarray analysis pipeline http://microarrayanalysis.tigem.it/index_i.html Platform supported 3’ Expression array Mouse-> MOE430A, Mouse430_2, MG_U74Av2 Human-> HG-U133A, HG-U133A_2, HG-U133_Plus_2 Whole Transcript Expression and Exon array Mouse-> Mouse Gene 1.0 ST, Mouse Exon 1.0 ST Human-> Human Gene 1.0 ST, Human Exon 1.0 ST Agilent GE 4x44 Human and Mouse -> two color and one color Illumina Bead Chip Human and Mouse -> WG-6, Ref-8 and HT-12 Affymetrix 3’ microarray A chip consists of a number of probesets. Probesets are intended to measure expression for a specific mRNA Each probeset is complementary to a target sequence which is derived from one or more mRNA sequences Probesets consist of 25mer probe pairs selected from the target sequence: one Perfect Match (PM) and one Mismatch (MM) for each chosen target position. Each chip has a corresponding Chip Description File (CDF) which (among other things) describes probe locations and probeset groupings on the chip. Target sequences and Probes Example: 1415771_at: Description: Mus musculus nucleolin mRNA, complete cds LocusLink: AF318184.1 (NT sequence is 2412 bp long) Target Sequence is 129 bp long 11 probe pairs tiling the target sequence gagaagtcaaccatccaaaactctgtttgtcaaaggtctgtctgaggataccactgaagagaccttaaaagaatcatttgagggctctgttcgtgcaagaatagtcactgatcgggaaactggttctt gagaagtcaaccatccaaaactctgtttgtcaaaggtctgtctgaggataccactgaagagaccttaaaagaatcatttgagggctctgttcgtgcaagaatagtcactgatcgggaaactggttctt gagaagtcaaccatccaaaactctgtttgtcaaaggtctgtctgaggataccactgaagagaccttaaaagaatcatttgagggctctgttcgtgcaagaatagtcactgatcgggaaactggttctt gagaagtcaaccatccaaaactctgtttgtcaaaggtctgtctgaggataccactgaagagaccttaaaagaatcatttgagggctctgttcgtgcaagaatagtcactgatcgggaaactggttctt gagaagtcaaccatccaaaactctgtttgtcaaaggtctgtctgaggataccactgaagagaccttaaaagaatcatttgagggctctgttcgtgcaagaatagtcactgatcgggaaactggttctt gagaagtcaaccatccaaaactctgtttgtcaaaggtctgtctgaggataccactgaagagaccttaaaagaatcatttgagggctctgttcgtgcaagaatagtcactgatcgggaaactggttctt gagaagtcaaccatccaaaactctgtttgtcaaaggtctgtctgaggataccactgaagagaccttaaaagaatcatttgagggctctgttcgtgcaagaatagtcactgatcgggaaactggttctt gagaagtcaaccatccaaaactctgtttgtcaaaggtctgtctgaggataccactgaagagaccttaaaagaatcatttgagggctctgttcgtgcaagaatagtcactgatcgggaaactggttctt gagaagtcaaccatccaaaactctgtttgtcaaaggtctgtctgaggataccactgaagagaccttaaaagaatcatttgagggctctgttcgtgcaagaatagtcactgatcgggaaactggttctt gagaagtcaaccatccaaaactctgtttgtcaaaggtctgtctgaggataccactgaagagaccttaaaagaatcatttgagggctctgttcgtgcaagaatagtcactgatcgggaaactggttctt gagaagtcaaccatccaaaactctgtttgtcaaaggtctgtctgaggataccactgaagagaccttaaaagaatcatttgagggctctgttcgtgcaagaatagtcactgatcgggaaactggttctt Affymetrix probeset Perfect match Probe pair ctgtctgaggataccactgaagaga ctgtctgaggattccactgaagaga Mismatch probe pairs values summarization ONE probeset value Background correction and Normalization Compare different samples on different microarray chips Example Control Tratment Sample1- sample2 - sample3 replicates Sample1 - sample2 - sample3 replicates Normalize all together Differentially expression We want to compare two biologically different conditions through the identification of differentially expressed genes Example Control Tratment Sample1- sample2 - sample3 replicates Sample1 - sample2 - sample3 replicates T-test for each gene Processing Microarray data (from .CEL files to gene expression) Background correction Normalization Expression summary Microarray Analysis Suite (MAS5) (Affimetrix proprietary method ) Robust Multy-array Average (RMA) (Irizarry (2003)) Identifying significant expressed genes in treatment versus control Bayesian t-test (Cyber-T tool) – Multiple testing correction-> False discovery rate (FDR) Paired or unpaired design? Output is a text file (Excel) with the resulting analysis. Microarray Pipeline - step 1 upload your .CEL files On Mac: Microarray Pipeline - step 1 upload your .CEL files On Windows: