* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download An Introduction to Affymetrix Microarrays
Copy-number variation wikipedia , lookup
Cancer epigenetics wikipedia , lookup
Comparative genomic hybridization wikipedia , lookup
Neuronal ceroid lipofuscinosis wikipedia , lookup
No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup
Saethre–Chotzen syndrome wikipedia , lookup
Point mutation wikipedia , lookup
Epigenetics in learning and memory wikipedia , lookup
Epigenetics of neurodegenerative diseases wikipedia , lookup
Genomic imprinting wikipedia , lookup
Public health genomics wikipedia , lookup
Genetic engineering wikipedia , lookup
Long non-coding RNA wikipedia , lookup
Epigenetics of human development wikipedia , lookup
History of genetic engineering wikipedia , lookup
Genome (book) wikipedia , lookup
Gene desert wikipedia , lookup
Genome evolution wikipedia , lookup
Gene therapy wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Gene nomenclature wikipedia , lookup
Genome editing wikipedia , lookup
Gene therapy of the human retina wikipedia , lookup
SNP genotyping wikipedia , lookup
Epigenetics of diabetes Type 2 wikipedia , lookup
Helitron (biology) wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Microevolution wikipedia , lookup
Designer baby wikipedia , lookup
Gene expression profiling wikipedia , lookup
Gene expression programming wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
An Introduction to Affymetrix Microarrays Naomi Altman Dept. of Statistics and Bioinformatics Consulting Centre PSU Oct. 14, 2004 DNA 100 A Statistician’s Simplification Introns are excised – regulatory elements Exon Exons are the coding region of the gene DNA 100 A Statistician’s Simplification cDNA What is a microarray probe? A probe is a spot on an array representing a gene or part of a gene On “cDNA” arrays, the probes are actual pieces of cDNA originally extracted from a cell We may not know the genetic sequence of a cDNA What is a microarray probe? If we know the genetic sequence of the cDNA, we can artificially synthesize a strand of DNA with the same sequence. This is called an oligo(nucleotide). Oligos may be “spotted” on the array like cDNA or may be synthesized on the array cDNA versus Oligos cDNAs have different hybridization properties due to their biochemistry Oligos may be chosen to have similar hybridization properties - and to represent maximally unique parts of genes - or to represent common domains cDNA versus Oligos cDNAs are maintained in cDNA libraries which are expensive to maintain and may be mislabeled or contaminated Oligos are synthesized from genomic sequence information which can be subject to error Format of an Affymetrix Array •Each gene is represented by a “probe set” •Each “probe set” is 16-20 pairs of oligos •Each oligo is 25 nucleotides •A PM (perfect match) probe matches a strand of cDNA •The corresponding MM (mismatch) probe differs from the PM by a change in the central nucleotide •The probe pairs are spatially dispersed •Control probes are printed Format of an Affymetrix Array Printing the Slide Heuristics for “Probe Sets” MM probe is supposed to control for: •Variation in chemical composition •Abundance of cross-hybridizing fragments from other genes By combining PM and MM information from many probes, gene to gene differences should be minimized. Heuristics for “Probe Sets” Initially it was thought that this approach would make it possible to consider the measured intensities as a proxy for actual mRNA concentration. Gene Expression The objective is to combine information from the PM and MM probe set to quantify gene expression. Comparability for a single gene across arrays is essential. Comparability among genes on a single array is desirable. Steps in Data Extraction An intensity is read for each probe. The intensities are corrected for background noise. The intensities are normalized. The intensities are combined into gene expression values. Why Normalize? Normalization is supposed to remove systematic measurement errors, both within and between arrays. This should improve our power to detect differential expression. Affymetrix Expression Values (MAS 5.0) Uses a weighted average of (PM-MM). PM-MM is truncated to a small positive value if it is negative. Weights are small if (PM-MM) is an outlier for the gene, and bigger for central values. Affymetrix Expression Values (MAS 5.0) Based on the 12 PM-MM pairs, a Wilcoxon test is done to determine if the gene expressed in the sample. “Affy p-value” Objections to Affymetrix Normalization The Affymetrix approach is under attack because between 15% - 30% of the MM are greater than the PM. The word from the grapevine is that the newest arrays: have PM only or “poor” MMs have been replaced We do need to normalize across arrays? 2 Biological Samples Other Expression Extraction Methods RMA (robust multi-array analysis) After background correction Probes are normalized using quantile method. PM probes only are combined into gene expression values using “median polish” method that recognizes outliers for the probe across arrays and within the probe set on one array. Other Expression Extraction Methods dChip=Li and Wong Uses a robust regression of PM on MM for each probe set. Other Expression Extraction Methods GCRMA Assumes that the probe intensity is a function of optical noise, non-specific binding, specific binding, whether PM or MM and the probe sequence. Probe intensities are adjusted by this model. Signal is then estimated using median polish. Does the Normalization Matter For Assessing Differential Expression? P -values Message of Figure 4 The lines represent the probe set for a single gene. X-axis – concentration of spiked sample. Y-axis – observed reflectance. Probes do not hybridize at the same rate. MM hybridizes at about the same rate as PM for this gene. The problem with equalizing methods You cannot add a new array without renormalizing all The problem with baseline methods Choice of baseline affects the normalization. What do we recommend at BCC? Of the microarrays we have seen – Affy arrays are the most consistent across samples But – we have seen failures We always do quality control plots: 1. Array image 2. Pairwise probe plots of replicates We use RMA. Software We use the Affy package in R – free, state of the art normalization is easy Other R packages are available for assessing differential expression and clustering (next week – an Intro to R) Conclusions •Affy arrays do need normalization •Probe-wise normalization is recommended. •Equalization methods are best but baseline matching is OK with careful choice of baseline array. •After normalization, PM probes or PM/MM can be combined to estimate expression