* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Powerpoint slides - Berkeley Statistics
Gene therapy of the human retina wikipedia , lookup
Metagenomics wikipedia , lookup
DNA damage theory of aging wikipedia , lookup
RNA silencing wikipedia , lookup
Minimal genome wikipedia , lookup
Oncogenomics wikipedia , lookup
Bisulfite sequencing wikipedia , lookup
Polycomb Group Proteins and Cancer wikipedia , lookup
History of RNA biology wikipedia , lookup
DNA supercoil wikipedia , lookup
Nucleic acid double helix wikipedia , lookup
Cancer epigenetics wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Molecular cloning wikipedia , lookup
Genome (book) wikipedia , lookup
Non-coding RNA wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Epitranscriptome wikipedia , lookup
Genomic library wikipedia , lookup
Human genome wikipedia , lookup
DNA vaccination wikipedia , lookup
Gene expression profiling wikipedia , lookup
Epigenomics wikipedia , lookup
Cre-Lox recombination wikipedia , lookup
No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup
Comparative genomic hybridization wikipedia , lookup
Cell-free fetal DNA wikipedia , lookup
Extrachromosomal DNA wikipedia , lookup
Genetic engineering wikipedia , lookup
Point mutation wikipedia , lookup
Genome evolution wikipedia , lookup
Designer baby wikipedia , lookup
Mir-92 microRNA precursor family wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Non-coding DNA wikipedia , lookup
Helitron (biology) wikipedia , lookup
Microevolution wikipedia , lookup
Genome editing wikipedia , lookup
Deoxyribozyme wikipedia , lookup
Nucleic acid analogue wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Primary transcript wikipedia , lookup
Introduction to genome biology and microarrays experiment Statistics for Microarray Data Analysis – Lecture 1 The Fields Institute for Research in Mathematical Sciences May 25, 2002 1 The human genome • The cell is the fundamental working unit of every living organism. • Humans: trillions of cells (metazoa); other organisms like yeast: one cell (protozoa). • Cells are of many different types (e.g. blood, skin, nerve cells), but all can be traced back to a single cell, the fertilized egg. 2 Genes • The human genome is distributed along 23 pairs of chromosomes. - 22 autosomal pairs; - the sex chromosome pair, XX for females and XY for males. • In each pair, one chromosome is paternally inherited, the other maternally inherited. • Chromosomes are made of compressed and entwined DNA. • A (protein-coding) gene is a segment of chromosomal DNA that directs the synthesis of a protein. 3 Chromosomes and DNA 4 DNA • A deoxyribonucleic acid or DNA molecule is a double-stranded polymer composed of four basic molecular units called nucleotides. • Each nucleotide comprises a phosphate group, a deoxyribose sugar, and one of four nitrogen bases: adenine (A), guanine (G), cytosine (C), and thymine (T). • The two chains are held together by hydrogen bonds between nitrogen bases. • Base-pairing occurs according to the following rule: G pairs with C, and A pairs with T. 5 6 Genetic and physical maps 7 Genetic and physical maps • Physical distance: number of base pairs (bp). • Genetic distance: expected number of crossovers between two loci, per chromatid, per meiosis. • Measured in Morgans (M) or centiMorgans (cM). • 1cM ~ 1 million bp (1Mb) in humans 8 Central dogma • The expression of the genetic information stored in the DNA molecule occurs in two stages: (i) transcription, during which DNA is transcribed into mRNA; (ii) translation, during which mRNA is translated to produce a protein. DNA mRNA protein • Other important aspects of regulation: methylation, alternative splicing, etc. • The correspondence between DNA's four-letter alphabet and a protein's twenty-letter alphabet is specified by the genetic code, which relates nucleotide triplets to amino acids. 9 Idea: measure the amount of mRNA to see which genes are being expressed in (used by) the cell. Measuring protein might be better, but is currently harder. 10 Differential expression • Each cell contains a complete copy of the organism's genome. • Cells are of many different types and states E.g. blood, nerve, and skin cells, dividing cells, cancerous cells, etc. • What makes the cells different? • Differential gene expression, i.e., when, where, and in what quantity each gene is expressed. • On average, 40% of our genes are expressed at any given time. 11 RNA • A ribonucleic acid or RNA molecule is a nucleic acid similar to DNA, but - single-stranded; - ribose sugar rather than deoxyribose sugar; - uracil (U) replaces thymine (T) as one of the bases. • RNA plays an important role in protein synthesis and other chemical activities of the cell. • Several classes of RNA molecules, including messenger RNA (mRNA), transfer RNA (tRNA), ribosomal RNA (rRNA), and other small RNAs. 12 Exons and introns • Genes comprise only about 2% of the human genome; the rest consists of non-coding regions, whose functions may include providing chromosomal structural integrity and regulating when, where, and in what quantity proteins are made (regulatory regions). • The terms exon and intron refer to coding (translated into a protein) and non-coding DNA, respectively. 13 Functional genomics • The various genome projects have yielded the complete DNA sequences of many organisms. E.g. human, mouse, yeast, fruitfly, etc. • Human: 3 billion base-pairs, 30-40 thousand genes. • Challenge: go from sequence to function, i.e., define the role of each gene and understand how the genome functions as a whole. 14 Introduction to microarrays 15 Gene expression assays • DNA microarrays rely on the hybridization properties of nucleic acids to monitor DNA or RNA abundance on a genomic scale in different types of cells. • The main types of gene expression assays: - High-density nylon membrane arrays; Serial analysis of gene expression (SAGE); Short oligonucleotide arrays (Affymetrix); Long oligonucleotide arrays (Agilent); Fibre optic arrays (Illumina); cDNA arrays (Brown/Botstein)*. 16 Applications of microarrays • • • • • • • Measuring transcript abundance; Genotyping; Estimating DNA copy number; Determining identity by descent; Measuring mRNA decay rates; Identifying protein binding sites; Determining sub-cellular localization of gene products; • …….. 17 18 The process Building the chip: MASSIVE PCR PCR PURIFICATION and PREPARATION PREPARING SLIDES RNA preparation: CELL CULTURE AND HARVEST PRINTING Hybing the chip: POST PROCESSING ARRAY HYBRIDIZATION RNA ISOLATION DATA ANALYSIS cDNA PRODUCTION PROBE LABELING 19 Taken from Schena & Davis Building the chip Arrayed Library (96 or 384-well plates of bacterial glycerol stocks) PCR amplification Directly from colonies with SP6-T7 primers in 96-well plates Consolidate into 384-well plates Spot as microarray on glass slides 20 Ngai Lab, UC Berkeley) Pins collect cDNA from wells 384 well plate Print-tip group 1 cDNA clones Contains cDNA probes Glass Slide Array of bound cDNA probes 4x4 blocks = 16 print-tip groups Spotted in duplicate Print-tip group 6 21 Sample preparation 22 Hybridization Binding of cDNA target samples to cDNA probes on the slide Hybridize for 5-12 hours 23 Hybridization chamber 3XSSC HYB CHAMBER ARRAY LIFTERSLIP SLIDE LABEL SLIDE LABEL • Humidity • Temperature • Formamide (Lowers the Tm) 24 Expression profiling with DNA microarrays cDNA “B” Cy3 labeled cDNA “A” Cy5 labeled Laser 1 Hybridization Laser 2 Scanning + Analysis Image Capture 25 RGB overlay of Cy3 and Cy5 images 26 Raw data • Human cDNA arrays - ~ 43K spots; 16–bit TIFFs: ~ 20Mb per channel; ~ 2,000 x 5,500 pixels per image; Spot separation: ~ 136um; For a “typical” array: Mean = 43, med = 32, SD = 26 pixels per spots 27 Image analysis • The raw data from a cDNA microarray experiment consist of pairs of image files, 16-bit TIFFs, one for each of the dyes. • Image analysis is required to extract measures of the red and green fluorescence intensities for each spot on the array. 28 Image analysis 29 GenePix Image analysis 1. Addressing. Estimate location of spot centers. 2. Segmentation. Classify pixels as foreground (signal) or background. 3. Information extraction. For each spot on the array and each dye • signal intensities; • background intensities; • quality measures. R and G for each spot on the array. 30 Spot • Batch automatic addressing. • Segmentation. Seeded region growing (Adams & Bischof 1994): adaptive segmentation method, no restriction on the size or shape of the spots. • Intensity extraction - Foreground. Mean of pixel intensities within a spot. - Background. Morphological opening: non-linear filter which generates an image of the estimated background intensity for the entire slide. • Spot quality measures. • Software package. Spot, built on the freely available software package R. 31 Quality measures • Spot - One channel, R or G - Signal/noise ratio; - Variation in pixel intensities; - Identification of “bad spots” (no signal), etc. - Two channels, R/G - Brightness: foreground/background ratio; - Uniformity: variation in pixel intensities and ratios of intensities; - Morphology: area, perimeter, circularity. • Array (slide) - Percentage of spots with no signal; - Range of intensities; - Distribution of spot signal area, etc. • How to use quality measures in subsequent analyses? 32 Terminology • Probe: DNA spotted on the array, aka. spot, immobile substrate. • Target: DNA hybridized to the array, mobile substrate. • Sector: collection of spots printed using the same print-tip (or pin), aka. print-tip-group, pin-group, spot matrix, grid. • Batch: collection of slides with the same probe layout. • The terms slide or array are often used to refer to the printed microarray. 33 Biological Question Data Analysis & Modelling Microarray Life Cycle Sample Preparation Microarray Detection Microarray Reaction Taken from Schena & Davis 34 WWW resources • Complete guide to “microarraying” http://cmgm.stanford.edu/pbrown/mguide/ • http://www.microarrays.org - Parts and assembly instructions for printer and scanner; - Protocols for sample prep; - Software; - Forum, etc. • Animation: http://www.bio.davidson.edu/courses/genomics/chip/chip.html 35 Acknowledgments Introduction to biology: based on Bioconductor short course lecture 1 (http://www.bioconductor.org/) and Temple short course lecture 1 with pictures from http://www.accessexcellence.org/ Introduction to microarray: based on IPAM, UCLA http://www.ipam.ucla.edu/programs/fg2000/tutorials.html#terry_speed Thanks also to: Natalie Thorne (WEHI) Ingrid Lönnstedt (Uppsala) 36