* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Introduction to Microarray Analysis (Section D1)
Biology and consumer behaviour wikipedia , lookup
X-inactivation wikipedia , lookup
Epitranscriptome wikipedia , lookup
Epigenomics wikipedia , lookup
Ridge (biology) wikipedia , lookup
Epigenetics of neurodegenerative diseases wikipedia , lookup
Epigenetics in learning and memory wikipedia , lookup
Cancer epigenetics wikipedia , lookup
Transposable element wikipedia , lookup
Oncogenomics wikipedia , lookup
Non-coding DNA wikipedia , lookup
No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup
Polycomb Group Proteins and Cancer wikipedia , lookup
Long non-coding RNA wikipedia , lookup
Genomic imprinting wikipedia , lookup
Gene nomenclature wikipedia , lookup
Pathogenomics wikipedia , lookup
Gene desert wikipedia , lookup
Genetic engineering wikipedia , lookup
Gene therapy wikipedia , lookup
Point mutation wikipedia , lookup
Public health genomics wikipedia , lookup
Gene therapy of the human retina wikipedia , lookup
Genome evolution wikipedia , lookup
Epigenetics of diabetes Type 2 wikipedia , lookup
Genome (book) wikipedia , lookup
Metagenomics wikipedia , lookup
Molecular Inversion Probe wikipedia , lookup
Primary transcript wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Mir-92 microRNA precursor family wikipedia , lookup
History of genetic engineering wikipedia , lookup
Genome editing wikipedia , lookup
Gene expression programming wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Helitron (biology) wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Microevolution wikipedia , lookup
Designer baby wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Functional Genomics Introduction Julie A Dickerson Electrical and Computer Engineering Iowa State University 1 Module Structure: Day 1 Introduction to Functional Genomics Transcriptomics Analysis and Experiment Design for Microarray Data (Dr. Peng Liu) RNA-Seq Data (Mr. Kun Liang) LAB: Using R for Normalizing, processing microarray data, and clustering analysis of ‘omics data (John Van Hemert) Module Structure: Day 2 Metabolomics (Dr. Ann Perera) Proteomics (Dr. Young-Jin Lee) Pathways and data integration methods (Dr. Julie Dickerson and Erin Boggess) Lab: Analyzing integrated sets of microarray, proteomics and metabolomics data (Erin Boggess) BBSI - 2010 June 15, 2010 3 F1: Outline Module Structure What is Functional Genomics? Data Types Available Transcriptomics Basic biology behind microarrays What can you learn from microarrays? Types of arrays Limitations of microarrays 4 Functional Genomics Definition Functional genomics is a field of molecular biology that attempts to make use of the data produced by genomic projects to describe gene (and protein) functions and interactions. Functional genomics focuses on the dynamic aspects such as gene transcription, translation, and protein-protein interactions, as opposed to the static aspects such as DNA sequence or structures. From Wikipedia, the free encyclopedia 5 Genome Wide View of Metabolism Streptococcus pneumoniae Explore capabilities of global network How do we go from a pretty picture to a model we can manipulate? Metabolic Pathways hexokinase phosphoglucoisomerase Metabolites glucose phosphofructokinase aldolase Enzymes phosphofructokinase triosephosphate isomerase G3P dehydrogenase phosphoglycerate kinase phosphoglycerate mutase Reactions & Stoichiometry 1 F6P => 1 FBP Kinetics enolase pyruvate kinase Regulation gene regulation metabolite regulation Metabolic Modeling: The Dream Data Types Available for Determining Function Genomes Genes Proteins Metabolites Phenotypes June 11, 2009 BBSI - 2009 Sequence Microarrays, Nextgen sequencing Proteomics Metabolomics Phenomics 9 A VERY Simplified Eukaryotic Cell chromosome nucleus DNA strands cytoplasm DNA contains thousands of genes. Dan Nettleton, Department of Statistics, IOWA STATE UNIVERSITY,Copyright © 2008 Dan Nettleton 10 Posttranscriptional Modifications to Primary Transcript Primary transcript 3’ UTR 5’ UTR Intervening sequences corresponding to introns that are removed through splicing Primary transcript after modification: messenger RNA (mRNA) G 5’ cap 5’ UTR 3’ UTR Coding portions of RNA sequence corresponding to exons AAAAAA...AAAA poly-A tail Dan Nettleton, Department of Statistics, IOWA STATE UNIVERSITY,Copyright © 2008 Dan Nettleton 11 Transcription takes place inside the nucleus. chromosome nucleus DNA strands cytoplasm Translation takes place outside the nucleus. Dan Nettleton, Department of Statistics, IOWA STATE UNIVERSITY,Copyright © 2008 Dan Nettleton 12 Translation Ribosome mRNA amino acid sequence folds to become a protein Dan Nettleton, Department of Statistics, IOWA STATE UNIVERSITY,Copyright © 2008 Dan Nettleton 13 During translation transfer RNA (tRNA) translates the genetic code codon ... codon U U A A C G A A U U G C ... G tRNA anticodon leu thr amino acids Dan Nettleton, Department of Statistics, IOWA STATE UNIVERSITY,Copyright © 2008 Dan Nettleton 14 The Genetic Code First Base U mRNA codon Second Base C amino acid G A U UUU UUC UUA UUG phe phe leu leu UCU UCC UCA UCG ser ser ser ser UAU UAC UAA UAG tyr tyr STOP STOP UGU UGC UGA UGG cys cys STOP trp C CUU CUC CUA CUG leu leu leu leu CCU CCC CCA CCG pro pro pro pro CAU CAC CAA CAG his his gln gln CGU CGC CGA CGG arg arg arg arg A AUU AUC AUA AUG ile ile ile met ACU ACC ACA ACG thr thr thr thr AAU AAC AAA AAG asn asn lys lys AGU AGC AGA AGG ser ser arg arg G GUU GUC GUA GUG val val val val GCU GCC GCA GCG ala ala ala ala GAU GAC GAA GAG asp asp glu glu GGU GGC GGA GGG gly gly gly gly 15 Dan Nettleton, Department of Statistics, IOWA STATE UNIVERSITY,Copyright © 2008 Dan Nettleton Miscellaneous Comments The biology is more complicated than I described. Humans have somewhere around 30,000 genes. (The exact number is a subject for debate.) Regulation of these genes seems to be more important than number! Much of the variation is created by differences in how cells use the genes they have. Microarrays are a tool that can help us understand how cells of various types use their genes in response to varying conditions. 16 Microarrays 5/23/2017 With only a few exceptions, every cell of the body contains a full set of chromosomes and identical genes. Only a fraction of these genes are turned on, however, and it is the subset that is "expressed" that confers unique properties to each cell type. "Gene expression" is the term used to describe the transcription of the information contained within the DNA, the repository of genetic information, into messenger RNA (mRNA) molecules that are then translated into the proteins that perform most of the critical functions of cells. BCB570 Gene Expression Data Analysis 17 Microarrays Microarrays work by exploiting the ability of a given mRNA molecule (target) to bind specifically to, or hybridize to, the DNA template (probe) from which it originated. This mechanism acts as both an "on/off" switch to control which genes are expressed in a cell as well as a "volume control" that increases or decreases the level of expression of particular genes as necessary. Source: The Genetic Science Learning Center, University of Utah 5/23/2017 BCB570 Gene Expression Data Analysis 18 DNA Microarrays Small, solid supports onto which the sequences from thousands of different genes are immobilized, or attached, at fixed locations. The DNA is printed, spotted, or actually synthesized directly onto the support. The spots themselves can be DNA, complementary DNA (cDNA, DNA synthesized from a mRNA template) , or oligonucleotides. (or oligo, a short fragment of a single-stranded DNA that is typically 5 to 50 nucleotides long) 5/23/2017 BCB570 Gene Expression Data Analysis 19 Why do microarray experiments? Comparing two conditions to find differentially expressed genes Compare more than two conditions; some of which may interact Control/treatment Disease/normal Different treatments, different strains Exploratory analysis 5/23/2017 What genes are expressed under drought stress? BCB570 Gene Expression Data Analysis 20 Why use microarrays (cont)? What happens over time? Developmental stages Predicting certain conditions (cancer vs. normal) Patterns of gene expression that characterize a patient’s or organism’s response 5/23/2017 BCB570 Gene Expression Data Analysis 21 Differentially Expressed Genes Find genes that show a large difference in expression between groups and are similar within a group Statistical tests (t-test), look at if the groups have different means or variances (chi-squared, F-statistics) Adapted from “Practical Microarray Analysis”, Presentation by Benedikt Brors, German Cancer Research Center 5/23/2017 BCB570 Gene Expression Data Analysis 22 Multiple Conditions Mutant 1 Inoculated Mutant 2 Control Inoculated Control Are there differences in expression level between the k conditions? Analysis of Variance (ANOVA) 5/23/2017 BCB570 Gene Expression Data Analysis 23 Some Example Microarray Experiments from Iowa State University Jim Reecy from Animal Science: muscle undergoing hypertrophy vs. normal muscle David Putthoff, Steve Rodermel, Thomas Baum from Plant Pathology: roots infected with soybean cyst nematodes vs. uninfected roots Anne Bronikowski in Genetics: wheel-running mice vs. non-runners Roger Wise, Rico Caldo in Plant Pathology: interaction between multiple isolates of powdery mildew and multiple genotypes of barley. Dan Nettleton, Department of Statistics, IOWA STATE UNIVERSITY,Copyright © 2008 Dan Nettleton 24 Wild-type vs. Myostatin Knockout Mice Belgian Blue cattle have a mutation in the myostatin gene. Identifying Genes Involved in Pathways That Distinguish Compatible from Incompatible Interactions Barley Genotype Mla6 Mla13 Mla1 Incompatible Compatible Incompatible Compatible Incompatible Incompatible Bgh Isolate 5874 K1 Caldo, Nettleton, Wise (2004). The Plant Cell. 16, 2514-2528. 26 An Example Gene of Interest Log Expression Incompatible Compatible Hours after Inoculation Dan Nettleton, Department of Statistics, IOWA STATE UNIVERSITY,Copyright © 2008 Dan Nettleton 27 Exploratory Analysis Find patterns in data to see what genes are expressed under different conditions Analysis includes clustering methods Used when little or no prior knowledge exists about the problem 5/23/2017 BCB570 Gene Expression Data Analysis 28 Fig. 5 (see Supplemental data at http://www.pnas.orgwww.pnas.org) for the full cluster diagram with all gene names\] Perou, Charles M. et al. (1999) Proc. Natl. Acad. Sci. USA 96, 9212-9217 5/23/2017 Copyright ©1999 by the National Academy of Sciences BCB570 Gene Expression Data Analysis 29 Time Series 0 hours 5/23/2017 4 hours 12 hours 24 hours Goal: find patterns of co-expressed genes over time or partial time Typical length is 3-10 time points Cluster to find similar patterns (k-means, self-organizing maps) Correlations to find genes that behave like a given gene of interest. BCB570 Gene Expression Data Analysis 30 Classification Learn characteristic patterns from a training set and evaluate with a test set. Classify tumor types based on expression patterns Predict disease susceptibility, stages, etc. 5/23/2017 BCB570 Gene Expression Data Analysis 31 Source: “Practical Microarray Analysis”, Presentation byData Analysis 5/23/2017 BCB570 Gene Expression Benedikt Brors, German Cancer Research Center 32 Some Commonly Used Tools for Microarray Analysis Oligonucleotide arrays Affymetrix GeneChips Nimblegen Agilent 33 Oligonucleotides An oligonucleotide is a short sequence of nucleotides. (oligonucleotide=oligo for short) An oligonucleotide microarray is a microarray whose probes consist of synthetically created DNA oligonucleotides. Probes sequences are chosen to have good and relatively uniform hybridization characteristics. A probe is chosen to match a portion of its target mRNA transcript that is unique to that sequence. Oligo probes can distinguish among multiple mRNA transcripts with similar sequences. Dan Nettleton, Department of Statistics, IOWA STATE UNIVERSITY,Copyright © 2008 Dan Nettleton 34 Simplified Example ... gene 1 ... oligo probe for gene 1 ATTACTAAGCATAGATTGCCGTATA ...gene 2 ... shared green regions indicate high degree of sequence similarity throughout much of the transcript 5/23/2017 GCGTATGGCATGCCCGGTAAACTGG BCB570 Gene Expression Data Analysis Source: Dan Nettleton Course Notes Statistics 416/516X oligo probe for gene 2 35 Oligo Microarray Fabrication Oligos can be synthesized and stored in solution. Oligo sequences can be synthesized on a slide or chip using various commercial technologies. The company Affymetrix uses a photolithographic approach which we will describe briefly. Dan Nettleton, Department of Statistics, IOWA STATE UNIVERSITY,Copyright © 2008 Dan Nettleton 36 Affymetrix GeneChips Affymetrix (www.affymetrix.com) manufactures GeneChips. GeneChips are oligonucleotide arrays. Each gene (more accurately sequence of interest or feature) is represented by multiple short (25-nucleotide) oligo probes. Some GeneChips include probes for around 120,000 genes and gene variants. mRNA that has been extracted from a biological sample can be labeled (dyed) and hybridized to a GeneChip. Only one sample is hybridized to each GeneChip. Dan Nettleton, Department of Statistics, IOWA STATE UNIVERSITY,Copyright © 2008 Dan Nettleton 37 Different Probe Pairs Represent Different Parts of the Same Gene gene sequence Probes are selected to be specific to the target gene and have good hybridization characteristics. 5/23/2017 BCB570 Gene Expression Data Analysis Source: Dan Nettleton Course Notes Statistics 416/516X 38 Affymetrix Probe Sets A probe set is used to measure mRNA levels of a single gene. Each probe set consists of multiple probe cells. Each probe cell contains millions of copies of one oligo. Each oligo is intended to be 25 nucleotides in length. Probe cells in a probe set are arranged in probe pairs. Each probe pair contains a perfect match (PM) probe cell and a mismatch (MM) probe cell. A PM oligo perfectly matches part of a gene sequence. A MM oligo is identical to a PM oligo except that the middle nucleotide (13th of 25) is replaced by its complementary nucleotide. 39 Dan Nettleton, Department of Statistics, IOWA STATE UNIVERSITY,Copyright © 2008 Dan Nettleton A Probe Set for Measuring Expression Level of a Particular Gene gene sequence ...TGCAATGGGTCAGAAGGACTCCTATGTGCCT... perfect match sequence AATGGGTCAGAAGGACTCCTATGTG mismatch sequence AATGGGTCAGAACGACTCCTATGTG probe pair probe cell probe set 40 Dan Nettleton, Department of Statistics, IOWA STATE UNIVERSITY,Copyright © 2008 Dan Nettleton Different Probe Pairs Represent Different Parts of the Same Gene gene sequence Probes are selected to be specific to the target gene and have good hybridization characterictics. Dan Nettleton, Department of Statistics, IOWA STATE UNIVERSITY,Copyright © 2008 Dan Nettleton 41 Affymetrix’s Photolithographic Approach mask mask mask mask mask mask mask mask A T G A C T T C T T C A C A A G GeneChip Dan Nettleton, Department of Statistics, IOWA STATE UNIVERSITY,Copyright © 2008 Dan Nettleton 42 43 Source: www.affymetrix.com Source: www.affymetrix.com 44 Source: www.affymetrix.com 45 Image from Hybridized GeneChip Source: www.affymetrix.com 46 Image Processing for Affymetrix GeneChips Image processing for Affymetrix GeneChips is typically done using proprietary Affymetrix software. The entire surface of a GeneChip is covered with square-shaped cells containing probes. Probes are synthesized on the chip in precise locations. Thus spot finding and image segmentation are not major issues. Dan Nettleton, Department of Statistics, IOWA STATE UNIVERSITY,Copyright © 2008 Dan Nettleton 47 Probe Cell 8 x 8 =64 pixels border pixels excluded 75th percentile of the 36 pixel intensities corresponding to the center 36 pixels is used to quantify fluorescence intensity for each probe cell. These values are called PM values for perfect-match probe cells and MM values for mismatch probe cells. The PM and MM values are used to compute expression measures for each probe set. 48 Dan Nettleton, Department of Statistics, IOWA STATE UNIVERSITY,Copyright © 2008 Dan Nettleton Normalization Outputs from each individual probe pair are statistically combined to give an expression level for the gene represented by the probe set. Normalization accounts for background noise on the chip, levels of control probes, etc Key methods are MAS5.0, RMA, GCRMA Summary of Microarrays Positives: commercial chips are accurate and repeatable in experienced hands and the statistics and modeling have been wellexplored Negatives: cost, can only see what is on the chip and difficult to update to new knowledge. June 11, 2007 BBSI - 2007 50 Short Read Sequencing Sequencing technology has evolved in the last 15 years Eventual goal is to be able to sequence a genome for $1000 (NIH). Why not just sequence the transcriptome directly and see what is there? June 11, 2007 BBSI - 2007 51 Sequencing by synthesis (454) Takes a single strand of DNA and synthesizes its complementary strand enzymatically one base pair at a timedetecting which base was actually added at each step. Pyrosequencing detect the activity of DNA polymerase with a chemiluminescent enzyme. Reads are about 400-500 bp June 11, 2007 BBSI - 2007 52 Other Techologies Illumina Solexa: 40-100 bp, tag DNA or RNA at both ends ABI SOLID around 50 bp Digital Gene Expression Sequence census methods for functional genomics Barbara Wold & Richard M Myers