* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Medicago Genomics and Bioinformatics
Quantitative trait locus wikipedia , lookup
Saethre–Chotzen syndrome wikipedia , lookup
RNA silencing wikipedia , lookup
Essential gene wikipedia , lookup
Neuronal ceroid lipofuscinosis wikipedia , lookup
Oncogenomics wikipedia , lookup
RNA interference wikipedia , lookup
X-inactivation wikipedia , lookup
Metagenomics wikipedia , lookup
Long non-coding RNA wikipedia , lookup
Genetic engineering wikipedia , lookup
Epigenetics of neurodegenerative diseases wikipedia , lookup
Pathogenomics wikipedia , lookup
Epigenetics of diabetes Type 2 wikipedia , lookup
Public health genomics wikipedia , lookup
Gene therapy wikipedia , lookup
Gene therapy of the human retina wikipedia , lookup
Polycomb Group Proteins and Cancer wikipedia , lookup
History of genetic engineering wikipedia , lookup
Genomic imprinting wikipedia , lookup
Minimal genome wikipedia , lookup
Gene desert wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Gene nomenclature wikipedia , lookup
Biology and consumer behaviour wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Genome evolution wikipedia , lookup
Ridge (biology) wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Genome (book) wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Gene expression programming wikipedia , lookup
Microevolution wikipedia , lookup
Designer baby wikipedia , lookup
PLPTH 890 Introduction to Genomic Bioinformatics Lecture 26 Microarray Data Analysis - III Liangjiang (LJ) Wang [email protected] April 19, 2005 Outline • Statistical tests and clustering (review). • Use of Gene Ontology (GO) in microarray data analysis. Sample acquisition RNA: purification, labeling Data acquisition Microarray: hybridization, washing, image analysis Data analysis Data: preprocessing, statistical inference, clustering analysis, . . . (Hypothesis generation) Hypothesis testing Biological insight Finding Significant Genes • Fold change: use a single fold change threshold to select genes; does not take into account the variability inherent in the microarray data. • Student’s t test: tests whether a difference is significant by comparing gene expression measurements between two conditions. • ANOVA (ANalysis Of VAriance): used to find significant genes in more than two conditions. • Correction for multiple testing: p ( / N ) – Bonferroni correction: – False Discovery Rate (FDR): i p q N Clustering Analysis • Unsupervised methods for discovering gene expression patterns and data structures. • There is no single method that is best for every dataset. • Commonly used clustering methods: – Hierarchical clustering: good for visualizing patterns, but often misused to partition data. – k-means: a simple method to partition data into a fixed number (k) of clusters. – Self-Organizing Map (SOM): a neural networkbased clustering approach. Self-Organizing Map (SOM) • The user defines an initial geometry of nodes (reference vectors) for the partitions such as a 3 x 2 rectangular grid. • During the iterative “training” process, the nodes migrate to fit the gene expression data. • The genes are mapped to the most similar reference vector. Genes in a Cluster May Be Co-Regulated Microarrays measure steady-state levels of mRNAs. Multi-level regulation RNA synthesis RNA processing RNA turnover Protein synthesis Protein modification and degradation Transcriptional regulation Posttranscriptional regulation Translational regulation Posttranslational regulation Beyond Clustering Analysis • Using GO to understand significant functional associations of a gene cluster. • Mapping gene expression data onto biochemical pathways. • Mapping gene expression data onto proteinprotein interaction networks. • Discovering regulatory elements shared by the promoters of co-expressed genes. • Inferring gene regulatory networks. What Is an Ontology? • An ontology is a set of terms, relationships and definitions that capture the knowledge of a certain domain. • Terms represent a controlled vocabulary, and define the concepts of a domain. • Terms are linked by relationships, which constitute a semantic network. • Ontologies augment natural language annotations and can be more easily processed computationally. The Gene Ontology (GO) • Providing structured vocabularies for describing gene products in the domain of molecular biology. • Enabling a common understanding of model organisms and between databases. • Consisted of three structurally unlinked hierarchies (molecular function, biological process and cellular component). • 2 types of relationships between GO terms: – is-a: subclass. – part-of: physical part of, or subprocess of. Three GO Hierarchies • Molecular function: elemental activity/task (what) (e.g., DNA-binding, polymerase, transcription factor) • Biological process: goal or objective (why) (e.g., mitosis, DNA replication, cell cycle control) • Cellular component: location or complex (where) (e.g., nucleus, ribosome, pre-replication complex) (Gene Ontology information can be accessed at http://www.geneontology.org/) Example: Gene Ontology Hierarchy Biological process (GO:0008150) i i Development (GO:0007275) i … … … P part of … … i P is a Behavior (GO:0007610) i Cell growth (GO:0008151) Cell aging Programmed (GO:0007569) (GO:0012501) … … … Physiological (GO:0007582) i i Communication Cell death (GO:0007154) (GO:0008219) P i i Cellular process (GO:0009987) i … … … i … … … … i Induction Apoptosis (GO:0012502) (GO:0006915) i … … … HS response (GO:0009626) … i Autophagic cell death (GO:0048102) … … … … Gene Annotation Using GO Terms • Association of GO terms with gene products based on evidence from literature reference or computational analysis. • The creation of GO and the association of GO terms with gene products (gene annotation) are two independent operations. • A gene can be associated with one or more GO terms (gene categories), and one category normally has many genes (many-to-many relationship between genes and GO terms). Example of Molecular Function (The Gene Ontology Consortium, 2000) Example of Biological Process (The Gene Ontology Consortium, 2000) Example of Cellular Component (The Gene Ontology Consortium, 2000) Genes from the Same Biological Process Tend to Be Co-Expressed (The Gene Ontology Consortium, 2000) Gene Names Bio Process How to Assess Overrepresentation of a GO Term? Genes on an array: Total number of genes (N): Number of genes – cell cycle (R): Genes in a cluster: Number of genes in the cluster (n): Number of genes – cell cycle (r): 2,285 161 147 25 Is the GO term (i.e., cell cycle) significantly overrepresented in the cluster? Using the Z-Statistic • Assume the hypergeometric distribution. • The z-score: (observed expected) z stdev(observed ) R r n N R N n R n 1 N N 1 N • For the example: 161 25 147 2285 z 4.88 161 2285 147 161 147 1 2285 2285 2285 1 Using the Fisher Exact Test • Contingency table: Cluster in GO class in out out a c b d a=r b=R-r c=n-r d=N-R-n+r • Probability of finding a genes of the GO class in the cluster: (a b)!(c d )!(a c)!(b d )! pa N !a!b!c!d! • The p value: a b p pi i a MAPPFinder • A tool for mapping gene expression data to the GO hierarchies. • Part of the free software package GenMAPP. • Available at http://www.genmapp.org/. (Doniger et al., 2003) MAPPFinder Sample Output (Doniger et al., 2003) GoMiner • A client-server application using Java (data on the server side). • Available at http://discover.nci.nih.gov/gominer/. (Zeeberg et al., 2003) Onto-Express • A web application for GO-based microarray data analysis (http://vortex.cs.wayne.edu/Projects.html). • The input to Onto-Express is a list of Affymetrix probe IDs, GenBank sequence accessions or UniGene cluster IDs. • Part of the integrated Onto-Tools, including: – Onto-Compare: compare commercial arrays. – Onto-Design: help array design (probe selection). – Onto-Translate: provide mapping of different IDs. p GO # genes (Genes linked to poor breast cancer outcome) Summary • “Statistical significance is fine, but biological significance is better” (Baxevanis and Ouellette, 2005). • Gene Ontology (GO) can be used to assess significant functional associations of a gene cluster or a list of significant genes. • Several tools are available to assist the GO-based analysis of microarray data. • Next: pathways and regulatory networks.