* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Slide 1
Gene therapy of the human retina wikipedia , lookup
Gene nomenclature wikipedia , lookup
Public health genomics wikipedia , lookup
Long non-coding RNA wikipedia , lookup
Genome (book) wikipedia , lookup
Metagenomics wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Pathogenomics wikipedia , lookup
Epigenetics of diabetes Type 2 wikipedia , lookup
Microevolution wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Genome evolution wikipedia , lookup
Designer baby wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Mir-92 microRNA precursor family wikipedia , lookup
Making Sense of Public Domain Expression Data- GeneVestigator Metsada Pasmanik-Chor, TAU Bioinformatics Unit, 19/3/09 1 On the Agenda Microarray databases – characteristics pros and cons Examples: • GEO and ArrayExpress • GeneVestigator - meta-analytical approach Metsada Pasmanik-Chor, TAU Bioinformatics Unit, 19/3/09 2 Meta-data in Microarray Experiments Gene expression studies generate large amounts of data ! Metsada Pasmanik-Chor, TAU http://titan.biotec.uiuc.edu/cs491jh/slides/cs491jh-Yong.ppt#268,6,Capturing Data and Meta-data in Bioinformatics Unit, 19/3/09 Microarray Experiments 3 Properties of High-throughput Data Microarray databases: have the ability to accept, store and export (share) large quantities of data. Data (stored) contain: Many genes Many samples Various organisms/tissues Variety of biological phenomena Time course Replicates Different technologies: various data format Data Retrieval: user-friendly web-based interfaces Links to Analysis Tools Metsada Pasmanik-Chor, TAU Bioinformatics Unit, 19/3/09 4 Gene Expression Matrix Genes Spots The final gene expression matrix (on the right) is needed for higher level analysis and mining Images Samples Spot/Image quantiations Metsada Pasmanik-Chor, TAU Bioinformatics Unit, 19/3/09 Expression Matrix http://titan.biotec.uiuc.edu/cs491jh/slides/cs491jh-Yong.ppt#271,8,Gene Gene expression levels 5 Microarray Data Precision and Loss Electron microscopy Only provided in 0.1% of public experiments Processed data loses precision ! 90% of CEL files generated from microarray experiments have never been deposited to any repository. Stokes et al. BMC Bioinformatics 2008 9(Suppl 6):S18 http://www.bio-miblab.org/arraywiki Metsada Pasmanik-Chor, TAU Bioinformatics Unit, 19/3/09 6 Microarray Data Formats A. Raw image data, the intensity of the signal at each spot is proportional to the expression level of the gene under test. Image intensities are quantified using image analysis software. B. Raw numerical data (signal intensities). A. C. Processed data. C. B. Metsada Pasmanik-Chor, TAU Bioinformatics Unit, 19/3/09 7 Complete description of complex experiments is desired. We don’t always know what’s important: “Noise” probes could end up being informative (e.g. detection of a splice variant). Different labs have different needs – a central system is needed ! The Future Better (more accurate) summarization algorithms will emerge. New uses for raw data may emerge. Challenge: Store the raw data in accessible form. Metsada Pasmanik-Chor, TAU Bioinformatics Unit, 19/3/09 8 Complexity and Categories of Data The MIAME (Minimum and MIAME 6 parts Information About a Microarray Experiment) guidelines contain standards for publication of information. Brazma et al. (2001), Nature Genetics 29(4), 365-71 Publication Experimental design Source (e.g., Taxonomy) Sample – Source & treatment, prep. & labelling Normalization Hybridisation Array design Gene (e.g., EMBL) Data measurements Metsada Pasmanik-Chor, TAU Bioinformatics Unit, 19/3/09 http://www.ict.ox.ac.uk/odit/projects/digitalrepository/docs/workshop/Helen_Parkinson-RDMW0608.ppt#429,18,Slide 18 9 Microarray Database Repositories are Biased The relative size of each pie corresponds to the number of experiments contained in each repository. All human data Mostly custom arrays Mostly human data Mostly old data Mainly Affy chips Metsada Pasmanik-Chor, Stokes et al. BMC Bioinformatics 2008 9 (Suppl 6): S18TAU Bioinformatics Unit, 19/3/09 http://www.biomedcentral.com/1471-2105/9/S6/S18 10 Overlaps of Data Between Repositories Stokes et al. BMC Bioinformatics 2008 9 (Suppl 6): S18 http://www.biomedcentral.com/1471-2105/9/S6/S18 Total Experiments: 2376 August 2005 – June 2006 Metsada Pasmanik-Chor, TAU Bioinformatics Unit, 19/3/09 11 User-Friendly Microarray Databases Many gene expression databases exist: commercial and non-commercial. Most focus on either a particular technology, particular organism or both. We will discuss most promising ones: ArrayExpress – EBI (AE) The Gene expression Omnibus (GEO; NCBI) GeneVestigator Metsada Pasmanik-Chor, TAU Bioinformatics Unit, 19/3/09 12 http://www.ncbi.nlm.nih.gov/geo/ The Gene Expression Omnibus is a public repository in the Entrez database that includes high-throughput gene expression data, hosted at the National library of Medicine (NIH). GEO was designed to accommodate diverse types of data. Metsada Pasmanik-Chor, TAU Bioinformatics Unit, 19/3/09 13 Gene Express Omnibus - Experiment centered view (GDS) 14 Gene Express Omnibus - Gene centered view Example: GDS563 Expression profile of the Dystrophin gene in a DataSet examining skeletal muscle biopsies from 12 Duchenne muscular dystrophy patients and 12 normal subjects. Red bars: level of abundance of an individual transcript across the Samples that make up a DataSet. Normal Duchenne Values are presented as arbitrary units. Single channel: normalized Experimental design Values signal count data. Dual channel: submitted Values are Faded bars/squares: These correspond to normalized log ratios. Affymetrix 'Detection call' = Absent. Blue square rank order, give an indication of where the expression of that gene falls with respect to all other genes on that array Metsada Pasmanik-Chor, TAU 15 (enrichment). Bioinformatics Unit, 19/3/09 http://www.ebi.ac.uk/microarray-as/ae/ Metsada Pasmanik-Chor, TAU Bioinformatics Unit, 19/3/09 16 Query ArrayExpress Annotations Experiments and description Click Condition Gene name Species Results: a list of all experiments, ordered by p value. For each experiment: short description, experimental factors and gene expression. Metsada Pasmanik-Chor, TAU Bioinformatics Unit, 19/3/09 17 Query ArrayExpress – similar expressed genes Select the ‘find 3 closest genes’ option. IER2, FOS, JUN, have similar expression to nfkbia. Metsada Pasmanik-Chor, TAU Bioinformatics Unit, 19/3/09 18 HeatMap Atlas Output Experimental condition Number of up/down regulated genes Metsada Pasmanik-Chor, TAU http://www.ebi.ac.uk/microarrayas/atlas/qr?q_gene=saa4&q_updn=updn&q_orgn=MUS+MUSCULUS&q_expt=%28all+conditions%29&view=heatmap&view= Bioinformatics Unit, 19/3/09 19 https://www.genevestigator.com/gv/index.jsp Metsada Pasmanik-Chor, TAU Bioinformatics Unit, 19/3/09 GeneVesigator – a reference expression database and meta-analysis system 20 Genevestigator – a system for the metaanalysis of microarray data A database & Web-browser data mining interface for Affymetrix GeneChip data, based on a the new concept of “Meta-Profiles“, relying on reference expression databases. Allows biologists to study the expression and regulation of genes in a broad variety of contexts by summarizing information from hundreds of manually curated microarray experiments. Workspaces and views can be stored into files and re-opened for another analysis session (*.gvw which stands for GenevestigatorWorkspace). Application server Java application Analysis output Metsada Pasmanik-Chor, TAU http://bar.utoronto.ca/ICAR19/ICAR19_BioinfoWorkshop%20-%20Genevestigator.ppt#257,2,Overview of the Genevestigator system Bioinformatics Unit, 19/3/09 21 Database Content and Quality Database consist of large and various manually curated and qualitycontrolled Affymetrix chips: Quality control of EACH experiment is manually done by Genevestigator curators using a pipeline of Bioconductor packages performing normalization and probe-level analysis. Low quality arrays are characterized by: • • • • fall out of range relative to the other arrays from the same experiment, exhibit higher RNA degradation, particularly noisy, do not correlate with replicate samples. Metsada Pasmanik-Chor, TAU Bioinformatics Unit, 19/3/09 22 User Hardware Requirements Genevestigator is a web-based application running in Java. Java applet provides several advantages: • users don’t have to install any software • users always work with the latest software release • Java is more powerful than HTML/Javascript for data manipulation To run the application, client machines must have Java runtime environment (JRE; version 1.4.2 or higher) installed (usually available by default on PCs). JRE is freely available for download at Sun Microsystems (http://www.Java.com). To optimally work with the Genevestigator application, we recommend: • screen resolution: 1024 x 768 or higher • memory: preferably 512 MB RAM or more Metsada Pasmanik-Chor, TAU Bioinformatics Unit, 19/3/09 23 GeneVestigator Species Availability Species: [Mammals] Arrays: Number of arrays: Species: [Plants] Arrays: Number of arrays: Human Mouse Human 133_2 & Human Genome 10k 20k 47 k Mouse Genome 1109, 3786, 2782 3071, 1967 Arabidopsis Arabidopsis Genome 22k 12k Rat Rat Genome 40k Barley 8k 31k 2146, 858 Rice Barley Genome 22k Rice Genome 22k 706 - Soybean 3110 Metsada Pasmanik-Chor, TAU Bioinformatics Unit, 19/3/09 24 Data Sources and Referencing The Genevestigator analysis platform comprises a large database of manually curated microarray experiments collected from the public domain or from individual contributors. The array annotations necessary for data analysis were retrieved from public repositories and/or, if insufficiently available, from the authors themselves. Genevestigator contains data from the following repositories and databases: Database Link Gene Expression Omnibus (GEO) http://www.ncbi.nlm.nih.gov/geo/ ArrayExpress http://www.ebi.ac.uk/arrayexpress/ ChipperDB http://chipperdb.chip.org/adb/adb-home The Arabidopsis Information Resource (TAIR) http://www.arabidopsis.org/ MUSC Microarray Database http:proteogenomics.musc.eduma Public Expression Profiling Resource (PEPR) http://pepr.cnmcresearch.org NASC Microarray Database (NASCArrays) http://affymetrix.arabidopsis.info/narrays/experimentbrowse.pl NIH Neuroscience Microarray Consortium http://arrayconsortium.tgen.org/np2/home.do Gene Expression Open Source System (GEOSS) RNA Abundance Database (RAD) https://genes.med.virginia.edu/intro to geoss.html Metsada Pasmanik-Chor, TAU http://www.cbil.upenn.edu/RAD/php/index.php Bioinformatics Unit, 19/3/09 25 GeneVestigator – focus on gene expression in the context of: 1. Time (Gene expression during stages of development\life-cycle). 2. Space (Tissue specific expression). 3. Response (Expression caused by stimuli: biotic stress, abiotic stress, chemical, hormone, light, drug treatment, disease). Users can query the database to retrieve the expression patterns of individual genes throughout chosen environmental conditions, growth stages, or organs. Reversely, mining tools allow users to identify genes specifically expressed during selected stresses, growth stages, or in particular organs Access: Free / By license Metsada Pasmanik-Chor, TAU Bioinformatics Unit, 19/3/09 26 Metsada Pasmanik-Chor, TAU Bioinformatics Unit, 19/3/09 27 http://sbw.kgi.edu/ Dr. Metsada Pasmanik-Chor Bioinformatics Unit, Life Science, TAU Tel: x 6992 E-mail: [email protected] 28 Bioinformatics Metsada Pasmanik-Chor, Intro, 15/12/2008, TAU Bioinformatics Metsada Pasmanik-Chor Unit, 19/3/09 Bioinfo. Unit webpage: http://bioinfo.tau.ac.il 28