* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Public Microarray Databases
Pathogenomics wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Minimal genome wikipedia , lookup
Cancer epigenetics wikipedia , lookup
History of genetic engineering wikipedia , lookup
Biology and consumer behaviour wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Genome evolution wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Designer baby wikipedia , lookup
Microevolution wikipedia , lookup
Metagenomics wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Polycomb Group Proteins and Cancer wikipedia , lookup
Public health genomics wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Mir-92 microRNA precursor family wikipedia , lookup
Genome (book) wikipedia , lookup
Gene expression profiling wikipedia , lookup
Using Internet Databases to Collect Information on Bladder Cancer Riham Soliman Research assistant Bioinformatics group Objectives of talk 1. Outline the importance of web-based public databases in the medical field 2. Necessity of having a biomedical research portal containing information collected from experiments on Egyptian samples 3. Outline our research goals 4. Explain our research and its benefits to the Egyptian community Introduction The basics DNA double-helix Cytoplasm Nucleus From: iGenetics CD-ROM (Animation Chapter 1: Genetics: An Introduction) Molecular genetics 3 billion nucleotides Nucleotides are molecules constituting the DNA doublehelix C A G T G C T G C C All our traits are encoded in DNA Genes are specific sequences of nucleotides that characterize our traits passed on from parents G Modified from: iGenetics CD-ROM (Animation Chapter 2: DNA as Genetic Material: The Hershey-Chase Experiment) C A G T Complements Gene expression How is DNA transformed into functional output for the cell, and consequently organism, survival? Central dogma DNA RNA protein Transcription Translation Gene expression analysis can be performed by studying RNA level- transcriptome Protein level- proteome Genetic mutations Changes in the genetic sequence Required for genetic diversity among individuals Disease-causing mutations Deletions Insertions Duplications http://www.genome.gov//Pages/Hyperion/DIR/VIP/Glossary/Illustration/mutation.cfm What is cancer? Normally cells will grow and divide until organism has completed development Some cells retain ability to grow and divide long after termination of development carcinogenesis Uncontrolled cell division arises The cell only cares about making more copies of itself rather than undergoing proper division Cancer-causing mutations Tumor suppressor genes (TSG) Mutations might cause under expression of TSG Proto-oncogenes Mutations cause them to become over expressed Become oncogenic (cancer-causing) Carcinogenesis is a multi-step process A single mutation is not enough Accumulation of more than one mutations is necessary Mutagenesis: multi-step http://www.cancervic.org.au/about-cancer/what_is_cancer Bioinformatics: a history Is an interdisciplinary discipline combining medicine, biology, computer science and mathematics. Serves the biological and medical community Based on computational power Dates back to 1960s Discovery of DNA double helix Discovery of genes; contain information guiding building of all cellular components. Human genome project Completed in 2003 Sequencing of the entire human genome Today Challenge of amalgamating large amounts of data from biomedical research Genetic research Molecular research Databases and information stored within them Why are databases necessary? Data provided is tailored to scientist’s requirement Offers a variety of information on genes, RNA, proteins, diagrams, images, etc. Databases sprout collaborations between scientists Improved research Data sharing Interoperability Ease-of-access to stored data Considers the fact that molecular scientists might not be computer proficient Information provided on databases Literature NCBI (National Centre for Biotechnology Information) General databases Google search Scholar Academic databases Ebscohost Sequence data Protein Level of expression Sequence 3D structure Different experimental conditions comparable to physiological environment Time-course experimentation Protein-protein and protein-DNA interactions KEGG Kyoto Encyclopedia of Genes and Genomes Cytoplasm Nucleus Nuclear membrane KEGG: bladder cancer MAPK pathway from KEGG WikiPathways MAPK pathway on Wikipathway: downloaded using GenMAPP GenMapp is an open source bioinformatics application to visualize metabolic pathways BioCarta: MAPK pathway Data extraction from NCBI National Center for Biotechnology Information. Run and maintained by collaborative efforts of computer scientists, molecular biologists, biochemists, research physicians and structural biologists. Provides information on diseases, genes, gene sequences, gene transcripts, proteins, protein interactions, function, additional resources. Types of services offered by NCBI PubMed BLAST (Basic Local Alignment Search Tool) Most famous tool on NCBI Used for pair-wise sequence comparison Identification of novel sequences and/or determining their property(ies). Entrez Literature search service of the National Library of Medicine. Access to over 16 million citations linked to participation online journals. Speed, efficient, easy to use. One of the most popular search engines in NCBI Search query can be name of gene, protein (if different) or accession number for the gene, RNA or protein. A plethora of relevant information produced OMIM (Online Mendelian Inheritance in Man) Used mostly by physicians and medical investigators interested with genetic disorders Cancer-specific databases caBIG Is an information network connecting the cancer research community Cancer Biomedical Informatics Grid Provided by the National Cancer Institute (NCI) in the USA Integrative cancer research extending from bench to bedside and back again Accelerate discovery of new detection, diagnostic and treatment techniques to improve outcome Shares information on clinical research, imaging, pathology and molecular biology caBIG services and resources Domain workspaces constitute areas of interest to the cancer-researching and medical community 1. 2. 3. 4. Integrative cancer research (ICR) workspaces Clinical trial management systems In vivo imaging workspace Tissue banks and pathology tools workspace caBIG Tools 1. 2. 3. 4. Bioconductor: established open-source collection of software packages for high throughput genome analysis caArray: open-source, web and programmatically accessible array data management system caIMAGE: database of cancer images caMATCH: system that identifies patients who are potentially eligible for clinical trials Profiling of bladder cancer data from public databases Objectives of research 1. Collecting information on genes involved in bladder cancer. 2. Assembling an interaction network for these genes. 3. Identifying biomarkers 4. Collecting expression level data, e.g., microarray data. 5. Automatic management, processing, visualization of this data. Figure 1.3: Age-standardised (World) incidence rates for bladder cancer, by sex, world regions, 2002 estimates Egypt Southern Europe Northern America Western Europe Northern Africa Northern Europe Australia/New Zealand Central & Eastern Western Asia Southern Africa South America Japan Caribbean Central America Polynesia Eastern Asia Eastern Africa South -Eastern Asia South-Central Asia Western Africa China India Micronesia Middle Africa Melanesia Males Females 0 5 10 15 20 25 30 Rate per 100,000 population 35 40 Source: http://info.cancerresearchuk.org/cancerstats/types/bladder/incidence/ Bladder cancer stages From: OXFORD,G.A.R.Y. and THEODORESCU,D.A.N. Review Article: The Role of Ras Superfamily Proteins in Bladder Cancer Progression, The Journal of Urology, 170: 1987-1993, 2003. Bladder cancer types Squamous cell carcinoma Carcinoma in situ Transitional cell carcinoma Superficial (low grade) From: http://cornellurology.com/bladder/gi/types.shtml Metastatic transitional cell carcinoma Invasive (high grade) Aetiology of bladder cancer in Egypt Cigarette smoking (3-7 fold risk) (Samanic et al. 2006) Aromatic amines Occupational hazard Schistosomiasis (Michaud, 2007) Bathing in infested waters Working in fields SCC was more common TCC during times of high schistosomiasis. Genes involved in bladder cancer To identify genes involved in bladder carcinogenesis and progression, internet research was performed to gather information about these genes. Sources Publicly available databases e.g. NCBI www.ncbi.nlm.nih.gov/ KEGG http://www.genome.jp/ BioGRID http://www.thebiogrid.org/ GeneOntology http://amigo.geneontology.org/ Ensembl www.ensembl.org/ Literature search using Pubmed (NCBI) and Google. Data collection Genes were collected using Boolean queries, e.g., “Bladder cancer, name of gene”. We identified 261 genes related to bladder cancer Data was summarized in a list containing gene information and interacting genes. Gene name, NCBI accession number, URLs Chromosome locus Protein-protein interactions Function in normal cell Function in bladder cancer cell Diagnostic/prognostic potential or use Literature Data annotation Biomarker identification Target in cancer research is mainly to predict tumor behavior. Early diagnosis Prevent delayed treatment situations We need to distinguish harmless early lesions from those that will progress into cancer. Depends on good tests and tools. Current diagnosis of bladder cancer: cystoscopy. Research community is developing good biomarkers for this purpose. Biomarkers are molecules that could be targeted in therapy. Biomarkers in use Marker Sensitivity % Specificity % Method of detection Manufacturer NMP22 47-87 58-91 Enzyme immunoanalysis Matritech Bard Diagnostics BTA STAT 57-82 61-82 Antigen-antibody colorimetric BTA TRAK 55-80 38-98 Enzyme immunoanalysis Bard Diagnostics Intracel Corp Oncor FDP 41-93 77-94 Antigen-antibody colorimetric Telomerase 53-91 46-99 Polymerase chain reaction 79-90 Immunofluorescence immnoassay/ cytology Immunocyt 86-95 Diagnocure Quanticyt 45-59 70-93 Morphometry Gentian Scientific Software UBC 59-79 84-96 Enzyme immunoanalysis IDL 57-78 Eelectrochemiluminescence assay Roche Diagnostics Eichrom Technologies CYFRA 21-1 74-99 BLCA4 85-96 85-100 Enzyme immunoanalysis Hyaluronic acid/hyaluronidase 82-92 83-96 Enzyme immunoanalysis Markers inserted onto KEGG’s bladder cancer network GPSM2 NMP22 Hyaluronidase Hyaluronic acid CD44 Microarray technology Measuring gene expression Gene expression analysis: Transcriptomics Microarray technology: the study of mRNA levels in cells Transcriptome Looks at the abundance of the transcript for thousands of genes High throughput http://en.wikipedia.org/wiki/DNA_microarray cDNA microarray Custom-made Oligonucleotide Ready Revolutionized by Affymetrix company Affymetrix array Cancer Control Up regulation Down regulation Differential expression From : http://www.fastol.com/~renkwitz/microarray_chips.htm Output of microarray Raw image is usually a 16-bit TIFF file. Microarray image processor converts color intensities into raw quantitative data (probe-level data) No immediate observations can be made concerning gene expression from raw data Statistical analysis applications are used to interrogate the data for information on gene expression patterns Raw data storage Modes of data storage As files •Data is stored directly on the institution’s or lab’s computer •Does not require special software •Difficult to track and query the data if larger experiments are performed. In local databases •Commercial or academic •Allows local storage of data •Good tracking and management of experimental data and integration with public MA databases. •Requires purchase, installation and maintenance of complex software Public and commercial microarray databases PUBLIC GEO (Gene expression omnibus) NCBI ArrayExpress (EBI-EMBL) caBIG SMD (Stanford microarray database) Yale microarray database RED (Rice expression database) Oncomine COMMERCIAL Oncomine Array Informatics Limas GeNet (Russian website) OTHER CleanEx (SIB) GenMAPP Our bladder cancer microarray data collection Queried “Bladder cancer” using all public databases identified Collected 14 data sets on bladder cancer ArrayExpress GEO Oncomine Based on literature, there are unpublished data sets Gender Disease state Disease staging Precomputational analyses Some databases provide information from preliminary analysis on data. Make data exploration much easier and quicker for the user. Oncomine ONCOMINE™ RESEARCH ONCOMINE performs pre-computations on data to make data exploration much easier and quicker Oncomine is made up of 3 layers • Data input • Data analysis • Data visualization Single and multiple experiment analyses Single-experiment analysis Outlier Largest value Upper quartile Median Lower quartile Smallest value Outlier Multiple-experiment analysis SIB (Swiss Institute of Bioinformatics) Research groups based in different European countries. The main goal is to provide a bioinformatics platform conglomerating as well as analyzing different data sets CleanEx microarray database Data is analyzed into their portal for easier access and interpretation CleanEx • Provided through the Swiss Institute of Bioinformatics (SIB) • Service similar to ONCOMINE but gathers data sets only from GEO Does not allow profile visualization Collecting information on bladder cancer in Egypt specifically Published article on bladder cancer in Egypt Ewis et al. (2007) studied bilharzia-associated SCC (squamous cell carcinoma) Analysis performed using with microarray 17 patients diagnosed at the Egyptian National Cancer Institute. RESULT Showed a change in expression- differential expression in 82 genes 38 genes up regulated 44 genes down regulated Our own data analysis on Ewis et al. data 1. Annotated information gathered on each of 82 genes 2. Compared expression pattern for each gene with other data sets from public, free databases 3. Identified 7 genes from the Ewis study showing opposition to all other datasets collected 4. Identified 3 genes from the Ewis study correlating in expression with other studies from databases 5. Gathered more detailed information on the 7 genes Where do they lie in our KEGG pathway network How vital are they to cell function Does Ewis data make sense (based on the known function)? Discrepancies found in results Keratin 16 KEGG BC pathway with all significant markers for research KRT16 Not much data provided on the remaining proteins TGFBR SMAD4 WE NEED TO UNDERSTAND THEIR FUNCTION SMAD2/3 TGFβ ACVR1B JNK KRT7 Modified from the KEGG database CONCLUSION Follow up of Ewis et al. study PROS Offers good preliminary information on bilharziaassociated bladder cancer in the Egyptian population CONS Several mistakes detected in annotation Pooled samples Only SCC studied Does not explain the present discrepancies in the results e.g. Keratin 16 FOLLOW UP STUDY IS NECESSARY TO UNDERSTAND DISCREPANCIES AND GENETIC DIFFERENCES BETWEEN WESTERN AND EGYPTIAN PATIENTS Problems with data collection 1. Information in databases is expanding as more research is carried out. 2. Each public database does not have a complete representation of all molecules. Time-consuming to look through several databases. 3. There is no bladder cancer-specific database. 4. Automated methods are needed to update the data. Long-term objectives of our study Determine the genetic and molecular profile of the Egyptian bladder cancer patients 1. Based on histology Based on the bilharzial status 2. Identify biomarkers to use as drug targets in a clinical setting 3. Improve treatment modalities Tailored to the Egyptian profile Thank you