Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Resources Databases links We have been teaching workshops for several years with a resource page found Here . (It might not be totally up to date but it is still useful.) Here is another list of Bioinformatic resources for biologists that I like: http://molbiol-tools.ca . Below is a list we try and keep up to date and that lists many of the sites we use in class. General databases EBI: http://www.ebi.ac.uk NCBI Gquery: http://www.ncbi.nlm.nih.gov/gquery/ Entrez help fields: http://www.ncbi.nlm.nih.gov/entrez/query/static/help/Summary_Matrices.html : http://w ww.ncbi.nlm.nih.gov/entrez/query/static/help/Summary_Matrices.html (Links to an external sit NCBI ressources: http://www.ncbi.nlm.nih.gov/guide/all/#howto NAR database : http://www.oxfordjournals.org/nar/database/c/ UniProt SwissProt: http://www.uniprot.org/ BioCyc: http://biocyc.org ( note subscription needed for some features) Organisms specific databases SGD: http://www.yeastgenome.org/ Ecogene: http://www.ecogene.org TAIR: http://www.arabidopsis.org/ Pseudomonas database GB: http://www.pseudomonas.com EcoCyc: http://ecocyc.org Databases related to enzymes and Pathways Enzyme nomenclature database: http://www.expasy.org/enzyme/ BRENDA: http://www.brenda-enzymes.org/ KEGG: http://www.kegg.jp/kegg/pathway.html Structures, PDB : http://www.rcsb.org/ Visualizing pathways, mapping data to pathways, metabolic reconstruction MetaCyc: http://metacyc.org/ iPAth: http://pathways.embl.de/ ModelSEED: Link http://seed-viewer.theseed.org/seedviewer.cgi?page=ModelView Reactome (eukaryotes only): http://www.reactome.org/ (Links to a Practical tools Gene Infinity: http://geneinfinity.org/ Omics Tools: https://omictools.com ReadSeq 1, change format tool: http://www-bimas.cit.nih.gov/molbio/readseq/ Seqret, change format tool: http://www.ebi.ac.uk/Tools/sfc/emboss_seqret/ SeqMassager, Sequence clean up tool: http://www.attotron.com/cybertory/analysis/seqMassager.htm Reverse complement DNA: http://www.bioinformatics.org/sms/rev_comp.html Bionumbers: http://www.bionumbers.hms.harvard.edu/ ( all types of numbers related to Biology) VENNY: http://bioinfogp.cnb.csic.es/tools/venny/index.html ( VENN Diagram tools) Alignment tools Aligning two sequences Align two sequences of DNA: http://www.bioinformatics.org/sms2/pairwise_align_dna.html Align two sequences of Proteins : http://www.bioinformatics.org/sms2/pairwise_align_protein.html Align 2 proteins with Blast: http://blast.ncbi.nlm.nih.gov/Blast.cgi?PAGE_TYPE=BlastSearch&BLAST_SPEC=blast2seq& LINK_LOC=align2seq Dotplot site 1, DotLet: http://myhits.isb-sib.ch/cgi-bin/dotlet Dotlet help: http://myhits.isb-sib.ch/util/dotlet/doc/dotlet_help.html Doptplot site 2 : http://www.genebee.msu.su/services/dhm/advanced.html Compare two genomes: https://genomevolution.org/CoGe/SynMap.pl Smith-Waterman full-length alignments between two sequences: http://pir.georgetown.edu/pirwww/search/pairwise.shtml Pearson alignment method between two proteins, LALIGN: http://www.ch.embnet.org/software/LALIGN_form.html PRSS3 - evaluates the significance of a protein sequence alignment: http://www.ch.embnet.org/software/PRSS_form.html Database searches Blast@NCBI: https://blast.ncbi.nlm.nih.gov/Blast.cgi ConciseBlast, blasts a reduced database from protein clusters : http://www.ncbi.nlm.nih.gov/genomes/prokhits.cgi Useful tool to find similar DNA regions in human genome, BLAT : http://genome.ucsc.edu/cgibin/hgBlat?hgsid=412713987_NrFiBsC7HAfLzfiAeKNgr11VBglP&command=start Sequence similarity searches@EBI includes FASTA : http://www.ebi.ac.uk/Tools/sss/ Exhaustive database search, SCANPS: http://www.compbio.dundee.ac.uk/www-scanps Blast Pubmed1, Seq2ref: http://prodata.swmed.edu/seq2ref/ Blast Pubmed2, PubServer: http://pubserver.burnham.org/ Blast@KEGG: http://www.genome.jp/tools/blast/ Multiple sequence alignments CustalOmega: http://www.ebi.ac.uk/Tools/msa/clustalo/ T-coffee: http://tcoffee.crg.cat/apps/tcoffee/index.html Multialin: http://multalin.toulouse.inra.fr/multalin/ Visualize alignments: http://www.ebi.ac.uk/Tools/msa/mview/ Compilation of MSA programs @EBI http://www.ebi.ac.uk/Tools/msa/ Platform to run different alignment programs ( signing up allows to save your work) http://mobyle.pasteur.fr/cgi-bin/portal.py#welcome Motifs and Protein domains Motifs Prosite: http://www.expasy.ch/prosite/ Meme: http://meme-suite.org Logo builder Weblogo: http://weblogo.threeplusone.com Comapre Logos, Two Sample Logo: http://www.twosamplelogo.org/cgi-bin/tsl/tsl.cgi Protein domains Pfam: http://pfam.xfam.org NCBI CDD/ CDART: http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml CDD Search: http://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi Interpro: http://www.ebi.ac.uk/interpro/ ProDom: http://prodom.prabi.fr/prodom/current/html/home.php Genome Protmap, maps proteins families in contigs: http://www.ncbi.nlm.nih.gov/sutils/protmap.cgi Good compilation of protein MSA domain and signature sites: https://npsa-prabi.ibcp.fr/cgibin/npsa_automat.pl?page=/NPSA/npsa_server.html Structure based domains @ SCOP :http://scop.mrc-lmb.cam.ac.uk/scop/ Structure based domains @ SCOP2: http://scop2.mrc-lmb.cam.ac.uk Structure based domains @CATH : http://www.cathdb.info/ Genome browsers within integrated platforms SEED viewer: http://pubseed.theseed.org/ IMG-JGI: https://img.jgi.doe.gov/cgi-bin/w/main.cgi UCSC Archaeal GB : http://archaea.ucsc.edu/ Microscope: https://www.genoscope.cns.fr/agc/microscope/home/index.php Pseudomonas database GB: http://www.pseudomonas.com/gbrowse_index.jsp UCSC Human GB: http://genome.ucsc.edu/cgi-bin/hgGateway Mapviewer, advanced browsing for a subset of genomes@NCBI : http://www.ncbi.nlm.nih.gov/projects/mapview/ Yeast SGD GB: GO TO Sequence/browser @SGD Eukaryotic GB at e!Ensembl : http://useast.ensembl.org/index.html Prokaryotic GB b!Ensembl: http://bacteria.ensembl.org/index.html Island Viewer: http://www.pathogenomics.sfu.ca/islandviewer/browse/ Genome Projector http://www.g-language.org/GenomeProjector/ Gene finders tRNA-scan: http://lowelab.ucsc.edu/tRNAscan-SE/ rRNA-finder : http://www.cbs.dtu.dk/services/RNAmmer/ NCBI-ORF Finder: http://www.ncbi.nlm.nih.gov/gorf/orfig.cgi GeneMark, a family of programs to predict genes: http://exon.biology.gatech.edu/GeneMark/ tRNA database: http://lowelab.ucsc.edu/GtRNAdb/ Plant genome databases Arabidopsis, TAIR: https://www.arabidopsis.org/ Maize, Gramene: http://ensembl.gramene.org/Zea_mays/Info/Index Maize, GDB: http :// www.maizegdb.org All plants: http://www.gramene.org/ The viridiplantae branch @ http://genome.jgi-psf.org/ PlantSEED: http://bioseed.mcs.anl.gov/~seaver/FIG/seedviewer.cgi?page=PlantSEED PlantGDB: http://www.plantgdb.org/ DNA motif finders or databases Bacterial ressources Bacterial sRNA target predictor: http://rna.tbi.univie.ac.at/RNApredator2/target_search.cgi Prokaryotic regulatory site compilation REGPrecise: http://regprecise.lbl.gov/RegPrecise/ Regulon DB: http://regulondb.ccg.unam.mx / CollecTF: http://collectf.umbc.edu/browse/home/ Prokaryotic gene regulation database: http://www.prodoric.de/ Platform to predict regulatory sites in Bacteria RegPredict: http://regpredict.lbl.gov/regpredict/ Motif finding platform MEME: http://meme-suite.org Riboswitch Finder : http://132.248.32.45/cgi-bin/ribex.cgi RNAfam ( contains riboswitches): http://rfam.xfam.org/ Regulatory overview of E. coli: http://www.ecocyc.org/overviewsWeb/regOv.shtml General gene finding and promoter finding tools in prokaryotes and eukaryotes: http://linux1.softberry.com/berry.phtml Eukaryotic ressources Plant promoter database: http://ppdb.agr.gifu-u.ac.jp/ppdb/cgi-bin/index.cgi Plant Transcription factor databases: http://planttfdb.cbi.pku.edu.cn/ http://plntfdb.bio.uni-potsdam.de/v3.0/ Human Transcriptional Regulatory Element Database: https://cb.utdallas.edu/cgibin/TRED/tred.cgi?process=home Coexpression database: http://coxpresdb.jp/ Cloning tools platforms Restriction mapping: http://nc2.neb.com/NEBcutter2/ Primer Design, primer 3: http://frodo.wi.mit.edu/primer3/ Primer Blast: http://www.ncbi.nlm.nih.gov/tools/primer-blast/ Primer design Primer3plus:http://www.bioinformatics.nl/cgi-bin/primer3plus/primer3plus.cgi Free in silico cloning platform, ApE: http://biologylabs.utah.edu/jorgensen/wayned/ape/ Free in silico cloning platform, Serial Cloner: http://serialbasics.free.fr/Serial_Cloner.html Physical and chemical properties of a protein@Expasy: http://web.expasy.org/protparam/ Electronic notebook plasmid depository: https://benchling.com/academic Plasmid depository: https://www.addgene.org/ https://benchling.com/academichttps://benchling.com/academic Protein analysis tools Compilation of protein analysis tools@ Expasy: http://www.expasy.org/proteomics Compilation of protein analysis tools @ CBS : http://www.cbs.dtu.dk/services/ Compute PI/MW@expasy: http://web.expasy.org/compute_pi/ Find a protein sequence based on the AA composition@Expasy: http://web.expasy.org/aacompident/ Calculate % identity similarities between proteins: http://imed.med.ucm.es/Tools/sias.html Prediction of transmembrane helices, TMHMM: http://www.cbs.dtu.dk/services/TMHMM/ Signal sequence analysis, SignalP : http://www.cbs.dtu.dk/services/SignalP/ Signal Sequence analysis for Bacteria, Psort: http://www.psort.org/psortb/ Targeting prediction eukaryotes, TargetP: http://www.cbs.dtu.dk/services/TargetP/ Combined transmembrane and signal sequence, Phobius: http://phobius.sbc.su.se/ Plant Proteome database: http://ppdb.tc.cornell.edu/introduction.aspx Plastid protein database: http://www.plprot.ethz.ch/ Mitominer: http://mitominer.mrc-mbu.cam.ac.uk/release-3.1/begin.do Phylogeny tools GENERAL BIOINFORMATIC PLATFORM WITH MANY PHYLOGENY TOOLS AND PIPELINE: http://mobyle.pasteur.fr/cgi-bin/portal.py#welcome Phylogeny platform Phylogeny.fr: http://www.phylogeny.fr EBI Clustal Phylogeny tool: http://www.ebi.ac.uk/Tools/phylogeny/clustalw2_phylogeny/ Seaview platform: http://doua.prabi.fr/software/seaview Mega platform: http://www.megasoftware.net/ PhyloDedron Phylogenetic tree printer: http://iubio.bio.indiana.edu/treeapp/treeprint-form.html ITol (species trees and add data to trees): http://itol.embl.de/ Evolgenius: http://evolgenius.info/evolview/#login Environment for Tree Exploration (ETE) http://etetoolkit.org / Display and annotation of trees, ITOL: http://itol.embl.de Structure visualization platforms To download on your computer Rasmol :http://www.umass.edu/microbio/rasmol/ Molscript : http://www.avatar.se/molscript/ Swiss-PDBViewer : http://spdbv.vital-it.ch/ Grasp : http://wiki.c2b2.columbia.edu/honiglab_public/index.php/Software:GRASP2 Chimera: http://www.cgl.ucsf.edu/chimera/ PyMol Educational Free but more limited version of Pymol and cannot be used for research http://pymol.org/edu/?q=educational/ Structure visualization on the web Note: These can be seen on the web but require Java so you will need to most certainly add them to your security settings NCBI Cn3D - http://www.ncbi.nlm.nih.gov/Structure/CN3D/cn3d.shtml ( you will need to download Cn3D, but very easy) PDB Jmol: http://jmol.sourceforge.net/ General Protein sequence analysis server with many 2D structures predictions tools; http ://npsa-pbil.ibcp.fr/cgi-bin/npsa_automat.pl?page=/NPSA/npsa_server.html Jpred4: http://www.compbio.dundee.ac.uk/jpred4 Web-based 3D structures prediction platforms Phyre2: http://www.sbg.bio.ic.ac.uk/phyre2/html/page.cgi?id=index SwissModel: https://swissmodel.expasy.org/interactive Tasser: http://zhanglab.ccmb.med.umich.edu/I-TASSER/ Structural genomics Structural genomics Knowledge base: http://kb.psi-structuralgenomics.org/ Structural genomics Target database: http://sbkb.org/tt/ UNKNOWN GENE/ENZYME DATABASES ORENZA http://www.orenza.u-psud.fr/ orphan enzymes) ORphan ENZyme Activities database (lists 1,200 ADOMETA http://vitkuplab.cu-genome.org/html/adometa/adometa.html ADoption of Orphan METabolic Activities (Orphan enzyme activities in E. coli, B. subtilis, and S. cerevisiae). ORPHAN ENZYME PROJECT http://www.orphanenzymes.org/ GREP http://bisscat.org/GREP/ Generator of Reaction Equations & Pathways look for reported and putative enzyme reaction equations, especially designed for finding metabolic pathways on orphan metabolites (compounds known to be present at least in a living organism, but whose synthetic/degradation pathways are unknown). General integration platforms (genome browsers, genome comparisons (pathways or synteny), phylogenetic distribution queries, physical clustering etc.) SEED http://www.theseed.org/wiki/Main_Page Database containing hundreds of genomes and many valuable tools. SEED Subsystems http://pubseed.theseed.org//SubsysEditor.cgi Advanced comparative genomic tools see http://www.hos.ufl.edu/meteng/HansonWebpagecontents/workshop/NSF%20Maize%20Worksh op.html for training Patric http://www.patricbrc.org/ Emphasis on Pathogenic bacteria (but contains all sequenced bacterial genomes), multiple tools MicroScope https://www.genoscope.cns.fr/agc/microscope/home/index.php Microbial genome annotation platform (Strong on metabolism) MicrobesOnline: http://microbesonline.org IMG http://img.jgi.doe.gov/ Integrated Microbial genomes data analysis system IMG-JGI: https://img.jgi.doe.gov/cgi-bin/w/main.cgi MGDB http://mbgd.genome.ad.jp/ Microbial Genome DataBase for comparative genomics. MG-RAST http://metagenomics.anl.gov/ Depository and analysis of metagenomics data EFI http://enzymefunction.org/ The Enzyme Function Initiative (EFI) is developing a robust sequence / structure based strategy for facilitating discovery of in vitro enzymatic and in vivo metabolic / physiological functions of unknown enzymes discovered in genome projects Fusions. http://modelseed.org/projects/fusions/ Search for fusions. Multiple Associations platforms STRING http://string.embl.de/ Database of known and predicted protein-protein relationships, derived from genomic context (fusions, conserved gene clusters, co-occurrence), high throughput experiments (co-expression), and the literature. STRING quantitatively integrates data from bacteria and other organisms. STITCH http://stitch.embl.de/ STRING incorporating small molecules eNet http://ecoli.med.utoronto.ca/ E. coli gene function prediction database integration of microarray and protein interaction data. EcoliNet: http://www.inetbio.org/ecolinet/ another database that integrates different types of evidence for E. coli GeneMania http://genemania.org very visual Data integration for a handful of model organisms AraNet http://www.functionalnet.org/aranet/search.html Gene associations in Arabidopsis. BioPixie: http://imp.princeton.edu intergration of data for a handful of eukaryotes nice graphics DIP http://dip.doe-mbi.ucla.edu/dip/Main.cgi database of interacting proteins from different organisms. List of Pathways and interactions databases : http://pathguide.org/ ConsensusPath DB: http://cpdb.molgen.mpg.de/YCPDB for yeast/human/mouse Phylogenetic distribution tools JGI Phylogenetic Profiler http://img.jgi.doe.gov/cgibin/w/main.cgi?section=PhylogenProfiler&page=phyloProfileForm Phylogenetic Profiler for Single Genes. MicroScope Phyloprofile Exploration https://www.genoscope.cns.fr/agc/microscope/compgenomics/phyloprofil.php? MicrobesOnline Phyletic Pattern http://www.microbesonline.org/cgibin/matchphyloprofile.cgi ORTHOMCL: http://orthomcl.org/orthomcl/ (great for eukaryotes) Phenotype data Yeast phenotype data: http://fitdb.stanford.edu/ E.coli phenotype data: http://ecoliwiki.net/tools/chemgen/ E. coli Biolog data: http://ecoli.naist.jp/GB/index.php/biolog RAPID http://rarge.gsc.riken.jp/phenome/ RIKEN Arabidopsis Phenome Information Database, phenotypic data in transposon-insertional mutants. SeedGenes http://www.seedgenes.org/ mutation. Genes that give a seed phenotype when disrupted by GOLM Metabolome: http://gmd.mpimp-golm.mpg.de MicrobesOnline http://www.microbesonline.org/ A comprehensive database that includes correlated TnSeq data for four organisms TnSeq Fitness Browser: http://fit.genomics.lbl.gov/cgi-bin/myFrontPage.cgi ESSENTIAL GENES DATABASES (Pro- and Eukaryote) OGEE http://ogeedb.embl.de/#summary Online GEne Essentiality database DEG http://tubic.tju.edu.cn/deg/ or http://www.essentialgene.org/ Database of Essential Genes MICROARRAY & RNASeq DATABASES AND ANALYSIS RESOURCES General Depositories GEO http://www.ncbi.nlm.nih.gov/geo/ Gene Expression Omnibus ARRAY Express : http://www.ebi.ac.uk/arrayexpress/ Web-Based analysis of your own data Gene pattern : http://genepattern.broadinstitute.org/gp/pages/index.jsf Patric: RNA Seq analysis Precomputed analysis of expression data ATTED http://atted.jp/ A simple site to use to look for co-expression patterns in Arabidopsis; it shows gene networks, not just lists of correlated genes. COEXPRESdb http://coxpresdb.jp Co-expression in yeasts and animals. COXPRESdb provides data for both S. cerevisiae and S. pombe, while Golm does cerevisae only. COXPRESdb displays co-expression data for orthologs -when they exist- in invertebrates and vertebrates. Such similar patterns of coexpression accross species can generate very strong predictions. FungiDB http://fungidb.org/fungidb/ Diurnal http://diurnal.mocklerlab.org/ Circadian/Diurnal gene expression data for an individual or set of Arabidopsis, rice, or poplar genes Translatome eFP cell populations http://efp.ucr.edu/ Transcriptome profiling of 13 discrete Arabidopsis PLEXdb http://www.plexdb.org/ Plant Expression Database Botany Array Resource http://bbc.botany.utoronto.ca/ electronic Northerns. CyanoExpress http://cyanoexpress.sysbiolab.eu sp. PCC 6803. Instructions: Tools for finding co-responses, Co-expression database for Synechocystis 1) On the homepage, click 'gene expression' 2) Click 'all perturbations' 3) In the blank field 'search for' on the upper left part of the page paste your gene id (e.g. sll0635), click go 4) In the new page, click the microarray picture (a single band). A larger microarray image will appear; there is a column on the right that lists your top 20 co-regulated genes (most of them functionally annotated from the poorly maintained and obsolete Cyanobase). MetaOmGraph http://metnetdb.org/MetNet_MetaOmGraph.htm large datasets Tool to plot and analyze qteller http://qteller.com/ RNAseq data for maize, sorghum, rice. Simple tools for expression in various organs, correlation of expression of two genes. Bacteria MicrobesOnline http://www.microbesonline.org/ A comprehensive database that includes correlated gene expression in E. coli and other bacteria EcoGene http://ecogene.org/ A rich resource on E. coli that includes Microarray data on the major changes in gene expression observed in various experiments. GenExpDB https://genexpdb.okstate.edu/?query=b0145 E. coli Community Gene Expression DataBase COLUMBO Platform: http://www.colombos.net ( expression data analysis for prokaryotes) Porteco http://www.porteco.org/ phenotype data E. coli microarray analysis they also have analysis of the Patric http://www.patricbrc.org/ ( has integrated Expression data for many Bacteria) Yeast SPELL http://spell.yeastgenome.org/ Co-response search tool for yeast Mammals BioGPS http://biogps.org/#goto=welcome A Compilation : http://omictools.com/gene-expression-c878-p1.html Painting data on pathways BioCyc : http://biocyc.org Paintomics: http://bioinfo.cipf.es/paintomics/ PRIMe http://prime.psc.riken.jp/ Server for metabolomics and transcriptomics, tools for metabolomics, transcriptomics and integrated analysis of different omics data. Heatmaps and PCA tools ClustVis: http://biit.cs.ut.ee/clustvis/ WebMeV (Multiple Experiment Viewer): http://www.tm4.org/#/welcome Comparative genomics tutorials Methods Paper Lecture 1: Predicting Gene Function by Comparative Genomics Lecture 2: Using Comparative Genomics Resources Lecture 3: Phylogenetic Distribution Tools