Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Protein phosphorylation wikipedia , lookup
G protein–coupled receptor wikipedia , lookup
Magnesium transporter wikipedia , lookup
List of types of proteins wikipedia , lookup
Protein moonlighting wikipedia , lookup
Intrinsically disordered proteins wikipedia , lookup
Nuclear magnetic resonance spectroscopy of proteins wikipedia , lookup
Protein–protein interaction wikipedia , lookup
Structural alignment wikipedia , lookup
Biological Databases Pharmamatrix Workshop 2010 - Philip Winter - Ishwar V. Hosamani Some databases in the field of molecular biology… AATDB, AceDb, ACUTS, ADB, AFDB, AGIS, AMSdb, ARR, AsDb,BBDB, BCGD,Beanref,Biolmage, BioMagResBank, BIOMDB, BLOCKS, BovGBASE, BOVMAP, BSORF, BTKbase, CANSITE, CarbBank, CARBHYD, CATH, CAZY, CCDC, CD4OLbase, CGAP, ChickGBASE, Colibri, COPE, CottonDB, CSNDB, CUTG, CyanoBase, dbCFC, dbEST, dbSTS, DDBJ, DGP, DictyDb, Picty_cDB, DIP, DOGS, DOMO, DPD, DPlnteract, ECDC, ECGC, EC02DBASE, EcoCyc, EcoGene, EMBL, EMD db, ENZYME, EPD, EpoDB, ESTHER, FlyBase, FlyView, GCRDB, GDB, GENATLAS, Genbank, GeneCards, Genline, GenLink, GENOTK, GenProtEC, GIFTS, GPCRDB, GRAP, GRBase, gRNAsdb, GRR, GSDB, HAEMB, HAMSTERS, HEART-2DPAGE, HEXAdb, HGMD, HIDB, HIDC, HlVdb, HotMolecBase, HOVERGEN, HPDB, HSC-2DPAGE, ICN, ICTVDB, IL2RGbase, IMGT, Kabat, KDNA, KEGG, Klotho, LGIC, MAD, MaizeDb, MDB, Medline, Mendel, MEROPS, MGDB, MGI, MHCPEP5 Micado, MitoDat, MITOMAP, MJDB, MmtDB, Mol-R-Us, MPDB, MRR, MutBase, MycDB, NDB, NRSub, 0-lycBase, OMIA, OMIM, OPD, ORDB, OWL, PAHdb, PatBase, PDB, PDD, Pfam, PhosphoBase, PigBASE, PIR, PKR, PMD, PPDB, PRESAGE, PRINTS, ProDom, Prolysis, PROSITE, PROTOMAP, RatMAP, RDP, REBASE, RGP, SBASE, SCOP, SeqAnaiRef, SGD, SGP, SheepMap, Soybase, SPAD, SRNA db, SRPDB, STACK, StyGene,Sub2D, SubtiList, SWISS-2DPAGE, SWISS-3DIMAGE, SWISSMODEL Repository, SWISS-PROT, TelDB, TGN, tmRDB, TOPS, TRANSFAC, TRR, UniGene, URNADB, V BASE, VDRR, VectorDB, WDCM, WIT, WormPep, YEPD, YPD, YPM, etc .................. !!!! What we expect from a database..!! • Sequence, functional, structural information, related bibliography • Well Structured and Indexed • Well cross-referenced (with other databases) • Periodically updated • Tools for analysis and visualization Biological Databases • Sequence databases • Structure databases Sequence databases • Nucleotide databases • Protein databases Sequence databases Nucleotide databases • International Nucleotide Sequence Database Collaboration (INSDC) – NCBI – EMBL – DDBJ Standard contents of a sequence database • • • • • • • • Sequences Accession number References Taxonomic data Annotation/curation Keywords Cross-references Documentation NCBI • • • • Very comprehensive biological database GENBANK: The nucleotide sequence database Provides 42 different resource Provides a simple and easy to use web interface http://www.ncbi.nlm.nih.gov/ • Sequence submission: done using Bankit or Sequin • Search Engine for data retrieval: Entrez • Retrieves information across all the resources under NCBI Example: PubMed, taxonomy, SNP, PubChem etc. Tools for analysis • • • • • BLAST Primer-BLAST B-Link ORF finder Genome workbench Protein Sequence databases • UniProt • PFAM • Gene Index project UniProt • Universal Protein Resource • Formed through the merger of : – SIB – EBI-SwissProt – TrEMBL – PIR-PSD • Entry names are often the names of the gene followed by the species. • Accession numbers are of the following format: • e.g. P26367 (PAX6_HUMAN) Uniprot features • Blast • Align • Retrieve • ID mapping Pfam • Proteins contain conserved regions • Based on the conserved regions, proteins are classified into families • Provides links to external databases like PDB, SCOP, CATH etc. Pfam: Features • • • • • • Sequence search View Pfam family View a clan View a sequence View a structure Keyword search Gene Indices • Project aimed at indexing genes and their variants in the various genome sequences. • Creating a catalogue of genes in a wide range of organisms • Reduce redundancy Gene Indices Software Tools • • • • TGI Clustering tools Clview SeqClean Cdbfasta/cdbyank Structural databases • PDB – Protein Data Bank • CATH • SCOP – Structural Classification of Proteins wwPDB • Contains information about experimentally determined structures of proteins, nucleic acids, and complex assemblies • RCSB-PDB, PDBe, PDBj, BMRB – repositories of protein structure data • Files in PDB, mmCIF, PDBML/XML formats • Advanced search – provides comprehensive information about a protein. • Sequence info, domain info, sequence similarity, literature, apart from the details of the structure. • Cross referenced to SCOP and CATH CATH • Classification of proteins based on domain structures • Each protein chopped into individual domains and assigned into homologous superfamilies. • Hierarchial domain classification of PDB entries. CATH hierarchy • Class – derived from secondary structure content is assigned automatically • Architecture – describes gross orientation of secondary structures, independent of connectivity • Topology – clusters structures according to their topological connections and numbers of secondary structures • Homologous superfamily – this level groups together protein domains which are thought to share a common ancestor and can therefore be described as homologous SCOP • Description of structural and evolutionary relationships between all the proteins with known structures • Uses the PDB entries • Search using keywords or PDB identifiers Hierarchy in SCOP • Class • Fold • Superfamily • Family • Species Thank you