Download 09_Berman - Structural Biology Knowledgebase

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

List of types of proteins wikipedia , lookup

Ancestral sequence reconstruction wikipedia , lookup

Protein domain wikipedia , lookup

Protein structure prediction wikipedia , lookup

Nuclear magnetic resonance spectroscopy of proteins wikipedia , lookup

De novo protein synthesis theory of memory formation wikipedia , lookup

Homology modeling wikipedia , lookup

Transcript
PSI Structural Genomics Knowledgebase
Helen M. Berman
Bottlenecks Workshop
April 14, 2008
Knowledgebase
PSI SG Knowledgebase
Knowledgebase Vision
 The PSI Structural Genomics Knowledgebase
(PSI SG KB) will turn the products of the PSI effort
into major advances in knowledge that can be
used to understand living systems and human
disease
 The PSI SG KB will be a key resource for the
advancement of biology, biochemistry, functional
genomics, pharmacology, bioinformatics,
chemistry, education and clinical medicine
PSI SG Knowledgebase
Knowledgebase Goals
To provide a “marketplace of ideas” that
 connects protein sequence information to 3D structures
and homology models
 enhances functional annotations
 provides access to new experimental protocols and
materials
To kick start and enable advancements in structural genomics
 by communicating and providing visibility and accessibility
of information and technology advances of the PSI
 through presentation and discussion of the most
provocative challenges with the general community
 by fostering community collaborations
PSI SG Knowledgebase
Scope
To capture, make accessible, and highlight elements of the high throughput
pipelines for general use in the community and to leverage such information
through the generation of hundreds of thousands of molecular models and
functional annotation. Standard metrics will be used to measure progress.
Experimental Tracking
Target Selection
Materials
Genomic
Data
Based Target Isolation, Expression,
Purification,Crystallization
Collection
Selection
Structure
Determination
PDB
Deposition & Release
Models
Annotations
Publications
Technology
Metrics
PSI SG Knowledgebase
Knowledgebase Users







Biologists
Biochemists
Functional Genomists
Pharmacologists
Bioinformatics
Chemists
Clinical Researchers and
Physicians
 Teachers and Students
KB Site Features
Search by
- Sequence
- Keyword
- PDB ID
Featured
Structure
News
and
Events
Technology
Feature
Molecules
of Unknown
Function
Link to
Functional
Sleuth
Gallery
Link to
Technology
Module
PSI SG Knowledgebase
PSI SG KB Portal
 Collects sequences, common features, and common
identifiers
 Maintains correspondences in local database
 Delivers aggregate reports, inventories, and epublications which contain links to PSI projects, modules
and external resources
 Delivers featured articles describing: PSI news and
events, featured molecules and technologies, molecules
of unknown function
 Provides collaborative environments for discussion,
annotation, and target suggestions
PSI SG Knowledgebase
PSI SG KB Portal Databases
Keyword
Database
PSI Modules
PSI Centers
Models Portal
Queries
Portal
Resource
Database
PSI Info Site
PDB ID
Related Biological
Resources
Archival Sequence
Databases
Domain Databases
(Pfam)
Sequence
Keyword
TargetD
B
PepcDB
PDB
Literature (PubMed)
TargetDB
Sequences
PDB
Sequences
PSI SG Knowledgebase
Modules
Modules derived from PSI information and external resources

Target Selection & Experimental Data Tracking

Materials Repository

Models

Annotation

Metrics

Technology

Outreach
PSI SG Knowledgebase
Target Selection & Experimental Data
Tracking
 Target Selection – PSI-2 BIG4
 Family definitions and target management
 TargetDB
 Search by sequence, Target ID, project site, status, update date,
protein name, and source organism
 Links to other sequence databases, domain databases, other
structural genomics centers, and PDB
 Download target data
 Target statistics summary
 PepcDB
 All the functionality of TargetDB
plus
– Experimental protocols
– Detailed status history of experimental trials
– Information on failed experiments
PSI SG Knowledgebase
Experimental Tracking
PepcDB Search Form
Protocol Keywords Search
PSI SG Knowledgebase
PSI SG Knowledgebase
Experimental
Tracking Module
PSI SG Knowledgebase
PSI SG Knowledgebase
Materials Repository
PSI SG Knowledgebase
PSI Materials Repository Module
PSI SG Knowledgebase
PSI SG Knowledgebase
Modeling Portal
Current Phase 1 Model Portal contains
 Models from 4 PSI centers and 2 public model
databases (SwissModel and ModBase) integrated on
a common UniProt reference system.
 Current release consists of 5.8 million comparative
protein models for 1.97 million distinct UniProt
entries.
PSI SG Knowledgebase
Modeling Portal
PSI SG Knowledgebase
Metrics Module
 Provides objective measures of the progress and
output of the PSI project
 Centered around “Goals and Milestones” document
PSI SG Knowledgebase
PSI-2 Summary Statistics
Updated April 1, 2008
I.1.A
Number of novel experimental PSI-2 structures
1031
I.1.B
Number of distinct experimental PSI-2 structures nonredundant sequences
1428
I.1.D
Total number of experimental PSI-2 structures
1628
I.1.E
Numbers of experimentally determined distinct residues
319977
Numbers of experimentally determined novel residues
225518
I.2.J
Number of experimental structures of human proteins
61
I.2.K
Number of experimental structures of eukaryotic proteins
186
I.2.M
Number of experimental structures of membrane proteins
1
I.2.N
Number of experimental structures determined at the atomic
level using x-ray crystallography
1484
Number of experimental structures determined at the atomic
level using NMR methods
144
PSI SG Knowledgebase
PSI-2 Summary Statistics for Domain and
Modeling Leverage
Updated January 15, 2008
I.1.C
I.1.E
Number and Size of BIG Domain Families for which
PSI-2 provides the first Experimental Structure Representative
474
Number and Size of MEGA Domain Families for which
PSI-2 provides the first Experimental Structure Representative
399
Numbers of Experimentally Determined Distinct BIG Family
Residues
76579
Numbers of Experimentally Determined Distinct MEGA Family
Residues
76121
Updated February 21, 2008
I.3.A
Total Modeling Leverage
583735
I.3.B
Novel Modeling Leverage
114407
PSI SG Knowledgebase
Technology Module
PSI Centers are actively developing technologies and
methodologies for all aspects of the structure
determination pipeline
Genomic
Based Target
Selection
Isolation, Expression,
Purification,Crystallization
Data
Collection
Structure
Determination
PDB
Deposition & Release
Publication
Functional
Annotation
PSI SG Knowledgebase
Technology Module Progress
 Phase 1 Technology Portal in place
 Summary Information from all PSI Centers
 Keyword search from KB portal
PSI SG Knowledgebase
PSI SG Knowledgebase
PSI SG Knowledgebase
PSI SG Knowledgebase
PSI SG Knowledgebase
PSI SG Knowledgebase
PSI SG Knowledgebase
Outreach Module




Provides information to the public about the products
and accomplishments of the PSI
Media reports
Publications
Community activities
Plans for a Nature Gateway
PSI SG Knowledgebase
PSI SG Knowledgebase
Current Annotation Module
Provides paths to unravel sequence,
structure, function relationships
 10 PSI Interactive Services for Sequence,
Structure and Functional Annotations
 11 PSI Galleries and Summaries of Sequence,
Structure and Functional Annotations
 35 other resources for annotation
PSI SG Knowledgebase
Annotation Module
PSI SG Knowledgebase
PSI SG Knowledgebase
Biological Annotation of Novel Proteins
March 7,8 2008 Calit2, UCSD
 Participants
 PSI groups
 Annotation system authors
 General biological community
 Outcome
 Recommendations for standard annotations
 Processes for community input
PSI SG Knowledgebase
Standard Annotations
Genomic features: gene identifier, name and synonyms, operon/regulon mappings
Protein sequence features: amino acid sequence, taxonomy & phylogeny, sequence
database accession, isoform, SNPs, PTMs, sequence families, residue conservation.
Structure features: oligomeric state, structure and functional domains, DNA binding motifs, nests &
clefts, sites of interaction, residue regions of protein-protein, ligand-protein, catalytic sites, secondary
structure, structural neighbors and comparison of groups of structures with common feature,
properties/features mapped to 3D and their similarities (e.g. electrostatics, cavities, conserved residues,
quality assessment )
Ligands: chemical structure, interactions, functional role.
Functional classification: GO, FunCat, EC, epitope mapping, cellular location, organ location, substrate
specificity, disease involvement
Mapping to Biological Systems: mapping to networks and pathways (e.g. Reactome, Kegg, HPRD,
BioCyc, Reactome, KEGG, HPRD, NetPath, MINT, MIPS, DIP, STRING, STITCH, PROLINKS)
Literature: synonyms for protein names, links to PubMed by database identifier and related text and
authors
PSI SG Knowledgebase
Future Improvements
Experimental Data Tracking  Standardization of the protocols in PepcDB
 PepcDB data deposition tool
 Integration with the Materials Repository
Materials Repository  Searchable database of clones
 Ordering system
 Integration with PepcDB and PSI SGKB
Models Module  Public web service interface
 Additional quality assessment
 Interactive homology modeling
PSI SG Knowledgebase
Future Improvements
Technology Module  Improved navigation over technology topic areas
 Keyword search option of descriptions and publications
PSI SGKB  Integration with Nature Gateway
 Simple presentation and search of standard annotations
 Incorporation of data about ligands and modified-residues
 Molecular visualization tool
PSI SG Knowledgebase
Access Information
http://kb.psi-structuralgenomics.org/KB/
Acknowledgements
KB Team
Wendy Tao
Raship Shah
James Chun
John Westbrook
Modules
Torsten Schwede (Models)
Andrei Kouranov (Exp. Data Tracking)
Paul Adams (Technology)
Wladek Minor (Publications)
Josh La Baer (Materials)
Rajesh Nair (Metrics)