Download CCPN at Göteborg: Day 1

Document related concepts

Clusterpoint wikipedia , lookup

Relational model wikipedia , lookup

Functional Database Model wikipedia , lookup

Object-relational impedance mismatch wikipedia , lookup

Database model wikipedia , lookup

Transcript
The CCPN Project
Tim Stevens and Wayne Boucher
October 2005
CCPN at Göteborg: Day 1
■ Introduction to CCPN
■ The CcpNmr applications
■ Analysis basics
■ Future developments
■ Analysis advanced
CCPN at Göteborg: Day 2
■ An overview of the data model
■ API Tutorial
■ Analysis Macros
■ Widgets and Popups
CCPN Overview
The CCPN Project
■ Collaborative Computing Project for NMR
● Started in 1999
● Collaborators in several countries
● Developers at University of Cambridge and EBI
■ Unifying platform for NMR software
● Similar to CCP4 (X-ray)
■ Main goals:
●
●
●
●
Data standards and data exchange
Software development and distribution
Meetings to determine and disseminate best practice
Open source access
People
■ Cambridge
● Ernest Laue
● Rasmus Fogh
● Dan O’Donovan
■ EBI, Hinxton
●
●
●
●
Kim Henrick
John Ionides
Wim Vranken
Anne Pajon
History
■ Workshops:
● EBI (2000, 2001)
● Washington (2000)
■ Funding:
● BBSRC (2000-2003, 2003-2006)
● NMRQUAL (2001-2004)
● TEMBLOR (2002-2005)
● NMR-EXTEND (2005-2008)
NMR Software
■ Problem - Heterogeneous development
●
●
●
●
●
Lots of proprietary data formats
Lots of stand-alone programs
Data is ‘lost’ along the way
Dedicated converters needed
Not acceptable for structural genomics projects
■ Solution - Unity
● Data standards
■ Ease of transfer between programs
■ Completeness, integrity, deposition, data mining
● Libraries
Data Format vs. Data Model
■ Data format - How data is stored
●
●
●
●
STAR
XML
SQL
Tab-separated ascii
■ Data model - What data means
● RCSB (PDB) mmCIF
● XML DTD or schemas
● SQL schema
CCPN Approach
■ Data model rather than data format
● Format independent
● Language independent
● Scientifically descriptive (NMR)
■ Library (API): in memory manipulation
● Create, update, delete & query objects
● One for each language
● Error checking
■ I/O modules: load/store data from/to disk
● One for each (storage format, language)
● Bookkeeping
Application View
User
GUI
Application1
API
Application2
In Memory Representation
(Python, Java, C++, Perl)
I/O
Application3
Data Store
(XML, SQL)
Model-Driven Architecture
■ UML: Unified Modelling Language
● Abstract representation of semantics
● Pictorial
■ Mapping from UML: to anything
● Multi-language
● Multi-format
● Architecture neutral (e.g. distributed or not)
■ Power: good and bad
■ CCPN uses Object Domain as its UML tool
● Python as scripting language
Documentation
Handcoded (1%)
Autogeneration
UML Model
Package 1
APIs
User
Application
Deposition
Program
Developers
Package 2
Python
Storage
Package 3
Java
C
Perl
SQL
XML
MEMOPS
framework
Domain
Experts
Data Model Packages
Reference
Citations
CcpNmr
Programs
Experimental
Laboratory
NMR
Protocols
Samples
Nuclei and
Molecule
Structure
Isotopes
Molecule
Targets
Sequence
Compound Structure and
Compound
Coordinates
Source
Molecular Preparation
Residue System Project
Organisms,
Template
Tracking
Taxonomy
X-ray
Crystallography Crystallisation
UML Example
CCPN API
■ Classes for developers
●
●
●
●
Mainly getters and setters
More than just code stubs
Constraints (e.g. cardinality) enforced
Links the hard part
■ Mostly (> 99%) auto generated from UML
● Some helper functions and constraints hand coded
■ Currently around 360k lines in Python and 650k
lines in Java
Developer Benefits
■
■
■
■
Specified data model and API
No I/O code
Concentrate on science, not bookkeeping
Extendible
● Application data can be assigned to any object
● UML model can be extended (packages)
■ Notification system
● Register interest when specified attribute changes
(class, not object, level)
■ Undo/Redo (in future)
Current Status of API
■ Stable and released:
● Python and XML code generation
● NMR, molecule description and structure data model
■ In testing stages:
● Java and SQL database code generation
● Protein production data model
■ Preliminary:
● X-ray crystallography data model
CcpNmr Applications
Structural Biology Pipeline
NMR
machine
Data
processing
Spectrum
analysis
Structure
calculation
Databases
NMR Applications
CcpNmr
Processing
Reference data
CcpNmr
Analysis
ARIA 2.0
CCPN
Data Model
CcpNmr
FormatConverter
Other formats (NmrView, XEasy, …)
Validation
software
NMRStar 3.0
Main CcpNmr Applications
■ Format Converter
● Conversion to and from legacy formats
■ Analysis
● Graphical analysis (e.g. assignment) program
■ Processing (coming soon)
● Azara “process” wrapped in data model
CcpNmr Format Converter
■ Import/export of data formats to the Data Model
● For harvesting/deposition purposes
● Allow people to use or try out the data model
● Interaction with existing programs
■ Fully or partially handles:
● Ansig, Auremol, Autoassign, Azara, Bruker, Charmm,
CNS/XPLOR/ARIA, Concoord, Diana/Dyana/Cyana,
Discover, Fasta, Felix, Module, .mol, Molmol, Monte,
NmrDraw, NMRPipe, NMR-STAR (v2.1.1, v3.0),
NmrView, Pdb, Pipp, Pistachio, Pronto, Sparky, Talos,
Varian, XEasy
● Sequences, chemical compounds, coordinates, NMR
measurements, constraints and peak lists, processing
and acquisition parameters.
Format Converter - The NMR Translator
Peaks
XEasy
NmrView
Chemical shifts
...
Generic peak
converter
XEasy
NmrView
Acquisition parameters
...
Generic chemical
shift converter
Bruker
Varian
Generic acquisition
parameters converter
Format specific
writers
CCPN
Data Model
XEasy
NmrView
Peaks
...
XEasy
NmrView
Chemical shifts
...
Azara
NMRPipe
Processing parameters
Format Converter Design
■ Wim Vranken (EBI)
■ Set of Python scripts
■ Accessed via:
● Tkinter (Tcl/Tk)
● custom Python scripts
■ http://www.ebi.ac.uk/msdsrv/docs/NMR/NMRtoolkit/main.html
CcpNmr Analysis
■ Requirements
●
●
●
●
●
●
Cross platform
Scalable
Extensible
Open and easy scripting language
Modern graphical user interface
Uses CCPN data model and API
■ Software
● Python, Tcl/Tk, C, OpenGL
● (Java, X, Motif)
■ OS
● Linux, Sun, SGI, OSX
(Windows)
Spectrum Windows
■
■
■
■
■
■
■
■
N-dim. windows
Multiple spectra
Automatic mapping
Contours on fly
Aliasing
Strips & cells
Mouse and key
Blocked data
●
●
●
●
Azara
Felix
NMRPipe
UCSF
Graphical Interface
■ Menus and popup dialogues
● CcpNmr widgets
■ Main objects
●
●
●
●
●
●
Spectra
Windows
Peaks
Resonances
Molecules
Structures
Assignment
■
■
■
■
■
■
■
■
■
Peak finding and fitting
Rich assignment model
Mainly mouse-driven
Can assign to atoms
Ambiguous contributions
Existing structure
Short resonance list
Multiple peaks easily
Navigation
The CLOUDS Protocol
■ Automated assignment &
structure determination
● Miguel Llinas, Alex Grishaev, et al.
● Spatial distribution of anonymous
resonances generated with NOEs
■ Integrated within CCPN
●
●
●
●
An Analysis module
Data Model glues modules
Functional platform
Distribution network
Spectra
Pick Peaks,
Link Shifts &
Combine
Pick Peaks &
Normalise
Spin Systems
NOE intensities
Relaxation Matrix
Optimisation
Distance Constraints
Hydrogen Atom
Molecular Dynamics
Proton Clouds
Chain Fitting &
Molecular Replacement
Chain Assignment
Full Structure
Calculation
Protein Structure
The CLOUDS Protocol
A family of Clouds
A fitted protein backbone
Other Features
■
■
■
■
Works with FormatConverter
Chemical compounds database
NMR reference information
Hard copy
● PostScript
● PDF
■
■
■
■
Table export
Rate analysis
Macros
Structures
CcpNmr Analysis Tutorial Part I
CCPN Future
Extend-NMR
■ EU STREP application funded to fully
integrate software from:
●
●
●
●
●
●
●
●
Bruker (TOPSPIN, acquisition)
Billeter, Orekhov (Garant, Munin, MDD)
Kalbitzer (Auremol)
Llinas (CLOUDS)
Nilges (Inferential Structure Determination)
Bonvin (Haddock, RECOORD)
Vriend, Vuister (Queen, What-Check)
Henrick, Vranken (NMR database)
■ Focus on complexes and development of
better software methodology
LIMS Collaborations
■ PIMS project collaboration
● Protein production LIMS
(with EBI, Sport Consortia, OPPF and Poupon)
■ EU STREP application (SFGLIMS) to work with :
● Poupon (Protein Production)
● Perrakis (Biophysical methods, crystallisation)
● Bricogne (X-ray data collection and structure
generation)
● Prilusky, Sussman (Bioinformatics, data mining)
Data Model Extensions
■ EXTEND-NMR
● New NMR applications
■ Solid state NMR
■ PIMS
● LIMS for protein production
■ SFGLIMS
● LIMS for NMR and X-ray structure determination
■ X-ray
■ Chemoinformatics
■ (Metabolomics?)
Code Generation Plans
■ C++/C/FORTRAN code
● Needed for Extend-NMR and for CcpNmr Processing
● Needed for interface to CYANA, NMRPIPE,
AUTOPSY, etc.
■ Java/Database code
● Extend for LIMS, high-throughput projects, NMRVIEW
■ Basic Machinery
● Upgrades for long term extensibility/maintainability
and performance
API Languages and Formats
Language
Format
Python
XML
SQL
Java
Analysis
FormatConverter
Bruker
TopSpin
NMRVIEW
MSD NMR
database
PIMS
SFGLIMS
For all languages:
• Metamodel
• Documentation
C++
Perl
Azara
Extend-NMR
NMRPIPE
AUTOPSY
(Varian)
(CYANA)
(Bioinformatics)
(SFGLIMS)
(bioinformatics)
For all formats:
• Schemas
• I/O mappings
New Core API technology
■ Reduce burden of adding new languages, formats
● Languages (Python, Java, C++, Perl)
● Storage formats (XML, SQL)
Most of the logic
Language & Format
independent
Format dependent
Language dependent
only
only
Language & Format
dependent
Code required for
Code required for
new format
new language
Core API technology, cont.
■ Remodelling of implementation details
● Storages, collection types, root objects, etc.
■ Complex data types
● e.g. rotation matrix
■ Client/Server architecture
● For PIMS and SFGLIMS
Analysis Development
■ Beyond CLOUDS
● Large proteins, homologues
■ Processing linked in
■ Couplings (RDCs, TROSY), dihedral
constraints
■ Titrations (Ka, Kd)
■ Chain states (alternate conformations)
■ Solid State NMR
■ Organic chemistry NMR (1D)
■ Publication-ready diagrams and tables
■ Windows version
Developments in Extend-NMR
■ Integrated Bayesian, maximum entropy, …
methods for data-processing, analysis and
structure calculation
■ ‘Molecular replacement’ for NMR
■ Further RECOORD development
■ Databank for Experimental NMR spectra (DEN)
■ MSD database analysis
Licenses
■ GPL
● Data model
● Scripts which produce APIs
■ LGPL
● Generic libraries
● Widget libraries
● Format Converter
■ CCPN
● Analysis
Resources, 1
■ SourceForge:
● CVS repository for code
● API and FormatConverter releases
● http://sourceforge.net/projects/ccpn
■ CCPN:
● Meetings, workshops
● API, FormatConverter and Analysis releases
● http://www.ccpn.ac.uk
Resources, 2
■ EBI:
● Format Converter
● Databases (MSD group)
● http://www.ebi.ac.uk/msdsrv/docs/NMR/NMRtoolkit/main.html
■ JISCMAIL:
● Email list
● http://www.jiscmail.ac.uk/lists/ccpnmr.html
● (http://www.jiscmail.ac.uk/lists/nmrgen.html)
CcpNmr Analysis Tutorial Part II
CCPN at Göteborg: Day 2
■ An overview of the data model
■ API Tutorial
■ Analysis Macros
■ Widgets and Popups
Major Data Model Packages
CCPN Packages
■ Groupings of related data
● e.g. NMR, X-ray, Molecular description
■ Connections between packages
● e.g. NMR loads Nucleus (isotope)
information
Molecule
ChemComp
People
■ Allows lazy loading
● Only load relevant data
● Only load when a link is queried
■ Save only modified
■ Reference packages
● Chemical compound, Reference
chemical shifts
MolSystem
Nucleus
Sample
Coordinates
Nmr
ChemElement
ChemElement - Details
Coordnates
Analysis
Implementation
Molecules and MolSystems
■ Molecules
● Templates for specifying molecular connectivity.
● Sequences, chemical components, protonation state etc.
● A kind of reference, e.g. “Lysozyme”
■ MolSystems
● Contain chains, which contain residues, which contain atoms.
● The objects you assign to.
● Built using molecule templates, e.g. a homo-oligomer is built
using the same template to make different chains.
■ Stored in different packages
● Molecule.xml, MolSystem.xml
MolSystem
Molecule
ChemComp
Experiment, Spectrum & Shift List Objects
■ Experiment
● The set-up under particular conditions at a particular time, not a class of
experiment.
■ Spectrum
● Known as Data Source in the data model. A pointer to a chunk of data
that results from an experiment. Several spectra may result from the
same experiment if they are processed differently.
■ Peak List
● A set of crosspeaks that have been picked for a spectrum. A spectrum
can have several peak lists. The user can separate peaks into classes,
e.g. picked in different ways.
■ Shift List
● A set of chemical shifts, which are derived from peaks and may be
linked to atoms. Valid for a set of experiments with similar conditions
that give similar chemical shifts. Using different shift lists doesn’t change
assignments, but it does change which peaks are used in the
calculation of a shift value.
Nmr
Nmr.Peak
Resonances and Assignment
■ Resonances
Experiment
Spectra
Conditions
● The centre of the NMR data
model
■ Connect to peaks
● Different peaks may be
caused by the same thing.
■ Connect to atoms
● A connection to NMR
equivalent atoms. Need not
be set if anonymous.
■ Have chemical shifts
● May have different shifts
under different conditions.
Measurement
Chemical Shift
Relaxation
Coupling
Peak
Dimensions
Annotation
Spin System
Connectivity
Residue Type
Resonance
Constraint
Distance
Dihedral
Structure
Co-ordinates
Molecule
Atoms
Residues
Chains
Nmr.Resonance
NmrConstraints
Python API coding tutorial
Development in the CCPN framework
■ CcpNmr Macros
● Small home-use Python
functions
■ Additions to function library
● Functions incorporated in
software release
● Community sharing
■ Embedded options
● Extension to CcpNmr application
■ Stand-alone applications
● Built on CCPN libraries and API
■ CcpNmr Clouds has examples of
all of these
The Python interface to the CCPN Data Model
■ Find the number of assigned peaks in a spectrum
count = 0
for peakList in spectrum.peakLists:
for peak in peakList.peaks:
for peakDim in peak.peakDims
if peakDim.peakDimContribs:
count += 1
break
■ Find all H-C partners in a residue
pairs = []
for atom in residue.atoms:
if atom.chemAtom.elementSymbol == ‘C’:
for bond in atom.chemAtom.chemBonds:
chemAtoms = list(bond.chemAtoms)
chemAtoms.remove(chemAtom)
if chemAtoms[0].elementSymbol == ‘H’:
pairs append([atom, residue.findFirstAtom(chemAtom=chemAtom2))])
CcpNmr Analysis Macros
■ Python scripts/functions
■ Accessible from Analysis and embeddable
■ Argument server
● An interface to the Analysis program
● Access to objects
■
■
■
■
■
Selected peaks
Cursor position
Spectra
Windows
Etc…
■ High-level function library
● Windows, Assignment, Molecules, Constraints
● Documented
Macro 1 - Simple stuff
• Python language
• Function anatomy
• Import library functions
• ArgumentServer
• Simple program
def addMarksToPeaks(argServer, peaks=None):
"""Descrn: Adds position line markers to the selected peaks.
Inputs: ArgumentServer, List of Nmr.Peaks
Output: None
"""
from ccpnmr.analysis.MarkBasic import createPeakMark
if not peaks:
peaks = argServer.getCurrentPeaks()
# no peaks - nothing happens
for peak in peaks:
createPeakMark(peak, remove=0)
Macro 2 - Ask the user
def calcAveragePeakListIntensity(argServer, peakList=None, intensityType='height'):
"""Descrn: Find the average height of peaks in a peak list.
Inputs: ArgumentServer, Nmr.PeakList
Output: Float
"""
from ccpnmr.analysis.ConstraintBasic import getMeanPeakIntensity
if not peakList:
peakList = argServer.getPeakList()
if not peakList:
argServer.showWarning('No peak list selected')
return
answer = argServer.askYesNo('Use peak volumes? Height will be used otherwise.')
if answer: # is true
intensityType = 'volume'
spec
expt
intensity
data
=
=
=
=
peakList.dataSource
spec.experiment
getMeanPeakIntensity(peakList.peaks, intensityType=intensityType)
(intensityType,expt.name,spec.name,peakList.serial,intensity))
argServer.showInfo('Mean peak %s for %s %s peak list %d is %e' % data
return intensity
Macro 3 - Popup loader
def openMyPopup(argServer):
"""Descrn: Opens and example popup.
Inputs: ArgumentServer
Output: None
"""
peakList = argServer.getPeakList()
popup = MyPopup(argServer.parent, peakList)
from
from
from
from
memops.gui.BasePopup import BasePopup
memops.gui.ButtonList import ButtonList
memops.gui.ScrolledGraph import ScrolledGraph
ccpnmr.analysis.PeakBasic import getPeakHeight, getPeakVolume
Macro 3 - The popup
class MyPopup(BasePopup):
def __init__(self, parent, peakList, *args, **kw):
self.peakList = peakList
self.colours
= ['red', 'green']
self.dataSets = []
BasePopup.__init__(self, parent=parent, title='Test Popup', **kw)
def body(self, guiParent):
row = 0
self.graph = ScrolledGraph(guiParent)
self.graph.grid(row=row, column=0, sticky='NSEW')
row += 1
texts
= ['Draw graph','Goodbye']
commands = [self.draw, self.destroy]
buttons = ButtonList(guiParent, texts=texts, commands = commands)
buttons.grid(row=row, column=0, sticky='NSEW')
def draw(self):
self.dataSets = self.getData()
self.graph.update(self.dataSets, self.colours)
def getData(self):
peakData = [( getPeakVolume(peak) or 0.0, peak) for peak in self.peakList.peaks]
peakData.sort()
heights = []
volumes = []
i = 0
for volume, peak in peakData:
heights.append([i, getPeakHeight(peak) or 0.0])
volumes.append([i, volume])
i += 1
CcpNmr Graphical Widgets
■ A library for any developer to use
ColorList
PulldownMenu
ScrolledMatrix
LabelFrame
CheckButton
Button
Label
Entry
ButtonList
CcpNmr Mega Widgets
■ Build them into your own code!
● ScrolledMatrix
● ScrolledGraph
● StructureFrame
Ccp Stand-Alone AppTemplate
■ Menu System
■ Project handling
●
●
●
●
New
Load
Save
Backup
■ Popup template
● Widgets
● Geometry
● Plumbing
Popup Constructors and Notifiers
■ Init
● Setup local variables
● Subclass popup window
Initialisation
■ Body
● Arrange Graphical elements
● Set up Data Model notifiers
● Set initial state
■ Update
● Process updated values
● Redraw widgets based on status
■ Widget callback
● From entry, buttons etc
● User functions
● Data Model change
User
Influence
Widgets
Body
Notifiers
Update Filter
Update
External
Influence
Data
Model
Aftercare
■ www.ccpn.ac.uk
●
●
●
●
Downloads
Data Model documentation
Analysis documentation
Tutorials
■ Mailing List
●
●
●
●
http://www.jiscmail.ac.uk/lists/CCPNMR.html
Quick response
Bugs
Requests