Download bchm6280_lect5_16

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Ubiquitin wikipedia , lookup

Circular dichroism wikipedia , lookup

Rosetta@home wikipedia , lookup

List of types of proteins wikipedia , lookup

Structural alignment wikipedia , lookup

Protein wikipedia , lookup

Protein design wikipedia , lookup

Cyclol wikipedia , lookup

Protein folding wikipedia , lookup

G protein–coupled receptor wikipedia , lookup

Bimolecular fluorescence complementation wikipedia , lookup

Proteomics wikipedia , lookup

Intrinsically disordered proteins wikipedia , lookup

Homology modeling wikipedia , lookup

Protein structure prediction wikipedia , lookup

Western blot wikipedia , lookup

Nuclear magnetic resonance spectroscopy of proteins wikipedia , lookup

Protein moonlighting wikipedia , lookup

Protein mass spectrometry wikipedia , lookup

Trimeric autotransporter adhesin wikipedia , lookup

Protein purification wikipedia , lookup

Protein–protein interaction wikipedia , lookup

Protein domain wikipedia , lookup

Transcript
Protein families, domains and motifs
in functional prediction
May 31, 2016
Outline
•
•
•
•
•
Usefulness of protein domain analysis
Types of protein domain databases
Interpro integrated protein domain database
SMART database
Predicting post-translational modifications
Protein families
• Groups of homologous sequences (within and across
species) that share similar functions and domains
• Examples:
– Carbonic anhydrases (14 in humans)
– Chitin synthases (8 in C. neoformans)
– Ser/Thr kinases
Protein domains
• Conserved part of protein sequence that can evolve,
function and exist independent of the rest of the
protein chain
• Often independently stable and folded
• Can recombine or evolve from gene duplications into
proteins with different combinations of domains
Protein motifs
• Short linear peptide sequences that serve a specific
function for the protein, but will not be stable or fold
independent of the rest of chain
• Protein-protein interaction, ligand interactions,
cleavage sites, targeting
• Examples:
– 14-3-3: Interaction with kinases
– KELCH: ubiquitin targeting
– SUMO: site recognized for modification by SUMO
Predicting function for unknown proteins
• Do they belong (by sequence homology) to a protein
family?
• Do they contain known protein domains?
• Do they have motifs that suggest a specific function?
When annotation is NOT enough
• You’ve got a list of genes, most of which have been
annotated with gene ontology and a potential
protein function
• Why would you want to go on and look more
specifically at the protein domains?
Limitations of annotation
• Even in a model organism with large amount of
resources, most genes are still annotated by
similarity
• Often, the name given is based on the BEST match to
a particular domain or known protein
• But…
Limitations of BLAST
• Likelihood of finding a homolog to a sequence:
– >80% bacteria
– >70% yeast
– ~60% animal
• Rest are truly novel sequences
• ~900/6500 proteins in yeast without a known
function
• NAME: Similar to yeast protein YAL7400 not very
informative
Limitations of similarity
• Proteins with more than one domain cause
problems.
– Numerous matches to one domain can mask matches to
other domains.
• Increased size of protein databases
– Number related sequences rises and less related sequence
hits may be lost
• Low-complexity regions can mask domain matches
Proteins are modular
• Individual domains can and often do fold
independently of other domains within the same
protein
• Domains can function as an independent unit (or
truncation experiments would never work)
• Thus identity of ALL protein domains within a
sequence can provide further clues about their
function
Proteins can have >1 domain
The name: protein kinase receptor UFO doesn’t
necessarily tell you that this protein also contains IgG and
fibronectin domains or that it has a transmembrane
domain
Domains are not always functional
• If a critical residue is missing
in an active site, it’s not likely
to be functional
• A similarity score won’t pick
that up
Protein signature databases
• Identify domains or classify proteins into families to
allow inference of function
• Approaches include:
–
–
–
–
regular expressions and profiles
position-specific scoring matrix-based fingerprints
automated sequence clustering
Hidden Markov Models (HMMs)
PROSITE
• Regular expression patterns describing functional
motifs
M-x-G-x(3)-[IV]2-x(2)-{FWY}
– Enzyme catalytic sites
– Prosthetic group attachment sites
– Ligand or metal binding sites
• Either matches or not
• Some families/domains defined by co-occurrence
Citrate synthase
G-[FYAV]-[GA]-H-x-[IV]-x(1,2)-[RKTQ]-x(2)-[DV]-[PS]-R
PRINTS
• Similar to PROSITE patterns
• Multiple-motif approach using either identity or
weight-matrix as basis
• Groups of conserved motif provide diagnostic
protein family signatures
• Can be created at super-family, family and sub-family
level
http://www.bioinf.manchester.ac.uk/dbbrowser/PRINTS/index.php
Profile-HMMs
• Models generated from alignments of many homologues then
counting frequency of occurrence for each amino acid in each
column of the alignment (profile).
• Profile-HMMs used to create probabilities of occurrence against
background evolutionary model that accounts for possible
substitutions.
• Provides convenient and powerful way of identifying homology
between sequences.
• Find domains in sequences that would never be found by BLAST
alone
HMM domain databases
• PFAM
– Classify novel sequences into protein domain profiles
– Most comprehensive; >16,000 protein families (v29)
• SMART
– Signaling, extracellular and chromatin proteins
– Identification of catalytic site conservation for enzymes
• TIGRFAMs
– Families of proteins from prokaryotes
• PANTHER
– Classification based on function using literature evidence
PFAM
• Manually curated profiles
• a statistical measure of the likelihood that an
alignment occurred by chance alone
• Does not indicate functionality
PFAM Summary
PFAM Domain Organization
SMART database
• SMART: Simple Modular Architecture Research Tool
– Focus on signaling, extracellular and chromatin-associated
proteins
– Curated models for >1200 domains
• Use?
– I have several kinase domains in my protein list and want to
know which ones are functional.
– What other domains are found in signaling proteins?
SMART: Search interface
Uniprot or Ensemble
Protein Accession number
Add other searches
SMART Output
InterPro Scan
• Combines search methods from several protein
databases
• Uses tools provided by member databases
– Uses threshold scores for profiles & motifs
• Interpro convenient means of deriving a consensus
among signature methods
• Interpro records integrated with Uniprot. If have a
Uniprot accession number, access the Interpro
information from Uniprot
MAPk14 Interpro record
MAPK14 – Uniprot record
Function from sequence
•
•
•
•
Membrane bound or secreted?
GPI anchored?
Cellular localization?
Post-translational modification sites?
CBS prediction services
• Protein sorting
– SignalP, TargetP, others
• Post-translational modification
– Acetylation, phosphorylation, glycosylation
• Immunological features
– Epitopes, MHC allele binding, ect
• Protein function & structure
– Transmembrane domains, co-evolving positions
Transmembrane domain prediction
Phosphorylation prediction
O-glycosylation
EMBOSS
Open source software for molecular biology
• Predict antigenic sites
– Useful if want to design a peptide antibody
• Look for specific motifs, even degenerate
– Known phosphorylation motifs
– Find motifs in multiple sequences with one submission
• Get stats on proteins/nucleic acid sequences
• Sequence manipulation of all kinds
Today in lab
• Tutorial on protein information sites
• From a sublist generated using DAVID, generate a list
of protein IDs and obtain the sequences
• Obtain protein accession numbers for the cluster
• Submit to SMART database to characterize/analyze
the domains
• Pick 2 proteins to do additional predictions