Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Protein Structure &
Analysis
Biology 224
Dr. Tom Peavy
Sept 27 & 29
<Images from Bioinformatics and Functional
Genomics by Jonathan Pevsner>
Protein families
Protein localization
protein
Protein function
Gene ontology (GO):
--cellular component
--biological process
--molecular function
Physical properties
The Human Proteome Organisation (HUPO)
Proteomics Standards Initiative (PSI)
Work groups
• Protein Separation
• Mass Spectrometry
• Molecular Interactions
• Protein Modifications
• Proteomics Informatics
Themes
• Controlled vocabularies
• MIAPE: Minimum information about a proteomics
experiment
Protein domains, motifs
& signatures
Definitions
Signature:
• a protein category such as a domain or motif
(a defining property of the protein or family)
Domain:
• a region of a protein that can adopt a 3D structure
• a fold
• a family is a group of proteins that share a domain
• examples:
zinc finger domain
immunoglobulin domain
Motif (or fingerprint):
• a short, conserved region of a protein
• typically 10 to 20 contiguous amino acid residues
Definition of a domain
According to InterPro at EBI (http://www.ebi.ac.uk/interpro/):
A domain is an independent structural unit, found alone
or in conjunction with other domains or repeats.
Domains are evolutionarily related.
According to SMART (http://smart.embl-heidelberg.de):
A domain is a conserved structural entity with distinctive
secondary structure content and a hydrophobic core.
Homologous domains with common functions usually
show sequence similarities.
15 most common domains (human)
Zn finger, C2H2 type
Immunoglobulin
EGF-like
Zn-finger, RING
Homeobox
Pleckstrin-like
RNA-binding region RNP-1
SH3
Calcium-binding EF-hand
Fibronectin, type III
PDZ/DHR/GLGF
Small GTP-binding protein
BTB/POZ
bHLH
Cadherin
1093 proteins
1032
471
458
417
405
400
394
392
300
280
261
236
226
226
Varieties of protein domains
Extending along the length of a protein
Occupying a subset of a protein sequence
Occurring one or more times
Example of a protein with domains:
Methyl CpG binding protein 2 (MeCP2)
MBD
TRD
The protein includes a methylated DNA binding domain
(MBD) and a transcriptional repression domain (TRD).
MeCP2 is a transcriptional repressor.
Mutations in the gene encoding MeCP2 cause Rett
Syndrome, a neurological disorder affecting girls
primarily.
Result of an MeCP2 blastp search:
A methyl-binding domain shared by several proteins
Are proteins that share only a domain homologous?
Proteins can have both domains and patterns (motifs)
Pattern
Pattern
(several (several
residues) residues)
Domain
(aspartyl
protease)
Domain
(reverse
transcriptase)
Can find UniProt accession number within GenBank Entry
Human hemoglobin
subunit beta
NP_000509
The SwissProt entry for
any protein provides
highly useful information…
Definition of a motif
A motif (or fingerprint) is a short, conserved region
of a protein. Its size is often 10 to 20 amino acids.
Simple motifs include transmembrane domains and
phosphorylation sites. These do not imply homology
when found in a group of proteins.
PROSITE (www.expasy.org/prosite) is a dictionary of
motifs (there are currently 1600 entries). In PROSITE,
a pattern is a qualitative motif description (a protein
either matches a pattern, or not). In contrast, a profile
is a quantitative motif description. Profiles are found
in Pfam, ProDom, SMART, and other databases.
Page 231-233
Pattern syntax
The symbol `x' is used for a position where any amino acid is accepted.
Ambiguities are indicated by listing the acceptable amino acids for a given position, between
square brackets `[ ]'. For example: [ALT] stands for Ala or Leu or Thr.
Ambiguities are also indicated by listing between a pair of curly brackets `{ }' the amino acids
that are not accepted at a given position. For example: {AM} stands for any amino acid except
Ala and Met.
Each element in a pattern is separated from its neighbor by a `-'.
Repetition of an element of the pattern can be indicated by following that element with a
numerical value or, if it is a gap ('x'), by a numerical range between parentheses.
Examples:
x(3) corresponds to x-x-x
x(2,4) corresponds to x-x or x-x-x or x-x-x-x
A(3) corresponds to A-A-A
Note: You can only use a range with 'x', i.e. A(2,4) is not a valid pattern element.
When a pattern is restricted to either the N- or C-terminal of a sequence, that pattern either
starts with a `<' symbol or respectively ends with a `>' symbol.