Download Protein Function and Classification (Cont.) - EMBL-EBI

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Ubiquitin wikipedia , lookup

Immunoprecipitation wikipedia , lookup

List of types of proteins wikipedia , lookup

Circular dichroism wikipedia , lookup

Rosetta@home wikipedia , lookup

Protein wikipedia , lookup

Cyclol wikipedia , lookup

Protein design wikipedia , lookup

Structural alignment wikipedia , lookup

Protein folding wikipedia , lookup

Bimolecular fluorescence complementation wikipedia , lookup

Trimeric autotransporter adhesin wikipedia , lookup

Intrinsically disordered proteins wikipedia , lookup

Proteomics wikipedia , lookup

Protein moonlighting wikipedia , lookup

Homology modeling wikipedia , lookup

Western blot wikipedia , lookup

Protein mass spectrometry wikipedia , lookup

Protein purification wikipedia , lookup

Protein domain wikipedia , lookup

Protein structure prediction wikipedia , lookup

Nuclear magnetic resonance spectroscopy of proteins wikipedia , lookup

Protein–protein interaction wikipedia , lookup

Transcript
Predicting protein structure
function using InterPro
and
Alex Mitchell & Hsin-Yu Chang (V9, Oct. 2013)
1
www.ebi.ac.uk
Contents
Course Information ........................................................................................... 3
Course learning objectives ............................................................................... 3
An introduction to InterPro ............................................................................... 4
What are protein signatures? ........................................................................ 4
InterPro entry types ...................................................................................... 5
InterPro entry relationships ........................................................................... 5
The InterPro home page ............................................................................... 6
Searching InterPro ........................................................................................ 7
I. Searching InterPro using a text search ........................................................ 7
Learning objectives ............................................................................................. 7
What information can be found in the InterPro entry page? ................................. 8
How do I interpret an InterPro protein view? ........................................................ 9
Protein view: Overview page......................................................................... 9
Protein view: Similar Proteins page............................................................. 11
Protein view: Structures page ..................................................................... 12
Summary........................................................................................................... 13
Exercises ......................................................................................................... 14
Exercise 1 – Searching InterPro using a UniProt identifier................................. 14
Exercise 2 – Exploring InterPro entries................. Error! Bookmark not defined.
General annotation ........................................ Error! Bookmark not defined.
Relationships ................................................. Error! Bookmark not defined.
GO (Gene Ontology) terms ............................ Error! Bookmark not defined.
Proteins matched ........................................................................................ 17
Domain organisation ................................................................................... 18
Taxonomy ................................................................................................... 18
Structures ................................................................................................... 18
II. Querying sequences using InterProScan .................................................. 19
Learning objectives ........................................................................................... 19
Summary........................................................................................................... 20
Exercises ......................................................................................................... 20
Exercise 3 – Analysing sequences using InterProScan ..................................... 20
Course summary ............................................................................................. 21
Further reading ................................................................................................ 22
Where to find out more ................................................................................... 22
Predicting protein structure and function using InterPro
2
www.ebi.ac.uk
Course Information
Course description
This tutorial provides an introduction to
InterPro, its web interface and content. You
will learn how to search InterPro to obtain
predicted information about protein function
and sequence and structural features.
Course level
Suitable for graduate-level scientists and
above who have not used InterPro before.
Regular users may benefit from going
through the tutorial to familiarise themselves
with the new InterPro website
Pre-requisites
Basic knowledge of biology and basic
computer skills
Subject area
Proteins and Proteomes, Genes
Genomes, Sequence Analysis
Target audience
Scientists interested in Sequence Analysis
Resources required
Internet access (a current browser such as
the latest Firefox or Internet Explorer)
Approximate time needed
2 hours
and
Course learning objectives
After completing this course, you should:
•
Understand what InterPro is and how it can be useful in your research
•
Know the different ways you can query InterPro
•
Understand the information on the InterPro entry pages
•
Be able to interpret the information on the InterPro protein analysis pages
Predicting protein structure and function using InterPro
3
www.ebi.ac.uk
An introduction to InterPro
InterPro provides functional analysis of proteins by classifying them into families
and predicting the presence of important domains and sites. It does this by
combining predictive models known as protein signatures from a number of
different databases (referred to as member databases) into a single searchable
resource. InterPro integrates the signatures, providing names and descriptive
abstracts and, whenever possible, adding Gene Ontology (GO) terms, structural
links and external database links. Together, the protein signatures combined
within InterPro provide matches to ~ 80% of proteins in the UniProt database.
What are protein signatures?
Protein signatures are obtained by modelling the conservation of amino acids at
specific positions within a group of related proteins (i.e., a protein family), or
within the domains/sites shared by a group of proteins. InterPro’s different
member databases use different computational methods to produce protein
signatures, and they each have their own particular focus of interest: structural
and/or functional domains, protein families, or protein features such as active
sites or binding sites (see Figure 1).
Figure 1. InterPro member databases grouped by the methods used to construct their
signatures. Their focus of interest is shown in blue text.
Predicting protein structure and function using InterPro
4
www.ebi.ac.uk
The protein signatures in InterPro are regularly run against the UniProt database
of protein sequences, and all significant matches are reported on the InterPro
web site, allowing users to check which proteins match a particular signature.
This information is also used to aid UniProt curators in their annotation of SwissProt proteins and by the automatic systems that add annotation to TrEMBL (the
non-reviewed protein sequences of UniProtKB; see http://www.uniprot.org/)
InterPro entry types
The signatures provided by the member databases are integrated into InterPro
entries. Each InterPro entry is assigned a type (see Figure 2): family
, repeat
, or site
, domain
(which can be a conserved site, active site, binding
site or post-translational modification site).
Figure 2. InterPro entry types and definitions. More information can be found be found in
the
FAQ
section
of
the
InterPro
web
site
(http://www.ebi.ac.uk/interpro/user_manual.html#faqs_04)
InterPro entry relationships
InterPro entries often share relationships with other entries in the database. For
example, where one entry might represent a protein family (such as the sugar
transporter family), another entry might represent a specific subtype of that family
(such as the glucose transporter type I subfamily). These relationships are stored
in InterPro as hierarchies. When classifying proteins, the level at which we can
Predicting protein structure and function using InterPro
5
www.ebi.ac.uk
place a protein in a hierarchy is important, since it determines the level of specific
functional information that we can infer.
Figure 3. Examples of hierarchical relationships between InterPro entries. Both family
entries and domain entries are able to form hierarchical relationships, although family and
domain hierarchies are kept separate in the database and do not overlap (for example, a
subclass of a particular domain cannot be a subtype of a protein family).
The InterPro home page
You can find the InterPro home page at http://www.ebi.ac.uk/interpro/ (see Fig. 4
below). The home page provides:

search tools

documentation, such as user manuals, release notes and recent
publications

links to tools for visualisation, complex data queries, etc
Figure 4. The InterPro web site home page.
Predicting protein structure and function using InterPro
6
www.ebi.ac.uk
Searching InterPro
InterPro can be searched a number of different ways:

Text search

Via the text box in the main page or at the search bar at the top
right of all other InterPro web pages

Search using: UniProtKB accessions or identifiers; InterPro entry
names or identifiers; SCOP, CATH or PDB structural identifiers;
GO identifiers; plain text

InterProScan

Search using an amino acid sequence

Use the sequence search box on the InterPro home page or follow
the link to InterProScan for more search options

BioMart

Allows more flexible and powerful querying; retrieve results in
HTML, plain text or Microsoft Excel spreadsheet

To access the InterPro BioMart, follow the link on the InterPro
home page
I. Searching InterPro using a text search
Learning objectives
In this section, you will learn how to:

query the InterPro web site using the text search option

interpret the information in the InterPro protein view

understand the information in the InterPro entry page
The text search bar can be found in the text box in main page and at the top right
of all InterPro web pages. It can be used to search with plain text, a UniProtKB
protein identifier or accession, an InterPro identifier or accession, a Gene
Ontology (GO) identifier, or a protein structure code.
Predicting protein structure and function using InterPro
7
www.ebi.ac.uk
Search
Figure 5. The InterPro search bar can be found at the top right of InterPro web pages.
To examine the pre-calculated analysis results that InterPro has stored for a
protein in UniProtKB, perform a text search using its UniProtKB accession
number or identifier (e.g.O15075). This will bring you to the protein view, which
provides both the signature hits and structural information for the protein (see
section “How do I interpret an InterPro protein view?”). If you have an accession
number from GenBank, Xref, EMBL or Ensembl, you can convert it to a
UniProtKB
accession
number
using
the
EBI’s
PICR
service
(http://www.ebi.ac.uk/Tools/picr/).
Searching with an InterPro identifier (e.g., IPR020405 or 20405) will provide the
entry page with all the information corresponding to that entry (see section “What
information can be found in the InterPro entry page?”). You can also use a
member database signature accession (such as PR01445), which will return the
InterPro entry that signature is integrated into.
A simple text search query (with a word, GO term, etc) will give you all the
InterPro entries associated with that term.
What information can be found in the InterPro entry page?
The InterPro entry page consists of the following features (see Fig.6):
A. Entry type and name
B. Contributing signatures
C. Hierarchical relationships to other InterPro entries
D. Description of the entry, with links to references
Predicting protein structure and function using InterPro
8
www.ebi.ac.uk
E. GO terms that have been associated with that entry. GO terms are
divided in three categories: biological process, molecular function and
cellular component.
Following the links in the side menu (on the left hand side of the page),
information can be found about proteins matched by that entry, their domain
organisation, pathways & interactions in which they are involved, and their
taxonomic coverage (i.e., the species in which the proteins are found).
Figure 6. The InterPro entry page for the endothelin receptor family (IPR000499)
How do I interpret an InterPro protein view?
The InterPro protein view is broken down into 3 pages: ‘Overview’, ‘Similar
proteins’ and ‘Structures’ (where available). These can be accessed by selecting
the appropriate link in the left hand side menu found on each of the pages.
Protein view: Overview page
This is the default protein view and is an interactive page that contains a large
amount of information (see Fig. 7 below). It is broken down into the following
sections:
Predicting protein structure and function using InterPro
9
www.ebi.ac.uk
Figure 7. The InterPro protein overview page
Top Section (A)

First of all we summarise basic UniProtKB information about the protein:
its UniProt name, short name and accession (with a link to UniProtKB), its
length and taxonomic information.
Protein family membership (B)

The family that InterPro predicts the protein belongs to is presented in a
hierarchy, where appropriate. Clicking on the links takes you to the
InterPro entry pages for each level of the hierarchy matched.
Summary View (C)

The protein is represented as a grey bar, with its length in amino acids
displayed along the bottom.

A graphical representation is provided, summarising the domain and
repeat information predicted by InterPro for the protein.

Domains and repeats are represented by coloured bars. Hovering over
these bars with your mouse gives detailed information on their match
locations and the different InterPro entries that they are summarising.
Sequence features (D)

This section contains detailed information showing all the InterPro
member database signatures that match the protein. This is the raw
information used to create the summary view above. It also contains site
information (active sites, binding sites, etc), where available.
Predicting protein structure and function using InterPro
10
www.ebi.ac.uk

The type and amount of information displayed in this section can be
controlled using the ‘Filter view’ floating menu on the left hand side of the
page.

Family, domain, repeat and/or site matches (in any combination) can be
selected for display.

Matches to member database signatures not yet integrated into InterPro
can also be displayed by clicking the ‘Unintegrated’ checkbox in the
‘Status’ section.

The colour scheme can be changed using the ‘Colour by’ option in the
menu on the left hand side of the page. It is possible to colour by entry
relationship (bars of the same colour are matches to entries in the same
hierarchy) or by member database (bars of the same colour are from the
same member database).

Hovering your mouse over the coloured bars will display the signature
name, accession and the member database that the signature comes
from. The amino acid coordinates that the signature matches are also
shown. Clicking on the linked accession will take you to the member
database page for that signature.
GO term prediction (E)

GO terms predicted for the protein, based on the InterPro entries that it
matches, are found at the bottom of the page. More information about GO
terms can be found at http://geneontology.org/.
Protein view: Similar Proteins page
The Similar proteins page (illustrated in Figure 8) shows all the proteins in
UniProt that have the same predicted domain organisation as the query protein.
A cartoon summarising the domain organisation is displayed, followed by a list of
matching proteins, with their UniProt accession numbers, names and species,
the families to which InterPro predicts they belong, as well as their length and an
indication as to whether structural information is available. Context sensitive links
to UniProt pages and InterPro entry and protein pages are available.
Predicting protein structure and function using InterPro
11
www.ebi.ac.uk
Figure 8. The InterPro Similar sequences page, showing all the proteins in UniProt that
have the same predicted domain organisation as the query protein.
Protein view: Structures page
The Structures page (illustrated in Figure 9) shows all of the experimentally
solved structures and predicted structural information for the query protein. Like
the Overview page, the Structures page displays UniProt information about the
protein, followed by a graphical summary of all of the domain and repeat
information predicted by InterPro, so that this can be correlated with structural
information. It then contains (where available):
Structural Features

Representative matches from PDB, SCOP and CATH are given. PDB
contains information about experimentally-determined structures and
provides structures that can cover part or the whole protein, while CATH
and SCOP break proteins into structural domains.
Structural Predictions

Two matches may be presented here, one to ModBase and the other to
Swiss-Model. These are homology databases that predict protein
structure based on the closest homologue.
Solved structures for this protein

Links are provided to PDB structural information, along with a graphical
representation of the solved structure.
Predicting protein structure and function using InterPro
12
www.ebi.ac.uk
Figure 9. The Structures page showing the sequence features predicted by InterPro and
PDB structural information.
Summary

The InterPro text search can be queried with UniProtKB accessions and
identifiers, InterPro entry names and identifiers, GO terms, structural
identifiers and plain text

To return pre-calculated InterPro analysis results for a protein in
UniProtKB, perform a text search using its UniProtKB accession number
or identifier

InterPro entry pages provide information related to that entry (including
the contributing signatures) and links to the proteins matched by that
entry

The InterPro protein pages provide information about predicted protein
family
membership,
sequence
features,
structural
features
and
predictions, and GO terms for a particular protein.
Predicting protein structure and function using InterPro
13
www.ebi.ac.uk
Exercises
Exercise 1 – Searching InterPro using a UniProt identifier
The retinoblastoma-associated protein (abbreviated pRb, RB or RB1) is a tumour
suppressor protein that is dysfunctional in several major cancers. Let’s find out
more about it using InterPro.
Find the accession number for human RB using UniProtKB
(http://www.uniprot.org/uniprot/). Type “RB” in the Query box. You can find human
RB in the returning results.
Question 1: What is the UniProtKB accession number for RB_HUMAN?
Open the InterPro homepage in a web browser (http://www.ebi.ac.uk/interpro/).
Using the “Text” Search box in the top right-hand corner of the page, type in
the UniProtKB accession for human RB. Click on the purple “Search” button.
You should now have an Overview page describing the signature matches for
human RB.
Question 2: Looking at the InterPro protein view for this protein, how many
InterPro entries (not individual signatures) match the query protein sequence?
(Remember to include the protein family matches)
Domains
Now look at the “Domains and repeats” information in the page.
Question 3: How many domains does InterPro predict the protein to have?
You might notice that the cyclin-like domain (IPR013763) is covering both Abox (IPR002720) and B-box (IPR002719) domains. This is because some of the
member databases classify this region as being a larger, more general domain. If
you click in IPR013763, you will find that this domain can also be found in
transcription factor IIB (TFIIB).
Predicting protein structure and function using InterPro
14
www.ebi.ac.uk
Click on the domain entries linking to IPR024599, IPR002720, IPR002719
and IPR015030 one by one. Examine these entries and fill in the table in
Question 4.
Question 4:
InterPro
Entry name
Contributing
signatures
Protein
matched
(accession)
(numbers)
accession
IPR024599
IPR002720
Retinoblastoma-associated
protein, N-terminal
Retinoblastoma-associated
protein, A-box
IPR002719
Retinoblastoma-associated
protein, B-box
IPR015030
Retinoblastoma-associated
protein, C-terminal
Hint: You can find the “Contributing signatures” box in the right hand side of
each entry page. You can find the “Proteins matched” in the let hand side of each
entry page under the “Overview” box.
Read the “Description” of each entry.
Question 5: Which domain is required in RB for high-affinity binding to
E2F-DP complexes and for maximal repression of E2F-responsive promoters?
Predicting protein structure and function using InterPro
15
www.ebi.ac.uk
Navigate back to the protein page or search InterPro again with the
RB_HUMAN UniProt accession number. In the upper left corner, you can find the
“Overview” menu. Click on ‘Similar proteins’. This will take you to a page listing
all the sequences in UniProt with the same predicted domain organisation as
your query sequence.
Question 6: Approximately how many proteins in UniProt share the same
predicted domain architecture as the query protein?
You can download the FASTA file for all the sequences in UniProt with the
same predicted domain architecture by click the upper right “Export FASTA”
bottom.
In the upper left corner of the same page, click on ‘Structures’. This will take
you to the Structures page. In the “Structural Features and predictions” section,
you will find the PDB structure. Its length indicates the region of the protein for
which the structure is known. You will also see bars representing a CATH
database match and a SCOP database match, both of which are structural
classification databases that break down the PDB structures into their constituent
domains.
Hover your mouse over the domain predicted to be located in the N terminus
of the protein in PDB. Click the link 2qdj. This will bring you to PDB website.
Question 7: From 2qdj entry, what method was used to determine this
protein structure?
From exercise 1, you have learned:
1. How to find UniProtKB identifier for human RB.
2. RB is predicted to contain four domains, including an N-terminal, A-Box,
B-box and C-terminal domain.
3. The C-terminal domain of RB is required for high-affinity binding to E2FDP complexes and for maximal repression of E2F-responsive promoters.
4. The structure of RB has been determined; you can find more structural
information about RB from PDB website.
Predicting protein structure and function using InterPro
16
www.ebi.ac.uk
Exercise 2 – Exploring InterPro family entries
GO terms
Go back to the human RB protein page or search InterPro again with the
RB_HUMAN UniProt accession number. Scroll down the page until you reach the
GO term prediction section.
Question 8: What are the Biological Process GO terms predicted for this
entry?
To find out more about the GO terms, including their exact definitions, click
on the terms themselves, which will take you to the relevant QuickGO page.
Relationships
InterPro links related signatures through parent/child relationships which indicate
domain/family hierarchies. Child entries subdivide IPR028309 into more closely
related subgroups.
From the protein page, RB_human is a member of the IPR028309 and
IPR015652 families/subfamilies.
Question 9: Which entry is more specific and has the least numbers of
proteins as family members?
Click on the IPR028309 entry.
Question 10: How many “child” entries does IPR028309 have?
Proteins matched
Now look at the proteins matched information in the menu on the left hand
side of the page.
Question 11: Approximately how many proteins are matched by
IPR028309?
Predicting protein structure and function using InterPro
17
www.ebi.ac.uk
Domain organisation
Click on the ‘Domain organisation’ link on the left hand side menu on the
entry page for IPR028309. This will take you to a page showing domain
organisations of all proteins in this entry.
Question 12: Approximately how many proteins are predicted to possess a
retinoblastoma-associated protein N-terminal domain, an A-box domain and a Bbox domain (without the C-terminal domain)?
Hint: you may check the colour key or mouse over the domain cartoons to
identify them.
Taxonomy
Click on the ‘Species’ link on the left hand side of the InterPro entry page for
IPR028309. You may explore the taxonomic spread of the sequences matching
this entry by expanding the table in the Taxa section. InterPro divides all the
protein hits in an entry by their taxonomy.
Question 13: What kind of taxonomic groups are retinoblastoma protein
family members found in?
Click the “Number of proteins” (in this case “24”) under human. This will bring
you to a page showing all the sequences matched from human. You can see
some proteins have been reviewed in UniProtKB/SwissProt
.
Question 14: How many characterised human proteins can be found in this
entry and what are they?
Structures
Now click on the "Structures" link on the left hand side menu.
InterPro provides a list of all the PDB entries associated with an entry. There are
also structural links to SCOP and CATH at the bottom of the page, which provide
structural classifications of the proteins that match this entry.
Scroll to the bottom of the page and follow the “SCOP a.74.1.3” link to the
SCOP database to find out the structural classification of this domain.
Predicting protein structure and function using InterPro
18
www.ebi.ac.uk
Question 15: What type of structure do the retinoblastoma protein family
members consist of?
Hint: look at the information under “Fold” in the “Lineage” section.
From this exercise, you have learned:
1. RB is a member of two related families: IPR028309 (Retinoblastoma protein
family)
and
IPR015652
(Retinoblastoma-associated
protein).
IPR015652
identifies more specific group of proteins and is a child of IPR028309.
2. There are three characterised human protein members of IPR028309
(Retinoblastoma
protein
family):
retinoblastoma-associated
protein,
retinoblastoma-like protein 1 and retinoblastoma-like protein 2.
You can also use the children entries of the Retinoblastoma protein family
(IPR028309) and the species classification to find the RB, RBL1 and RBL2
homologues from other species.
II. Querying sequences using InterProScan
Learning objectives
In this section, you will learn:

how to perform sequence-based queries using InterProScan

how to interpret the results of these queries
If you are using a novel protein sequence to query InterPro (i.e., a protein
sequence that is not in UniProt), the simplest way is to copy and paste the amino
acid sequence into the large box on the home page and click on the search
button immediately to the right. This will run your sequence through InterProScan
with the default parameters selected. For more advanced search options use the
InterProScan link, which allows query sequences to be either entered directly or
uploaded from a file in different formats (GCG, FASTA, EMBL, GenBank, PIR,
etc).
InterProScan incorporates all the analysis algorithms and result post-processing
steps from the member databases. InterProScan outputs the resulting matches
Predicting protein structure and function using InterPro
19
www.ebi.ac.uk
for a sequence in a graphical format. The matches can also be viewed as a table,
which lists the signature match positions.
In addition to the online version of InterPro, a stand-alone version can be
downloaded from the ftp server (ftp://ftp.ebi.ac.uk/pub/software/unix/iprscan/) and
installed locally. Unlike the online version of InterProScan, the standalone version
can accept multiple sequences as input. It can also process nucleic acid
sequences as well as amino acid ones.
Summary

InterPro Scan can be used with sequence queries as a predictive
tool for protein sequence classification.
Exercises
Exercise 3 – Analysing sequences using InterProScan
For the next exercise we will use the following sequence from Xenopus tropicalis:
>sequence1
MEAVIEKECSALGGLFQNIISDMKGSYPVWEDFINKAGKLQSQLRTTVVAAAAFLDAFQK
VADMATNTRGATREVGSALTRMCMRHRSIESKLRQFSSALIDCLINPLQEQMEEWKKVAN
QLDKDHAKEYKKARQEIKKKSSDTLKLQKKAKKGRGDIQPQLDSAMQDVNDKYLLLEETE
KQAVRKALIEERSRFCSFISMLRPVIEEEISMLGEITHLQTISEDLKMLTMDPHKLPSAS
EQVIHDLKGSDYNWSYQTPPSSPSTTMSRKSSVCSSLNSVNSSDSRSSGSHSHSPTSHYR
YRSSNLPQQAPMRLSSVSSHDSGFISQDAFQSKSPSPMPPEAVNQNSSSSASSEASETCQ
SVSECSSPTSVSSGSTMGAWASTDKLSNGYCHYSLPSDMAPVSAGPYSHFPLPASRPWTR
SSSALPECHHYYTIGPGMLPSSKVPSWKDWAKPGPYDQPMVNTLQRRKREADPSRGALIT
TPGHIDEPPVPRNTPVPAALRQGEEAEATNEELALALSRGLQLDTQRSSRDSLQCSSGYS
TQTTTPCCSEDTIPSQVSDYDYFSVSGDQEGEQQDFDKSSTIPRNSDISQSYRRMFQAKR
PASTAGLPAAFGPVIVTPGVATIRRTPSTKPSVRRGTIGGGPIPIKTPVIPVKTPTVPDL
PGGLPSGLSEDYGEQSPDSPCATESPTLISDMPSSLWSGQASVNPPVPGTQGFVLEDQRQ
TRAENDEEEGEQVYSPTFDPMVQTDPAVAEPGEAVQGENMLYAIRRGVKLKKTTTNDRSA
PRFS
Get the sequence from http://tinyurl.com/interpro-seq1.
First, we will analyse the sequence using InterProScan. Navigate to the
InterPro homepage at http://www.ebi.ac.uk/interpro/ and paste the sequence into
the text box labelled ‘FASTA Sequence’ located midway down the page. Press
‘Search’.
Question 16: Based on the InterPro matches. What domains is the protein
predicted to possess? What family is it predicted to belong to?
Question 17: What are the Biological Process GO terms predicted for this
protein?
Predicting protein structure and function using InterPro
20
www.ebi.ac.uk
Read the “Description” of the domain entries.
Question 18: Which domain is predicted to act as a conserved F-actin
bundling domain involved in filopodium formation and regulated by Rac1?
Question 19: Which domain is predicted to bind actin monomers and can
facilitate the assembly of actin monomers into newly forming actin filaments?
Click on the domain entries linking to IPR027682. Click on the “Proteins
matched” in the Overview menu. You can see two entries with “
” symbol.
Click on the symbol of the human protein. This will bring you to the UniProtKB
website. You can find more specific information about this protein.
Question 20: How many isoforms does this human protein have?
Hint: You can find the isoform information under the “Alternative
products”.
From this exercise, you have learned:
1. You can find more information about an uncharacterised protein using InterPro
Scan.
2. Using family classification in InterPro, you can find protein homologues of the
unknown protein. This helps to predict the protein function.
3. More information about individual protein can be found in UniProtKB through
the links in InterPro.
Course summary
InterPro is a classification resource for protein families, domains and functional
sites, which integrates the following protein signature databases: PROSITE,
PRINTS, ProDom, Pfam, SMART, TIGRFAMs, PIRSF, SUPERFAMILY, Gene3D
and PANTHER. By uniting the member databases, InterPro capitalises on their
individual strengths, producing a powerful diagnostic tool and integrated
resource.
InterPro can help if you have a sequence or set of sequences and want to know:
Predicting protein structure and function using InterPro
21
www.ebi.ac.uk

what they are, the protein family to which they belong

what their function is and how it may be explained in structural
terms
Further reading
InterPro in 2011: new developments in the family and domain prediction
database. Hunter S, et al. Nucl. Acid Res. (2011) doi: 10.1093/nar/gkr948
InterPro: the integrative protein signature database. Hunter S, et al. Nucl. Acid
Res. (2009) 37:D211-5.
InterPro and InterProScan: tools for protein sequence classification and
comparison.
Mulder NJ, Apweiler R. Methods in Molecular Biology (2007)
396:59-70.
Where to find out more
You can find links to documentation about InterPro (user manual, release notes)
on our main web page, and also download information from our ftp server
(ftp://ftp.ebi.ac.uk/pub/databases/interpro/).
More
information
on
protein
signatures and their use in protein classification can be found on the EBI’s online
training
portal
(see
http://www.ebi.ac.uk/training/online/course/introduction-
protein-classification-ebi).
Predicting protein structure and function using InterPro
22