* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Protein Function and Classification (Cont.) - EMBL-EBI
Immunoprecipitation wikipedia , lookup
List of types of proteins wikipedia , lookup
Circular dichroism wikipedia , lookup
Rosetta@home wikipedia , lookup
Protein design wikipedia , lookup
Structural alignment wikipedia , lookup
Protein folding wikipedia , lookup
Bimolecular fluorescence complementation wikipedia , lookup
Trimeric autotransporter adhesin wikipedia , lookup
Intrinsically disordered proteins wikipedia , lookup
Protein moonlighting wikipedia , lookup
Homology modeling wikipedia , lookup
Western blot wikipedia , lookup
Protein mass spectrometry wikipedia , lookup
Protein purification wikipedia , lookup
Protein domain wikipedia , lookup
Protein structure prediction wikipedia , lookup
Nuclear magnetic resonance spectroscopy of proteins wikipedia , lookup
Predicting protein structure function using InterPro and Alex Mitchell & Hsin-Yu Chang (V9, Oct. 2013) 1 www.ebi.ac.uk Contents Course Information ........................................................................................... 3 Course learning objectives ............................................................................... 3 An introduction to InterPro ............................................................................... 4 What are protein signatures? ........................................................................ 4 InterPro entry types ...................................................................................... 5 InterPro entry relationships ........................................................................... 5 The InterPro home page ............................................................................... 6 Searching InterPro ........................................................................................ 7 I. Searching InterPro using a text search ........................................................ 7 Learning objectives ............................................................................................. 7 What information can be found in the InterPro entry page? ................................. 8 How do I interpret an InterPro protein view? ........................................................ 9 Protein view: Overview page......................................................................... 9 Protein view: Similar Proteins page............................................................. 11 Protein view: Structures page ..................................................................... 12 Summary........................................................................................................... 13 Exercises ......................................................................................................... 14 Exercise 1 – Searching InterPro using a UniProt identifier................................. 14 Exercise 2 – Exploring InterPro entries................. Error! Bookmark not defined. General annotation ........................................ Error! Bookmark not defined. Relationships ................................................. Error! Bookmark not defined. GO (Gene Ontology) terms ............................ Error! Bookmark not defined. Proteins matched ........................................................................................ 17 Domain organisation ................................................................................... 18 Taxonomy ................................................................................................... 18 Structures ................................................................................................... 18 II. Querying sequences using InterProScan .................................................. 19 Learning objectives ........................................................................................... 19 Summary........................................................................................................... 20 Exercises ......................................................................................................... 20 Exercise 3 – Analysing sequences using InterProScan ..................................... 20 Course summary ............................................................................................. 21 Further reading ................................................................................................ 22 Where to find out more ................................................................................... 22 Predicting protein structure and function using InterPro 2 www.ebi.ac.uk Course Information Course description This tutorial provides an introduction to InterPro, its web interface and content. You will learn how to search InterPro to obtain predicted information about protein function and sequence and structural features. Course level Suitable for graduate-level scientists and above who have not used InterPro before. Regular users may benefit from going through the tutorial to familiarise themselves with the new InterPro website Pre-requisites Basic knowledge of biology and basic computer skills Subject area Proteins and Proteomes, Genes Genomes, Sequence Analysis Target audience Scientists interested in Sequence Analysis Resources required Internet access (a current browser such as the latest Firefox or Internet Explorer) Approximate time needed 2 hours and Course learning objectives After completing this course, you should: • Understand what InterPro is and how it can be useful in your research • Know the different ways you can query InterPro • Understand the information on the InterPro entry pages • Be able to interpret the information on the InterPro protein analysis pages Predicting protein structure and function using InterPro 3 www.ebi.ac.uk An introduction to InterPro InterPro provides functional analysis of proteins by classifying them into families and predicting the presence of important domains and sites. It does this by combining predictive models known as protein signatures from a number of different databases (referred to as member databases) into a single searchable resource. InterPro integrates the signatures, providing names and descriptive abstracts and, whenever possible, adding Gene Ontology (GO) terms, structural links and external database links. Together, the protein signatures combined within InterPro provide matches to ~ 80% of proteins in the UniProt database. What are protein signatures? Protein signatures are obtained by modelling the conservation of amino acids at specific positions within a group of related proteins (i.e., a protein family), or within the domains/sites shared by a group of proteins. InterPro’s different member databases use different computational methods to produce protein signatures, and they each have their own particular focus of interest: structural and/or functional domains, protein families, or protein features such as active sites or binding sites (see Figure 1). Figure 1. InterPro member databases grouped by the methods used to construct their signatures. Their focus of interest is shown in blue text. Predicting protein structure and function using InterPro 4 www.ebi.ac.uk The protein signatures in InterPro are regularly run against the UniProt database of protein sequences, and all significant matches are reported on the InterPro web site, allowing users to check which proteins match a particular signature. This information is also used to aid UniProt curators in their annotation of SwissProt proteins and by the automatic systems that add annotation to TrEMBL (the non-reviewed protein sequences of UniProtKB; see http://www.uniprot.org/) InterPro entry types The signatures provided by the member databases are integrated into InterPro entries. Each InterPro entry is assigned a type (see Figure 2): family , repeat , or site , domain (which can be a conserved site, active site, binding site or post-translational modification site). Figure 2. InterPro entry types and definitions. More information can be found be found in the FAQ section of the InterPro web site (http://www.ebi.ac.uk/interpro/user_manual.html#faqs_04) InterPro entry relationships InterPro entries often share relationships with other entries in the database. For example, where one entry might represent a protein family (such as the sugar transporter family), another entry might represent a specific subtype of that family (such as the glucose transporter type I subfamily). These relationships are stored in InterPro as hierarchies. When classifying proteins, the level at which we can Predicting protein structure and function using InterPro 5 www.ebi.ac.uk place a protein in a hierarchy is important, since it determines the level of specific functional information that we can infer. Figure 3. Examples of hierarchical relationships between InterPro entries. Both family entries and domain entries are able to form hierarchical relationships, although family and domain hierarchies are kept separate in the database and do not overlap (for example, a subclass of a particular domain cannot be a subtype of a protein family). The InterPro home page You can find the InterPro home page at http://www.ebi.ac.uk/interpro/ (see Fig. 4 below). The home page provides: search tools documentation, such as user manuals, release notes and recent publications links to tools for visualisation, complex data queries, etc Figure 4. The InterPro web site home page. Predicting protein structure and function using InterPro 6 www.ebi.ac.uk Searching InterPro InterPro can be searched a number of different ways: Text search Via the text box in the main page or at the search bar at the top right of all other InterPro web pages Search using: UniProtKB accessions or identifiers; InterPro entry names or identifiers; SCOP, CATH or PDB structural identifiers; GO identifiers; plain text InterProScan Search using an amino acid sequence Use the sequence search box on the InterPro home page or follow the link to InterProScan for more search options BioMart Allows more flexible and powerful querying; retrieve results in HTML, plain text or Microsoft Excel spreadsheet To access the InterPro BioMart, follow the link on the InterPro home page I. Searching InterPro using a text search Learning objectives In this section, you will learn how to: query the InterPro web site using the text search option interpret the information in the InterPro protein view understand the information in the InterPro entry page The text search bar can be found in the text box in main page and at the top right of all InterPro web pages. It can be used to search with plain text, a UniProtKB protein identifier or accession, an InterPro identifier or accession, a Gene Ontology (GO) identifier, or a protein structure code. Predicting protein structure and function using InterPro 7 www.ebi.ac.uk Search Figure 5. The InterPro search bar can be found at the top right of InterPro web pages. To examine the pre-calculated analysis results that InterPro has stored for a protein in UniProtKB, perform a text search using its UniProtKB accession number or identifier (e.g.O15075). This will bring you to the protein view, which provides both the signature hits and structural information for the protein (see section “How do I interpret an InterPro protein view?”). If you have an accession number from GenBank, Xref, EMBL or Ensembl, you can convert it to a UniProtKB accession number using the EBI’s PICR service (http://www.ebi.ac.uk/Tools/picr/). Searching with an InterPro identifier (e.g., IPR020405 or 20405) will provide the entry page with all the information corresponding to that entry (see section “What information can be found in the InterPro entry page?”). You can also use a member database signature accession (such as PR01445), which will return the InterPro entry that signature is integrated into. A simple text search query (with a word, GO term, etc) will give you all the InterPro entries associated with that term. What information can be found in the InterPro entry page? The InterPro entry page consists of the following features (see Fig.6): A. Entry type and name B. Contributing signatures C. Hierarchical relationships to other InterPro entries D. Description of the entry, with links to references Predicting protein structure and function using InterPro 8 www.ebi.ac.uk E. GO terms that have been associated with that entry. GO terms are divided in three categories: biological process, molecular function and cellular component. Following the links in the side menu (on the left hand side of the page), information can be found about proteins matched by that entry, their domain organisation, pathways & interactions in which they are involved, and their taxonomic coverage (i.e., the species in which the proteins are found). Figure 6. The InterPro entry page for the endothelin receptor family (IPR000499) How do I interpret an InterPro protein view? The InterPro protein view is broken down into 3 pages: ‘Overview’, ‘Similar proteins’ and ‘Structures’ (where available). These can be accessed by selecting the appropriate link in the left hand side menu found on each of the pages. Protein view: Overview page This is the default protein view and is an interactive page that contains a large amount of information (see Fig. 7 below). It is broken down into the following sections: Predicting protein structure and function using InterPro 9 www.ebi.ac.uk Figure 7. The InterPro protein overview page Top Section (A) First of all we summarise basic UniProtKB information about the protein: its UniProt name, short name and accession (with a link to UniProtKB), its length and taxonomic information. Protein family membership (B) The family that InterPro predicts the protein belongs to is presented in a hierarchy, where appropriate. Clicking on the links takes you to the InterPro entry pages for each level of the hierarchy matched. Summary View (C) The protein is represented as a grey bar, with its length in amino acids displayed along the bottom. A graphical representation is provided, summarising the domain and repeat information predicted by InterPro for the protein. Domains and repeats are represented by coloured bars. Hovering over these bars with your mouse gives detailed information on their match locations and the different InterPro entries that they are summarising. Sequence features (D) This section contains detailed information showing all the InterPro member database signatures that match the protein. This is the raw information used to create the summary view above. It also contains site information (active sites, binding sites, etc), where available. Predicting protein structure and function using InterPro 10 www.ebi.ac.uk The type and amount of information displayed in this section can be controlled using the ‘Filter view’ floating menu on the left hand side of the page. Family, domain, repeat and/or site matches (in any combination) can be selected for display. Matches to member database signatures not yet integrated into InterPro can also be displayed by clicking the ‘Unintegrated’ checkbox in the ‘Status’ section. The colour scheme can be changed using the ‘Colour by’ option in the menu on the left hand side of the page. It is possible to colour by entry relationship (bars of the same colour are matches to entries in the same hierarchy) or by member database (bars of the same colour are from the same member database). Hovering your mouse over the coloured bars will display the signature name, accession and the member database that the signature comes from. The amino acid coordinates that the signature matches are also shown. Clicking on the linked accession will take you to the member database page for that signature. GO term prediction (E) GO terms predicted for the protein, based on the InterPro entries that it matches, are found at the bottom of the page. More information about GO terms can be found at http://geneontology.org/. Protein view: Similar Proteins page The Similar proteins page (illustrated in Figure 8) shows all the proteins in UniProt that have the same predicted domain organisation as the query protein. A cartoon summarising the domain organisation is displayed, followed by a list of matching proteins, with their UniProt accession numbers, names and species, the families to which InterPro predicts they belong, as well as their length and an indication as to whether structural information is available. Context sensitive links to UniProt pages and InterPro entry and protein pages are available. Predicting protein structure and function using InterPro 11 www.ebi.ac.uk Figure 8. The InterPro Similar sequences page, showing all the proteins in UniProt that have the same predicted domain organisation as the query protein. Protein view: Structures page The Structures page (illustrated in Figure 9) shows all of the experimentally solved structures and predicted structural information for the query protein. Like the Overview page, the Structures page displays UniProt information about the protein, followed by a graphical summary of all of the domain and repeat information predicted by InterPro, so that this can be correlated with structural information. It then contains (where available): Structural Features Representative matches from PDB, SCOP and CATH are given. PDB contains information about experimentally-determined structures and provides structures that can cover part or the whole protein, while CATH and SCOP break proteins into structural domains. Structural Predictions Two matches may be presented here, one to ModBase and the other to Swiss-Model. These are homology databases that predict protein structure based on the closest homologue. Solved structures for this protein Links are provided to PDB structural information, along with a graphical representation of the solved structure. Predicting protein structure and function using InterPro 12 www.ebi.ac.uk Figure 9. The Structures page showing the sequence features predicted by InterPro and PDB structural information. Summary The InterPro text search can be queried with UniProtKB accessions and identifiers, InterPro entry names and identifiers, GO terms, structural identifiers and plain text To return pre-calculated InterPro analysis results for a protein in UniProtKB, perform a text search using its UniProtKB accession number or identifier InterPro entry pages provide information related to that entry (including the contributing signatures) and links to the proteins matched by that entry The InterPro protein pages provide information about predicted protein family membership, sequence features, structural features and predictions, and GO terms for a particular protein. Predicting protein structure and function using InterPro 13 www.ebi.ac.uk Exercises Exercise 1 – Searching InterPro using a UniProt identifier The retinoblastoma-associated protein (abbreviated pRb, RB or RB1) is a tumour suppressor protein that is dysfunctional in several major cancers. Let’s find out more about it using InterPro. Find the accession number for human RB using UniProtKB (http://www.uniprot.org/uniprot/). Type “RB” in the Query box. You can find human RB in the returning results. Question 1: What is the UniProtKB accession number for RB_HUMAN? Open the InterPro homepage in a web browser (http://www.ebi.ac.uk/interpro/). Using the “Text” Search box in the top right-hand corner of the page, type in the UniProtKB accession for human RB. Click on the purple “Search” button. You should now have an Overview page describing the signature matches for human RB. Question 2: Looking at the InterPro protein view for this protein, how many InterPro entries (not individual signatures) match the query protein sequence? (Remember to include the protein family matches) Domains Now look at the “Domains and repeats” information in the page. Question 3: How many domains does InterPro predict the protein to have? You might notice that the cyclin-like domain (IPR013763) is covering both Abox (IPR002720) and B-box (IPR002719) domains. This is because some of the member databases classify this region as being a larger, more general domain. If you click in IPR013763, you will find that this domain can also be found in transcription factor IIB (TFIIB). Predicting protein structure and function using InterPro 14 www.ebi.ac.uk Click on the domain entries linking to IPR024599, IPR002720, IPR002719 and IPR015030 one by one. Examine these entries and fill in the table in Question 4. Question 4: InterPro Entry name Contributing signatures Protein matched (accession) (numbers) accession IPR024599 IPR002720 Retinoblastoma-associated protein, N-terminal Retinoblastoma-associated protein, A-box IPR002719 Retinoblastoma-associated protein, B-box IPR015030 Retinoblastoma-associated protein, C-terminal Hint: You can find the “Contributing signatures” box in the right hand side of each entry page. You can find the “Proteins matched” in the let hand side of each entry page under the “Overview” box. Read the “Description” of each entry. Question 5: Which domain is required in RB for high-affinity binding to E2F-DP complexes and for maximal repression of E2F-responsive promoters? Predicting protein structure and function using InterPro 15 www.ebi.ac.uk Navigate back to the protein page or search InterPro again with the RB_HUMAN UniProt accession number. In the upper left corner, you can find the “Overview” menu. Click on ‘Similar proteins’. This will take you to a page listing all the sequences in UniProt with the same predicted domain organisation as your query sequence. Question 6: Approximately how many proteins in UniProt share the same predicted domain architecture as the query protein? You can download the FASTA file for all the sequences in UniProt with the same predicted domain architecture by click the upper right “Export FASTA” bottom. In the upper left corner of the same page, click on ‘Structures’. This will take you to the Structures page. In the “Structural Features and predictions” section, you will find the PDB structure. Its length indicates the region of the protein for which the structure is known. You will also see bars representing a CATH database match and a SCOP database match, both of which are structural classification databases that break down the PDB structures into their constituent domains. Hover your mouse over the domain predicted to be located in the N terminus of the protein in PDB. Click the link 2qdj. This will bring you to PDB website. Question 7: From 2qdj entry, what method was used to determine this protein structure? From exercise 1, you have learned: 1. How to find UniProtKB identifier for human RB. 2. RB is predicted to contain four domains, including an N-terminal, A-Box, B-box and C-terminal domain. 3. The C-terminal domain of RB is required for high-affinity binding to E2FDP complexes and for maximal repression of E2F-responsive promoters. 4. The structure of RB has been determined; you can find more structural information about RB from PDB website. Predicting protein structure and function using InterPro 16 www.ebi.ac.uk Exercise 2 – Exploring InterPro family entries GO terms Go back to the human RB protein page or search InterPro again with the RB_HUMAN UniProt accession number. Scroll down the page until you reach the GO term prediction section. Question 8: What are the Biological Process GO terms predicted for this entry? To find out more about the GO terms, including their exact definitions, click on the terms themselves, which will take you to the relevant QuickGO page. Relationships InterPro links related signatures through parent/child relationships which indicate domain/family hierarchies. Child entries subdivide IPR028309 into more closely related subgroups. From the protein page, RB_human is a member of the IPR028309 and IPR015652 families/subfamilies. Question 9: Which entry is more specific and has the least numbers of proteins as family members? Click on the IPR028309 entry. Question 10: How many “child” entries does IPR028309 have? Proteins matched Now look at the proteins matched information in the menu on the left hand side of the page. Question 11: Approximately how many proteins are matched by IPR028309? Predicting protein structure and function using InterPro 17 www.ebi.ac.uk Domain organisation Click on the ‘Domain organisation’ link on the left hand side menu on the entry page for IPR028309. This will take you to a page showing domain organisations of all proteins in this entry. Question 12: Approximately how many proteins are predicted to possess a retinoblastoma-associated protein N-terminal domain, an A-box domain and a Bbox domain (without the C-terminal domain)? Hint: you may check the colour key or mouse over the domain cartoons to identify them. Taxonomy Click on the ‘Species’ link on the left hand side of the InterPro entry page for IPR028309. You may explore the taxonomic spread of the sequences matching this entry by expanding the table in the Taxa section. InterPro divides all the protein hits in an entry by their taxonomy. Question 13: What kind of taxonomic groups are retinoblastoma protein family members found in? Click the “Number of proteins” (in this case “24”) under human. This will bring you to a page showing all the sequences matched from human. You can see some proteins have been reviewed in UniProtKB/SwissProt . Question 14: How many characterised human proteins can be found in this entry and what are they? Structures Now click on the "Structures" link on the left hand side menu. InterPro provides a list of all the PDB entries associated with an entry. There are also structural links to SCOP and CATH at the bottom of the page, which provide structural classifications of the proteins that match this entry. Scroll to the bottom of the page and follow the “SCOP a.74.1.3” link to the SCOP database to find out the structural classification of this domain. Predicting protein structure and function using InterPro 18 www.ebi.ac.uk Question 15: What type of structure do the retinoblastoma protein family members consist of? Hint: look at the information under “Fold” in the “Lineage” section. From this exercise, you have learned: 1. RB is a member of two related families: IPR028309 (Retinoblastoma protein family) and IPR015652 (Retinoblastoma-associated protein). IPR015652 identifies more specific group of proteins and is a child of IPR028309. 2. There are three characterised human protein members of IPR028309 (Retinoblastoma protein family): retinoblastoma-associated protein, retinoblastoma-like protein 1 and retinoblastoma-like protein 2. You can also use the children entries of the Retinoblastoma protein family (IPR028309) and the species classification to find the RB, RBL1 and RBL2 homologues from other species. II. Querying sequences using InterProScan Learning objectives In this section, you will learn: how to perform sequence-based queries using InterProScan how to interpret the results of these queries If you are using a novel protein sequence to query InterPro (i.e., a protein sequence that is not in UniProt), the simplest way is to copy and paste the amino acid sequence into the large box on the home page and click on the search button immediately to the right. This will run your sequence through InterProScan with the default parameters selected. For more advanced search options use the InterProScan link, which allows query sequences to be either entered directly or uploaded from a file in different formats (GCG, FASTA, EMBL, GenBank, PIR, etc). InterProScan incorporates all the analysis algorithms and result post-processing steps from the member databases. InterProScan outputs the resulting matches Predicting protein structure and function using InterPro 19 www.ebi.ac.uk for a sequence in a graphical format. The matches can also be viewed as a table, which lists the signature match positions. In addition to the online version of InterPro, a stand-alone version can be downloaded from the ftp server (ftp://ftp.ebi.ac.uk/pub/software/unix/iprscan/) and installed locally. Unlike the online version of InterProScan, the standalone version can accept multiple sequences as input. It can also process nucleic acid sequences as well as amino acid ones. Summary InterPro Scan can be used with sequence queries as a predictive tool for protein sequence classification. Exercises Exercise 3 – Analysing sequences using InterProScan For the next exercise we will use the following sequence from Xenopus tropicalis: >sequence1 MEAVIEKECSALGGLFQNIISDMKGSYPVWEDFINKAGKLQSQLRTTVVAAAAFLDAFQK VADMATNTRGATREVGSALTRMCMRHRSIESKLRQFSSALIDCLINPLQEQMEEWKKVAN QLDKDHAKEYKKARQEIKKKSSDTLKLQKKAKKGRGDIQPQLDSAMQDVNDKYLLLEETE KQAVRKALIEERSRFCSFISMLRPVIEEEISMLGEITHLQTISEDLKMLTMDPHKLPSAS EQVIHDLKGSDYNWSYQTPPSSPSTTMSRKSSVCSSLNSVNSSDSRSSGSHSHSPTSHYR YRSSNLPQQAPMRLSSVSSHDSGFISQDAFQSKSPSPMPPEAVNQNSSSSASSEASETCQ SVSECSSPTSVSSGSTMGAWASTDKLSNGYCHYSLPSDMAPVSAGPYSHFPLPASRPWTR SSSALPECHHYYTIGPGMLPSSKVPSWKDWAKPGPYDQPMVNTLQRRKREADPSRGALIT TPGHIDEPPVPRNTPVPAALRQGEEAEATNEELALALSRGLQLDTQRSSRDSLQCSSGYS TQTTTPCCSEDTIPSQVSDYDYFSVSGDQEGEQQDFDKSSTIPRNSDISQSYRRMFQAKR PASTAGLPAAFGPVIVTPGVATIRRTPSTKPSVRRGTIGGGPIPIKTPVIPVKTPTVPDL PGGLPSGLSEDYGEQSPDSPCATESPTLISDMPSSLWSGQASVNPPVPGTQGFVLEDQRQ TRAENDEEEGEQVYSPTFDPMVQTDPAVAEPGEAVQGENMLYAIRRGVKLKKTTTNDRSA PRFS Get the sequence from http://tinyurl.com/interpro-seq1. First, we will analyse the sequence using InterProScan. Navigate to the InterPro homepage at http://www.ebi.ac.uk/interpro/ and paste the sequence into the text box labelled ‘FASTA Sequence’ located midway down the page. Press ‘Search’. Question 16: Based on the InterPro matches. What domains is the protein predicted to possess? What family is it predicted to belong to? Question 17: What are the Biological Process GO terms predicted for this protein? Predicting protein structure and function using InterPro 20 www.ebi.ac.uk Read the “Description” of the domain entries. Question 18: Which domain is predicted to act as a conserved F-actin bundling domain involved in filopodium formation and regulated by Rac1? Question 19: Which domain is predicted to bind actin monomers and can facilitate the assembly of actin monomers into newly forming actin filaments? Click on the domain entries linking to IPR027682. Click on the “Proteins matched” in the Overview menu. You can see two entries with “ ” symbol. Click on the symbol of the human protein. This will bring you to the UniProtKB website. You can find more specific information about this protein. Question 20: How many isoforms does this human protein have? Hint: You can find the isoform information under the “Alternative products”. From this exercise, you have learned: 1. You can find more information about an uncharacterised protein using InterPro Scan. 2. Using family classification in InterPro, you can find protein homologues of the unknown protein. This helps to predict the protein function. 3. More information about individual protein can be found in UniProtKB through the links in InterPro. Course summary InterPro is a classification resource for protein families, domains and functional sites, which integrates the following protein signature databases: PROSITE, PRINTS, ProDom, Pfam, SMART, TIGRFAMs, PIRSF, SUPERFAMILY, Gene3D and PANTHER. By uniting the member databases, InterPro capitalises on their individual strengths, producing a powerful diagnostic tool and integrated resource. InterPro can help if you have a sequence or set of sequences and want to know: Predicting protein structure and function using InterPro 21 www.ebi.ac.uk what they are, the protein family to which they belong what their function is and how it may be explained in structural terms Further reading InterPro in 2011: new developments in the family and domain prediction database. Hunter S, et al. Nucl. Acid Res. (2011) doi: 10.1093/nar/gkr948 InterPro: the integrative protein signature database. Hunter S, et al. Nucl. Acid Res. (2009) 37:D211-5. InterPro and InterProScan: tools for protein sequence classification and comparison. Mulder NJ, Apweiler R. Methods in Molecular Biology (2007) 396:59-70. Where to find out more You can find links to documentation about InterPro (user manual, release notes) on our main web page, and also download information from our ftp server (ftp://ftp.ebi.ac.uk/pub/databases/interpro/). More information on protein signatures and their use in protein classification can be found on the EBI’s online training portal (see http://www.ebi.ac.uk/training/online/course/introduction- protein-classification-ebi). Predicting protein structure and function using InterPro 22