Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
List of types of proteins wikipedia , lookup
Degradomics wikipedia , lookup
Protein design wikipedia , lookup
Intrinsically disordered proteins wikipedia , lookup
Protein mass spectrometry wikipedia , lookup
Protein domain wikipedia , lookup
Protein structure prediction wikipedia , lookup
1. Use sequence 1 from the multiple alignment file in a BLAST search and comment on the results. The BLAST search (BLASTP conserved domain: 2.2.12; nr database) firstly reveals the presence of a This is shown to be a domain common to the trypsin-like serine protease enzyme family. The results from RPS-BLAST 2.2.11 show scores are large (88.2 to 243), and the E values are all >0.1, suggesting that this domain is likely to be present in our query sequence (i.e. our query sequence could be a serine protease). The remaining results from the BLAST search show that all alignments shown are significant (scores range from 288-550; E-values from 9e-77 - 2e-155). The first alignment is to the sequence P06870, which is the accession number for KLK1 (Kallikrein 1 precursor). This sequence is 100% identical to our query sequence, with no gaps inserted. The first seven results are all for human kallikrein 1, and they match with a high level of identity. This suggests that our query sequence is human Kallikrein 1. You will also notice that the other results include kallikrein sequences from a variety of other species including mouse, rat and chimp. This would indicate that the kallikreins are a family of proteins that are conserved well between species. This may be of use when identifying the functional areas of the protein. 2. Using your results from the exercises in section 1, check the alignment of the Vega and Ensembl sequences for SerpinA3 and identify where they differ. How do they align to the UCSC sequence? CLUSTALW 1.8 was used for multiple alignment of the three sequences. Areas where there were difference between the sequences were identified using Boxshade version 3.21 – these are shown in the file “Boxshade results”. It can be see that the VEGA sequence has an extra 22 amino acids at the N terminus. Since the UCSC entry for SerpinA3 tells us that the protein is extracellular, it may be possible that these 22 amino acids are the signal sequence, which is involved in the secretion of the protein out of the cell. You will also notice that there is an area between 102-114 of the Ensembl sequence that is different to the other 2 sequences. This is likely to be an error. The Ensembl genes are predicted by automated methods, whereas the Vega entries are checked manually. There is more chance that the Vega and UCSC sequences are correct at this point. This is also true for the end of the protein sequence, where the Ensembl sequence has extra amino acids compared to the Vega and UCSC entries.