* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Computational Molecular Biology 2012
Survey
Document related concepts
Protein folding wikipedia , lookup
Protein domain wikipedia , lookup
Protein purification wikipedia , lookup
Nuclear magnetic resonance spectroscopy of proteins wikipedia , lookup
List of types of proteins wikipedia , lookup
Western blot wikipedia , lookup
Protein–protein interaction wikipedia , lookup
Alpha helix wikipedia , lookup
Structural alignment wikipedia , lookup
Intrinsically disordered proteins wikipedia , lookup
Protein mass spectrometry wikipedia , lookup
Transcript
Computational Molecular Biology 2014 Lab Assignment 1: Sequence Alignment Due Tuesday March 4th 2014, 23.59h Write down your answers for this lab-assignment in a .pdf file with the following name “<your student number><your last name>_lab01.pdf”, e.g., “012345jansen_lab01.pdf” and send it as an attachment of an e-mail with subject “CMB2014_lab01” to [email protected]. Lab Assignment Sequence alignment 1) Use the pair wise alignment tool available at http://www.ebi.ac.uk/Tools/psa/emboss_needle/ to align the sequences AwwwAF48002.2 and EFN89594.1 (hint: use ENTREZ at NCBI to retrieve the amino acid sequences of these two proteins). a) What is the sequence identity? b) What is the sequence similarity? c) What does sequence similarity and identity mean? Why are they not the same? 2) Consider the structure that has PDB ID 1GAX (see http://www.rcsb.org/pdb/), and answer the following questions: a) Find the UniprotKB ID and use this ID as input for BLASTp (with default settings, available under the Blast-tab at www.uniprot.org ) to find the most similar sequence in the uniProtKB database. b) What is the uniProtKB ID of the found sequence? c) What is the identity of the found sequence to the original? d) What is the name and strain of the species the found sequence is in? 3) For the Auc/IAA mRNA sequence JN043333.1: a) What is the translated nucleotide sequence (i.e. amino acid sequence). (Hint use NCBI to search for gene centered (protein) information on JN043333.1) b) Use BLASTx at the Blast-section of NCBI with the accession number JN043333 to find proteins which have a similar sequence. c) What are the accession numbers of the three top returned results (in terms of the Evalue)? 4) Use a protein-protein program (blastp at NCBI) to determine the (non-human) species having the most similar homolog of human interferon-gamma (NP_000610). Give the accession number of this homolog and the number of amino acid differences as compared to the human interferon-gamma. 5) The following DNA sequence fragment, containing some mutation, was isolated from a patient: tttgctccccgcgcgctgtttttctcagtgactttcagcgggcggaaaag a) In what gene the mutation is located? On which chromosome? How many nucleotides are changed? (Hint: use Blast with the sequence as query.) b) Using the annotation given for the corresponding sequence database entry, could you indicate possible diseases determined by mutations in this gene? 6) What is the most efficient strategy to determine the difference (number of amino acid substitutions) between homologous proteins from two strains of an influenza virus? a) Determine the number of substitutions in the polymerase PB1 from the strain resulting in the death of a veterinarian during the outbreak of bird flu in 2003 in The Netherlands (Hint use UniProt: A/Netherlands/219/2003(H7N7) scroll down to get the EMBL accession code) as compared to the homologous protein from the strain isolated from the 1918 pandemy victim who had been interred in Alaska permafrost since November 1918? (strain A/Brevig Mission/1/1918). b) How many of these substitutions are conservative ones according to the default substitution matrix (BLOSUM62) used in BLAST programs for proteins? 7) One of the 8 RNA fragments of the influenza A genome codes for a polymerase called PB1 of about 750 amino acids. It has been recently determined that the 5'-proximal part of this RNA fragment contains an overlapping open reading frame (ORF) coding for another protein PB1-F2 of about 90 amino acids. However, for many influenza A virus strains the information about this protein is still missing in GenBank. a) Use the tool ORF Finder (see www.ncbi.nlm.nih.gov/Tools/) to determine the size of the PB1-F2 protein encoded by the PB1 segment from the strain A/Netherlands/219/03 (accession number AY340083). b) Use one of the BLAST versions, provided by the ORF Finder to determine the strain that has the most similar putative PB1-F2 to that from A/Netherlands/219/03. 8) Use the BLAST options and the amino acid sequence of the protein Dicer from Arabidopsis thaliana (accession number Q9SP32) to retrieve the putative (partial) plant Dicer mRNAs from the database of expressed sequence tags (EST). What organisms have the putative Dicer proteins with the highest sequence similarities to that from A.thaliana (give three names and accession numbers of your BLAST hits)? (Please note, the EST database contains "raw" nucleotide sequences, and its entries do not include features like coding sequences). 9) Recently a so-called minor spliceosome, that catalyses the splicing of atypical introns, has been identified in a number of organisms. In order to establish the evolutionary history of the minor spliceosome, BLAST searches for minor spliceosome-specific proteins were used. Explore the usefulness of the PSI-BLAST program for the search of (distant) homologs of one of the human minor spliceosome-specific proteins (accession number NP_078847): a) How many hits are yielded by the PSI-BLAST Iteration 1? How many of them have Evalues better than the threshold? b) How many hits are yielded by the PSI-BLAST Iteration 2? How many new hits with Evalue better than the threshold are found? Give the accession number and organism name for the best of these new hits. c) Does PSI-BLAST Iteration 3 yield new hits with E-value better than threshold? Explain the result.