Download Computational Molecular Biology 2012

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Protein folding wikipedia , lookup

Protein domain wikipedia , lookup

Protein purification wikipedia , lookup

Proteomics wikipedia , lookup

Nuclear magnetic resonance spectroscopy of proteins wikipedia , lookup

List of types of proteins wikipedia , lookup

Cyclol wikipedia , lookup

Western blot wikipedia , lookup

Protein–protein interaction wikipedia , lookup

Protein wikipedia , lookup

Alpha helix wikipedia , lookup

Structural alignment wikipedia , lookup

Intrinsically disordered proteins wikipedia , lookup

Protein mass spectrometry wikipedia , lookup

Protein structure prediction wikipedia , lookup

Homology modeling wikipedia , lookup

Transcript
Computational Molecular Biology 2014
Lab Assignment 1: Sequence Alignment
Due Tuesday March 4th 2014, 23.59h
Write down your answers for this lab-assignment in a .pdf file with the following name “<your
student number><your last name>_lab01.pdf”, e.g., “012345jansen_lab01.pdf” and send it as
an attachment of an e-mail with subject “CMB2014_lab01” to [email protected].
Lab Assignment Sequence alignment
1) Use the pair wise alignment tool available at http://www.ebi.ac.uk/Tools/psa/emboss_needle/
to align the sequences AwwwAF48002.2 and EFN89594.1 (hint: use ENTREZ at NCBI to
retrieve the amino acid sequences of these two proteins).
a) What is the sequence identity?
b) What is the sequence similarity?
c) What does sequence similarity and identity mean? Why are they not the same?
2) Consider the structure that has PDB ID 1GAX (see http://www.rcsb.org/pdb/), and answer the
following questions:
a) Find the UniprotKB ID and use this ID as input for BLASTp (with default settings,
available under the Blast-tab at www.uniprot.org ) to find the most similar sequence in
the uniProtKB database.
b) What is the uniProtKB ID of the found sequence?
c) What is the identity of the found sequence to the original?
d) What is the name and strain of the species the found sequence is in?
3) For the Auc/IAA mRNA sequence JN043333.1:
a) What is the translated nucleotide sequence (i.e. amino acid sequence). (Hint use NCBI to
search for gene centered (protein) information on JN043333.1)
b) Use BLASTx at the Blast-section of NCBI with the accession number JN043333 to find
proteins which have a similar sequence.
c) What are the accession numbers of the three top returned results (in terms of the Evalue)?
4) Use a protein-protein program (blastp at NCBI) to determine the (non-human) species having
the most similar homolog of human interferon-gamma (NP_000610). Give the accession
number of this homolog and the number of amino acid differences as compared to the human
interferon-gamma.
5) The following DNA sequence fragment, containing some mutation, was isolated from a
patient: tttgctccccgcgcgctgtttttctcagtgactttcagcgggcggaaaag
a) In what gene the mutation is located? On which chromosome? How many nucleotides are
changed? (Hint: use Blast with the sequence as query.)
b) Using the annotation given for the corresponding sequence database entry, could you
indicate possible diseases determined by mutations in this gene?
6) What is the most efficient strategy to determine the difference (number of amino acid
substitutions) between homologous proteins from two strains of an influenza virus?
a) Determine the number of substitutions in the polymerase PB1 from the strain resulting in
the death of a veterinarian during the outbreak of bird flu in 2003 in The Netherlands
(Hint use UniProt: A/Netherlands/219/2003(H7N7) scroll down to get the EMBL
accession code) as compared to the homologous protein from the strain isolated from the
1918 pandemy victim who had been interred in Alaska permafrost since November 1918?
(strain A/Brevig Mission/1/1918).
b) How many of these substitutions are conservative ones according to the default
substitution matrix (BLOSUM62) used in BLAST programs for proteins?
7) One of the 8 RNA fragments of the influenza A genome codes for a polymerase called PB1
of about 750 amino acids. It has been recently determined that the 5'-proximal part of this
RNA fragment contains an overlapping open reading frame (ORF) coding for another protein
PB1-F2 of about 90 amino acids. However, for many influenza A virus strains the
information about this protein is still missing in GenBank.
a) Use the tool ORF Finder (see www.ncbi.nlm.nih.gov/Tools/) to determine the size of the
PB1-F2 protein encoded by the PB1 segment from the strain A/Netherlands/219/03
(accession number AY340083).
b) Use one of the BLAST versions, provided by the ORF Finder to determine the strain that
has the most similar putative PB1-F2 to that from A/Netherlands/219/03.
8) Use the BLAST options and the amino acid sequence of the protein Dicer from Arabidopsis
thaliana (accession number Q9SP32) to retrieve the putative (partial) plant Dicer mRNAs
from the database of expressed sequence tags (EST). What organisms have the putative Dicer
proteins with the highest sequence similarities to that from A.thaliana (give three names and
accession numbers of your BLAST hits)? (Please note, the EST database contains "raw"
nucleotide sequences, and its entries do not include features like coding sequences).
9) Recently a so-called minor spliceosome, that catalyses the splicing of atypical introns, has
been identified in a number of organisms. In order to establish the evolutionary history of the
minor spliceosome, BLAST searches for minor spliceosome-specific proteins were used.
Explore the usefulness of the PSI-BLAST program for the search of (distant) homologs of
one of the human minor spliceosome-specific proteins (accession number NP_078847):
a) How many hits are yielded by the PSI-BLAST Iteration 1? How many of them have Evalues better than the threshold?
b) How many hits are yielded by the PSI-BLAST Iteration 2? How many new hits with Evalue better than the threshold are found? Give the accession number and organism name
for the best of these new hits.
c) Does PSI-BLAST Iteration 3 yield new hits with E-value better than threshold? Explain
the result.