Download Bioe 190 HW6 - Ortholog identification - b

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Cyclol wikipedia , lookup

List of types of proteins wikipedia , lookup

Protein structure prediction wikipedia , lookup

Western blot wikipedia , lookup

Proteomics wikipedia , lookup

Intrinsically disordered proteins wikipedia , lookup

Protein–protein interaction wikipedia , lookup

Homology modeling wikipedia , lookup

Protein mass spectrometry wikipedia , lookup

Transcript
Bioe 190
HW 6: Ortholog identification
(Extra credit and/or substitution for Data Science report)
5 points
Estimated time to complete: 2-5 hours.
Summary: Examine the evidence for/against the SwissProt assignment of KCNA1_ONCMY (Q9I829) to the
KCNA1 subfamily. Learn how to use Reciprocal Best BLAST (RBB) search to identify candidate orthologs.
In particular, ask the following questions:
Question 1.
What is the most likely ortholog in the Oncorhynchus mykiss (rainbow trout) genome to
KCNA1_HUMAN (Q09470)?
Question 2.
What is the most likely ortholog in the human genome to KCNA1_ONCMY (Q9I829)?
RBB criterion: Two proteins P1 and P2 (or equivalently, the genes encoding the proteins) in respective
genomes G1 and G2 satisfy the RBB criterion if the top BLAST hit using P1 as a query to score (the proteins
encoded in) G2 is P2, and the top BLAST hit using P2 as a query to score (the proteins encoded in) G1 is P1.
Background and motivation: Based on the gene name assigned by SwissProt, we would assume that
KCNA1_ONCMY is a member of the same functional subfamily as KCNA1_HUMAN. Normally, but not
invariably, “the same functional subfamily” implies orthology (and vice-versa). In fact, two proteins can be
orthologs and not have the same function (especially if they are not super-orthologs, aka 1-1 orthologs). But if
they are 1-1 orthologs, they will most commonly have the same function (unless there are mutations at key sites
or the species are very distant so that the specific function of the protein is different).
Based on the phylogenetic placement of KCNA1_ONCMY from the SwissProt clustering in HW5, I would
assume that KCNA1_ONCMY has been incorrectly assigned to the KCNA1 functional subfamily, and that the
two proteins are not each other’s orthologs.
In this homework/lab, you will evaluate the evidence for/against the orthology between KCNA1_ONCMY and
KCNA1_HUMAN using a reciprocal best BLAST (RBB) approach.
The first challenge you’ll have is that the NR database includes many duplicate entries (proteins) corresponding
to the same gene; it’s not unusual for protein sequences to be 100% identical (exact matches along their entire
lengths) but have different identifiers/accessions. You can tell that there are duplicate entries when you see
matches in BLAST results that have exactly identical scores. (Note: different very large scores can all give Evalues of 0, so look at the scores, not the E-values.) Some near-exact matches (with only 1 or 2 amino acid
differences) also show up; these result in almost identical scores. These are also most commonly either artefacts
of sequencing ambiguities (e.g., base-calling errors) or (possibly) allelic variants or isoforms. (If they
corresponded to different genes, we would expect to see more sequence differences accumulating following a
duplication event, unless the duplication was very recent (in which case they would presumably be ultraparalogs, using the nomenclature of Zmasek and Eddy).)
Note: if you need to confirm that apparently different human proteins in the NR database correspond to the same
gene, you can use the UCSC Genome Browser BLAT server https://genome.ucsc.edu/cgi-bin/hgBlat .
There is also a BLAT server to search the rainbow trout genome at
https://www.genoscope.cns.fr/trout/cgi-bin/gbrowse/truite/
Getting started: find the sequence accessions in the NR database for KCNA1_HUMAN and
KCNA1_ONCMY.
Because the NR database includes entries from SwissProt and both KCNA1_ONCMY and KCNA1_HUMAN
are in SwissProt, you’ll find these proteins in the results returned by BLAST against NR. But if you were trying
to find the corresponding (identical) sequences in NR for a protein that was not in one of the databases merged
1
in NR, you could look for a match that is an exact match. To show you how to do this, I’m making that step 1
and 2 in this lab. (Also: NCBI commonly displays accessions from other databases, not UniProt accessions, and
you have to dig into the results to find all the accessions that correspond to a sequence.
Note: Additional accessions are listed near the top of the GenPept record, just under the accession/ID, as “See
[k] more title(s)” (where k is the number of accessions for the same sequence).
Step 1: Identify the sequence accession(s) in the NR database corresponding to (potentially duplicate
entries for) KCNAI_HUMAN.
How to do this: run BLAST vs NR using the sequence for KCNA1_HUMAN as a query, restricting results to
human. (Alternatively, don’t restrict results to human, but when the results return, click on the Taxonomy
Report and view the results in the human genome. You’ll need to scroll down the page to see the human
matches with their pairwise alignments to the query.) You’ll find a cluster of different proteins at the top with Evalues of 0, but with different scores (see screenshot).
The first 5 have identical scores; one of these has an accession in the UniProt format (Q09470). If you click on
that accession, you’ll bring up the GenPept page, and you’ll be able to confirm that it is, in fact, the record for
KCNA1_HUMAN. The 6th sequence from the top has a slightly weaker score: there is a single amino acid
change from KCNA1_HUMAN. I assume (but cannot confirm) that this represents an allelic variant or the result
of a sequencing error.
Step 2: figure out the corresponding sequence accession(s) in the NR database for KCNA1_ONCMY.
Repeat the analyses in Step 1, using KCNA1_ONCMY as a query. Record the sequence accessions representing
exact matches.
2
Question 1: Is there a protein in the Oncorhynchus mykiss genome that satisfies the RBB criterion for
KCNA1_HUMAN?
Step 1: Search the Oncorhynchus mykiss genome with KCNA1_HUMAN as a query.
Record the sequence accession(s) of the top-ranking cluster of hits (clustering hits that have the same alignment
score or are each other’s near-exact matches). Note both the accession provided by default (which is unlikely to
be the UniProt accession) and any UniProt accession(s) for the top-scoring cluster.
Q: Is KCNA1_ONCMY (Q9I829) in the top-ranking cluster? If not, how far down is it in the ranked list?
Note: You can restrict the results to a target genome in the Choose Search Set section of the input form by
typing in the organism name (Oncorhynchus mykiss) or the taxonomic ID/taxid (8022). See screen shot below.
Step 2: Using the top Oncorhynchus mykiss hit as a query, search the human genome.
Record the sequence accession(s) of the top-ranking cluster of hits (clustering hits that have the same alignment
score or are each other’s near-exact matches). Note both the accession provided by default (which is unlikely to
be the UniProt accession) and any UniProt accession(s) for the top-scoring cluster.
From the results of Steps 1 and 2, answer the question: Is there a protein in the Oncorhynchus mykiss genome
that satisfies the RBB criterion for KCNA1_HUMAN?
Question 2: Is there a protein in the human genome that is the RBB match to KCNA1_ONCMY?
Repeat the same analyses as in Question 1, but starting with KCNA1_ONCMY as a query.
From the results of Steps 1 and 2, answer the question: Is there a protein in the human genome that satisfies the
RBB criterion for KCNA1_ONCMY?
Question 3: Try to use one of the orthology prediction webservers (or orthology database) to identify
orthologs in the human genome to KCNA1_ONCMY.
You can start by looking at webservers and databases listed here:
http://questfororthologs.org/orthology_databases.
Combine the findings from these various analyses to answer the question: Is there an unambiguous 1-1 ortholog
in the human genome?
3