Download Bioe 190 HW6 - Ortholog identification - b

Bioe 190 HW 6: Ortholog identification (Extra credit and/or substitution for Data Science report) 5 points Estimated time to complete: 2-5 hours. Summary: Examine the evidence for/against the SwissProt assignment of KCNA1_ONCMY (Q9I829) to the KCNA1 subfamily. Learn how to use Reciprocal Best BLAST (RBB) search to identify candidate orthologs. In particular, ask the following questions: Question 1. What is the most likely ortholog in the Oncorhynchus mykiss (rainbow trout) genome to KCNA1_HUMAN (Q09470)? Question 2. What is the most likely ortholog in the human genome to KCNA1_ONCMY (Q9I829)? RBB criterion: Two proteins P1 and P2 (or equivalently, the genes encoding the proteins) in respective genomes G1 and G2 satisfy the RBB criterion if the top BLAST hit using P1 as a query to score (the proteins encoded in) G2 is P2, and the top BLAST hit using P2 as a query to score (the proteins encoded in) G1 is P1. Background and motivation: Based on the gene name assigned by SwissProt, we would assume that KCNA1_ONCMY is a member of the same functional subfamily as KCNA1_HUMAN. Normally, but not invariably, “the same functional subfamily” implies orthology (and vice-versa). In fact, two proteins can be orthologs and not have the same function (especially if they are not super-orthologs, aka 1-1 orthologs). But if they are 1-1 orthologs, they will most commonly have the same function (unless there are mutations at key sites or the species are very distant so that the specific function of the protein is different). Based on the phylogenetic placement of KCNA1_ONCMY from the SwissProt clustering in HW5, I would assume that KCNA1_ONCMY has been incorrectly assigned to the KCNA1 functional subfamily, and that the two proteins are not each other’s orthologs. In this homework/lab, you will evaluate the evidence for/against the orthology between KCNA1_ONCMY and KCNA1_HUMAN using a reciprocal best BLAST (RBB) approach. The first challenge you’ll have is that the NR database includes many duplicate entries (proteins) corresponding to the same gene; it’s not unusual for protein sequences to be 100% identical (exact matches along their entire lengths) but have different identifiers/accessions. You can tell that there are duplicate entries when you see matches in BLAST results that have exactly identical scores. (Note: different very large scores can all give Evalues of 0, so look at the scores, not the E-values.) Some near-exact matches (with only 1 or 2 amino acid differences) also show up; these result in almost identical scores. These are also most commonly either artefacts of sequencing ambiguities (e.g., base-calling errors) or (possibly) allelic variants or isoforms. (If they corresponded to different genes, we would expect to see more sequence differences accumulating following a duplication event, unless the duplication was very recent (in which case they would presumably be ultraparalogs, using the nomenclature of Zmasek and Eddy).) Note: if you need to confirm that apparently different human proteins in the NR database correspond to the same gene, you can use the UCSC Genome Browser BLAT server https://genome.ucsc.edu/cgi-bin/hgBlat . There is also a BLAT server to search the rainbow trout genome at https://www.genoscope.cns.fr/trout/cgi-bin/gbrowse/truite/ Getting started: find the sequence accessions in the NR database for KCNA1_HUMAN and KCNA1_ONCMY. Because the NR database includes entries from SwissProt and both KCNA1_ONCMY and KCNA1_HUMAN are in SwissProt, you’ll find these proteins in the results returned by BLAST against NR. But if you were trying to find the corresponding (identical) sequences in NR for a protein that was not in one of the databases merged 1 in NR, you could look for a match that is an exact match. To show you how to do this, I’m making that step 1 and 2 in this lab. (Also: NCBI commonly displays accessions from other databases, not UniProt accessions, and you have to dig into the results to find all the accessions that correspond to a sequence. Note: Additional accessions are listed near the top of the GenPept record, just under the accession/ID, as “See [k] more title(s)” (where k is the number of accessions for the same sequence). Step 1: Identify the sequence accession(s) in the NR database corresponding to (potentially duplicate entries for) KCNAI_HUMAN. How to do this: run BLAST vs NR using the sequence for KCNA1_HUMAN as a query, restricting results to human. (Alternatively, don’t restrict results to human, but when the results return, click on the Taxonomy Report and view the results in the human genome. You’ll need to scroll down the page to see the human matches with their pairwise alignments to the query.) You’ll find a cluster of different proteins at the top with Evalues of 0, but with different scores (see screenshot). The first 5 have identical scores; one of these has an accession in the UniProt format (Q09470). If you click on that accession, you’ll bring up the GenPept page, and you’ll be able to confirm that it is, in fact, the record for KCNA1_HUMAN. The 6th sequence from the top has a slightly weaker score: there is a single amino acid change from KCNA1_HUMAN. I assume (but cannot confirm) that this represents an allelic variant or the result of a sequencing error. Step 2: figure out the corresponding sequence accession(s) in the NR database for KCNA1_ONCMY. Repeat the analyses in Step 1, using KCNA1_ONCMY as a query. Record the sequence accessions representing exact matches. 2 Question 1: Is there a protein in the Oncorhynchus mykiss genome that satisfies the RBB criterion for KCNA1_HUMAN? Step 1: Search the Oncorhynchus mykiss genome with KCNA1_HUMAN as a query. Record the sequence accession(s) of the top-ranking cluster of hits (clustering hits that have the same alignment score or are each other’s near-exact matches). Note both the accession provided by default (which is unlikely to be the UniProt accession) and any UniProt accession(s) for the top-scoring cluster. Q: Is KCNA1_ONCMY (Q9I829) in the top-ranking cluster? If not, how far down is it in the ranked list? Note: You can restrict the results to a target genome in the Choose Search Set section of the input form by typing in the organism name (Oncorhynchus mykiss) or the taxonomic ID/taxid (8022). See screen shot below. Step 2: Using the top Oncorhynchus mykiss hit as a query, search the human genome. Record the sequence accession(s) of the top-ranking cluster of hits (clustering hits that have the same alignment score or are each other’s near-exact matches). Note both the accession provided by default (which is unlikely to be the UniProt accession) and any UniProt accession(s) for the top-scoring cluster. From the results of Steps 1 and 2, answer the question: Is there a protein in the Oncorhynchus mykiss genome that satisfies the RBB criterion for KCNA1_HUMAN? Question 2: Is there a protein in the human genome that is the RBB match to KCNA1_ONCMY? Repeat the same analyses as in Question 1, but starting with KCNA1_ONCMY as a query. From the results of Steps 1 and 2, answer the question: Is there a protein in the human genome that satisfies the RBB criterion for KCNA1_ONCMY? Question 3: Try to use one of the orthology prediction webservers (or orthology database) to identify orthologs in the human genome to KCNA1_ONCMY. You can start by looking at webservers and databases listed here: http://questfororthologs.org/orthology_databases. Combine the findings from these various analyses to answer the question: Is there an unambiguous 1-1 ortholog in the human genome? 3

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Bioe 190 HW6 - Ortholog identification - b