* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download annotation_tutorial
Gene nomenclature wikipedia , lookup
Microevolution wikipedia , lookup
Point mutation wikipedia , lookup
Genome evolution wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Frameshift mutation wikipedia , lookup
Metagenomics wikipedia , lookup
Designer baby wikipedia , lookup
Genetic code wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Genome editing wikipedia , lookup
Sequence alignment wikipedia , lookup
Using the Artemis sequence viewer and annotation tool. About this document This document is very much “work in progress”, so if you have any comments or suggestions, do not hesitate to contact me. Installing the software Artemis can be downloaded from http://www.sanger.ac.uk/Software/Artemis/ . This site also contains installation instructions. The Artemis software can also be downloaded from http://mycor.nancy.inra.fr/IMGC/LaccariaGenome/Annotation/download.html. You can download Artemis documentation from the same address. Getting the correct scaffolds The available scaffolds are based on the 15 march 2005 assembly. Sequences are split in 1MB, or smaller segments, with an overlap of 10KB. Sequences are named scaffold_XXX_YYY-ZZZ with XXX the number of the scaffold in the 20050315 assembly and YYY-ZZZ the range of the sequence in the original scaffold. To determine the scaffolds that contain your gene of interest you can use the BLAST server at http://mycor.nancy.inra.fr/IMGC/LaccariaGenome/Annotation/blastlaccaria.php You can download the scaffolds you want to work with from http://mycor.nancy.inra.fr/IMGC/LaccariaGenome/Annotation/scaffold.php?start=0&sear ch= Getting additional data I have done a tBLASTx of the Laccaria scaffolds against both Coprinus and Cryptococcus. This result is formatted so you can load it into artemis to help you visualize. Off course you can expect this file to include a lot of false positive data but I’ve found it very helpful nonetheless. For each genome I have made 2 sets available: 1 filtered to a BLAST e-value of 10-10 or less and one filtered to an e-value of 10-50 or less. You can download these files from http:// . The mapping of the EST data is also formatted in the same way. These files can also be downloaded from http:// . Example Let’s try to annotate the NADP-dependent glutamate dehydrogenase 2 in Laccaria using the yeast sequence from SwissProt. The yeast sequence look like this: >DHE5_YEAST (P39708) NADP-specific glutamate dehydrogenase 2 (EC 1.4.1.4) (NADP-GDH 2) (NADP-dependent glutamate dehydrogenase 2) MTSEPEFQQAYDEIVSSVEDSKIFEKFPQYKKVLPIVSVPERIIQFRVTWENDNGEQEVA QGYRVQFNSAKGPYKGGLRFHPSVNLSILKFLGFEQIFKNALTGLDMGGGKGGLCVDLKG KSDNEIRRICYAFMRELSRHIGKDTDVPAGDIGVGGREIGYLFGAYRSYKNSWEGVLTGK GLNWGGSLIRPEATGFGLVYYTQAMIDYATNGKESFEGKRVTISGSGNVAQYAALKVIEL GGIVVSLSDSKGCIISETGITSEQIHDIASAKIRFKSLEEIVDEYSTFSESKMKYVAGAR PWTHVSNVDIALPCATQNEVSGDEAKALVASGVKFVAEGANMGSTPEAISVFETARSTAT NAKDAVWFGPPKAANLGGVAVSGLEMAQNSQKVTWTAERVDQELKKIMINCFNDCIQAAQ EYSTEKNTNTLPSLVKGANIASFVMVADAMLDQGDVF Using ungapped tBLASTn against the Laccaria assembly with the BLOSUM62 matrix and an expect cutoff of 0.0001 gives us this BLAST report: TBLASTN 2.2.8 [Jan-05-2004] Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs", Nucleic Acids Res. 25:3389-3402. Query= DHE5_YEAST (P39708) NADP-specific glutamate dehydrogenase 2 (EC 1.4.1.4) (NADP-GDH 2) (NADP-dependent glutamate dehydrogenase 2) (457 letters) Database: laccaria_genome 686 sequences; 65,096,429 total letters Searching..done Sequences producing significant alignments: scaffold_4_1-1000000 8 Score E (bits) Value 104 N e-126 >scaffold_4_1-1000000 Length = 1000000 Score = 104 bits (224), Expect(8) = e-126 Identities = 39/57 (68%), Positives = 47/57 (82%) Frame = +1 Query: 147 VPAGDIGVGGREIGYLFGAYRSYKNSWEGVLTGKGLNWGGSLIRPEATGFGLVYYTQ 203 + AGDIG G REIGYLFGAY+ +N + G+LTGKGL WGGS IRPEATG+GL+YY + Sbjct: 194707 IVAGDIGTGAREIGYLFGAYKKLQNEFVGMLTGKGLAWGGSFIRPEATGYGLIYYVE 194877 Score = 95.7 bits (204), Expect(8) = e-126 Identities = 36/61 (59%), Positives = 48/61 (78%) Frame = +2 Query: 366 VWFGPPKAANLGGVAVSGLEMAQNSQKVTWTAERVDQELKKIMINCFNDCIQAAQEYSTE 425 VW+ P KA+N GGVAVSGLEMAQNSQ++ WT ++VDQ+LKKIM C+ C+ A ++S E Sbjct: 195500 VWYAPGKASNCGGVAVSGLEMAQNSQRLAWTTDQVDQKLKKIMAECYEICLSAGTKWSGE 195679 Query: 426 K 426 + Sbjct: 195680 E 195682 Score = 83.4 bits (177), Expect(8) = e-126 Identities = 34/38 (89%), Positives = 35/38 (92%) Frame = +3 Query: 66 QFNSAKGPYKGGLRFHPSVNLSILKFLGFEQIFKNALT 103 Q+NSA GPYKGGLR HPSVNLSILKFLGFEQ FKNALT Sbjct: 194409 QYNSALGPYKGGLRLHPSVNLSILKFLGFEQTFKNALT 194522 Score = 79.8 bits (169), Expect(8) = e-126 Identities = 36/61 (59%), Positives = 45/61 (73%) Frame = +3 Query: 221 VTISGSGNVAQYAALKVIELGGIVVSLSDSKGCIISETGITSEQIHDIASAKIRFKSLEE 280 V ISGSGNVAQ+ ALKVIELG V+SLSDSKG +I+E G T E I +I K++ +LE Sbjct: 194988 VAISGSGNVAQFTALKVIELGATVLSLSDSKGSLIAEKGYTKEFIKEIGQLKLKGGALES 195167 Query: 281 I 281 + Sbjct: 195168 L 195170 Score = 63.4 bits (133), Expect(8) = e-126 Identities = 27/46 (58%), Positives = 35/46 (76%) Frame = +3 Query: 297 AGARPWTHVSNVDIALPCATQNEVSGDEAKALVASGVKFVAEGANM 342 AG RPW+ + V +ALP ATQNEVS EA+ L+ +GV+ VAEG+NM Sbjct: 195249 AGKRPWSLLPVVHVALPGATQNEVSKTEAEDLIKAGVRIVAEGSNM 195386 Score = 63.4 bits (133), Expect(8) = e-126 Identities = 22/39 (56%), Positives = 31/39 (79%) Frame = +3 Query: 28 PQYKKVLPIVSVPERIIQFRVTWENDNGEQEVAQGYRVQ 66 P Y+K L IV +PER++QFRV WE+D G+ +V +G+RVQ Sbjct: 194232 PDYEKALEIVQIPERVLQFRVVWEDDQGKAQVNRGFRVQ 194348 Score = 51.1 bits (106), Expect(8) = e-126 Identities = 20/27 (74%), Positives = 22/27 (81%) Frame = +3 Query: 122 SDNEIRRICYAFMRELSRHIGKDTDVP 148 SD EIRR C +FM EL RHIG+DTDVP Sbjct: 194577 SDGEIRRFCTSFMSELFRHIGQDTDVP 194657 Score = 42.5 bits (87), Expect(8) = e-126 Identities = 17/31 (54%), Positives = 21/31 (67%) Frame = +2 Query: 425 EKNTNTLPSLVKGANIASFVMVADAMLDQGD 455 E LPSL+ GAN+A F+ VADAM + GD Sbjct: 195680 EIKDGVLPSLLSGANVAGFIKVADAMREHGD 195772 Database: laccaria_genome Posted date: May 4, 2005 3:15 PM Number of letters in database: 65,096,429 Number of sequences in database: 686 Lambda 0.315 K H 0.133 0.380 Matrix: BLOSUM62 Number of Hits to DB: 28,044,928 Number of Sequences: 686 Number of extensions: 351513 Number of successful extensions: 43955 Number of sequences better than 1.0e-04: 2 length of query: 457 length of database: 21,698,809 effective HSP length: 55 effective length of query: 402 effective length of database: 21,661,079 effective search space: 8707753758 effective search space used: 8707753758 frameshift window, decay const: 50, 0.5 T: 13 A: 40 X1: 16 ( 7.3 bits) X2: 32 (14.6 bits) S1: 41 (21.6 bits) S2: 96 (46.6 bits) If we look at the first section of the BLAST report we see our query sequence is 457 nucleotides long and we have 1 hit with a very good score in scaffold_4_1-1000000 We download this sequence and save it in the file scaffold_4_1-1000000.embl. Doing the manual annotation Before we can annotate the correct gene structure in artemis it helps to know what introns we can expect in this genome. Information on this is available in a file Laccaria bicolor introns.doc. This file should be present in the same location as the document you are reading. Start artemis and load in the file you just downloaded. This will give you the Artemis window looking like this: The artemis window is composed of 3 main parts: The first (top) part gives you an overview of the entire sequence. Using the slider at the bottom you can scroll through the sequence. The slider in the right will let you zoom in and out on the sequence. In the middle of this section you see the nucleotide numbering with one dark grey bar above and below. These bars will later contain a graphical representation of our annotation. Above and below there are also 3 light-grey bars, these represent the translation of this sequence in the 6 reading frames. The black lines in these bars indicate stop codons at the corresponding positions. The second (middle) part of the artemis window shows a maximally zoomed in view of the sequence. The sequence and its reverse complement are shown in the middle and the six-frame translation is given on the 6 bars above and below the sequence. Stop codons are represented by the symbols +, * and #. The third part of the screen shows the currently annotated features and is currently empty. To start annotating our gene we have a look at the second part of the BLAST report. The text Expect(8) = e-126 tells us that BLAST found 8 correctly distributed hits in the genome sequence. Taken together these hits get an e-value of 10-126. If this gene is completely covered by the BLAST hit, it can thus have a maximum of 8 exons. It might be less since we used ungapped BLAST so if there is a real gap in the alignment it would have been split up in 2 HSPs. We browse through all HSPs in this group (all with the same text “Expect(8) = e-126” (there is only 1 group in this example)) and we find the 2 extreme HSPs. Score = 63.4 bits (133), Expect(8) = e-126 Identities = 22/39 (56%), Positives = 31/39 (79%) Frame = +3 Query: 28 PQYKKVLPIVSVPERIIQFRVTWENDNGEQEVAQGYRVQ 66 P Y+K L IV +PER++QFRV WE+D G+ +V +G+RVQ Sbjct: 194232 PDYEKALEIVQIPERVLQFRVVWEDDQGKAQVNRGFRVQ 194348 and Score = 42.5 bits (87), Expect(8) = e-126 Identities = 17/31 (54%), Positives = 21/31 (67%) Frame = +2 Query: 425 EKNTNTLPSLVKGANIASFVMVADAMLDQGD 455 E LPSL+ GAN+A F+ VADAM + GD Sbjct: 195680 EIKDGVLPSLLSGANVAGFIKVADAMREHGD 195772 We noted earlier that the length of our query(yeast) sequence was 457 so it seems this last HSP corresponds to the end of the last exon. The first HSP only starts at position 28 of the yeast sequence so we might be missing the first exon. We’ll first add all the BLAST HSPs to the sequence. From the BLAST report we note the location of each HSP on the assembly. These are: 194707..194877 195500..195682 194409..194522 194988..195170 195249..195386 194232..194348 194577..194657 195680..195772 These are each entered into artemis with the menu “Create -> New feature”. In the new window, select BLASTCDS as “key” and fill in the correct coordinates as shown in this figure. Do the same for the remaining 7 features. Once you are a bit more familiar with this process you will want to skip this first step, but for this tutorial I think it’s a good idea. Now double click on one of the created features in the bottom part of the screen and artemis will center around this feature. Your screen should look something like this: To facilitate the discussion I have numbered the 8 HSP segments. There are no stop codons between 1 and 3 so HSP 1, 2 and 3 could form 1 exon. From HSP 3 to 4 we go to another reading frame so there will probably be in intron between these HSPs. The same goes for HSPs 4&5 and 6 & 7. We can see there are stop codons between HSPs 5 and 6 so there will be an intron between these. HSPs 7 and 8 overlap so they will be merged to 1 exon. First we need to check if we have one or more ESTs available for this gene. If we have this will make our job a lot easier. To check this we should download the gff file with Laccaria EST data and load it into artemis. For this sequence this file is called scaffold_4_1-1000000.lbEST.gff. Load it into artemis using the “File -> Read an entry...” menu. We see there are no ESTs matching this gene so we’ll have to do everything by hand. To see an example of a gene matched by an EST you can scroll to position 326000 (or look at the next screen shot), This way you can clearly see the intron/exon boundaries. We can’t use the EST data for this gene so we can unload the EST gff file from artemis by clicking “Entries -> Remove An Entry -> scaffold_4_1-1000000.lbEST.gff” Go back to our glutamate dehydrogenase by double-clicking one of the BLASTCDS entries in the lower part of the artemis window. For the purpose of this tutorial we will annotate this gene in the 3’ -> 5’ direction, contrary to the natural 5’ -> 3’ direction. We do suggest however that you start with most genes from the 5’ end. This is however a nice gene for a tutorial because it contains both “easy” and “hard” parts. Unfortunately the easy parts, with which we’ll start, are at the 3’ end. Let’s start with the last intron (between HSPs 6 and 7). In the BLAST report we see that HSP 6 stops at position 342 while HSP 7 only starts at position 366 of the Yeast sequence. This means it’s likely that there is still some coding sequence between these 2 HSPs that was not detected by BLAST. Zoom in on sequence and try to find the intron. We need to take into account that HSP 6 was in frame 3 and HSP 7 is in frame 2 so we need to select a GT....AG pair that respects this. (GT...AG seem to be the most common introns in Laccaria, but we also have a (much lower) number of GC...AG introns) Position 195387 looks like a very good splice donor site. If we take into account the fact that most Laccaria introns have a length in the 40-70 nt. range we have only 2 possible splice acceptors: 195439 and 195452. The potential splice acceptors on 195473, 195485 and 195521 would give us unusually long introns. If we then take into account the constraint that our intron needs to respect the reading frame of the 2 flanking exons, we can disqualify all potential splice acceptors except 195439. If you find 2 AG’s within a few nt you should select the AG most 5’ because of the way the splicing mechanism works: the spliceosomal machinery binds to the branch point and from there on starts scanning the sequence in the 5’->3’ direction for the AG splice acceptor. Because of this scanning mechanism the first AG will usually be selected. We select these nucleotides. Click on nucleotide 195387 and drag the mouse to nucleotide 195439. Now we turn our selection into in intron by clicking “Create -> Create Feature From Base Range”. Change the key to “intron” and click OK. Your artemis window should now look something like this: Let’s immediately include the last exon. We remember from the BLAST report that the last HSP should be close to the end of the gene and indeed we see a stop codon 2 codons beyond the end of this HSP. We select the nucleotides from 195440 to 195781 (do this by first clicking on 195440, then scrolling to the right and finally shift-clicking on 195781). Make this an exon by clicking “Create -> Create Feature From Base Range”. Change the key to “exon” and click OK. We’ll have a look at the intron between HSPs 5 and 6 now. From the BLAST report we learn the HSP 5 stops at 281 in the yeast sequence and HSP 6 starts at 297, so again we expect to include a few amino acids that were not detected by BLAST. Since there is a stop codon before HSP 6 we expect to find most of these missing amino acids by extending HSP5. This stop codon also means we have only 1 valid splice-acceptor near HSP6: 195252. For splice donors we can choose from 195167, 195198 (a GC splice donor), 195202 and 195214. Since we think we should extend the HSP we don’t choose 195167 (this one is inside HSP5). 195198 would not put us in the correct reading frame when combined with our splice acceptor and 195214 would give us an intron of only 38 nucleotides so we choose 195202. Select the intron and annotate it as previously. You can now also select and annotate the second to last exon. Next is the intron between HSPs 4 and 5. Zoom in on this area. Your artemis window now looks like this: The BLAST report tells us that HSP 4 stops at 203 while HSP 5 only starts at 221 in the yeast sequence, so again we expect to have to extend the HSPs. Position 194881 has the only clean splice donor. Now we find 4 splice acceptors that put us in the right reading frame: 194936, 194960, 194975 and 194981. With no other information, we will just select the first one and check our protein later by aligning it to known homologs. Let’s move on to the intron between HSPs 3 and 4. From the BLAST report we see that HSP3 stops at 148 while HSP4 starts at 147 in the yeast sequence. This means we’ll probably have to remove a few amino acids from the HSPs. The only valid combination here seems to be 194659..194713 so we select this as our intron. HSPs 1, 2 and 3 could form 1 large exons because there are no stop codons between them. To make sure this is correct we have again a look at our BLAST report. If we look at the location of these HSPs on the yeast sequence we see this 28..(HSP1)..66 66..(HSP2)..103 122..(HSP3)..148 This leads us think that HSPs 2 and 3 will indeed form one exon, but it is likely that there will be an intron between HSPs 1 and 2. If we stay close to the HSP boundaries we see only one valid donor/acceptor combination: 194349..194411. After annotating this intron and the exons your artemis window should look like this: One HSP to go. The BLAST report showed us that it is likely that this is not the real start of the gene. If we look at the sequence before this HSP we can see no methionine codons in the same reading frame. This means we are probably missing one intron and an exon. If we look at the upstream sequence we can find a methionine at position 194130 and another one at 194095. We can find multiple splice acceptors and for both methionine codons we can find a splice donor, but never a very clean one. Clearly this is going to be our most difficult intron so we’ll need a little help. We’ll load in the Coprinus tBLASTx data to help find the first exon. We will use the file filtered to an e-value of 10-50 since this will contain less false positives. To load in this file select “File -> Read An Entry...” and select the file scaffold_4_1-1000000.coprE-50.gff. Now your artemis window will look like this. You will notice on the top of the screen that a new entry has been loaded. You can turn the annotation of each individual entry on or off by clicking the check box in front of its name. With this information it’s clear which methionine we should select as start site. It even seems we can just use the borders of these 2 tBLASTx hits as our intron borders. The splice donor site does not look perfect but we can’t find a better one in this area. We annotate 194095..194136 as our first exon and 194137..194183 as our first intron. Now have another look at the tBLASTx data. First check off the first entry and then check it back on (this is just a trick to make sure the first entry is on top in the artemis window). Your window now looks like this: We see that our annotation agrees very well with the tBLASTx data. It looks we were right that HSP 2 and 3 belong to the same exon. It seems we were also right in extending HSP5 in both directions and HSP7 to the 5’ end. Off course we’ll need to check this more carefully later. This is it for the tBLASTx data, so you can click it off or remove the entry with “Entries > Remove An Entry -> scaffold_4_1-1000000.coprE-50.gff”. Now we have all our exons annotated we can combine then to a CDS. Select all the exons (click on the first and then, while holding shift, click on the other ones) and combine them by clicking “Edit -> Merge Selected Features”. Confirm that you want to do this and don’t let artemis remove the old features (for now). You will notice in the lower part of the artemis window that a new exon feature was formed spanning from 194095 to 195781. We will need to modify this to CDS. Select this feature (if it isn’t already) and click “Edit -> Edit Selected Features”. Change the key to CDS and click OK. Now your artemis window looks like this: Quality Control Now it’s time to check if we really annotated this gene correctly. Select the CDS we just annotated and show the sequence by selecting “View -> View Amino Acids Of Selection As FASTA”. Select the sequence and copy it to the clipboard. Go to http://npsa-pbil.ibcp.fr/cgi-bin/npsa_automat.pl?page=/NPSA/npsa_server.html Then chose Blast search, and paste your sequence in the window. Choose BlastP, and non-redundant protein sequence (you may chose Uniprot-SwissProt instead, especially if your protein is well known). You may give a name to your sequence (optional), otherwise it will appear as “unknown_XXXX”. Click on submit.. Wait a bit… Then a window with a graphic display and a list of orthologues is given. Most of the time you have too many of them. Keep only the top of the list, by typing an appropriate E threshold in the window below the graph, then the “select” button. Scroll to the bottom of the page to click on “Add query sequence to created database”, then click on the Extract button. This may take a while … before the next page appear (“work with protein sequence databank”) . Click on Align .. In the next page (“ClustalW”) click on the submit button (after choosing a larger window, maybe, like 100) From the multiple alignment it’s clear that our annotation agrees very well with the alignment except for 1 area: The “CRA” in our annotation seems to be an insertion compared to the other basidiomycetes. Let’s go back to our artemis window and find this sequence. This is on the border of the first and second exon. We can’t remove the CRA sequence from the protein since we don’t have the correct splice donor and acceptor for this. We can however remove the ACR from the last exon, this will give us the exact same protein sequence and we can keep our splice acceptor site. We also find an OK splice donor at position 194128. This is one of the rather rate GC splice donors! Modify the annotation of the first exon by clicking it and selecting “Edit -> Edit Selected Features” change the endpoint to 194127. Now modify the intron to have it start at location 194128. Now we have the correct intron/exon structure of this gene defined. We still need to remove the old CDS: select in and click “Edit -> Remove Selected Features”. Now make a new CDS by combining the exons as explained earlier. This time we won’t be modifying the exons anymore so you can remove the old features when artemis asks for it. We also won’t need our original BLAST HSPs anymore so you can remove these as well. Another quite common problem is to find the correct start codon. Using this alignment method it will mostly be very obvious if you selected the wrong ATG. Now we can finish the annotation of this gene by adding some information to the CDS feature. Select the CDS feature and click “Edit -> Edit Selected Features”. The first thing we’ll add is the name of the gene. Next to the “Add Qualifier” button, select the qualifier “gene” and then push the “Add Qualifier” button. In the text area below the text /gene=”” will appear. Enter the gene name between the quotes. (gdhA). Please use the accepted name(s) for this and select the one used for Yeast/Fungi if there is more then one. This name, usually 3letters + a Capital letter or a number (eg, GlnA, Nia1) can usually be found in the entries you had in your Blast search. Likewise, we can add the “product” qualifier (NADP-dependent glutamate dehydrogenase). We will also include the best results from our BLASTp against SwissProt by adding the “blastp_match” qualifier. (gi|1706405|sp|P54388|DHE4_LACBI NADP-specific glutamate deh... 834 0.0. gi|41017051|sp|Q96UJ9|DHE4_HEBCY NADP-specific glutamate de... 735 0.0. gi|1706404|sp|P54387|DHE4_AGABI NADP-specific glutamate deh... 714 0.0). Finally, don’t forget to add your name to annotation so we can keep track of who did what. Use the “curation” qualifier for this. Have a look at the other qualifiers that are available. You can find a partial description at http://www.ncbi.nlm.nih.gov/collab/FT/#7.4 . 1. keep as close as possible to conventions: - type biochemical function when there is an indication for it, using Swissprot kind of nomenclature (The EC nomenclature for enzyme) - If only a cellular function is known, check if it could apply to Laccaria ! Otherwise use a terminology indicating where the function applied, or append “like” to the description valid for another organism. - Don’t overkill ! Often the blast will point to a very specific function… Like Cadmium transporter, uridine diphosphate-N-acetylglucosamine transporter, 6phosphogluconolactonase. Check if it really applies, or if it shouldn’t be changed towards a wider acceptation : metal transporter, nucleotide-sugar transporter, etc.. In the example: In this example we are rather sure it is indeed an NADPdependant glutamate dehydrogenase because we found a lot of hits to other NADP-dependant glutamate dehydrogenases with our BLAST. Otherwise we could have used “glutamate dehydrogenase” 2. Only leave this name as such if there is this is proven to exist (cognate cDNA & ESTs). Otherwise add the mention putative or probable, depending on the likeliness for the gene to be the one you found : “glutamate dehydrogenase, probable”. The artemis feature edit window should now look something like this: Finally, I suggest you keep as much information as possible in a Word file on your own computer. This information should include: the location and annotation of the gene. You can get this from the Artemis Feature Edit window (see previous screenshot), the DNA and protein sequence of your gene (you can get these by selecting the CDS and select “View -> View Bases Of Selection As Fasta” or “View Amino Acids Of Selection As Fasta”), relevant information from BLAST reports, the alignment you made in the quality control step, ... basically anything you think might be important. All this information will be important if we update the genome and there are problems with remapping previously annotated genes. Click OK and don’t forget to save you file (File -> Save An Entry). You can upload your annotated file to http://