Download Basic sequence analyses and submission

BIT150 - Lab 1 Sequence Analysis Remember: Door code: 93175 Log in: 150student Password: $Jorge1 Log on to: Bioinfolab Introduction Using Triticum monococcum L. genomic DNA we subcloned a 4,116-bp Hind III fragment into a pBluescript II SK vector. This vector has M13 forward (F) and reverse (R) sites at both sides of the Hind III cloning site. Commercial primers M13F and M13R were used to start sequencing the cloned fragment (chromatograms M13_F and M13_R). However, this sequence is too long to be completed in a single sequencing reaction, so primer walking was used to complete the sequence. Using the sequence obtained with the M13 primers, new primers were designed (F1 and R1) and used to extend the sequence. Then, the new sequences were used to design primers F2 and R2. Finally, primer F3 was used to close the last gap (Figure 1). F3  F2  Vector  R2 F1   R1 Vector M13_F  Figure 1  M13_R Sequences are available on the class website (BIT150), and also in the Z: drive/10_Lab1 directory. You can read but not write in the ‘Z:’ drive, but you can read and write in the ‘C:’ drive. Create in the C: drive a directory with your last name within the class directory (BIT150), and copy the directory Lab1 from Z: into C:. Take a copy with you for the homework (Hwk1). Objective: Manually prepare a full-length integrated sequence of the T. monococcum fragment, without vector, without sequencing errors, and without duplicated overlapping sequences. Activities: 1. Use Chromas to open and inspect the chromatograms Chromas is a chromatogram-viewing program that can display chromatograms in both forward and reverse complement orientation. As all the software that will be used in the class, Chromas was pre-installed on your lab computers, and a copy of the free-ware version is included in the course CD that was distributed in the first class. You can start this and the other programs by clicking on START-> Programs->Bioinformatics (you can also create shortcuts in the Desktop). 1 In the Chromas window click on Open. Chromas will open SCF and Applied Biosystems sequencing files. Here we will use Applied Biosystems files with the extension .ab1 Click on the .ab1 file you want to open, and then click on Open. Once the chromatogram is open, browse through the length of the sequence and check the quality of the sequence. Compare the quality at both the beginning and the end of the sequence, with the quality at the middle of the sequence. o o The chromatogram files are the files with extension <.ab1> When the sequence peaks in the chromatogram are sharp and equally spaced, that indicates high quality of the read. o Determining sequence quality through visual inspection is a highly subjective scoring technique. However, quality scoring software sometimes perform poorly on otherwise good quality sequence due to sequencing artifacts, irregular spacing of the peaks, etc. Thus, it is still considered the most reliable method for decision making on quality scoring. You can use the examples below as guidelines for your own decision making: High quality Low quality Low quality 2 2. Use Chromas to convert the chromatograms into FASTA format and copy them to the MBCS Add-In Word document in your Lab1 directory. In Chromas, Chromatogram Editor, click on Edit  Copy Sequence  FASTA format. Open the MBCS Add-In Word Document. A Security Warning should open at the top of the document. Click the Options button and select Enable this Content. Paste the sequence into the MBCS Add-In Word document. The M13_R and the R1 and R2 sequences are in reverse complement orientation to the forward sequences (they are sequences from the other DNA strand). To put all the sequences in the same orientation, you need to reverse complement them (e. g. from AGCTT to AAGCT). It might be easier to align first M_13R, R1 and R2 and then, at the end, reverse complement the correct contig. To reverse complement a sequence, select the sequence in the MBCS Add-In Document. Select Add-Ins  MBCS1.2  Sequence Manipulation  Antisense DNA/RNA Sequence. (Two other sequence tools are listed here that are similar but not the same. Reverse will list the bases backwards. Complement will list the complementary bases. Antisense will both Reverse and Complement the sequence. It is important that you select the correct option.) In the sequence manipulation window you can select to copy the new sequence to the clipboard or insert it at the end of your sequence. Be sure to indicate that the sequence has been reverse complemented, by adding ‘RC’). CURRIER NEW font is useful for a good alignment of sequences. An alternative way to reverse complement the sequence is to use Chromas to reverse complement the chromatogram before exporting the sequence. It is doubly important to make a note of the orientation of the sequence after pasting it in the word document if you use Chromas to reverse it, as Chromas will not change the name of the sequence when it is reversed. This is useful however to view the chromatogram of the reversed sequence. 3. Identify Restriction sites in your sequence using the webtool NEB Cutter by New England Biolabs. http://tools.neb.com/NEBcutter2/index.php Paste your sequence into the NEBcutter window. Select All Commercially available specificities and click Submit. To identify sites of a specific enzyme, in the results window under main options, click Custom Digest. Check the boxes of the specific enzymes you are interested in and click Digest. To transfer figures from screen to your Word documents: Press the Prnt Scrn key of your keyboard to copy the screen image to the clipboard (Open Start/Programs/ Accessories/Paint, select Edit/Paste (or Ctrl V); select the cut tool on the left region you want to cut, copy it (Ctrl C), and paste it into your Word document. , mark the 3 A custom digest for BamHI on the assembled 4116 bp sequence is shown below. 4. Trim the vector sequence from the M13_F and M13_R sequences using the NCBI's BLAST-VecScreen (http://www.ncbi.nlm.nih.gov/VecScreen/VecScreen.html). Copy the M13_F sequence and click on RunVecScreen. Find the region corresponding to vector. Identify the Hind III cloning site: AAGCTT, and eliminate the vector sequence, but not the cloning site. Highlight the cloning site. Repeat the same process with the M13_R sequence. The other sequences (F1, F2, F3, R1 and R2) do not have vector because the primers were designed within the cloned segment. 5. Use BLAST 2 sequences to Align M13_F with F1 http://www.ncbi.nlm.nih.gov/blast/bl2seq/wblast2.cgi (create a bookmark for this site) Copy the M13_F sequence in the Sequence 1 window and the F1 sequence in the Sequence 2 window. Unselect Filter, and click on Align. Identify the overlapping sequence between M13_F and F1, and highlight it in both sequences in your Word document (remember to use Currier New font). The two sequences are from the same molecule, and therefore they should be identical. The differences between them are sequencing errors. Examine the chromatograms to decide which base is the correct one for each difference observed in the overlapping sequence. Eliminate the duplicated region generated by the sequence overlaps and create a combined clean sequence (without vector and without sequencing errors). 6. Repeat the process with the other chromatograms until you assemble a complete clean sequence 4 Sequence Submission to NCBI Introduction An important part of working with genomic or protein sequences is the submission of the final sequence to the central databases (GenBank for the US). The software tool developed by NCBI for submitting and updating entries to the GenBank, EMBL, or DDBJ sequence databases is called Sequin, and is available freely at http://www.ncbi.nlm.nih.gov/Sequin/. For Sequin tutorial, go to SEQUIN at NCBI. Objective: Prepare a sequence submission to GenBank using Sequin. Activities: 1. Use Sequin to prepare a sequence submission to GenBank For this assignment you have the genomic DNA sequence (.txt) from barley (Hordeum vulgare L.), the protein translation (.txt), and the annotated genomic sequence (.doc) for the Acyl Co-A Synthetase in the subdirectory with the name ‘Sequin Acyl Co-A Synthetase’, into the Lab1 directory. Download the files into your created directory in the C: drive. 1.1. Start Sequin. 1.2. Enter your personal information as submitter. Ask the sequence to be released in 1 year from today. Move to the next form. 1.3. Load the ‘proper’ Co-A data file(s) into Sequin and move to the next form. The FASTA genomic DNA sequence is in the file Co-A_DNA.txt and the protein one in Co-A_Protein.txt (note that the protein sequence should NOT have the asterisk representing the stop codon at the end). For both .txt files, note that the SeqID after the ‘>’ symbol in the definition line should not contain any space. The final annotation of the sequence is in the Word document called ‘Final annotation.doc’. Sequin will format and annotate the sequence using automated programs (called macros) to determine exon locations, etc. Check if the coordinates of the exons are correct using the Tools/Word Count option in your annotated Word document. 1.4. To get the taxonomic information, you can go to NCBI, select the Taxonomy database and search for Hordeum vulgare. Click on the line Unclassified in the Organism field. Select the Lineage tab and paste the lineage into the Taxonomic Lineage box. Eukaryota; Viridiplantae; Streptophyta; Streptophytina; Embryophyta; Tracheophyta; Euphyllophyta; Spermatophyta; Magnoliophyta; Liliopsida; commelinids; Poales; Poaceae; BEP clade; Pooideae; Triticeae; Hordeum 1.5. Click on Search->Validate to check correctness of the automated annotation. If a submission is invalid, you can correct it manually by clicking on the shown error and completing the requested information. 1.6. Once you fixed the error, click on Revalidate. 1.7. To save your document, click on File --> Export GenBank (then you will be able to open this file as a Word document). The complete and validated GenBank file should be submitted as part of Homework 1. 5

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Basic sequence analyses and submission