Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Perl exercise 4 (Due on 07/12/2010) Don't forget to write well-organized scripts, use meaningful variables and to write comments about what you are doing in each part of code. Always test your script on several examples. If you need more biological sequences, search for appropriate examples in GenBank. Note that the obligatory question are 1, 2, 3a, 5. Pattern Matching 1. Write the following regular expressions (test them in a script, but send us just the regular expressions): a. Match a number which is composed only of even digits, including 0 (don't allow 0 to be the first digit!) For example it should match: 248, 4200, 6 Should not match: 100, 020, 5 b. Match a number which may be negative or positive, may have a decimal point, but it must be smaller than 1000, and larger than -1000. For example it should match: -132, 3.1415, 0 Should not match: 1000 -2001 c. Match a number like in (b), but if there is a decimal point, the digits after it must be 0 or 5 (any number of '0' or '5' is allowed). For example it should match: -132, 3.5555, 22.0505 Should not match: 1000, 2.27, -2.5050502 d. Match an RNA sequence that begins with "AUG" and ends with either of "UAA","UAG", or "UGA". Both upper case and lower case letters are allowed. 2. Write a script that reads a phone book file – name, address and phone number, separated by colons (;). Here are two examples: Yael Ginsburg;19 Herzel St. Dimona;08-3792999 Rahel Levi;36 Yefet St. Tel Aviv-Yafo;03-6447338 Print out all family names in the 03 region. 3. Write a script that reads a FASTA file of DNA sequences and finds the first open reading frame in each one, if one exists. Note: a reading frame start with "ATG", contains any number of codons (nucleotide triplets) and ends with either "TAA","TAG", or "TGA". a. Print the coding sequence you found. b*. Print the positions of the beginning of the methionine codon and the last nucleotide of the stop codon. 4* Now also search for an open reading frame on the opposite strand. Can you find ALL possible reading frames on both strands? Note: the opposite strand is the "complement"(by matching nucleotides A-T and G-C) in reverse order. 5. Write a script to read and parse a Genbank genomic record (Use adenovirus GenBank record available from the course site). Find lines of coding sequence annotation (CDS), extract and print the separate coordinates (get each number into a separate variable). Try to extract them correctly for as many of CDS lines as you can! Note: There could be several coordinates in a CDS line e.g., CDS join(503..1070,1145..1377). Where 503..1070 and 503..1070 are 2 sets of coding sequence coordinates 6* For each CDS, extract and print the coding sequence of the gene from the FASTA file of the genome sequence (available on the course website).