Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
4.1 Revision 4.2 if, elsif, else It’s convenient to test several conditions in one if structure: print "Please enter your grades average:\n"; my $number = <STDIN>; if ($number < 0 or $number > 100) { print "ERROR: The average must be between 0 and 100.\n"; } elsif ($number > 90) { print "wow!\n"; } elsif ($number > 80) { print "well done.\n"; } else { print "oh well...\n"; } Note the indentation: a single tab in each ‘}’ that ends the block line of new block should be in the same indentation as where it started 4.3 Comparison operators Comparison Numeric String Equal == eq Not equal != ne Less than < lt Greater than > gt Less than or equal to >= le Greater than or equal to <= ge if ($age == 18){ ... } if ($name eq "Yossi")... if ($name ne "Yossi")... if ($name lt "n")... if ($age = 18)... Found = in conditional, should be == at ... if ($name == "Yossi")... Argument "Yossi" isn't numeric in numeric eq (==) at ... 4.4 Loops Commands inside a loop are executed repeatedly (iteratively): my $num=0; print "Guess a number.\n"; while ($num != 31) { $num = <STDIN>; } print "correct!\n"; my @names = <STDIN>; chomp(@names); my $name; foreach $name (@names) { print "Hello $name!\n"; } 4.5 Loops: foreach The foreach loop passes through all the elements of an array my @numArr = (1,1,2,3,5); foreach my $number (@numArr) { $number++; } Note: The array is actually changed 4.6 Fasta format Fasta format sequence begins with a single-line description, that start with '>', followed by lines of sequence data that contains new-lines after a fixed number of characters: >gi|229608964|ref|NM_014600.2| Homo sapiens EH-domain AAACATGGCGGCGCCCTGCGCGGCTTCCCGTCGCCGCAACCGTGGGGCCGGCCCTGCCTT GGAGCGGAGCCGAAGCATCCCTTGCTGCACGCAGGGCAGAGCAGGCGAGGGCTGGGGGCC GTATAACTTATTTTATATCCATATTCAGACTATATAGAGAATATTCTATGCATCTATGAC GTGCTTAC >gi|197099147|ref|NM_001131576.1| Pongo abelii EH-domain AGAGCTGAGCGCCTGCCCACAAACATGGCGGCGCCCTGCGCGGCTTCCCTTCGCCGGGAC CGCCTGGGGCTGCAGGATGCTGCTGCGGATGCTGAGCTGTCCGCGGGTTGGGCAGCGTCG CTGCGCGGCTTCCCTT >gi|55742034|ref|NM_001006733.1| Xenopus tropicalis EH-domain CGGGCAAGACCACCTTCATCCGCCACCTCATAGAGCAGGACTTCCCCGGCATGAGGATCG GGCCCGAACCGGGGACTTCCTCTGCGCGCCGGCTTCCTGCCCAGCTGGCATTTAAACCAC ACATGGCGGCGCCCTGCGCGGCTTCCCGTCGCCGCAACCGTGGGGCCGGCC 4.7 Breaking out of loops next – skip to the next iteration last – skip out of the loop my @lines = <STDIN>; foreach $line (@lines) { if (substr($line,0,1) eq ">") { next; } if (substr($line,0,8) eq "**stop**") { last; } print($line); } 4.8 More loops 4.9 Scope of variable declaration If you declare a variable inside a loop it will only exist in that loop This is true for every {block}: my $name=""; while ($name ne "Yossi") { chomp($name = <STDIN>); print "Hello $name, what is your age?\n"; my $age; $age = <STDIN>; } print $name; print $age; Global symbol "$age" requires explicit package name 4.10 Never declare the same variable name twice If you declare a variable name twice, outside and inside – you are creating two distinct variables… don’t do it! my $name = "Ruti"; print "Hello $name!\n"; my $number; foreach $number (1,2,3) { my $name = "Nimrod"; print "Hello $name!\n"; } print "Hello $name!\n"; Hello Hello Hello Hello Hello Ruti! Nimrod! Nimrod! Nimrod! Ruti! 4.11 Never declare the same variable name twice If you declare a variable name twice, outside and inside – you are creating two distinct variables… don’t do it! my $name = "Ruti"; print "Hello $name!\n"; my $number; foreach $number (1,2,3) { $name = "Nimrod"; print "Hello $name!\n"; } print "Hello $name!\n"; Hello Hello Hello Hello Hello Ruti! Nimrod! Nimrod! Nimrod! Nimrod! 4.12 Reminder: Uninitialized (undefined) variables If uninitialized variables are used (before assignment) awarnings is issued: my $a; print($a+3); Use of uninitialized value in addition (+) 3 my $line; print("line is $line"); Use of uninitialized value in concatenation (.) or string line is 4.13 Is this variables defined? defined check whether a variable was defined. my $a; if (defined $a){ print($a+3); } ctrl-z to indicate end of input my $line = <STDIN>; while (defined $line){ print "line is $line"; $line = <STDIN>; } print "done!!!\n" 4.14 Is this variables defined? defined check whether a variable was defined. my $line = <STDIN>; while (defined $line){ if (substr($line,0,1) eq ">"){ print "$line"; } $line = <STDIN>; } 4.15 FASTA: Analyzing complex input Assignment: Write a script that reads several DNA sequences in FASTA format, and prints for each sequence print its header and its G+C content | Obtain from the assignment: Input Required Output Required processes (functions) 4.16 Start FASTA: Analyzing complex input Overall design: Read the FASTA file (several sequences). For each sequence: 1. Read the FASTA sequence 1.1. Read FASTA header Read line Save header Read line Concatenate to sequence Read line 1.2. Read each line until next FASTA header 2. For each sequence: Do something 2.1. Compute G+C content 2.2. Print header and G+C content Header or end of input No Yes Do something Let’s see how it’s done… End of input? End No 4.17 Start # 1. Read FASTA sequece $fastaLine = <STDIN>; while (defined $fastaLine) { Read line Save header # 1.1. Read FASTA header $header = substr($fastaLine,1); $fastaLine = <STDIN>; # 1.2. Read sequence until next FASTA header while ((defined $fastaLine) and (substr($fastaLine,0,1) ne ">" )) { $seq .= $fastaLine; $fastaLine = <STDIN>; } # 2. Do something ... # 2.1 compute $gcContent print "$header: $gcContent\n"; } Read line Concatenate to sequence Read line Header or end of input No Yes Do something End of input? End No 4.18 1. Class exercise 4a Write a script that reads lines of names and expenses: Yossi 6.10,16.50,5.00 Dana 21.00,6.00 Refael 6.10,24.00,7.00,8.00 END Output: Yossi 27.6 Dana 27 Refael 45.1 For each line print the name and the sum. Stop when you reach "END" 2. Change your script to read names and expenses on separate lines, Identify lines with numbers by a "+" sign as the first character in the string: Yossi +6.10 +16.50 +5.00 Dana +21.00 +6.00 Refael +6.10 +24.00 +7.00 +8.00 END Sum the numbers while there is a '+' sign before them. 4.19 3. Class exercise 4a (Home Ex. 2 Q. 5) Write a script that reads several protein sequences in FASTA format, and prints the name and length of each sequence. Start with the example code from the last lesson. 4*. Write a script that reads several DNA sequences in FASTA format, and prints FASTA output of the sequences whose header starts with 'Chr07'. 5*. As in Q4, but now concatenate all the sequences whose header starts with 'Chr07'. 4.20 A bit about Reading and writing files 4.21 Reading files Open a file for reading, and link it to a filehandle: open(IN, "<EHD.fasta"); And then read lines from the filehandle, exactly like you would from <STDIN>: my $line = <IN>; my @inputLines = <IN>; foreach $line (@inputLines) ... Every filehandle opened should be closed: close(IN); 4.22 Writing to files Open a file for writing, and link it to a filehandle: open(OUT, ">EHD.analysis") NOTE: If a file by that name already exists it will be overwritten! Print to a file: print OUT "The mutation is in exon $exonNumber\n"; And don't forget to: close(IN); no comma here 4.23 Class exercise 4b 1. Change the script for class exercise 4a.1 to read the lines from an input file (instead of reading lines from keyboard). 2. Now, in addition, write the output of the previous question to a file named "class.ex.4a1.out" (instead of printing to the screen). 3*. Change the script for class exercise 4.a3 to receive from the user two strings: 1) a name of FASTA file 2) a name of an output file. And then - read from a FASTA file given by the user, and write to an output file also supplied by the user.