Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Bioinformatics 生物信息学理论和实践 唐继军 [email protected] 13928761660 More Conditions But • Use ==, <, <=, >, >=, !=, ||, && for numeric numbers • Use eq, lt, le, gt, ge, ne, or, and for string comparisons More Arithmatics • +, -, *, **, /, % • +=, -=, *=, **=, /=, %= • ++, -- $x = 28; $x = $x/2; print $x/=2, "\n"; print $x--, "\n"; print $x, "\n"; print --$x, "\n"; print $x, "\n"; print $x % 3, "\n"; print $x**2, "\n"; #!/usr/bin/perl -w print "Please type the filename of the DNA sequence data: "; $dna_filename = <STDIN>; chomp $dna_filename; open(DNAFILE, $dna_filename); $name = <DNAFILE>; @DNA = <DNAFILE>; close DNAFILE; $DNA = join('', @DNA); $DNA =~ s/\s//g; $count_of_CG = 0; $position = 0; while ( $position < length $DNA) { $base = substr($DNA, $position, 1); if ( $base eq 'C' or $base eq 'G') { ++$count_of_CG; } $position++; } print "CG content is ", $count_of_CG/(length $DNA)*100, "%\n"; #!/usr/bin/perl –w print "Please type the filename of the DNA sequence data: "; $dna_filename = <STDIN>; chomp $dna_filename; open(DNAFILE, $dna_filename); $name = <DNAFILE>; @DNA = <DNAFILE>; close DNAFILE; $DNA = join('', @DNA); $DNA =~ s/\s//g; $count_of_CG = 0; for ( $position = 0 ; $position < length $DNA ; ++$position ) { $base = substr($DNA, $position, 1); if ( $base eq 'C' or $base eq 'G') { ++$count_of_CG; } } print "CG content is ", $count_of_CG/(length $DNA)*100, "%\n"; #!/usr/bin/perl –w print "Please type the filename of the DNA sequence data: "; $dna_filename = <STDIN>; chomp $dna_filename; open(DNAFILE, $dna_filename); $name = <DNAFILE>; @DNA = <DNAFILE>; close DNAFILE; $DNA = join('', @DNA); $DNA =~ s/\s//g; $count_of_CG = 0; while($DNA =~ /c/ig) {$count_of_CG++;} while($DNA =~ /g/ig) {$count_of_CG++;} print "CG content is ", $count_of_CG/(length $DNA)*100, "%\n"; $DNA = "ACCTAAACCCGGGAGAATTCCCACCAATTCTACGTAAC"; $s = ""; for ($i = 0, $j = 5; $i < $j; $i+=2, $j++) { $s .= substr($DNA, $i, $j); } print $s, "\n"; $DNA = "ACCTAAACCCGGGAGAATTCCCACCAATTCTACGTAAC"; $s = ""; for ($i = 0, $j = 5; $i < $j; $i+=2, $j++) { $s .= substr $DNA, $i, $j; } print ($s, "\n"); Call functions/subroutines • Name p1, p2, p3; • Name(p1, p2, p3); • print $DNA1, $DNA2, "\n"; • print ($DNA1, $DNA2, "\n"); Exercise 1 • • • • Ask for a protein file in fasta format Ask for an amino acid Count the frequency of that amino acid TKFHSNAHFYDCWRMLQYQLDMRCMRAISTF SPHCGMEHMPDQTHNQGEMCKPRMWQVS MNQSCNHTPPFRKTYVEWDYMAKALIAPYTL GWLASTCFIW Exercise 2 • • • • • Ask for an RNA file in fasta format Convert it to RNA Ask for a codon Count the frequency of that codon TCGTACTTAGAAATGAGGGTCCGCTTTTGCCC ACGCACCTGATCGCTCCTCGTTTGCTTTTAAG AACCGGACGAACCACAGAGCATAAGGAGAA CCTCTAGCTGCTTTACAAAGTACTGGTTCCCT TTCCAGCGGGATGCTTTATCTAAACGCAATGA Subroutine • Some code needs to be reused • A good way to organize code • Called “function” in some languages • Name • Return • Parameters (@_) #!/usr/bin/perl –w print "Please type the filename: "; $dna_filename = <STDIN>; chomp $dna_filename; open(DNAFILE, $dna_filename); $name = <DNAFILE>; @DNA = <DNAFILE>; close DNAFILE; $DNA = join('', @DNA); $DNA =~ s/\s//g; $count_of_G = countG($DNA); print $count_of_G; sub countG { my($dna) = @_; my($count) = 0; $count = ( $dna =~ tr/Gg//); return $count; } #!/usr/bin/perl –w print "Please type the filename: "; $dna_filename = <STDIN>; chomp $dna_filename; open(DNAFILE, $dna_filename); $name = <DNAFILE>; @DNA = <DNAFILE>; close DNAFILE; $DNA = join('', @DNA); $DNA =~ s/\s//g; $count_of_G = count($DNA, 'Gg'); print $count_of_G; sub count { my($dna, $pattern) = @_; my($count) = 0; $count = ( eval("$dna =~ tr/$pattern//") ); return $count; } Codon sub codon2aa { my($codon) = @_; if elsif elsif elsif elsif elsif elsif elsif elsif elsif elsif elsif elsif elsif elsif elsif elsif elsif elsif elsif elsif elsif elsif elsif elsif ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( $codon $codon $codon $codon $codon $codon $codon $codon $codon $codon $codon $codon $codon $codon $codon $codon $codon $codon $codon $codon $codon $codon $codon $codon $codon =~ =~ =~ =~ =~ =~ =~ =~ =~ =~ =~ =~ =~ =~ =~ =~ =~ =~ =~ =~ =~ =~ =~ =~ =~ /TCA/i /TCC/i /TCG/i /TCT/i /TTC/i /TTT/i /TTA/i /TTG/i /TAC/i /TAT/i /TAA/i /TAG/i /TGC/i /TGT/i /TGA/i /TGG/i /CTA/i /CTC/i /CTG/i /CTT/i /CCA/i /CCC/i /CCG/i /CCT/i /CAC/i ) ) ) ) ) ) ) ) ) ) ) ) ) ) ) ) ) ) ) ) ) ) ) ) ) { { { { { { { { { { { { { { { { { { { { { { { { { return return return return return return return return return return return return return return return return return return return return return return return return return 'S' 'S' 'S' 'S' 'F' 'F' 'L' 'L' 'Y' 'Y' '_' '_' 'C' 'C' '_' 'W' 'L' 'L' 'L' 'L' 'P' 'P' 'P' 'P' 'H' } } } } } } } } } } } } } } } } } } } } } } } } } # # # # # # # # # # # # # # # # # # # # # # # # # Serine Serine Serine Serine Phenylalanine Phenylalanine Leucine Leucine Tyrosine Tyrosine Stop Stop Cysteine Cysteine Stop Tryptophan Leucine Leucine Leucine Leucine Proline Proline Proline Proline Histidine sub codon2aa { my($codon) = @_; if ( $codon =~ /GC./i) { return 'A' } # Alanine elsif ( $codon =~ /TG[TC]/i) { return 'C' } # Cysteine elsif ( $codon =~ /GA[TC]/i) { return 'D' } # Aspartic Acid elsif ( $codon =~ /GA[AG]/i) { return 'E' } # Glutamic Acid elsif ( $codon =~ /TT[TC]/i) { return 'F' } # Phenylalanine elsif ( $codon =~ /GG./i) { return 'G' } # Glycine elsif ( $codon =~ /CA[TC]/i) { return 'H' } # Histidine elsif ( $codon =~ /AT[TCA]/i) { return 'I' } # Isoleucine elsif ( $codon =~ /AA[AG]/i) { return 'K' } # Lysine elsif ( $codon =~ /TT[AG]|CT./i) { return 'L' } # Leucine elsif ( $codon =~ /ATG/i) { return 'M' } # Methionine elsif ( $codon =~ /AA[TC]/i) { return 'N' } # Asparagine elsif ( $codon =~ /CC./i) { return 'P' } # Proline elsif ( $codon =~ /CA[AG]/i) { return 'Q' } # Glutamine elsif ( $codon =~ /CG.|AG[AG]/i) { return 'R' } # Arginine elsif ( $codon =~ /TC.|AG[TC]/i) { return 'S' } # Serine elsif ( $codon =~ /AC./i) { return 'T' } # Threonine elsif ( $codon =~ /GT./i) { return 'V' } # Valine elsif ( $codon =~ /TGG/i) { return 'W' } # Tryptophan elsif ( $codon =~ /TA[TC]/i) { return 'Y' } # Tyrosine elsif ( $codon =~ /TA[AG]|TGA/i) { return '_' } # Stop else {print STDERR "Bad codon \"$codon\"!!\n"; exit; } } Exercise • Make the subroutine of converting codon to aa • Read in a dna fasta file, print out an Amino Acid sequence #!/usr/bin/perl -w $dna = 'CGACGTCTTCGTACGGGACTAGCTCGTGTCGGTCGC'; $protein = ''; for(my $i=0; $i < (length($dna) - 2) ; $i += 3) { $codon = substr($dna,$i,3); $protein .= codon2aa($codon); } print "I translated the DNA\n\n$dna\n\n protein\n\n$protein\n\n"; sub codon2aa { #... } into the Reading Frame 5' 3' atgcccaagctgaatagcgtagaggggttttcatcatttgaggacgatgtataa 1 atg ccc aag ctg aat agc gta gag ggg ttt tca tca ttt gag gac gat gta taa M P K L N S V E G F S S F E D D V * 2 tgc cca agc tga ata gcg tag agg ggt ttt cat cat ttg agg acg atg tat C P S * I A * R G F H H L R T M Y 3 gcc caa gct gaa tag cgt aga ggg gtt ttc atc att tga gga cga tgt ata A Q A E * R R G V F I I * G R C I three in the forward reading, three in the reverse complement reading Exercise 3 • Make the subroutine of converting codon to aa • Read in a dna fasta file, print out an Amino Acid sequence • There are 6 reading frame, can you try to print all 6 version? #!/usr/bin/perl –w print "Please type the filename: "; $dna_filename = <STDIN>; chomp $dna_filename; open(DNAFILE, $dna_filename); $name = <DNAFILE>;@DNA = <DNAFILE>;close DNAFILE; $DNA = join( '', @DNA);$DNA =~ s/\s//g; print "First print "Second print "Third ", dna2peptide($DNA), "\n"; ", dna2peptide(substr($DNA, 1)), "\n"; ", dna2peptide(substr($DNA, 2)), "\n"; $DNA = reverse print "Fourth print "Fifth print "Sixth $DNA; ", dna2peptide($DNA), "\n"; ", dna2peptide(substr($DNA, 1)), "\n"; ", dna2peptide(substr($DNA, 2)), "\n"; sub dna2peptide { my ($dna) = @_; my $protein = ""; for(my $i=0; $i < (length($dna) - 2) ; $i += 3) { $codon = substr($dna,$i,3); $protein .= codon2aa($codon); } return $protein; } sub codon2aa { #... } Modules • A Perl Module is a self-contained pieceof [Perl] code that can be used by a Perl program later • Like a library • End with extension .pm • Needs a 1 at the end #Bio.pm sub codon2aa { #.... #.... } sub dna2peptide { #.... #.... } 1 #!/usr/bin/perl -w use Bio; print "Please type the filename: "; $dna_filename = <STDIN>; chomp $dna_filename; open(DNAFILE, $dna_filename); $name = <DNAFILE>;@DNA = <DNAFILE>;close DNAFILE; $DNA = join( '', @DNA);$DNA =~ s/\s//g; print "First print "Second print "Third ", dna2peptide($DNA), "\n"; ", dna2peptide(substr($DNA, 1)), "\n"; ", dna2peptide(substr($DNA, 2)), "\n"; $DNA = reverse $DNA; $DNA =~ tr/ACGTacgt/TGCAtgca/; print "Fourth print "Fifth print "Sixth ", dna2peptide($DNA), "\n"; ", dna2peptide(substr($DNA, 1)), "\n"; ", dna2peptide(substr($DNA, 2)), "\n"; #Bio.pm sub codon2aa { #.... #.... } sub dna2peptide { #.... #.... } sub fasta_read { print "Please type the filename: "; my $dna_filename = <STDIN>; chomp $dna_filename; unless (open(DNAFILE, $dna_filename)) { print "Cannot open file ", $dna_filename, "\n"; } $name = <DNAFILE>;@DNA = <DNAFILE>;close DNAFILE; $DNA = join( '', @DNA);$DNA =~ s/\s//g; return $DNA; } 1 #!/usr/bin/perl -w use Bio; $DNA = fasta_read(); print "First print "Second print "Third ", dna2peptide($DNA), "\n"; ", dna2peptide(substr($DNA, 1)), "\n"; ", dna2peptide(substr($DNA, 2)), "\n"; $DNA = reverse $DNA; $DNA =~ tr/ACGTacgt/TGCAtgca/; print "Fourth print "Fifth print "Sixth ", dna2peptide($DNA), "\n"; ", dna2peptide(substr($DNA, 1)), "\n"; ", dna2peptide(substr($DNA, 2)), "\n"; Scope • my provides lexical scoping; a variable declared with my is visible only within the block in which it is declared. • Blocks of code are hunks within curly braces {}; files are blocks. • Use use vars qw([list of var names]) or our ([var_names]) to create package globals. #!/usr/bin/perl -w use Bio; use strict; use warnings; $DNA = fasta_read(); print "First print "Second print "Third ", dna2peptide($DNA), "\n"; ", dna2peptide(substr($DNA, 1)), "\n"; ", dna2peptide(substr($DNA, 2)), "\n"; $DNA = reverse $DNA; $DNA =~ tr/ACGTacgt/TGCAtgca/; print "Fourth print "Fifth print "Sixth ", dna2peptide($DNA), "\n"; ", dna2peptide(substr($DNA, 1)), "\n"; ", dna2peptide(substr($DNA, 2)), "\n"; Variable "$DNA" is not imported at frame2.pl line 6. Variable "$DNA" is not imported at frame2.pl line 8. Variable "$DNA" is not imported at frame2.pl line 9. Variable "$DNA" is not imported at frame2.pl line 10. Variable "$DNA" is not imported at frame2.pl line 12. Variable "$DNA" is not imported at frame2.pl line 12. Variable "$DNA" is not imported at frame2.pl line 13. Variable "$DNA" is not imported at frame2.pl line 14. Variable "$DNA" is not imported at frame2.pl line 15. Global symbol "$DNA" requires explicit package name at frame2.pl Global symbol "$DNA" requires explicit package name at frame2.pl Global symbol "$DNA" requires explicit package name at frame2.pl Global symbol "$DNA" requires explicit package name at frame2.pl Global symbol "$DNA" requires explicit package name at frame2.pl Global symbol "$DNA" requires explicit package name at frame2.pl Global symbol "$DNA" requires explicit package name at frame2.pl Global symbol "$DNA" requires explicit package name at frame2.pl Global symbol "$DNA" requires explicit package name at frame2.pl Execution of frame2.pl aborted due to compilation errors. line line line line line line line line line 6. 8. 9. 10. 12. 12. 13. 14. 15. #!/usr/bin/perl -w use Bio; use strict; use warnings; my $DNA = fasta_read(); print "First print "Second print "Third ", dna2peptide($DNA), "\n"; ", dna2peptide(substr($DNA, 1)), "\n"; ", dna2peptide(substr($DNA, 2)), "\n"; $DNA = reverse $DNA; $DNA =~ tr/ACGTacgt/TGCAtgca/; print "Fourth print "Fifth print "Sixth ", dna2peptide($DNA), "\n"; ", dna2peptide(substr($DNA, 1)), "\n"; ", dna2peptide(substr($DNA, 2)), "\n";