Download Three main topics for this Intro lecture

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Bioinformatics
生物信息学理论和实践
唐继军
[email protected]
13928761660
More Conditions
But
• Use ==, <, <=, >, >=, !=, ||, && for numeric
numbers
• Use eq, lt, le, gt, ge, ne, or, and for string
comparisons
More Arithmatics
• +, -, *, **, /, %
• +=, -=, *=, **=, /=, %=
• ++, --
$x = 28;
$x = $x/2;
print $x/=2, "\n";
print $x--, "\n";
print $x, "\n";
print --$x, "\n";
print $x, "\n";
print $x % 3, "\n";
print $x**2, "\n";
#!/usr/bin/perl -w
print "Please type the filename of the DNA sequence data: ";
$dna_filename = <STDIN>;
chomp $dna_filename;
open(DNAFILE, $dna_filename);
$name = <DNAFILE>;
@DNA = <DNAFILE>;
close DNAFILE;
$DNA = join('', @DNA);
$DNA =~ s/\s//g;
$count_of_CG = 0;
$position = 0;
while ( $position < length $DNA) {
$base = substr($DNA, $position, 1);
if ( $base eq 'C' or $base eq 'G') {
++$count_of_CG;
}
$position++;
}
print "CG content is ", $count_of_CG/(length $DNA)*100, "%\n";
#!/usr/bin/perl –w
print "Please type the filename of the DNA sequence data: ";
$dna_filename = <STDIN>;
chomp $dna_filename;
open(DNAFILE, $dna_filename);
$name = <DNAFILE>;
@DNA = <DNAFILE>;
close DNAFILE;
$DNA = join('', @DNA);
$DNA =~ s/\s//g;
$count_of_CG = 0;
for ( $position = 0 ; $position < length $DNA ; ++$position ) {
$base = substr($DNA, $position, 1);
if ( $base eq 'C' or $base eq 'G') {
++$count_of_CG;
}
}
print "CG content is ", $count_of_CG/(length $DNA)*100, "%\n";
#!/usr/bin/perl –w
print "Please type the filename of the DNA sequence data: ";
$dna_filename = <STDIN>;
chomp $dna_filename;
open(DNAFILE, $dna_filename);
$name = <DNAFILE>;
@DNA = <DNAFILE>;
close DNAFILE;
$DNA = join('', @DNA);
$DNA =~ s/\s//g;
$count_of_CG = 0;
while($DNA =~ /c/ig) {$count_of_CG++;}
while($DNA =~ /g/ig) {$count_of_CG++;}
print "CG content is ", $count_of_CG/(length $DNA)*100, "%\n";
$DNA = "ACCTAAACCCGGGAGAATTCCCACCAATTCTACGTAAC";
$s = "";
for ($i = 0, $j = 5; $i < $j; $i+=2, $j++) {
$s .= substr($DNA, $i, $j);
}
print $s, "\n";
$DNA = "ACCTAAACCCGGGAGAATTCCCACCAATTCTACGTAAC";
$s = "";
for ($i = 0, $j = 5; $i < $j; $i+=2, $j++) {
$s .= substr $DNA, $i, $j;
}
print ($s, "\n");
Call functions/subroutines
• Name p1, p2, p3;
• Name(p1, p2, p3);
• print $DNA1, $DNA2, "\n";
• print ($DNA1, $DNA2, "\n");
Exercise 1
•
•
•
•
Ask for a protein file in fasta format
Ask for an amino acid
Count the frequency of that amino acid
TKFHSNAHFYDCWRMLQYQLDMRCMRAISTF
SPHCGMEHMPDQTHNQGEMCKPRMWQVS
MNQSCNHTPPFRKTYVEWDYMAKALIAPYTL
GWLASTCFIW
Exercise 2
•
•
•
•
•
Ask for an RNA file in fasta format
Convert it to RNA
Ask for a codon
Count the frequency of that codon
TCGTACTTAGAAATGAGGGTCCGCTTTTGCCC
ACGCACCTGATCGCTCCTCGTTTGCTTTTAAG
AACCGGACGAACCACAGAGCATAAGGAGAA
CCTCTAGCTGCTTTACAAAGTACTGGTTCCCT
TTCCAGCGGGATGCTTTATCTAAACGCAATGA
Subroutine
• Some code needs to be reused
• A good way to organize code
• Called “function” in some languages
• Name
• Return
• Parameters (@_)
#!/usr/bin/perl –w
print "Please type the filename: ";
$dna_filename = <STDIN>;
chomp $dna_filename;
open(DNAFILE, $dna_filename);
$name = <DNAFILE>;
@DNA = <DNAFILE>;
close DNAFILE;
$DNA = join('', @DNA);
$DNA =~ s/\s//g;
$count_of_G = countG($DNA);
print $count_of_G;
sub countG {
my($dna) = @_;
my($count) = 0;
$count = ( $dna =~ tr/Gg//);
return $count;
}
#!/usr/bin/perl –w
print "Please type the filename: ";
$dna_filename = <STDIN>;
chomp $dna_filename;
open(DNAFILE, $dna_filename);
$name = <DNAFILE>;
@DNA = <DNAFILE>;
close DNAFILE;
$DNA = join('', @DNA);
$DNA =~ s/\s//g;
$count_of_G = count($DNA, 'Gg');
print $count_of_G;
sub count {
my($dna, $pattern) = @_;
my($count) = 0;
$count = ( eval("$dna =~ tr/$pattern//") );
return $count;
}
Codon
sub codon2aa {
my($codon) = @_;
if
elsif
elsif
elsif
elsif
elsif
elsif
elsif
elsif
elsif
elsif
elsif
elsif
elsif
elsif
elsif
elsif
elsif
elsif
elsif
elsif
elsif
elsif
elsif
elsif
(
(
(
(
(
(
(
(
(
(
(
(
(
(
(
(
(
(
(
(
(
(
(
(
(
$codon
$codon
$codon
$codon
$codon
$codon
$codon
$codon
$codon
$codon
$codon
$codon
$codon
$codon
$codon
$codon
$codon
$codon
$codon
$codon
$codon
$codon
$codon
$codon
$codon
=~
=~
=~
=~
=~
=~
=~
=~
=~
=~
=~
=~
=~
=~
=~
=~
=~
=~
=~
=~
=~
=~
=~
=~
=~
/TCA/i
/TCC/i
/TCG/i
/TCT/i
/TTC/i
/TTT/i
/TTA/i
/TTG/i
/TAC/i
/TAT/i
/TAA/i
/TAG/i
/TGC/i
/TGT/i
/TGA/i
/TGG/i
/CTA/i
/CTC/i
/CTG/i
/CTT/i
/CCA/i
/CCC/i
/CCG/i
/CCT/i
/CAC/i
)
)
)
)
)
)
)
)
)
)
)
)
)
)
)
)
)
)
)
)
)
)
)
)
)
{
{
{
{
{
{
{
{
{
{
{
{
{
{
{
{
{
{
{
{
{
{
{
{
{
return
return
return
return
return
return
return
return
return
return
return
return
return
return
return
return
return
return
return
return
return
return
return
return
return
'S'
'S'
'S'
'S'
'F'
'F'
'L'
'L'
'Y'
'Y'
'_'
'_'
'C'
'C'
'_'
'W'
'L'
'L'
'L'
'L'
'P'
'P'
'P'
'P'
'H'
}
}
}
}
}
}
}
}
}
}
}
}
}
}
}
}
}
}
}
}
}
}
}
}
}
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
Serine
Serine
Serine
Serine
Phenylalanine
Phenylalanine
Leucine
Leucine
Tyrosine
Tyrosine
Stop
Stop
Cysteine
Cysteine
Stop
Tryptophan
Leucine
Leucine
Leucine
Leucine
Proline
Proline
Proline
Proline
Histidine
sub codon2aa {
my($codon) = @_;
if ( $codon =~ /GC./i)
{ return 'A' }
# Alanine
elsif ( $codon =~ /TG[TC]/i)
{ return 'C' }
# Cysteine
elsif ( $codon =~ /GA[TC]/i)
{ return 'D' }
# Aspartic Acid
elsif ( $codon =~ /GA[AG]/i)
{ return 'E' }
# Glutamic Acid
elsif ( $codon =~ /TT[TC]/i)
{ return 'F' }
# Phenylalanine
elsif ( $codon =~ /GG./i)
{ return 'G' }
# Glycine
elsif ( $codon =~ /CA[TC]/i)
{ return 'H' }
# Histidine
elsif ( $codon =~ /AT[TCA]/i)
{ return 'I' }
# Isoleucine
elsif ( $codon =~ /AA[AG]/i)
{ return 'K' }
# Lysine
elsif ( $codon =~ /TT[AG]|CT./i) { return 'L' }
# Leucine
elsif ( $codon =~ /ATG/i)
{ return 'M' }
# Methionine
elsif ( $codon =~ /AA[TC]/i)
{ return 'N' }
# Asparagine
elsif ( $codon =~ /CC./i)
{ return 'P' }
# Proline
elsif ( $codon =~ /CA[AG]/i)
{ return 'Q' }
# Glutamine
elsif ( $codon =~ /CG.|AG[AG]/i) { return 'R' }
# Arginine
elsif ( $codon =~ /TC.|AG[TC]/i) { return 'S' }
# Serine
elsif ( $codon =~ /AC./i)
{ return 'T' }
# Threonine
elsif ( $codon =~ /GT./i)
{ return 'V' }
# Valine
elsif ( $codon =~ /TGG/i)
{ return 'W' }
# Tryptophan
elsif ( $codon =~ /TA[TC]/i)
{ return 'Y' }
# Tyrosine
elsif ( $codon =~ /TA[AG]|TGA/i) { return '_' }
# Stop
else {print STDERR "Bad codon \"$codon\"!!\n"; exit; }
}
Exercise
• Make the subroutine of converting codon
to aa
• Read in a dna fasta file, print out an Amino
Acid sequence
#!/usr/bin/perl -w
$dna = 'CGACGTCTTCGTACGGGACTAGCTCGTGTCGGTCGC';
$protein = '';
for(my $i=0; $i < (length($dna) - 2) ; $i += 3) {
$codon = substr($dna,$i,3);
$protein .= codon2aa($codon);
}
print "I translated the DNA\n\n$dna\n\n
protein\n\n$protein\n\n";
sub codon2aa {
#...
}
into the
Reading Frame
5'
3'
atgcccaagctgaatagcgtagaggggttttcatcatttgaggacgatgtataa
1 atg ccc aag ctg aat agc gta gag ggg ttt tca tca ttt gag gac gat gta taa
M
P
K
L
N
S
V
E
G
F
S
S
F
E
D
D
V
*
2 tgc cca agc tga ata gcg tag agg ggt ttt cat cat ttg agg acg atg tat
C
P
S
*
I
A
*
R
G
F
H
H
L
R
T
M
Y
3
gcc caa gct gaa tag cgt aga ggg gtt ttc atc att tga gga cga tgt ata
A
Q
A
E
*
R
R
G
V
F
I
I
*
G
R
C
I
three in the forward reading, three in the reverse complement
reading
Exercise 3
• Make the subroutine of converting codon
to aa
• Read in a dna fasta file, print out an Amino
Acid sequence
• There are 6 reading frame, can you try to
print all 6 version?
#!/usr/bin/perl –w
print "Please type the filename: ";
$dna_filename = <STDIN>; chomp $dna_filename;
open(DNAFILE, $dna_filename);
$name = <DNAFILE>;@DNA = <DNAFILE>;close DNAFILE;
$DNA = join( '', @DNA);$DNA =~ s/\s//g;
print "First
print "Second
print "Third
", dna2peptide($DNA), "\n";
", dna2peptide(substr($DNA, 1)), "\n";
", dna2peptide(substr($DNA, 2)), "\n";
$DNA = reverse
print "Fourth
print "Fifth
print "Sixth
$DNA;
", dna2peptide($DNA), "\n";
", dna2peptide(substr($DNA, 1)), "\n";
", dna2peptide(substr($DNA, 2)), "\n";
sub dna2peptide {
my ($dna) = @_;
my $protein = "";
for(my $i=0; $i < (length($dna) - 2) ; $i += 3) {
$codon = substr($dna,$i,3);
$protein .= codon2aa($codon);
}
return $protein;
}
sub codon2aa {
#...
}
Modules
• A Perl Module is a self-contained pieceof
[Perl] code that can be used by a Perl
program later
• Like a library
• End with extension .pm
• Needs a 1 at the end
#Bio.pm
sub codon2aa {
#....
#....
}
sub dna2peptide {
#....
#....
}
1
#!/usr/bin/perl -w
use Bio;
print "Please type the filename: ";
$dna_filename = <STDIN>; chomp $dna_filename;
open(DNAFILE, $dna_filename);
$name = <DNAFILE>;@DNA = <DNAFILE>;close DNAFILE;
$DNA = join( '', @DNA);$DNA =~ s/\s//g;
print "First
print "Second
print "Third
", dna2peptide($DNA), "\n";
", dna2peptide(substr($DNA, 1)), "\n";
", dna2peptide(substr($DNA, 2)), "\n";
$DNA = reverse $DNA;
$DNA =~ tr/ACGTacgt/TGCAtgca/;
print "Fourth
print "Fifth
print "Sixth
", dna2peptide($DNA), "\n";
", dna2peptide(substr($DNA, 1)), "\n";
", dna2peptide(substr($DNA, 2)), "\n";
#Bio.pm
sub codon2aa {
#....
#....
}
sub dna2peptide {
#....
#....
}
sub fasta_read {
print "Please type the filename: ";
my $dna_filename = <STDIN>; chomp $dna_filename;
unless (open(DNAFILE, $dna_filename)) {
print "Cannot open file ", $dna_filename, "\n";
}
$name = <DNAFILE>;@DNA = <DNAFILE>;close DNAFILE;
$DNA = join( '', @DNA);$DNA =~ s/\s//g;
return $DNA;
}
1
#!/usr/bin/perl -w
use Bio;
$DNA = fasta_read();
print "First
print "Second
print "Third
", dna2peptide($DNA), "\n";
", dna2peptide(substr($DNA, 1)), "\n";
", dna2peptide(substr($DNA, 2)), "\n";
$DNA = reverse $DNA;
$DNA =~ tr/ACGTacgt/TGCAtgca/;
print "Fourth
print "Fifth
print "Sixth
", dna2peptide($DNA), "\n";
", dna2peptide(substr($DNA, 1)), "\n";
", dna2peptide(substr($DNA, 2)), "\n";
Scope
• my provides lexical scoping; a variable
declared with my is visible only within the
block in which it is declared.
• Blocks of code are hunks within curly
braces {}; files are blocks.
• Use use vars qw([list of var names]) or our
([var_names]) to create package globals.
#!/usr/bin/perl -w
use Bio;
use strict;
use warnings;
$DNA = fasta_read();
print "First
print "Second
print "Third
", dna2peptide($DNA), "\n";
", dna2peptide(substr($DNA, 1)), "\n";
", dna2peptide(substr($DNA, 2)), "\n";
$DNA = reverse $DNA;
$DNA =~ tr/ACGTacgt/TGCAtgca/;
print "Fourth
print "Fifth
print "Sixth
", dna2peptide($DNA), "\n";
", dna2peptide(substr($DNA, 1)), "\n";
", dna2peptide(substr($DNA, 2)), "\n";
Variable "$DNA" is not imported at frame2.pl line 6.
Variable "$DNA" is not imported at frame2.pl line 8.
Variable "$DNA" is not imported at frame2.pl line 9.
Variable "$DNA" is not imported at frame2.pl line 10.
Variable "$DNA" is not imported at frame2.pl line 12.
Variable "$DNA" is not imported at frame2.pl line 12.
Variable "$DNA" is not imported at frame2.pl line 13.
Variable "$DNA" is not imported at frame2.pl line 14.
Variable "$DNA" is not imported at frame2.pl line 15.
Global symbol "$DNA" requires explicit package name at frame2.pl
Global symbol "$DNA" requires explicit package name at frame2.pl
Global symbol "$DNA" requires explicit package name at frame2.pl
Global symbol "$DNA" requires explicit package name at frame2.pl
Global symbol "$DNA" requires explicit package name at frame2.pl
Global symbol "$DNA" requires explicit package name at frame2.pl
Global symbol "$DNA" requires explicit package name at frame2.pl
Global symbol "$DNA" requires explicit package name at frame2.pl
Global symbol "$DNA" requires explicit package name at frame2.pl
Execution of frame2.pl aborted due to compilation errors.
line
line
line
line
line
line
line
line
line
6.
8.
9.
10.
12.
12.
13.
14.
15.
#!/usr/bin/perl -w
use Bio;
use strict;
use warnings;
my $DNA = fasta_read();
print "First
print "Second
print "Third
", dna2peptide($DNA), "\n";
", dna2peptide(substr($DNA, 1)), "\n";
", dna2peptide(substr($DNA, 2)), "\n";
$DNA = reverse $DNA;
$DNA =~ tr/ACGTacgt/TGCAtgca/;
print "Fourth
print "Fifth
print "Sixth
", dna2peptide($DNA), "\n";
", dna2peptide(substr($DNA, 1)), "\n";
", dna2peptide(substr($DNA, 2)), "\n";
Related documents