Download PowerPoint

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Computer Programming for Biologists
Class 7
Nov 27th, 2014
Karsten Hokamp
http://bioinf.gen.tcd.ie/GE3M25/programming
Hash Variables
Description
 associative arrays
 list of key/value pairs
 values and keys  scalars
 access values by key names
 Great for look-ups!
Hash Variables
Look-up Table
Look-up table in real life
for translation:
AAA
K
AAC
N
AAG
K
AAU
N
…
…
UUG
L
UUU
F
Genetic code
In Perl use hash variable:
%genetic_code = (
'AAA' => 'K',
'AAC' => 'N',
'AAG' => 'K',
'AAU' => 'N',
…
'UUG' => 'L',
'UUU' => 'F'
);
Keys are unique!
Hash Variables
Examples
 %bases = ('a',
'c',
'g',
't',
'purine',
'pyrimidine',
'purine',
'pyrimidine');
 %complement = ('a'
'c'
'g'
't'
=>
=>
=>
=>
't',
'g',
'c',
'a');
 %letters = (1, 'a', 2, 'b', 3, 'c', 4, 'd');
Hashes: Lists with special relationship between each pair of elements!
Hash Variables
Storing Data
# count frequency of nucleotides:
my $As = 0; my $Cs = 0; my $Gs = 0; my $Ts = 0;
foreach my $nuc (split
if ($nuc eq 'A')
$As++;
} elsif ($nuc eq
$Cs++;
} elsif ($nuc eq
$Gs++;
} elsif ($nuc eq
$Ts++;
}
}
//, $dna) {
{
'C') {
'G') {
'T') {
Hash Variables
Storing Data
# count frequency of nucleotides:
my %freq = ();
foreach my $nuc (split //, $dna) {
$freq{$nuc}++;
}
Hash Variables
Storing Data
# count frequency of nucleotides:
my %freq = ();
foreach my $nuc (split //, 'ACTTGGGT') {
$freq{$nuc}++;
}
auto-initialisation
with '' or 0
key
value
A
1
C
1
G
3
T
3
keys are
stored in no
specific order
Hash Variables
Scalar vs Hash
$As = 0;
$Cs = 0;
$Gs = 0;
$Ts = 0;
Hash Variables
Scalar vs Hash
$As = 0;
$As++;
$Cs = 0;
$Cs++;
$Gs = 0;
$Gs++;
$Ts = 0;
$Ts++;
Hash Variables
Scalar vs Hash
%freq = ();
$freq{'Gs'}++;
$Cs = 0;
$Cs++;
$Gs = 0;
$Gs++;
$Ts = 0;
$Ts++;
freq
$As = 0;
$As++;
Computer Programming for Biologists
Exercises
Practical:
http://bioinf.gen.tcd.ie/GE3M25/programming/class7
Hash Variables
Accessing Elements
General:
$value = $hash{$key};
Special funtions: keys and values
# get complement of a base
my $new_base = $complement{$base};
# get aminoacid for a codon
my $aa = $genetic_code{$codon};
# list all the aa's that occurred
foreach my $aa (keys %list) {
print "$aa was found!\n";
}
loop through
all keys
Hash Variables
Retrieving a key/value pair
%freq
$freq = $freq{'Gs'};
print "Gs: $freq\n";
 Gs: 3
Hash Variables
Retrieving a key/value pair
%freq
$nuc = 'Gs';
print "$nuc: $freq{$nuc}\n";
 Gs: 3
Hash Variables
Retrieving a key/value pair
%freq
foreach my $nuc (keys %freq) {
print "$nuc: $freq{$nuc}\n";
}
 Cs: 1
Ts: 3
Gs: 3
As: 1
Hash Variables
Retrieving a key/value pair
%freq
foreach my $nuc (sort keys %freq) {
print "$nuc: $freq{$nuc}\n";
}
 As: 1
Cs: 1
Gs: 3
Ts: 3
Hash Variables
Checking for keys/values
# does the key exist?
if (exists $hash{$key}) {
}
# does the key have a defined value?
if (defined $hash{$key}) {
}
# does the key have a value
if ($hash{$key}) {
}
Computer Programming for Biologists
Exercises
Use hashes in your sequence analysis tool for:
- reporting frequencies of nucleotides
or amino acids
- reporting the GC content
Related documents