Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Computer Programming for Biologists
Class 7
Nov 27th, 2014
Karsten Hokamp
http://bioinf.gen.tcd.ie/GE3M25/programming
Hash Variables
Description
associative arrays
list of key/value pairs
values and keys scalars
access values by key names
Great for look-ups!
Hash Variables
Look-up Table
Look-up table in real life
for translation:
AAA
K
AAC
N
AAG
K
AAU
N
…
…
UUG
L
UUU
F
Genetic code
In Perl use hash variable:
%genetic_code = (
'AAA' => 'K',
'AAC' => 'N',
'AAG' => 'K',
'AAU' => 'N',
…
'UUG' => 'L',
'UUU' => 'F'
);
Keys are unique!
Hash Variables
Examples
%bases = ('a',
'c',
'g',
't',
'purine',
'pyrimidine',
'purine',
'pyrimidine');
%complement = ('a'
'c'
'g'
't'
=>
=>
=>
=>
't',
'g',
'c',
'a');
%letters = (1, 'a', 2, 'b', 3, 'c', 4, 'd');
Hashes: Lists with special relationship between each pair of elements!
Hash Variables
Storing Data
# count frequency of nucleotides:
my $As = 0; my $Cs = 0; my $Gs = 0; my $Ts = 0;
foreach my $nuc (split
if ($nuc eq 'A')
$As++;
} elsif ($nuc eq
$Cs++;
} elsif ($nuc eq
$Gs++;
} elsif ($nuc eq
$Ts++;
}
}
//, $dna) {
{
'C') {
'G') {
'T') {
Hash Variables
Storing Data
# count frequency of nucleotides:
my %freq = ();
foreach my $nuc (split //, $dna) {
$freq{$nuc}++;
}
Hash Variables
Storing Data
# count frequency of nucleotides:
my %freq = ();
foreach my $nuc (split //, 'ACTTGGGT') {
$freq{$nuc}++;
}
auto-initialisation
with '' or 0
key
value
A
1
C
1
G
3
T
3
keys are
stored in no
specific order
Hash Variables
Scalar vs Hash
$As = 0;
$Cs = 0;
$Gs = 0;
$Ts = 0;
Hash Variables
Scalar vs Hash
$As = 0;
$As++;
$Cs = 0;
$Cs++;
$Gs = 0;
$Gs++;
$Ts = 0;
$Ts++;
Hash Variables
Scalar vs Hash
%freq = ();
$freq{'Gs'}++;
$Cs = 0;
$Cs++;
$Gs = 0;
$Gs++;
$Ts = 0;
$Ts++;
freq
$As = 0;
$As++;
Computer Programming for Biologists
Exercises
Practical:
http://bioinf.gen.tcd.ie/GE3M25/programming/class7
Hash Variables
Accessing Elements
General:
$value = $hash{$key};
Special funtions: keys and values
# get complement of a base
my $new_base = $complement{$base};
# get aminoacid for a codon
my $aa = $genetic_code{$codon};
# list all the aa's that occurred
foreach my $aa (keys %list) {
print "$aa was found!\n";
}
loop through
all keys
Hash Variables
Retrieving a key/value pair
%freq
$freq = $freq{'Gs'};
print "Gs: $freq\n";
Gs: 3
Hash Variables
Retrieving a key/value pair
%freq
$nuc = 'Gs';
print "$nuc: $freq{$nuc}\n";
Gs: 3
Hash Variables
Retrieving a key/value pair
%freq
foreach my $nuc (keys %freq) {
print "$nuc: $freq{$nuc}\n";
}
Cs: 1
Ts: 3
Gs: 3
As: 1
Hash Variables
Retrieving a key/value pair
%freq
foreach my $nuc (sort keys %freq) {
print "$nuc: $freq{$nuc}\n";
}
As: 1
Cs: 1
Gs: 3
Ts: 3
Hash Variables
Checking for keys/values
# does the key exist?
if (exists $hash{$key}) {
}
# does the key have a defined value?
if (defined $hash{$key}) {
}
# does the key have a value
if ($hash{$key}) {
}
Computer Programming for Biologists
Exercises
Use hashes in your sequence analysis tool for:
- reporting frequencies of nucleotides
or amino acids
- reporting the GC content