Download References and Data Structures

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Microevolution wikipedia , lookup

Designer baby wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Comparative genomic hybridization wikipedia , lookup

Transcript
References and Data
Structures
References
• Just as in C, you can create a variable that is a reference (or
pointer) to another variable. That is, it contains the address in
memory where the other variable is stored.
• In Perl, the backslash is used to create a reference:
my $var = 5;
my $var_ref = \$var;
• To dereference a simple reference, put it inside curly braces with
another $ in front of it. Thus, ${$var_ref} is the same as $var, that is,
the value “5”.
• The curly braces de-reference what is inside them. I like to say
“{$var_ref} ‘generates’ the scalar variable” .
• In many cases you can leave the curly braces out: $$var_ref works
just as well as ${$var_ref}. But, in complicated expressions this can
cause havoc due to precedence problems.
More References
• This same trick works for arrays and hashes too.
my @arr = qw(cow horse pig chicken);
my $arr_ref = \@arr;
print “Farm animals include @{$arr_ref}\n”;
# can leave out {} here
my %hash = (“red” => “stop”, “yellow” => “caution”, “green” =>
“go”);
my $hash_ref = \%hash;
foreach my $key (keys %{$hash_ref} ) {
print “$key means ${$hash_ref}{$key}\n”;
}
# can leave out {} in the “foreach” line, but probably not on the
print line.
Arrow Notation
• Perl provides an alternative notation for use with array
and hash references. The small arrow (hyphen followed
by greater-than: ->) de-references. To access individual
array or hash elements, follow the arrow with [] or {}.
• For example:
my @arr = (1, 3, 5, 7);
my $arr_ref = \@arr;
for (my $i = 0; $i <= $#{$arr_ref}; $i++) {
print “Element $i is $arr_ref->[$i]\n”;
}
• Similarly, hash keys would be placed inside curly braces
to access hash values from a hash reference.
Passing Arrays In and Out of
Subroutines
•
•
•
•
•
•
One important use of references is passing arrays, hashes, and very long strings into and out of
subroutines.
If you pass in a variable, it gets copied to a new location for use by the subroutine. If this is a very
long string, such as the DNA sequence of a chromosome, you will use a large amount of memory.
However, if you pass a reference to that string to the subroutine, the string itself is not copied.
Recall that variables are passed into a subroutine by the @_ array. For example:
process($var1, $var2, @arr);
sub process {
my ($x, $y, @z) = @_;
...
}
If you try to pass in 2 arrays, they both end up together in the fist array inside the subroutine. That
is, Perl “flattens” multiple arrays into the single @_ array.
The way around the problem of passing multiple arrays in or out of subroutines is to pass in
references, which are just scalar variables.
process($var1, @arr2, @arr3); # DOESN”T WORK
process($var1, \@arr2, \@arr3); # GOOD
sub process {
my ($x, $arr_ref2, $arr_ref3) = @_;
More on Subroutines
• Similarly, arrays are generally returned from subroutines
in the form of array references.
• Note in this example that the array @arr is created within
the subroutine, but returned as a reference. The name
“@arr” doesn’t exist outside the subroutine.
sub add_to {
my @arr;
for (my $i = 0; $i < 10; $i++) {
$arr[$i] = $i + 2;
}
return \@arr;
}
Multidimensional Arrays
• Arrays are one-dimensional: a linear set of elements.
• Suppose you want a two dimensional array, to keep
track of positions on a grid, for instance. Say, a tic-tactoe game.
• Each row can be represented as a single array:
@row1 = qw(X O O);
@row2 = qw (O X O);
@row3 = qw(X O X);
• Since the elements of an array are scalars, you can’t just
put the row arrays together in a big array to represent
the whole game board.
• However, array references are scalars, so the game
board could be represented by an array of references to
the sub-arrays:
@game = (\@row1, \@row2, \@row3);
More on Multidimensional Arrays
• To access a row, you need to de-reference it:
print “Row 2 is @{$game[1]} \n”;
• Note the position of the curly braces which do the de-referencing:
they surround $game[1], which is an array reference, \@row2.
• To access an individual element, say the first square in row 2:
print “ ${$game[1]}[0] \n”;
• You see that the index value [0] for the individual element is
OUTSIDE the curly braces. The array reference is inside; once they
return the array, the $ at the beginning of the expression and the [0]
at the end of it access the individual element of that row.
Arrow Notation with
Multidimensional Arrays
• You could also use arrow notation:
print “$game[1]->[0] “;
• Here, the arrow causes $game[1] to be dereferenced, at which point
you can access the individual element [0].
• Perl, in its helpful fashion, allows you to not use arrows between
indices. Thus, this also works:
print “$game[1][0]”
• In this case, @game is an actual array. If you instead used a
reference to an array here:
$game_ref = \@game;
you would need to use the arrow between the variable name and the
first index value:
print “$game_ref->[1][0]”;
• You can leave the arrows out between the indexes, but not between
the initial array reference and the first index.
Anonymous Arrays
• We have been creating an array such as @arr = (1, 3, 5, 7), then
creating a reference to that array: $arr_ref = \@arr.
• It isn’t necessary to do this in 2 steps. If we only want to use the
array reference, we can create an anonymous array and create an
array reference variable to refer to it. The anonymous array never
gets its own name; it is always referred to by its reference.
• Recall that to construct an array you put the array values within
parentheses:
@arr = (1, 3, 5, 7);
• The anonymous array constructor is square brackets: [].
$arr_ref = [1, 3, 5, 7];
• Using square brackets instead of parentheses generates a
reference to an anonymous array, which you assign to a variable. In
contrast, the parentheses generate the array itself, which must be
given an array designation starting with @.
More Anonymous Arrays
• We could create the tic-tac-toe game thus:
my @game = ( [ “X”, “O”, “O”],
[ “O”, “X”, “O”],
[“X”, “O”, “X”] );
• That is, we generate 3 anonymous arrays inside the parentheses
that create the top level array @game.
• Or, we could generate an anonymous array containing 3 references
to other anonymous arrays, and assign the whole mess to an array
reference scalar:
my $game_ref = [ [ “X”, “O”, “O”],
[ “O”, “X”, “O”],
[“X”, “O”, “X”] ];
• Here we use nested sets of anonymous array generators (square
brackets) to produce the array references we need.
Using Temporary Arrays in a Loop
• Another way to create a 2 dimensional array is to create each row
as a temporary named array, then convert it to an anonymous array
reference and push it onto a larger array.
for (my $i = 0; $i <= 3; $i++) {
my @temp_arr = ($i, $i*2, $i*$i);
push @big_arr, [ @temp_arr ];
}
• The @temp_arr gets used repeatedly, but the values put into it are
placed in separate locations when it gets converted to an
anonymous array with [ @temp_arr ].
• There is a temptation to rewrite the “push” line as:
push @big_arr, \@temp_arr; #WRONG
• This doesn’t work, because @temp_arr changes with every pass
through the loop, and \@temp_arr always refers to the same place
in memory. In contrast, [ @temp_arr ] copies the values in
@temp_arr to a new location with each pass through the loop.
Auto-vivification
• You don’t need to pre-declare anything about a
multidimensional array. Perl takes care of this
by creating all needed structures as soon as
they are needed. Thus, you could say
something like:
my @arr;
$arr[5][0][1][4] = 17;
• This would cause a 4-dimensional array to come
into being, with all values other than the one you
specified set to “undef”.
Hash of Arrays
•
•
•
•
•
A hash stores a value that is indexed by its key. Sometimes you want to
store an array of values indexed by the same key. This can be done using
the anonymous array composer to create an array for each individual hash
key.
For example, various data about students could be stored in a single hash
whose keys are the student ID numbers.
my %students = (
“z12345” => [“Schmoe”, “Joe”, “freshman”, “F”],
“z67890” => [“Smith”, “Harold”, “sophomore, “C”],
“z13579” => [“Vicious”, “Nancy”, “senior”, “A”] );
To access a student’s info:
print “@{$students{z12345} } \n”;
To access an individual piece of information, any of these will work:
print “${$students{z12345}}[3] “;
print “$students{z12345}->[3] “;
print “$students{z12345}[3] “;
Note that $students{z12345} is a reference to an anonymous array.
Anonymous Hashes
• The anonymous hash generator is the curly braces {}.
When used instead of parentheses, they generate a
scalar reference to an anonymous hash.
• For example:
my %hash = (“green” => “go”, “yellow” => “caution”,
“red” => “stop”);
my $hash_ref = {“green” => “go”, “yellow” => “caution”,
“red” => “stop”};
• Hash references are de-referenced just like array
references:
print “A red light means $hash_ref->{red} \n”;
print “A red light means ${$hash_ref}{red} \n”;
Array of Hashes
•
•
•
The anonymous hash composer can be used to create various data
structures. An array that contains a set of hash references is an example.
An example: an array of genes on a chromosome, where the position of the
gene in the array corresponds to its relative position on the chromosome.
Information about each gene is stored in a hash.
For example, assume that INFILE contains information about genes, one
gene per line, in a “key = value” format, with each attribute separated by
commas.
while (<INFILE>) {
my @attributes = split /,/;
my %temp_hash;
foreach my $pair (@attributes) {
my ($key, $value) = split /=/, $pair;
$temp_hash{$key} = $value;
}
push @gene_arr, { %temp_hash};
}
Printing from Array of Hashes
• To print an individual element, say the length of gene 1.
print “$gene_arr[1]{length} \n”;
• To print the whole thing:
foreach my $i (0 .. $#gene_arr) {
foreach my $key (sort keys %{$gene_arr[$i]} ) {
print “$key = $gene_arr[$i]{$key}\n”;
}
}
Hash of Hashes
•
•
Here’s a hash of hashes example, based on the previous example of genes
on the chromosome. Here we are using a top level hash whose keys are
the gene names.
The input file has the gene name followed by a colon, followed by a commaseparated list of key=value pairs.
my %gene_hash;
while (<INFILE>) {
my ($gene, $rest) = split /\s*:\s*/;
my @pairs = split /,/, $rest;
my %temp_hash;
foreach my $pair (@pairs) {
my ($key, $value) = split /=/, $pair;
$temp_hash{$key} = $value;
}
$gene_hash{$gene} = { %temp_hash };
}
Further
• All kinds of data structure are possible,
with as many levels as you like, mixing
arrays and hashes freely. All you have to
do is not get yourself confused by your
own cleverness.
• Also, remember that someone else will
probably have to read your code someday,
so document the structures and avoid
needless complications