Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
8ex.1 References and complex data structures 8ex.2 Hash – an associative array An associative array (or simply – a hash) is an unordered set of key=>value pairs. Each key is associated with a value. A hash variable name always start with a “%”: my %h = ("a"=>5, "bob"=>"zzz", 50=>"Johnny"); You can access a value by its key: print $h{50}.$h{"a"}; Johnny5 $h{"bob"} = "aaa"; (modifying an existing value) $h{555} = "z"; (adding a new key-value pair) 8ex.3 Iterating over hash elements To iterate over the keys in %h foreach $key (keys(%h))... For example: foreach $key (keys(%h)) { print "The key is $key\n"; print "The value is $h{$key}\n"; } The elements are given in an arbitrary order, so if you want a certain order use sort: foreach $key (sort(keys(%h)))... 8ex.4 Why do we need complex data structures? So far, we know two types of data structures: An Array is an ordered list of scalar values: my @names = ("Shmuel", "Moti", "Rahel"); A Hash is an unordered set of pairs of scalar values: my %phoneBook = ("Shmuel"=>5820, "Moti"=>2745); However, in many situations we may need to store more complex data records. For example – how to keep the phone number, address and list of grades for each student in a course? We would like a data record that looks like this: "Shmuel" => (5820, "34 HaShalom St.", (85,91,67)) For this to work we’re going to need references… 8ex.5 Variable types in PERL Scalar Array Hash $number -3.54 @array %hash => $string "hi\n" => => $reference 0x225d14 %hash => @array1 => @array2 => @array3 8ex.6 References A reference to a variable is a scalar value that “points” to the variable: $nameRef = \$name; $nameRef @grades = (85,91,67); $gradesRef = \@grades; $gradesRef $phoneBookRef = \%phoneBook; $phoneBookRef $name @grades %phoneBook => => => 8ex.7 References A reference to a variable is a scalar value that “points” to the variable: $nameRef = \$name; @grades = (85,91,67); $gradesRef = \@grades; $gradesRef $phoneBookRef = \%phoneBook; @grades We can make an anonymous reference without creating a variable with a name: [ITEMS] creates a new, anonymous array and returns a reference to it; {ITEMS} creates a hash: $arrayRef = [85,91,67]; $arrayRef $hashRef = {85=>4,91=>3}; (These are variables with no variable name) 8ex.8 De-referencing $nameRef = \$name; $gradesRef $gradesRef = \@grades; $phoneBookRef = \%phoneBook; print $gradesRef; ARRAY(0x225d14) To access the data from a reference we need to dereference it: print $$nameRef; Yossi print "@$gradesRef"; 85 91 67 $$gradesRef[3] = 100; print "@grades"; 85 91 67 100 $phoneNumber = $$phoneBookRef{"Yossi"}; @grades 100 was added to the original array @grades! 8ex.9 De-referencing $gradesRef = \@grades; $phoneBookRef = \%phoneBook; $gradesRef print "@$gradesRef"; 85 91 67 $$gradesRef[3] = 100; $phoneNumber = $$phoneBookRef{"Yossi"}; The following notation is equivalent, and sometimes it is more readable: $gradesRef->[3] = 100; $phoneNumber = $phoneBookRef->{"Yossi"}; @grades 8ex.10 References allow complex structures Because a reference is a scalar value, we can store a reference to an array\hash in as an element in another array\hash: %students @grades = (85,91,67); NAME => [GRADES] %students = ("Yossi" => \@grades); %students $students{"Yossi"} = \@grades; => $students{"Shmuel"} = [83,76]; => Now the key “Yossi” is paired to a reference value: print $students{"Yossi"}; print "@{$students{"Yossi"}}"; print ${$students{"Yossi"}}[1]; print $students{"Yossi"}->[1]; => ARRAY(0x22e714) 85 91 67 91 91 This form is more readable, we strongly recommend it… 8ex.11 References allow complex structures Now we can do it: “how to keep the phone number, address and list of grades for each student in a course?” %students $students{"Yossi"} = NAME => {"phone"=>3744, {"phone" => PHONE "address"=>"34 HaShalom St.", "address" => ADDRESS "grades"=>[93,72,87]}; "grades" => [GRADES]} $students{"Rahel"} = {"phone"=>5732, %students "address"=>"5 Bazel St.", => => "grades"=>[91,86,88]}; => => => => => => => => => => 8ex.12 References allow complex structures Now we can do it: “how to keep the phone number, address and list of grades for each student in a course?” %students $students{"Yossi"} = NAME => {"phone"=>3744, {"phone" => PHONE "address"=>"34 HaShalom St.", "address" => ADDRESS "grades"=>[93,72,87]}; "grades" => [GRADES]} print $students{"Yossi"}->{"grades"}->[2]; 87 %students => It is more convenient to use a shorthand notation: print $students{"Yossi"}{"grades"}[2] But remember that there are references in there! => => => => => => => => => => => 8ex.13 References allow complex structures The following code is an example of iterating over two levels of the structure – The top hash (each student) and the internal arrays (lists of grades): foreach my $name (keys(%students)) { foreach my $grade (@{$students{$name}->{"grades"}}) { print $grade; %students } %students NAME => => => } {"phone" => PHONE => "address" => ADDRESS => "grades" => [GRADES]} => => => => => => => => 8ex.14 The REUSED_ADDRESS problem When building a complex data structure in some loop you may come across a problem if you insert a non-anonymous array or hash into the data structure: my ($line, $id, @grades, %students); while ($line = <IN>) { ... @grades = ... $students{$id} = \@grades; } Let’s see what happens when we enter the lines: a 86 73 89 b 79 90 87 c 100 90 93 8ex.15 The REUSED_ADDRESS problem The debugger will show you that there is a problem: 8ex.16 The REUSED_ADDRESS problem The problem is that for every student we store a reference to the same array. We have to create new array in every iteration: 1. By using an anonymous array reference: $students{$id} = {GRADES=>[...], ... 2. or, we could declare (with my) the array inside the loop, so that a new one is created in every iteration: while ($line = <IN>) { my @grades = ... $students{$id} = \@grades; } (You may have this problem with the multiple #RP fields in ex5.5) 8ex.17 Class exercise 10 1. Read the adenovirus genome %genes file and build a hash of genes, PRODUCT => where the key is the "product" {"protein_id" => PROTEIN_ID} PROTEIN_ID name: For each gene store a "strand" => STRAND} STRAND "CDS" => [START, END]} hash with the protein ID. Print all keys (names) in the hash. 2. Add to the hash the strand of the gene on the genome: “+” for the sense strand and “-” for the antisense strand. Print all antisense genes. 3. Add to the hash an array of two coordinates – the start and end of the CDS. Print genes shorter than 500bp. 4. Print the product name of all genes on the sense strand whose CDS spans more than 1kbp, and all genes on the antisense strand whose CDS spans less than 500bp. 8ex.18 Two dimensional arrays Now we can also create a 2-dimensional array (a table or a matrix): @table = ([1,2,3],[4,5,6],[7,8,9]);\ @table print $table[1]->[0]; 4 Or: print $table[1][0]; 4 1 2 3 4 5 6 7 8 9