Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Welcome to Introduction to Bioinformatics Wednesday, 19 September 2014 Scenario 2: Regulatory protein / Simulations Array indexing List notation Push/Pop Shift/Unshift foreach / for Array-based DiceRoll.pl Implementing a Simulation General Strategy Problem: How frequent are NtcA binding sites in a random DNA sequence? Which random sequence? Differentiation in cyanobacteria Find primers to PCR out hetC ttgtcagttgtcagacgtagtagcgcgtctagtctaatgtgttgttatat tatttgctactagaaatgaggagagggttatttttctcactgcttcccaa ttctatgagaatataaaattttccttaagtttctcatggcaataatggaa aaaaccgaccattctgatgaataagtccggttttttccaaaaaatatttt tgctttttcgctttatttatctatatttccaagttttagtacatcggtga ggggtgacaactatcttgccaatattgtcgttattgttaggttgctatcg gaaaaaatcTGTAacatgagaTACAcaatagcatttatatttgctttagt atctctctcttgggtgggattctgcctgcaatttaaaaaccagtgttaac aattttcggctttattttccgggagttaaatcaaccaagggaaaatgtaa ctaatgtttaaatatcttcggatacacacaaagtaaaaccaatttttaca GTA…(8)…TAC gatgtcgatgttgctcacattttttagaaatattactaaattaaaaatgt tattaaatttatgttcatagagaaccttttccaaataaaaaaataatttt cctgatgttttaagaaaattactgttgttataaattaaaggtgattcaac aaaatatagatagttctttcaataactatctacttttaccattaagtgaa cttactcatgaataatcaacaggaattaaaaataaagttcatgaatactg gttaaagattcagtaaagtttgaggaaataccggaataaatttccaccca aatatgattttttaaaagatacattggcagtacattaaaatgccgatgtt Differentiation in cyanobacteria ttctatgagaatataaaattttccttaagtttct aaaaccgaccattctgatgaataagtccggtttt tgctttttcgctttatttatctatatttccaagt ggggtgacaactatcttgccaatattgtcgttat gaaaaaatctGTAacatgagaTACacaatagcat ttatatttgcttTAgtaTctctctcttgggtggg GTA…(8)…TAC…(20-24)…TAnnnT Promoter NtcA binding site Implementing a Simulation General Strategy Problem: How frequent are NtcA binding sites in a random DNA sequence? Which random sequence? I still don't entirely understand why we only need to create 847 bp Implementing a Simulation General Strategy Problem: How frequent are NtcA binding sites in random DNA sequence? Strategy: Modify DiceRoll.pl - (change to use arrays) - Modify Make_random_sequence (SQ.1-3) - Change Random_integer Random nucleotide (SQ. 4) - Modify Any_matches, test for exact match (SQ. 5) - Modify Any_matches, allow inexact matches (SQ. 7-11) The Simulation The Alternative: Straight Math SQ1. Probability of getting at least one matched pair i don't remember combination in a roll of five dice. and permutation math very well. Probability (0 dice matching) = Probability (1 dice matching) = Probability (2 dice matching) = Probability (3 dice matching) = Probability (4 dice matching) = Probability (5 dice matching) = Roll that doesn’t work Roll that works The Alternative: Straight Math SQ1. Probability of getting at least one matched pair in a roll of five dice. Probability (0 dice matching) = Probability (1 dice matching) = Probability (2 dice matching) = Probability (3 dice matching) = Probability (4 dice matching) = Probability (5 dice matching) = Roll that doesn’t work Roll that works SQ3: push / pop / shift / unshift I am just learning about push, pop, shift, and unshift in 600. A quick review of all of these would greatly help. SQ3: push / pop / shift / unshift YGRP Arrays: Assignment and Access @codons = ATG GAT GCT TAT TTT CAA 0 Memory: 3200 1 2 3203 3206 Which $codon[ 3 4 ... TAA 5 n Memory: ???? ] is GCT? Where is $codon[n]? 3200 + 3*n Where is $codon[1]? 3200 + 3 Where is $codon[2]? 3200 + 6 Arrays: Assignment and Access Scalar assignment of array values: my @days; $days[0] = “Sun”; $days[1] = “Mon”; ... Array assignment of array values: my @days = (“Sun”, “Mon”, ...); my @numbers = (1 .. 47); print @numbers; Arrays: Assignment and Access SQ2. @letters contains all uppercase letters. How to print the letter "J"? my @letters = print SQ3: push / pop / shift / unshift SQ3. Predict output of: @protein = ("cytochrome oxidase","hexokinase","glutamine synthetase"); push @protein, "phosphofructokinase", "albumin"; $protein[1]= "deleted"; unshift @protein, "globin"; $name1 = pop @protein; $name2 = shift @protein; $name3 = shift @protein; print"name1 = $name1 name2 = $name2 name3 = $name3", $LF; print"current protein[2] = $protein[2]", $LF; print"remaining names: ",join(", ", @protein); SQ4: DiceRoll if with arrays SQ4. Rewrite these lines to use an array if ($number_of_ones>=$matches_wanted) { return $true} if ($number_of_twos>=$matches_wanted) { return $true} ... if ($number_of_sixes>=$matches_wanted) { return $true} for loops Problem: Add up the numbers from 1 to 100 - Where to begin? - Where to end? - How to get from here to there? - What to do in between? for (my $number = 1; $number <= 100; $number = $number + 1) { $sum = $sum + $number; } foreach loops Problem: Add up the numbers from 1 to 100 - Where to begin? - Where to end? - How to get from here to there? - What to do in between? foreach (my $number (1 .. 100) { $sum = $sum + $number; } foreach loops SQ5. Write a loop that prints out a table of numbers from 1 to 20 and their squares. SQ6: Rewrite DiceRoll SQ6. Replace $number_of_ones and similar variables with an array.