Download Lecture II - Baylor School of Engineering & Computer Science

Perl Part I: A Biology Primer Conceptual Biology     H. sapiens did not create the genetic code – but they did invent the transistor Biological life is not optimized – the modern synthesis Nature vs. Nurture What are the best ways to understand the important differences the make the difference? A Molecular Primer  Hierarchy of the eukaryote • Organism > System > Organ > Tissue > Cell > Organelle > Protein > RNA > DNA  Put Simply: DNA → RNA → Protein The Building Blocks     DNA is composed of four building blocks • • Nucleic acids, nucleotides, bases Adenine, Cytosine, Guanine, Thymine RNA also has four building blocks • Adenine, Cytosine, Guanine, Uracil Proteins are composed of 20 building blocks • • Amino acids, residues Fragments of proteins are called peptides DNA, RNA and Proteins are polymers Code Nucleic Acid(s) w/ Sugar w/P A Adenine Adenosine Adenylic Acid C Cytosine Cytodine Cytidylic Acid G Guanine Guanosine Guanylic Acid T Thymine Tymidine Thymidylic Acid U Uracil Uridine Uridylic Acid M A or C (amino) Code Nucleic Acid R A or G (purine) V A or C or G W A or T (weak) H A or C or T S C or G (strong) D A or G or T Y C or T (pyrimidine) B C or G or T K G or T (keto) N A, G, C, T (any) Code A Nucleic Acid(s) Adenine w/ Sugar w/P Adenosine Adenylic Acid C Cytosine Cytodine Cytidylic Acid G Guanine Guanosin Guanylic e Acid T Thymine Tymidine Thymidyli c Acid U Uracil Uridine Uridylic Acid M A or C (amino) Code R A or G (purine) V W A or T (weak) H A or C or T S C or G (strong) D A or G or T C or T (pyrimidin e) B G or T (keto) N Y K Nucleic Acid A or C or G C or G or T A, G, C, T (any) DNA RNA A = T → A C = G → C G = C → G C = G → C T = A → U T = A → U M = K → M W = W → ? N = N → N C = G → C C = G → C T = A → U Y = R → ? B = V → ? N = N → N K = M → ? S = S → S T = A → U T = A → U DNA RNA •One Dimensional A = T → A •Two Dimensional C = G → C G = C → G C = G → C T = A → U T = A → U M = K → M W = W → ? N = N → N C = G → C C = G → C T = A → U Y = R → ? B = V → ? N = N → N K = M → ? S = S → S T = A → U T = A → U •Three Dimensional DNA RNA A = T → A C = G → C G = C → G C = G → C T = A → U T = A → U M = K → M W = W → ? N = N → N C = G → C C = G → C T = A → U Y = R → ? B = V → ? N = N → N K = M → ? S = S → S T = A → U T = A → U DNA RNA A = T → A T = A → U G = C → G C = G → C T = A → U T = A → U M = K → M W = W → ? N = N → N C = G → C C = G → C T = A → U Y = R → ? B = V → ? N = N → N K = M → ? S = S → S T = A → U T = A → U DNA RNA A = T → A T = A → U G = C → G C = G → C T = A → U T = A → U M = K → M W = W → ? N = N → N C = G → C C = G → C T = A → U Y = R → ? B = V → ? N = N → N K = M → ? S = S → S T = A → U T = A → U One-Letter Code Amino Acid ThreeLetter Code One-Letter Code Amino Acid ThreeLetter Code C Cysteine Cys D Aspartic acid Asp E Glutamic Acid Glu F Phenylalanin Phe G Glycine Gly H Histidine His I Isoleucine Ile K Lysine Lys L Leucine Leu M Methionine Met N Asparagine Asn P Proline Pro Q Glutamine Gln R Argine Arg S Serine Ser T Threonine Thr V Valine Val W Tryptophan Trp X Unknown Xxx Y Tyrosine Tyr Z Glutamic acid or Glutimine Glx DNA RNA A = T → A T = A → U G = C → G C = G → C T = A → U T = A → U M = K → M W = W → ? N = N → N C = G → C C = G → C T = A → U Y = R → ? B = V → ? N = N → N K = M → ? S = S → S T = A → U T = A → U Met (Start) Leu AA?, AU?, CA?, CU? -> Asn, Lys, Ile, Met, His, Gln, Val Pro UU?, UG?, UC?, CU?, CG?, CC? -> Phe, Leu, Cys, Stop, Trp, Ser, Leu, Arg, Pro UCU, UGU, GCU, GGU -> Ser, Cys, Ala, Gly DNA RNA A = T → A T = A → U G = C → G C = G → C T = A → U T = A → U M = K → M W = W → ? N = N → N C = G → C C = G → C T = A → U Y = R → ? B = V → ? N = N → N K = M → ? S = S → S T = A → U T = A → U Cys Phe, Leu A?C, U?C -> Ile, Thr, Asn, Ser, Phe, Ser, Tyr, Cys Leu U?U, U?G, C?U, C?G -> Phe, Ser, Tyr, Cys, Leu, Stop, Trp, Leu, Pro, His, Arg, Gln GUU, CUU -> Val, Leu Protein DNA RNA Lecture II Part II: One-Dimensional Strings Hello World…     A few perls of wisdom Concatenating Sequences Making a reverse complement Read sequences from data files Every journey starts with a first 10bp #!/usr/bin/perl –w #storing DNA in a variable, and printing it out #First, storing DNA in a variable called $DNA $DNA = ‘CGGGCTATTC’; #Next, print the DNA onto the screen print $DNA; #Finally, specifically tell the program to end exit; Every journey starts with a first 10bp #!/usr/bin/perl –w #storing DNA in a variable, and printing it out #First, storing DNA in a variable called $DNA $DNA = ‘CGGGCTATTC’; #Next, print the DNA onto the screen print $DNA; #Finally, specifically tell the program to end exit; Every journey starts with a first 10bp #!/usr/bin/perl –w #storing DNA in a variable, and printing it out #First, storing DNA in a variable called $DNA $DNA = ‘CGGGCTATTC’; #Next, print the DNA onto the screen print $DNA; #Finally, specifically tell the program to end exit; Every journey starts with a first 10bp #!/usr/bin/perl –w #storing DNA in a variable, and printing it out #First, storing DNA in a variable called $DNA $DNA = ‘CGGGCTATTC’; #Next, print the DNA onto the screen print $DNA; #Finally, specifically tell the program to end exit; Concatenating DNA Fragments #!/usr/bin/perl –w #Store DNA in 2 variables $DNA1 = ‘AGTGCGTCGCTAG’; $DNA2 = ‘ACCGCATGCATTG’; #using string interpolation $DNA3 = “$DNA1$DNA2”; print “$DNA3\n\n”; #dot operator $DNA3 = $DNA1 . $DNA2; print “$DNA3\n\n”; Print $DNA1,$DNA2,”\n”; exit; Transcription: DNA to RNA #!/usr/bin/perl –w $DNA = ‘ACGACTGCACGATCGTACG’; #print the DNA onto the screen print “$DNA\n\n”; #Transcribe the DNA->RNA by substituting all T’s with U’s $RNA = $DNA; $RNA =~ s/T/U/g; #print the result to the screen print “Here is the result of DNA->RNA:\t$RNA\n\n”; exit; Variable Binding Operator Delimiters to separate the operator $RNA =~ s/T/U/g; Substitute operator Pattern modifier g = globally Pattern to be replaced Replacement Text of replace pattern i = case insensititve m = multiline s = single line x = permit comments o = compile only once for speed e = treat replacement as Perl code Calculating the Reverse Complement #!usr/bin/perl –w $DNA = ‘ACGTCAGTCGAGCT’; #print the starting DNA onto the screen print “Here is the starting DNA:\t$DNA\n\n”; #Calculate the reverse complement, first copying the DNA onto #a new variable called $revcom $revcom = reverse $DNA; #substitute all bases by their complement $revcom =~ s/A/T/g; $revcom =~ s/T/A/g; $revcom =~ s/C/G/g; $revcom =~ s/G/C/g; print “$revcom\n”; Calculating the Reverse Complement #!usr/bin/perl –w $DNA = ‘ACGTCAGTCGAGCT’; #print the starting DNA onto the screen print “Here is the starting DNA:\t$DNA\n\n”; #Calculate the reverse complement, first copying the DNA onto #a new variable called $revcom $revcom = reverse $DNA; #substitute all bases by their complement $revcom =~ tr/ACGTacgt/TGCAtgca/; print “$revcom\n”; Reading Data from Files #### Sample Data in FASTA Format #### >NM_012345 | Sample Data | Muppet Stuffing Protein MNIDDKLEFGDEMGOSSRTMV FGDLVRSMPHOEILAADEVLISHEE GLOYAKLEFGDEMGOGHDDEFGVY Reading Files #!/usr/bin/perl –w #The filename of the file containing the sequence data $proteinFilename = ‘NM_012345.pep’; #open the file, and associate a ‘filehandle’ with it open (PROTEINFILE {IN}, $proteinFilename); #assign file with an input operator $muppetProtein = <PROTEINFILE>; #print the protein file print “Here is the protein:\t$muppetProtein\n\n”; exit; Reading Data from Files #### Sample Data in FASTA Format #### >NM_012345 | Sample Data | Muppet Stuffing Protein MNIDDKLEFGDEMGOSSRTMV FGDLVRSMPHOEILAADEVLISHEE GLOYAKLEFGDEMGOGHDDEFGVY Lets try this again … #!usr/bin/perl –w $proteinFilename = ‘NM_012345.pep’; open(PROTEINFILE, $proteinFilename); $muppetProtein = <PROTEINFILE>; print “Here is the first line:\t$muppetProtein\n\n”; $muppetProtein = <PROTEINFILE>; print “Here is the second line:\t$muppetProtein\n\n”; $muppetProtein = <PROTEINFILE>; print “Here is the third line:\t$muppetProtein\n\n”; close PROTEINFILE; exit; Using Arrays to Read Files #!usr/bin/perl –w $proteinFilename = ‘NM_012345’; #open the file open(PROTEINFILE, $proteinFilename); #Read the sequence data from the file, and store it in the array #variable @protein @protein = <PROTEINFILE>; #print the protein onto the screen print @protein; close PROTEINFILE; exit; Arrays #Here’s one way to declare an array @bases = (‘A’,’C’,’G’,’T’); #Now print each element of the array print “\nFirst element: “ , $bases[0]; print “\nSecond Element: “ , $bases[1]; print “\nThird Element: “ , $bases[2]; print “\nFourth Element: “ , $bases[3]; Arrays #Here’s one way to declare an array @bases = (‘A’,’C’,’G’,’T’); #Now print each element of the array in a row print “\nHere are all of the bases: “ , @bases; #This prints out: ‘Here are all of the bases: ACGT’ #But, you can print them out with spaces in between print “\nHere they are with spaces” , “@bases”; Arrays #Here’s one way to declare an array @bases = (‘A’,’C’,’G’,’T’); #Here’s how to take an element off of the end $base1 = pop @bases; print “Here’s the last element: “, $base1, “\n\n”; #The other elements still remain print “\nHere are the remaining elements: ” , “@bases”; Arrays #Here’s one way to declare an array @bases = (‘A’,’C’,’G’,’T’); #Here’s how to take an element off of the front $base2 = shift @bases; print “Here’s the first element: “, $base2, “\n\n”; #The other elements still remain print “\nHere are the remaining elements: ” , “@bases”; Arrays #Here’s one way to declare an array @bases = (‘A’,’C’,’G’,’T’); #Here’s how you put an element at the beginning of an array #Our example will put the last element at the beginning $base1 = pop @bases; unshift (@bases, $base1); print “Here’s the last element put first: “ , “@bases\n\n”; Arrays #Here’s one way to declare an array @bases = (‘A’,’C’,’G’,’T’); #Here’s how you put an element at the end of an array #Our example will put the first element at the end $base1 = shift @bases; push (@bases, $base1); print “Here’s the first element put last: “ , “@bases\n\n”; Arrays #Here’s one way to declare an array @bases = (‘A’,’C’,’G’,’T’); #Here’s how to reverse an array @reverse = reverse @bases; #Here’s how to get the length print scaler @bases, “\n\n”; #Here’s how to insert an element at an arbitrary place splice (@bases, 2, 0, ‘X’); Arrays #Arrays can be evaluated as lists and scalers @bases = (‘A’,’C’,’G’,’T’); #Here’s how to print the array print “@bases\n”; #Here’s how to assign it to a scaler $a = @bases; print $a; #Here’s how to assign an array to a list ($a) = @bases; print $a;

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Lecture II - Baylor School of Engineering & Computer Science