Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Molecular evolution wikipedia , lookup
Cre-Lox recombination wikipedia , lookup
Nucleic acid analogue wikipedia , lookup
Comparative genomic hybridization wikipedia , lookup
Non-coding DNA wikipedia , lookup
Deoxyribozyme wikipedia , lookup
Community fingerprinting wikipedia , lookup
String ($var) arrays (@array) conversion and substring extraction Lecture 6 Split strings • This function can be used to split (divide) data: – Strings into an arrays. – Strings into a list of scalars ($variables) – It can also split each character of a string by using “” as the deliminiter. • >192a8, the lactose gene, e. coli, cambridge university, january 1981 – chomp($line = <>); # read the line into $line – @fields = split ‘,’,$line; #splits a String into an array – ($clone,$laboratory,$left_oligo,$right_oligo) = split ‘,’,$line; • See SplitExample.pl Join: elements of an array/ • The join function is the reverse of the split: – Convert an array into a string • To transform arrays (lists) into strings: join • #initialize an array • @seq = (“aaaaaa",“tttttt",“cccccc",“ggggggg"); • $CombinedSeq = join ‘', @seq; • Result of the join is: • aaaaaattttttccccccggggggg • See JoinExample.pl Concatetion • To concatenate to strings you use the – =. Symbol – Seq1 is a null string: $seq = “”; – We can add (concatenate) a sequence to this by: – $seq .= $input_seq2 – It can be used to read in sequences and join them together so they form one string. Extracting substrings • • Substr: a function to extracting a substring from a string. Assume the string is: AAAAGGGGCCCCTTTT • To extract the sequence AGG (a codon) from the string we need: – – • Move to 4 positions [character} of the string] t. Extract 3 characters or a 3 character substring The syntax for perl substr (substring function) – $sub = substr ($string, offset position[position to begin extraction], size of substring) – Offset is zero based • • # more details on substrings can be found at: # http://perlmeme.org/howtos/perlfunc/substr.html • • Extract words from a sentence: Substring.pl Extract codon from a DNA seqeunce: substring.pl Perl Functions for determining the ORF of DNA sequences. • • The Unpack function: this a function of the perl language that extracts sets of characters from a sequence of characters and assign them to an array. So they can be used to extract groups of 3 bases from a DNA sequence. E.g.. open reading frames, and assign each set to an element of an array. – @triplets = unpack("a3" x (length($line)/3), $line); • To determining all possible open reading frames (ORFs) for a DNA sequence (reading frame 1, reading frame 2 and reading frame 3) one needs to shift one base when going from reading frame 1 to reading frame 2 and the same when going from reading frame 2 to reading frame 3 subsequent • Frame Shift (1positions to the right) – • @triplets = unpack(‘a1’ . “a3” x (length ($line)/3),$line); Remember if there are only 2 characters at the end/ beginning of a sequence. Unpack will still assign them to an element of the array. If using hash tables do not forget an exist function may be required, • See Unpack_codons.pl (Run to show the output) Sample Exercise • Write a script to read in the contents of a fasta file (without descriptor line) and print it out as a string containing all the DNA bases/ Amino acids • Modify the unpack function to use substrings instead of unpack.