Download Lecture 4

Lecture 4 BNFO 235 Usman Roshan IUPAC Nucleic Acid symbols IUPAC Amino Acid symbols Genetic code Splitting and joining strings • split: splits a string by regular expression and returns array – @s = split(/,/); – @s = split(/\s+/); • join: joins elements of array and returns a string (opposite of split) – $seq=join(“”, @pieces); – $seq=join(“X”, @pieces); Searching and substitution • $x =~ /$y/ ---- true if expression $y found in $x • $x =~ /ATG/ --- true if open reading frame ATG found in $x • $x !~ /GC/ --- true if GC not found in $x • $x =~ s/T/U/g --- replace all T’s with U’s • $x =~ s/g/G/g --- convert all lower case g to upper case G DNA regular expressions Taken from Jagota’s Perl for Bioinformatics DNA Sequence Evolution -3 mil yrs AAGACTT AAGACTT AAGGCTT AAGGCTT _GGGCTT _GGGCTT GGCTT _G_GCTT (Mouse) (Mouse) TAGACCTT TAGACCTT TAGGCCTT TAGGCCTT (Human) (Human) -2 mil yrs T_GACTT T_GACTT TAGCCCTTA TAGCCCTTA (Monkey) (Monkey) A_CACTT A_CACTT ACACTTC A_CACTTC (Lion) ACCTT A_C_CTT (Cat) (Cat) -1 mil yrs today Comparative Bioinformatics • Fundamental notion of biology: all life is related by an unknown evolutionary Tree of Life. • Therefore, if we know something about one species we can make inferences about other ones. • Also, by comparing multiple species we can make inferences about sets of species. • How do we compare DNA or protein sequences of two different species? Comparative Bioinformatics • We need to know how often do mutations from A to T occur or A to C occur. • To determine this we manually create a set of “true” alignments and estimate the likelihood of A changing to C, for example, by counting the number of time A changes to C and computing related statistics. • Now we have a realistic “scoring matrix” which can be used to evaluate how related are two species based on their DNA. Problems • Write a Perl subroutine called readmatrix that reads a DNA substitution scoring matrix from a file called “dna.txt” and stores it in a two dimensional array. The format of the scoring matrix in the file is A C G T A 10 3 1 4 C 3 12 3 5 G1 3 15 2 T 4 5 2 11 • Write a Perl subroutine called translate that takes an mRNA sequence and converts it into a protein sequence and also returns the sequence. Problems • Write a Perl program that reads in a substitution scoring matrix from a file called “matrix.txt”, reads in a pair of DNA sequences of equal length from a file called “dna.txt”, and returns the total substitution score between the two sequences. • Write a Perl program that reads pairs of DNA sequences from a file called “DNApairs.txt” and estimates the frequency of nucleotide substitutions.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Lecture 4