* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Lecture 4
DNA sequencing wikipedia , lookup
Frameshift mutation wikipedia , lookup
Zinc finger nuclease wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Genetic code wikipedia , lookup
Mitochondrial DNA wikipedia , lookup
DNA polymerase wikipedia , lookup
Genomic library wikipedia , lookup
DNA profiling wikipedia , lookup
Comparative genomic hybridization wikipedia , lookup
SNP genotyping wikipedia , lookup
Cancer epigenetics wikipedia , lookup
Human genome wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Primary transcript wikipedia , lookup
Koinophilia wikipedia , lookup
Bisulfite sequencing wikipedia , lookup
United Kingdom National DNA Database wikipedia , lookup
DNA damage theory of aging wikipedia , lookup
Gel electrophoresis of nucleic acids wikipedia , lookup
Epigenomics wikipedia , lookup
Molecular cloning wikipedia , lookup
DNA nanotechnology wikipedia , lookup
Genome editing wikipedia , lookup
Cell-free fetal DNA wikipedia , lookup
DNA vaccination wikipedia , lookup
Genealogical DNA test wikipedia , lookup
DNA supercoil wikipedia , lookup
DNA barcoding wikipedia , lookup
History of genetic engineering wikipedia , lookup
Microsatellite wikipedia , lookup
Nucleic acid double helix wikipedia , lookup
Metagenomics wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Cre-Lox recombination wikipedia , lookup
Extrachromosomal DNA wikipedia , lookup
Microevolution wikipedia , lookup
Point mutation wikipedia , lookup
Nucleic acid analogue wikipedia , lookup
Non-coding DNA wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Lecture 4 BNFO 235 Usman Roshan IUPAC Nucleic Acid symbols IUPAC Amino Acid symbols Genetic code Splitting and joining strings • split: splits a string by regular expression and returns array – @s = split(/,/); – @s = split(/\s+/); • join: joins elements of array and returns a string (opposite of split) – $seq=join(“”, @pieces); – $seq=join(“X”, @pieces); Searching and substitution • $x =~ /$y/ ---- true if expression $y found in $x • $x =~ /ATG/ --- true if open reading frame ATG found in $x • $x !~ /GC/ --- true if GC not found in $x • $x =~ s/T/U/g --- replace all T’s with U’s • $x =~ s/g/G/g --- convert all lower case g to upper case G DNA regular expressions Taken from Jagota’s Perl for Bioinformatics DNA Sequence Evolution -3 mil yrs AAGACTT AAGACTT AAGGCTT AAGGCTT _GGGCTT _GGGCTT GGCTT _G_GCTT (Mouse) (Mouse) TAGACCTT TAGACCTT TAGGCCTT TAGGCCTT (Human) (Human) -2 mil yrs T_GACTT T_GACTT TAGCCCTTA TAGCCCTTA (Monkey) (Monkey) A_CACTT A_CACTT ACACTTC A_CACTTC (Lion) ACCTT A_C_CTT (Cat) (Cat) -1 mil yrs today Comparative Bioinformatics • Fundamental notion of biology: all life is related by an unknown evolutionary Tree of Life. • Therefore, if we know something about one species we can make inferences about other ones. • Also, by comparing multiple species we can make inferences about sets of species. • How do we compare DNA or protein sequences of two different species? Comparative Bioinformatics • We need to know how often do mutations from A to T occur or A to C occur. • To determine this we manually create a set of “true” alignments and estimate the likelihood of A changing to C, for example, by counting the number of time A changes to C and computing related statistics. • Now we have a realistic “scoring matrix” which can be used to evaluate how related are two species based on their DNA. Problems • Write a Perl subroutine called readmatrix that reads a DNA substitution scoring matrix from a file called “dna.txt” and stores it in a two dimensional array. The format of the scoring matrix in the file is A C G T A 10 3 1 4 C 3 12 3 5 G1 3 15 2 T 4 5 2 11 • Write a Perl subroutine called translate that takes an mRNA sequence and converts it into a protein sequence and also returns the sequence. Problems • Write a Perl program that reads in a substitution scoring matrix from a file called “matrix.txt”, reads in a pair of DNA sequences of equal length from a file called “dna.txt”, and returns the total substitution score between the two sequences. • Write a Perl program that reads pairs of DNA sequences from a file called “DNApairs.txt” and estimates the frequency of nucleotide substitutions.