Download Lecture 4

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

DNA wikipedia , lookup

DNA sequencing wikipedia , lookup

Frameshift mutation wikipedia , lookup

Zinc finger nuclease wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Genetic code wikipedia , lookup

Gene wikipedia , lookup

Mitochondrial DNA wikipedia , lookup

DNA polymerase wikipedia , lookup

Genomic library wikipedia , lookup

Replisome wikipedia , lookup

DNA profiling wikipedia , lookup

Mutagen wikipedia , lookup

Mutation wikipedia , lookup

Comparative genomic hybridization wikipedia , lookup

SNP genotyping wikipedia , lookup

Cancer epigenetics wikipedia , lookup

Human genome wikipedia , lookup

Nucleosome wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Primary transcript wikipedia , lookup

Koinophilia wikipedia , lookup

Bisulfite sequencing wikipedia , lookup

United Kingdom National DNA Database wikipedia , lookup

DNA damage theory of aging wikipedia , lookup

Gel electrophoresis of nucleic acids wikipedia , lookup

Epigenomics wikipedia , lookup

Molecular cloning wikipedia , lookup

DNA nanotechnology wikipedia , lookup

Genome editing wikipedia , lookup

Cell-free fetal DNA wikipedia , lookup

DNA vaccination wikipedia , lookup

Genealogical DNA test wikipedia , lookup

DNA supercoil wikipedia , lookup

DNA barcoding wikipedia , lookup

History of genetic engineering wikipedia , lookup

Microsatellite wikipedia , lookup

Nucleic acid double helix wikipedia , lookup

Metagenomics wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Cre-Lox recombination wikipedia , lookup

Extrachromosomal DNA wikipedia , lookup

Microevolution wikipedia , lookup

Point mutation wikipedia , lookup

Genomics wikipedia , lookup

Nucleic acid analogue wikipedia , lookup

Non-coding DNA wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Deoxyribozyme wikipedia , lookup

Helitron (biology) wikipedia , lookup

Transcript
Lecture 4
BNFO 235
Usman Roshan
IUPAC Nucleic Acid symbols
IUPAC Amino Acid symbols
Genetic code
Splitting and joining strings
• split: splits a string by regular
expression and returns array
– @s = split(/,/);
– @s = split(/\s+/);
• join: joins elements of array and returns
a string (opposite of split)
– $seq=join(“”, @pieces);
– $seq=join(“X”, @pieces);
Searching and substitution
• $x =~ /$y/ ---- true if expression $y
found in $x
• $x =~ /ATG/ --- true if open reading
frame ATG found in $x
• $x !~ /GC/ --- true if GC not found in $x
• $x =~ s/T/U/g --- replace all T’s with U’s
• $x =~ s/g/G/g --- convert all lower case
g to upper case G
DNA regular expressions
Taken from Jagota’s Perl for Bioinformatics
DNA Sequence Evolution
-3 mil yrs
AAGACTT
AAGACTT
AAGGCTT
AAGGCTT
_GGGCTT
_GGGCTT
GGCTT
_G_GCTT
(Mouse)
(Mouse)
TAGACCTT
TAGACCTT
TAGGCCTT
TAGGCCTT
(Human)
(Human)
-2 mil yrs
T_GACTT
T_GACTT
TAGCCCTTA
TAGCCCTTA
(Monkey)
(Monkey)
A_CACTT
A_CACTT
ACACTTC
A_CACTTC
(Lion)
ACCTT
A_C_CTT
(Cat)
(Cat)
-1 mil yrs
today
Comparative Bioinformatics
• Fundamental notion of biology: all life is
related by an unknown evolutionary Tree of
Life.
• Therefore, if we know something about one
species we can make inferences about other
ones.
• Also, by comparing multiple species we can
make inferences about sets of species.
• How do we compare DNA or protein
sequences of two different species?
Comparative Bioinformatics
• We need to know how often do mutations
from A to T occur or A to C occur.
• To determine this we manually create a set of
“true” alignments and estimate the likelihood
of A changing to C, for example, by counting
the number of time A changes to C and
computing related statistics.
• Now we have a realistic “scoring matrix”
which can be used to evaluate how related
are two species based on their DNA.
Problems
• Write a Perl subroutine called readmatrix that reads a
DNA substitution scoring matrix from a file called
“dna.txt” and stores it in a two dimensional array. The
format of the scoring matrix in the file is
A C
G
T
A 10 3
1
4
C 3 12
3
5
G1 3
15
2
T 4 5
2
11
• Write a Perl subroutine called translate that takes an
mRNA sequence and converts it into a protein
sequence and also returns the sequence.
Problems
• Write a Perl program that reads in a
substitution scoring matrix from a file called
“matrix.txt”, reads in a pair of DNA sequences
of equal length from a file called “dna.txt”, and
returns the total substitution score between
the two sequences.
• Write a Perl program that reads pairs of DNA
sequences from a file called “DNApairs.txt”
and estimates the frequency of nucleotide
substitutions.