Download Lecture II - Baylor School of Engineering & Computer Science

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Eukaryotic transcription wikipedia , lookup

Zinc finger nuclease wikipedia , lookup

Promoter (genetics) wikipedia , lookup

RNA-Seq wikipedia , lookup

DNA repair protein XRCC4 wikipedia , lookup

Transcriptional regulation wikipedia , lookup

DNA sequencing wikipedia , lookup

RNA wikipedia , lookup

Genomic library wikipedia , lookup

Restriction enzyme wikipedia , lookup

Genetic code wikipedia , lookup

Biochemistry wikipedia , lookup

Gene expression wikipedia , lookup

Gene wikipedia , lookup

Agarose gel electrophoresis wikipedia , lookup

SNP genotyping wikipedia , lookup

DNA profiling wikipedia , lookup

Silencer (genetics) wikipedia , lookup

Real-time polymerase chain reaction wikipedia , lookup

Nucleosome wikipedia , lookup

Community fingerprinting wikipedia , lookup

Bisulfite sequencing wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Point mutation wikipedia , lookup

Transformation (genetics) wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Molecular cloning wikipedia , lookup

Gel electrophoresis of nucleic acids wikipedia , lookup

Non-coding DNA wikipedia , lookup

DNA supercoil wikipedia , lookup

Biosynthesis wikipedia , lookup

Deoxyribozyme wikipedia , lookup

Nucleic acid analogue wikipedia , lookup

Transcript
Perl
Part I: A Biology Primer
Conceptual Biology




H. sapiens did not create the genetic
code – but they did invent the transistor
Biological life is not optimized – the
modern synthesis
Nature vs. Nurture
What are the best ways to understand
the important differences the make the
difference?
A Molecular Primer

Hierarchy of the eukaryote
• Organism > System > Organ > Tissue > Cell >
Organelle > Protein > RNA > DNA

Put Simply: DNA → RNA → Protein
The Building Blocks




DNA is composed of four building blocks
•
•
Nucleic acids, nucleotides, bases
Adenine, Cytosine, Guanine, Thymine
RNA also has four building blocks
•
Adenine, Cytosine, Guanine, Uracil
Proteins are composed of 20 building blocks
•
•
Amino acids, residues
Fragments of proteins are called peptides
DNA, RNA and Proteins are polymers
Code
Nucleic
Acid(s)
w/ Sugar
w/P
A
Adenine
Adenosine
Adenylic Acid
C
Cytosine
Cytodine
Cytidylic Acid
G
Guanine
Guanosine
Guanylic Acid
T
Thymine
Tymidine
Thymidylic Acid
U
Uracil
Uridine
Uridylic Acid
M
A or C (amino)
Code
Nucleic Acid
R
A or G (purine)
V
A or C or G
W
A or T (weak)
H
A or C or T
S
C or G (strong)
D
A or G or T
Y
C or T
(pyrimidine)
B
C or G or T
K
G or T (keto)
N
A, G, C, T (any)
Code
A
Nucleic
Acid(s)
Adenine
w/
Sugar
w/P
Adenosine Adenylic
Acid
C
Cytosine
Cytodine
Cytidylic
Acid
G
Guanine
Guanosin Guanylic
e
Acid
T
Thymine
Tymidine
Thymidyli
c Acid
U
Uracil
Uridine
Uridylic
Acid
M
A or C
(amino)
Code
R
A or G
(purine)
V
W
A or T
(weak)
H
A or C or
T
S
C or G
(strong)
D
A or G or
T
C or T
(pyrimidin
e)
B
G or T
(keto)
N
Y
K
Nucleic
Acid
A or C or
G
C or G or
T
A, G, C, T
(any)
DNA
RNA
A
=
T
→
A
C
=
G
→
C
G
=
C
→
G
C
=
G
→
C
T
=
A
→
U
T
=
A
→
U
M
=
K
→
M
W
=
W
→
?
N
=
N
→
N
C
=
G
→
C
C
=
G
→
C
T
=
A
→
U
Y
=
R
→
?
B
=
V
→
?
N
=
N
→
N
K
=
M
→
?
S
=
S
→
S
T
=
A
→
U
T
=
A
→
U
DNA
RNA
•One Dimensional
A
=
T
→
A
•Two Dimensional
C
=
G
→
C
G
=
C
→
G
C
=
G
→
C
T
=
A
→
U
T
=
A
→
U
M
=
K
→
M
W
=
W
→
?
N
=
N
→
N
C
=
G
→
C
C
=
G
→
C
T
=
A
→
U
Y
=
R
→
?
B
=
V
→
?
N
=
N
→
N
K
=
M
→
?
S
=
S
→
S
T
=
A
→
U
T
=
A
→
U
•Three Dimensional
DNA
RNA
A
=
T
→
A
C
=
G
→
C
G
=
C
→
G
C
=
G
→
C
T
=
A
→
U
T
=
A
→
U
M
=
K
→
M
W
=
W
→
?
N
=
N
→
N
C
=
G
→
C
C
=
G
→
C
T
=
A
→
U
Y
=
R
→
?
B
=
V
→
?
N
=
N
→
N
K
=
M
→
?
S
=
S
→
S
T
=
A
→
U
T
=
A
→
U
DNA
RNA
A
=
T
→
A
T
=
A
→
U
G
=
C
→
G
C
=
G
→
C
T
=
A
→
U
T
=
A
→
U
M
=
K
→
M
W
=
W
→
?
N
=
N
→
N
C
=
G
→
C
C
=
G
→
C
T
=
A
→
U
Y
=
R
→
?
B
=
V
→
?
N
=
N
→
N
K
=
M
→
?
S
=
S
→
S
T
=
A
→
U
T
=
A
→
U
DNA
RNA
A
=
T
→
A
T
=
A
→
U
G
=
C
→
G
C
=
G
→
C
T
=
A
→
U
T
=
A
→
U
M
=
K
→
M
W
=
W
→
?
N
=
N
→
N
C
=
G
→
C
C
=
G
→
C
T
=
A
→
U
Y
=
R
→
?
B
=
V
→
?
N
=
N
→
N
K
=
M
→
?
S
=
S
→
S
T
=
A
→
U
T
=
A
→
U
One-Letter
Code
Amino Acid
ThreeLetter Code
One-Letter
Code
Amino Acid
ThreeLetter Code
C
Cysteine
Cys
D
Aspartic
acid
Asp
E
Glutamic
Acid
Glu
F
Phenylalanin
Phe
G
Glycine
Gly
H
Histidine
His
I
Isoleucine
Ile
K
Lysine
Lys
L
Leucine
Leu
M
Methionine
Met
N
Asparagine
Asn
P
Proline
Pro
Q
Glutamine
Gln
R
Argine
Arg
S
Serine
Ser
T
Threonine
Thr
V
Valine
Val
W
Tryptophan
Trp
X
Unknown
Xxx
Y
Tyrosine
Tyr
Z
Glutamic acid or Glutimine
Glx
DNA
RNA
A
=
T
→
A
T
=
A
→
U
G
=
C
→
G
C
=
G
→
C
T
=
A
→
U
T
=
A
→
U
M
=
K
→
M
W
=
W
→
?
N
=
N
→
N
C
=
G
→
C
C
=
G
→
C
T
=
A
→
U
Y
=
R
→
?
B
=
V
→
?
N
=
N
→
N
K
=
M
→
?
S
=
S
→
S
T
=
A
→
U
T
=
A
→
U
Met (Start)
Leu
AA?, AU?, CA?, CU? -> Asn, Lys, Ile, Met,
His, Gln, Val
Pro
UU?, UG?, UC?, CU?, CG?, CC? -> Phe, Leu,
Cys, Stop, Trp, Ser, Leu, Arg, Pro
UCU, UGU, GCU, GGU -> Ser, Cys, Ala, Gly
DNA
RNA
A
=
T
→
A
T
=
A
→
U
G
=
C
→
G
C
=
G
→
C
T
=
A
→
U
T
=
A
→
U
M
=
K
→
M
W
=
W
→
?
N
=
N
→
N
C
=
G
→
C
C
=
G
→
C
T
=
A
→
U
Y
=
R
→
?
B
=
V
→
?
N
=
N
→
N
K
=
M
→
?
S
=
S
→
S
T
=
A
→
U
T
=
A
→
U
Cys
Phe, Leu
A?C, U?C -> Ile, Thr, Asn, Ser, Phe, Ser, Tyr,
Cys
Leu
U?U, U?G, C?U, C?G -> Phe, Ser, Tyr, Cys,
Leu, Stop, Trp, Leu, Pro, His, Arg, Gln
GUU, CUU -> Val, Leu
Protein
DNA
RNA
Lecture II
Part II: One-Dimensional
Strings
Hello World…




A few perls of wisdom
Concatenating Sequences
Making a reverse complement
Read sequences from data files
Every journey starts with a first
10bp
#!/usr/bin/perl –w
#storing DNA in a variable, and printing it out
#First, storing DNA in a variable called $DNA
$DNA = ‘CGGGCTATTC’;
#Next, print the DNA onto the screen
print $DNA;
#Finally, specifically tell the program to end
exit;
Every journey starts with a first
10bp
#!/usr/bin/perl –w
#storing DNA in a variable, and printing it out
#First, storing DNA in a variable called $DNA
$DNA = ‘CGGGCTATTC’;
#Next, print the DNA onto the screen
print $DNA;
#Finally, specifically tell the program to end
exit;
Every journey starts with a first
10bp
#!/usr/bin/perl –w
#storing DNA in a variable, and printing it out
#First, storing DNA in a variable called $DNA
$DNA = ‘CGGGCTATTC’;
#Next, print the DNA onto the screen
print $DNA;
#Finally, specifically tell the program to end
exit;
Every journey starts with a first
10bp
#!/usr/bin/perl –w
#storing DNA in a variable, and printing it out
#First, storing DNA in a variable called $DNA
$DNA = ‘CGGGCTATTC’;
#Next, print the DNA onto the screen
print $DNA;
#Finally, specifically tell the program to end
exit;
Concatenating DNA Fragments
#!/usr/bin/perl –w
#Store DNA in 2 variables
$DNA1 = ‘AGTGCGTCGCTAG’;
$DNA2 = ‘ACCGCATGCATTG’;
#using string interpolation
$DNA3 = “$DNA1$DNA2”;
print “$DNA3\n\n”;
#dot operator
$DNA3 = $DNA1 . $DNA2;
print “$DNA3\n\n”;
Print $DNA1,$DNA2,”\n”;
exit;
Transcription: DNA to RNA
#!/usr/bin/perl –w
$DNA = ‘ACGACTGCACGATCGTACG’;
#print the DNA onto the screen
print “$DNA\n\n”;
#Transcribe the DNA->RNA by substituting all T’s with U’s
$RNA = $DNA;
$RNA =~ s/T/U/g;
#print the result to the screen
print “Here is the result of DNA->RNA:\t$RNA\n\n”;
exit;
Variable
Binding Operator
Delimiters to separate the operator
$RNA =~ s/T/U/g;
Substitute
operator
Pattern modifier
g = globally
Pattern to be
replaced
Replacement
Text of replace
pattern
i = case insensititve
m = multiline
s = single line
x = permit comments
o = compile only once for
speed
e = treat replacement as
Perl code
Calculating the Reverse Complement
#!usr/bin/perl –w
$DNA = ‘ACGTCAGTCGAGCT’;
#print the starting DNA onto the screen
print “Here is the starting DNA:\t$DNA\n\n”;
#Calculate the reverse complement, first copying the DNA onto
#a new variable called $revcom
$revcom = reverse $DNA;
#substitute all bases by their complement
$revcom =~ s/A/T/g;
$revcom =~ s/T/A/g;
$revcom =~ s/C/G/g;
$revcom =~ s/G/C/g;
print “$revcom\n”;
Calculating the Reverse Complement
#!usr/bin/perl –w
$DNA = ‘ACGTCAGTCGAGCT’;
#print the starting DNA onto the screen
print “Here is the starting DNA:\t$DNA\n\n”;
#Calculate the reverse complement, first copying the DNA onto
#a new variable called $revcom
$revcom = reverse $DNA;
#substitute all bases by their complement
$revcom =~ tr/ACGTacgt/TGCAtgca/;
print “$revcom\n”;
Reading Data from Files
#### Sample Data in FASTA Format ####
>NM_012345 | Sample Data | Muppet Stuffing Protein
MNIDDKLEFGDEMGOSSRTMV
FGDLVRSMPHOEILAADEVLISHEE
GLOYAKLEFGDEMGOGHDDEFGVY
Reading Files
#!/usr/bin/perl –w
#The filename of the file containing the sequence data
$proteinFilename = ‘NM_012345.pep’;
#open the file, and associate a ‘filehandle’ with it
open (PROTEINFILE {IN}, $proteinFilename);
#assign file with an input operator
$muppetProtein = <PROTEINFILE>;
#print the protein file
print “Here is the protein:\t$muppetProtein\n\n”;
exit;
Reading Data from Files
#### Sample Data in FASTA Format ####
>NM_012345 | Sample Data | Muppet Stuffing Protein
MNIDDKLEFGDEMGOSSRTMV
FGDLVRSMPHOEILAADEVLISHEE
GLOYAKLEFGDEMGOGHDDEFGVY
Lets try this again …
#!usr/bin/perl –w
$proteinFilename = ‘NM_012345.pep’;
open(PROTEINFILE, $proteinFilename);
$muppetProtein = <PROTEINFILE>;
print “Here is the first line:\t$muppetProtein\n\n”;
$muppetProtein = <PROTEINFILE>;
print “Here is the second line:\t$muppetProtein\n\n”;
$muppetProtein = <PROTEINFILE>;
print “Here is the third line:\t$muppetProtein\n\n”;
close PROTEINFILE;
exit;
Using Arrays to Read Files
#!usr/bin/perl –w
$proteinFilename = ‘NM_012345’;
#open the file
open(PROTEINFILE, $proteinFilename);
#Read the sequence data from the file, and store it in the array
#variable @protein
@protein = <PROTEINFILE>;
#print the protein onto the screen
print @protein;
close PROTEINFILE;
exit;
Arrays
#Here’s one way to declare an array
@bases = (‘A’,’C’,’G’,’T’);
#Now print each element of the array
print “\nFirst element: “ , $bases[0];
print “\nSecond Element: “ , $bases[1];
print “\nThird Element: “ , $bases[2];
print “\nFourth Element: “ , $bases[3];
Arrays
#Here’s one way to declare an array
@bases = (‘A’,’C’,’G’,’T’);
#Now print each element of the array in a row
print “\nHere are all of the bases: “ , @bases;
#This prints out: ‘Here are all of the bases: ACGT’
#But, you can print them out with spaces in between
print “\nHere they are with spaces” , “@bases”;
Arrays
#Here’s one way to declare an array
@bases = (‘A’,’C’,’G’,’T’);
#Here’s how to take an element off of the end
$base1 = pop @bases;
print “Here’s the last element: “, $base1, “\n\n”;
#The other elements still remain
print “\nHere are the remaining elements: ” , “@bases”;
Arrays
#Here’s one way to declare an array
@bases = (‘A’,’C’,’G’,’T’);
#Here’s how to take an element off of the front
$base2 = shift @bases;
print “Here’s the first element: “, $base2, “\n\n”;
#The other elements still remain
print “\nHere are the remaining elements: ” , “@bases”;
Arrays
#Here’s one way to declare an array
@bases = (‘A’,’C’,’G’,’T’);
#Here’s how you put an element at the beginning of an array
#Our example will put the last element at the beginning
$base1 = pop @bases;
unshift (@bases, $base1);
print “Here’s the last element put first: “ , “@bases\n\n”;
Arrays
#Here’s one way to declare an array
@bases = (‘A’,’C’,’G’,’T’);
#Here’s how you put an element at the end of an array
#Our example will put the first element at the end
$base1 = shift @bases;
push (@bases, $base1);
print “Here’s the first element put last: “ , “@bases\n\n”;
Arrays
#Here’s one way to declare an array
@bases = (‘A’,’C’,’G’,’T’);
#Here’s how to reverse an array
@reverse = reverse @bases;
#Here’s how to get the length
print scaler @bases, “\n\n”;
#Here’s how to insert an element at an arbitrary place
splice (@bases, 2, 0, ‘X’);
Arrays
#Arrays can be evaluated as lists and scalers
@bases = (‘A’,’C’,’G’,’T’);
#Here’s how to print the array
print “@bases\n”;
#Here’s how to assign it to a scaler
$a = @bases; print $a;
#Here’s how to assign an array to a list
($a) = @bases; print $a;