Download Lesson07RNAsecStructPred

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Homology modeling wikipedia , lookup

Epitranscriptome wikipedia , lookup

Protein structure prediction wikipedia , lookup

Biosynthesis wikipedia , lookup

Nucleic acid analogue wikipedia , lookup

Transcript
Doug Raiford
Lesson 7

RNA World Hypothesis
 RNA world evolved into the DNA
and protein world
 DNA advantage: greater chemical
stability
 Protein advantage: more flexible
and efficient enzymes
(biomolecules that catalyze)
▪ 20 amino acids vs. 4 nucleotides
▪ Chemically, more diverse
 Remnants remain in ribosomes,
nucleases, polymerases, and
splicing molecules


Primary: sequence
Secondary: double
stranded regions
>tRNA. Carries amino acid for Isolucine
AGGCUUGUAGCUCAGGUGGUUAGAGCGCACCCCUGAUAAGGGUGAGGUCGGUGGUUCA
AGUCCACUCAGGCCUACCA
 Reverse
complements

Tertiary: threedimensional
structure
T arm
CCA Tail
Acceptor Step
D arm
Anticodon arm
Anticodon

How find regions of reverse
complementation?

What do we have?
 Sequence
 A’s like pairing with U’s and
G’s like pairing with C’s
 Stronger bond (3 hydrogen
bonds) between G’s and C’s
 Should result in lowest free energy (max enthalpy)

tRNA
 Transports amino acid
to the ribosome
T arm
CCA Tail
Acceptor Step
D arm
Anticodon arm
Anticodon

Visualization





Good at finding longer basepairings (stacked base-pairs)
Need to find the conformation
that provides the minimal total
free energy
RNA often has many alternate
conformations at different
temperatures
Stacked base-pairs add
stability
Loops/bulges introduce
positive free energy and are
destabilizing


First nucleotide basepairs with last
Recurse on rest
Recurrence relations
First nucleotide basepairs with some
other
on every
 (ri , rj )  E ( Si 1, j Recurse
)

1
E
(
S
)

min

i, j
possible
set
of
(other than
last)


min
E
(
S
)

E
(
S
)
for
i

k

j
i , k 1
k, j

two strings
nucleotide (including
none)
j
G
G
i


As luck would have it…
Zuker came up with a
dynamic programming
solution
G
G
A
A
A
U
C
C
G
G
A
A
A
U
C
C
0
0
0
0
0
0
0
0
0
j
G


Start with zeros on
diagonal
Populate diagonally
i
G
0
G
0
G
A
A
A
U
C
C
G
G
A
A
A
U
C
C
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0


Will look at last
value to illustrate
Match first and last
character, recurse
on rest
 (ri , rj )  E ( Si 1, j 1 )
 1  (2)
j
i
G
G
G
A
A
A
U
G
0
0
0
0
0
0
-1 -2 -3
G
0
0
0
0
0
0
-1 -2 -3
0
0
0
0
0
-1 -2 -2
0
0
0
0
-1 -1 -1
0
0
0
-1 -1 -1
0
0
-1 -1 -1
0
0
0
0
0
0
0
0
0
G
A
A
A
U
α
A
C
U
G
C
A
0
0
-1
0
C
C
0
0
0
-1
U
-1
0
0
0
G
0
-1
0
0
C
C
j

Min of all pairs of
substrings
G
i
GGGAAAUCC
G
G-G A
C-C-U
A
GGGAAAUCC
G
A
A
G-G-G-A
C-C-U
G
A
A
A
U
A
C
C
G
G
G
A
A
A
U
0
0
0
0
0
0
C
-1 -2 -3
0
0
0
0
0
-1 -2 -3
0
0
0
0
-1 -2 -2
0
0
0
-1 -1 -1
0
0
-1 -1 -1
0
-1 -1 -1
0
C
0
0
0
0
0


n2 plus 2n for each visited cell
So O(n3)
Populate matrix plus
traverse row/column
for each cell

Any prediction method
must account for these






Now O(n4)
Interior loops
most expensive
Can exploit the
fact that along
diagonals, loops
have same size
Can calculate once
Limits search
space
Back to O(n3)
 E ( Si 1, j )
E (S )

i , j 1
E ( Si , j )  min 
min E ( Si ,k )  E ( S k 1, j )for i  k  j
 E ( Li , j )
 (ri , rj )   ( j  i  1), if Li , j is a hairpin loop
 (r , r )    E ( S
i 1, j 1 ), if Li , j is a helical region
 i j

 (ri , rj )   (k )  E (Sik 1, j 1 ), if Li, j is a bulge on i
E ( Li , j )  min
k 1
min  (ri , rj )   (k )  E ( Si 1, j  k 1 ), if Li , j is a bulge on j
 k 1
min  (ri , rj )   (k1  k 2 )  E ( Si 1 k1 , j 1 k 2 ) , if Li , j is an interior loop
 k 1
 (k )  destabiliz ing free energy of a hairpin loop with size k
  stabilizin g free energy of adjacent base pairs
 (k )  destabiliz ing free energy of a bulge of size k
 (k )  destabiliz ing free energy of an interior loop of size k



Zuker’s site
T arm
CCA Tail
Acceptor Step
D arm
Anticodon arm
Anticodon
Codon: uua
Anti-codon: aat
tRNA for Leucine in E. coli, a prototypical organism
1 gccgaggtgg tggaattggt agacacgcta ccttgaggtg gtagtgccca atagggctta
61 cgggttcaag tcccgtcctc ggtacca


Just like proteins:
conformation
What if a T-A base-pair
mutate to an G-C
 Still same function

What would this do to a
search or sequence
alignment?
GCAGGACCAUAUA
|||||||||||||
CGUCCUGGUAUAU
GCAGGACCAGAUA
|||||||||||||
CGUCCUGGUCUAU

Phenomenon known as
covariance
(not to be confused with
statistical covariance)
GCAGGACCAUAUA
|||||||||||||
CGUCCUGGUAUAU
GCAGGACCAGAUA
|||||||||||||
CGUCCUGGUCUAU



How might we locate
covariant pairs?
MSA then compare all pairwise combinations of
columns
High degree of agreement in
two columns (G’s match with
C’s, A’s match with U’s) an
indication of base-pairing
χ2 test
Compare to expected
number of parings given
sequence composition

Pairing depicted with nested parentheses
AAGACUUCGGUCUGGCGACAUUC
(((
))) (( (
)))

Mountain plots
A mountain plot represents a secondary
structure in a plot of height versus position,
where the height m(k) is given by the number
of base pairs enclosing the base at position k.
I.e. loops correspond to plateaus (hairpin loops
are peaks), helices to slopes.

Circle plot


Data structure
capable of capturing
secondary structure
Ordered Binary Tree

Productions
S → aSu | uSa | cSg | gSc
S → aS | cS | gS | uS
S → Sa | Sc | Sg | Su
S → SS
S →⍉

Derivation
S → aS
S → aSc
S → aScc
S → acSgcc
S → acgScgcc
S → acggSccgcc
S → acgggScccgcc
S → acggggSccccgcc
S → acgggguSccccgcc
S → acgggguuSccccgcc
S → acgggguucSccccgcc
S → acgggguucgSccccgcc
S → acgggguucgaSccccgcc
S → acgggguucgaaSccccgcc
S → acgggguucgaauSccccgcc
S → acgggguucgaauccccgcc

Parse tree
a←S
|
S→c
|
S→c
|
c←S→g
|
g←S→c
|
g←S→c
|
g←S→c
|
g←S→c
S→u
|
|
u←S
S→a
\
/
u←S
S→a
\
/
c←S—S→g



Conformation of RNA dictates function
Determining secondary structure can help
determine tertiary structure
Dynamic programming approach to
identifying minimum energy conformations
 Zuker MFOLD


View using dot plots, nested parens,
mountain or circular plots
Covariance: base-pairs mutate but still form
pairs, exploit to find pairings