Download Recombinant DNA Technology I Restriction Enzymes

Document related concepts

United Kingdom National DNA Database wikipedia , lookup

Helicase wikipedia , lookup

DNA nanotechnology wikipedia , lookup

Microsatellite wikipedia , lookup

Helitron (biology) wikipedia , lookup

Transcript
STBC2023 – Introduction to Bioinformatics
Analyses & Predictive Methods
Using Nucleotide Sequences
M. Firdaus Raih
Room 1166, Bangunan Sains Biologi
Office Hours: Wednesdays
Phone: 0389215961 Email: [email protected]
Ver. 23-01-09-1
Guide
• This is a electronic self study and self assessment module which is based
on the lectures which cover Topic 4 – Analysis at Nucleotide Level of the
STBC2023 – Introduction to Bioinformatics course.
• To navigate this module, use the buttons provided mostly on the bottom
right hand corner of the page or in some slides, the bottom left hand
corner. The Home icon button will automatically set the slide back to the
key questions which we are trying to answer with this course material.
Several pages have hyperlinks which navigate immediately to either
specific slides OR navigate away from this module via the default web
browser. To return, simply click back this file. Not clicking on the buttons
properly will result in normal powerpoint slideshow mode progression of
the slides as opposed to navigating to the directed pages.
• Practicals and self assessment questions to gauge your comprehension of
a given practical session are also provided throughout. Please attempt the
practicals and the questions on your own before resorting to the solutions
or answers provided.
Pre-session Questions
•
•
•
•
•
•
•
•
•
What are nucleic acids?
What types of nucleic acids are there?
What functions do nucleic acids have?
What sort of information do nucleotide sequences carry?
What can be done with DNA sequences?
What can be done with RNA sequences?
Is molecular structure important for RNA sequences?
What is a sequence alignment?
What is the relationship of an alignment with regard to
biological function?
• Is extracting the encoded information for protein synthesis
the only sequence analysis which can be done?
Learning objectives
• Know the basic chemistry and able to understand the
diverse functions of nucleic acids.
• Able to generally list potential analyses for nucleic acid
sequence data and the applications for those analyses
based on an understanding of the functions of nucleic
acids.
• Able to formulate a strategy and present processes involved
in the analysis of nucleic acid sequence data.
• Able to comprehend the basic concepts involved in
sequence alignments in general and aligning nucleic acids
specifically as well as the relationship between an
alignment to a sequence’s biological function.
Nucleic Acids: Chemistry and Molecular Structure
What are nucleic acids?
Nucleic Acids: Chemistry and Molecular Structure
What are nucleic acids?
• Nucleic acids = polymer of nucleotides  2 types
• DNA – deoxyribonucleic acids
• RNA – ribonucleic acids
Nucleic Acids: Chemistry and Molecular Structure
What are nucleic acids?
• Nucleic acids = polymer of nucleotides  2 types
• DNA – deoxyribonucleic acids
• RNA – ribonucleic acids
What is a nucleotide?
Nucleic Acids: Chemistry and Molecular Structure
What are nucleic acids?
• Nucleic acids = polymer of nucleotides  2 types
• DNA – deoxyribonucleic acids
• RNA – ribonucleic acids
What is a nucleotide?
• Nucleotide = nucleoside + 1 phosphate group
• Nucleoside = nitrogenous base + sugar (ribose)
Nucleic Acids: Chemistry and Molecular Structure
What is the basic difference between RNA and DNA (in terms of chemistry)?
RNA
Click here for animation
DNA
Nucleic Acids: Chemistry and Molecular Structure
How can the nucleotide polymer be represented?
Nucleic Acids: Chemistry and Molecular Structure
How can the nucleotide polymer be represented?
5’
A=
C=
=
T =
G=
=
3’
3’
T
G
A
C
5’
Hydrogen bonded base
interactions and base
stacking interactions result
in stable structures of DNA /
RNA.
Seq 1. ACTG
Seq 2. TGAC
What can be done with such sequence data?
How is the analysis related to biological function?
Nucleic Acids: Biological Functions
• What is/are the function(s) of DNA?
• What is/are the function(s) of RNA?
Nucleic Acids: Biological Functions
What is the function of DNA?
– Storage of genetic information
– Proteins such as transcription factors also interact directly
with DNA as part of regulatory pathways
– Total genetic content of an organism = genome
– Genes are part of genomes
– So… what is a gene?
Nucleic Acids: Biological Functions
What is the function of DNA?
– Storage of hereditary information in genes.
– What is a gene?
While sequencing of the human genome surprised us with how many protein-coding genes there
are, it did not fundamentally change our perspective on what a gene is. In contrast, the complex
patterns of dispersed regulation and pervasive transcription uncovered by the ENCODE project,
together with non-genic conservation and the abundance of noncoding RNA genes, have
challenged the notion of the gene. To illustrate this, we review the evolution of operational
definitions of a gene over the past century--from the abstract elements of heredity of Mendel and
Morgan to the present-day ORFs enumerated in the sequence databanks. We then summarize the
current ENCODE findings and provide a computational metaphor for the complexity. Finally, we
propose a tentative update to the definition of a gene: A gene is a union of genomic
sequences encoding a coherent set of potentially overlapping functional
products. Our definition side-steps the complexities of regulation and transcription by removing
the former altogether from the definition and arguing that final, functional gene products (rather
than intermediate transcripts) should be used to group together entities associated with a single
gene. It also manifests how integral the concept of biological function is in defining genes.
Nucleic Acids: Biological Functions
What are the functions of RNA?
Nucleic Acids: Biological Functions
What are the functions of RNA?
– Information storage and transfer
• Genomes of RNA viruses
• mRNA
– Protein synthesis
• tRNA
• Peptidyl transferase
– Catalysis
• ribozymes
– Regulatory
• Small ncRNAs / microRNAs
• Riboswitches
Also see: The RNA World hypothesis – first coined by Walter Gilbert 1986, Nature
DNA (Genes): From Sequence to Function
How does a gene sequence correlate to biological function?
DNA (Genes): From Sequence to Function
How does a gene sequence correlate to biological function?
Let’s first look at:
Information about the amino acid sequence is contained
within the nucleic acids sequence.
Is that the only analysis that can be done for DNA sequences?
What other analyses, if any, can be done for DNA sequences?
Potential Analyses for DNA Sequences
What can be done with DNA sequences?
– Genome projects: DNA sequencing data need to be
assembled into complete genomes.
– Genes need to be identified / predicted.
– Comparisons of specific nucleotide level variations.
– Identification and analysis of specific nucleotide sequence
level motifs and patterns.
– Identification and analysis of polymorphisms.
Potential Analyses for DNA Sequences
What can be done with DNA sequences?
Genome projects: DNA sequencing data need to be
assembled into complete genomes.
– Genome sequencing generate fragments of sequences .
– These fragments need to be assembled into genes, chromosomes and
finally the complete genome.
– Assembly is done by analyzing for contiguous sequences (contigs).
– Contigs are basically found by aligning the short DNA sequences to
one another and finding where there are overlaps.
– More on this topic will be covered in the Genomics course in Year 3.
– After the genome is assembled, the genes need to be identified.
Potential Analyses for DNA Sequences
What can be done with DNA sequences?
From sequence data, genes need to be predicted.
– Several methods to gene prediction:
• Searching by signal – analysis of sequence signals which specify a
gene.
• Searching by content – analysis of regions showing compositional
bias that has been correlated to coding regions.
• Homology based prediction – comparison against known gene
sequence. [involve sequence alignments]
• Comparative gene prediction – comparing sequences of interest
against anonymous genomic sequences. [involve sequence
alignments]
– The prediction of eukaryotic genes from genomic DNA
data is appreciably more difficult than that of prokaryotic.
Why?
Potential Analyses for DNA Sequences
What can be done with DNA sequences?
From sequence data, genes need to be predicted.
– Several methods to gene prediction:
• Searching by signal – analysis of sequence signals which specify a
gene.
• Searching by content – analysis of regions showing compositional
bias that has been correlated to coding regions.
• Homology based prediction – comparison against known gene
sequence. [involve sequence alignments]
• Comparative gene prediction – comparing sequences of interest
against anonymous genomic sequences. [involve sequence
alignments]
– For this session, we will focus on methods which involve
sequence alignments.
Potential Analyses for DNA Sequences
What can be done with DNA sequences?
– Comparisons of specific nucleotide level variations.
• Enable differentiation at individual level or close relationships ie.
Between strains of the same species.
– Phylogenetic analysis (discussed by Dr. Khairina).
Potential Analyses for DNA Sequences
What can be done with DNA sequences?
– Identification and analysis of specific nucleotide sequence
level motifs, patterns.
– This will be discussed further in the following lecture.
– Examples:
• PCR Primer design
• Searching / mapping restriction sites
Go to the corresponding BLAST exercise NOW
or proceed to the next slide.
Potential Analyses for DNA Sequences
What can be done with DNA sequences?
– Identification and analysis of polymorphisms.
– This will be discussed further in the following lecture.
– Examples:
• SNPs – single nucleotide polymorphisms (more on SNPs)
Go to the corresponding BLAST exercise NOW
or proceed to the next slide.
Sequence Alignments
What is a sequence alignment?
A way of arranging or ‘aligning’ the similarities between sequences.
Examples:
Gaps (-) are inserted to optimize alignments.
They represent ‘indel’ mutations.
Easy to align short sequences manually. But what about longer
sequences? How can those be aligned? In order to understand this
further, let’s look at a method which we can visualize and track the
alignment. This method is called a dot plot.
Sequence Alignments
What is a dot plot?
A plot where two sequences are written
along the top row and leftmost column of a
two-dimensional matrix and a dot is placed at
any point where the characters in the
appropriate columns match.
Parts of the two sequences where the match
is continuous can be traced as a diagonal line
 region where the sequences are aligned.
A sequence can be plotted against itself and
regions that share significant similarities will
appear as lines off the main diagonal; can
occur when a protein consists of multiple
similar structural domains.
A dot plot is not able to detect divergence or
substitutions/mutations which we know can
occur.
Sequence Alignments
Dot plot for two DNA sequences
Complete the dot plot for the two DNA sequences provided below. An example can be seen below.
Seq1: CGATCGCGTAATCGGTGATCGGC
Seq2: CGGTATCGGTGATCGATCGCA
Questions:
1. Which stretch of these sequences can best be aligned to each other? (Answer)
2. Can this alignment be extended? (Answer)
3. Can you identify a repetitive sequence of 4 bases which keep occurring in both sequences? (Answer)
4. What can you attribute all the other plotted dots to? (Answer)
Sequence Alignments
Dot plot for two DNA sequences
Questions:
1. Which stretch of these sequences can best be aligned to each other?
Answer: Longest continuous diagonal line from your dot plot. ATCGGTGATCG
2. Can this alignment be extended?
Answer: Yes, it can be extended as shown below. 2 nucleotides are not aligned and may possibly be
substitutions.
3. Can you identify a repetitive sequence of 4 bases which keep occurring in both sequences?
Answer: ATCG, this can be deduced from the repeating short diagonal lines.
4. What can you attribute all the other plotted dots to?
Answer: They are the result of random sequence similarities.
Computational Sequence Alignments
• We’ve looked at manual alignments for short sequences and the dot
plot… However, manual alignments cannot be done for lengthy and
highly variable sequences. Therefore for long variable sequences,
computer aided alignments need to be done.
• How can computer aided alignments be done?
• To enable computer aided alignment, algorithms called dynamic
programming algorithms are used.
• Two common dynamic programming algorithms approach alignment
differently, via:
1. Local alignments: Smith-Waterman algorithm
2. Global alignments: Needleman-Wunsch algorithm
Computational Sequence Alignments
What is the difference between global and local alignments?
• Local alignments: Smith-Waterman algorithm
• Global alignments: Needleman-Wunsch algorithm
•
The Smith-Waterman algorithm is currently the most used because real biological
sequences are usually similar in localized portions and not over entire lengths.
–
Examples:
•
•
•
•
genes from different organisms with similar exons, different intron structures
Proteins share only certain domains
Alignments can have gaps which represent mutations. The ability to add gaps is
required as sequence diverge.
So how do we know that an alignment is meaningful?
Computational Sequence Alignments
How do we know that an alignment is meaningful?
•
Insertions and deletions are slow evolutionary processes, therefore addition of
gaps MUST be controlled to avoid large proportions of matches by inserting large
numbers of gaps.
•
Gap penalties are given to control addition of gaps. The penalty system can be
constant or proportional.
•
Scores are given for matches, while penalties are given for addition of gaps.
•
The alignment algorithm then carries out alignments in order to get the best
score.
•
Like the dot plot, a simple system as above does not seem to fully consider
divergence (ie. point mutations) – only deletions and insertions seem to be
considered.
•
How can we get around this problem?
Computational Sequence Alignments
How do we know that an alignment is meaningful? (…cont.)
•
•
Point mutations can result in change as opposed to deletion or insertion.
A matrix called a substitution matrix can be used to model the possible changes and
provide quantitative values to changes arising from point mutations.
•
The values for substitution can
take into consideration similarity
such as physico-chemical
properties for amino acids or
transition mutations for nucleic
acids.
•
But there is still probability
that a search result is random
especially for large databases.
How can we be certain the
alignment achieved is the
expected result?
Amino acid substitution matrix example
Nucleic acid substitution matrix example
Computational Sequence Alignments
How do we know that an alignment is meaningful? (…cont.)
•
•
How can we be certain the alignment achieved is the expected result?
The alignments produced are statistically evaluated.
•
As an example, for the BLAST program, a value called the Expectation (E) value is
given.
The number of different alignments with scores equivalent to or better than S that
are expected to occur in a database search by chance.
The lower the E value, the more significant the score.
•
•
Sequence Alignments
What is the rationale in doing an alignment?
• Proteins perform most cellular functions.
• The structure of a protein is an important determinant of its
function.
• If proteins share a similar structure, then it may also share a
similar function.
• We know that sequences with 30% similarity, share a similar
fold (Chothia & Lesk 1986).
Sequence Alignments
What is the rationale of doing an alignment?
• If proteins share a similar function, then it may also share a
similar structure.
Heme site
Sequence Alignments
What is the rationale in doing an alignment?
• If proteins share a similar structure, then it may also
share a similar sequence.
• But our interest here are NUCLEIC ACID sequences…
• So what is the relevance?
Sequence Alignments
What is the rationale in doing an alignment?
• If proteins share a similar structure, then it may also
share a similar sequence.
• But our interest here are NUCLEIC ACID sequences…
• So what is the relevance?
Sequence Database Searching
What is a sequence database?
What are we searching for, and how do we search for
something in sequence databases?
Sequence Database Searching
What is a sequence database?
– A collection of biological macromolecular sequences.
– Can be sequences organized into organisms, protein families,
sources etc.
– Has been covered in Topic 2. Example: NCBI GenBank.
What are we searching for, and how do we search for
something in sequence databases?
Sequence Database Searching
What is a sequence database?
– A collection of biological macromolecular sequences
– Can be sequences organized into organisms, protein families,
sources etc.
– Has been covered in Topic 2. Example: NCBI GenBank
What are we searching for, and how do we search for
something in sequence databases?
– We are searching for sequence similarity.
– We can search for sequence similarity by comparing an input
(query) sequence against sequences in the database.
– This comparison is done by aligning the query sequences to the
database sequences  one tool we can use is BLAST.
– How is this alignment relevant biologically?
Sequence Database Searching
What is BLAST?
– Basic Local Alignment Search Tool
– Implements heuristics to approximate the Smith-Waterman
algorithm and search for high scoring alignments.
– The alignment scores are then statistically evaluated – one
example is the E value discussed previously.
– BLAST is actually a family of programs.
Sequence Database Searching
What is BLAST?
– Basic Local Alignment Search Tool
– Implements heuristics to approximate the Smith-Waterman
algorithm and search for high scoring alignments.
– The alignment scores are then statistically evaluated – one
example is the E value discussed previously.
– BLAST is actually a family of programs.
BLAST
How do we use BLAST?
BLAST
How do we use BLAST?
(1) Select the BLAST program
(2) Input the sequence (query)
(3) Choose the database to search
(4) Choose optional parameters
Then click “BLAST”
BLAST
Is that it?
BLAST
Is that it?... YES and NO…. Let’s look at some
considerations and strategies for BLAST searching.
BLAST
Some considerations and strategies:
– Input sequence and search database – what is it that you’re really
interested in? Finding similarity alone or identifying homologs?
Finding homologs only or perhaps trying to find out if genes with
similar sequences encode for proteins with available structures?
The answer to these types of questions influence the type of
search program you should use and the database to search in.
Protein
vs.
Nucleotide?
BLAST
Some considerations and strategies:
– Are you interested in something quite specific?
BLAST
Some considerations and strategies:
– Did you forget to turn something on/off?
• Sequence filters – Low-complexity regions have fewer sequence characters
in them because of repeats of the same sequence character or pattern.
These sequences produce artificially high-scoring alignments that do not
accurately convey sequence relationships in sequence similarity searches.
Regions of low complexity or repetitive sequences may be readily visualized
in a dot matrix analysis of a sequence against itself. Low-complexity regions
with a repeat occurrence of the same residue can appear on the matrix as
horizontal and vertical rows of dots representing repeated matches of one
residue position in one copy of the sequence against a series of the same
residue in the second copy. Repeats of a sequence pattern appear in the
same matrix as short diagonals of identity that are offset from the main
diagonal. Such sequences should be excluded from sequence similarity
searches.
BLAST
Some considerations and strategies:
– Did you forget to turn something on/off?
• Options and parameter settings
Output of BLAST Searches
•
•
What are the components of a BLAST search output
Example: blastn vs blastx (GenBank AF390557)
blastn
This section: overview of the output alignments
blastx
Output of BLAST Searches
•
•
What are the components of a BLAST search output
Example: blastn vs blastx (GenBank AF390557)
blastn
This section: list of hits (alignments)
Read more about interpreting the output.
blastx
Output of BLAST Searches
•
•
What are the components of a BLAST search output
Example: blastn vs blastx (GenBank AF390557)
blastx
blastn
This section: the alignments
Output of BLAST Searches
•
To be a significant match, a database sequence that is listed in the program output should have a small E
(expect value) and a reasonable alignment with the query sequence (or translations of protein-encoding
DNA sequences should have these same features).
•
The E of the alignment score between the sequences gives the statistical chance that an unrelated
sequence in the database or a random sequence could have achieved such a score with the query
sequence, given as many sequences as there are in the database. The smaller the E, the more significant
the alignment. A cutoff value in the range of 0.01-0.05 may be used (Pearson 1996). In genome
comparisons, a more stringent cutoff score (10-100-10-20) may be used to find sequences that align
very well with the query sequence. However, the alignment should also be examined for absence of
repeats of the same residue or residue pattern because these patterns tend to give false high alignment
scores.
•
Filtering of low-complexity regions from the query sequence in a database search helps to reduce the
number of false positives. The alignment should also be examined for reasonable amino acid
substitutions and for the appearance of a believable alignment.
•
To gain further
confidence that the alignment between the query and database sequences is significant,
.
either the query sequence or the matched database sequence may be shuffled many times, and each
random sequence may be realigned with the other unshuffled sequence to obtain a score distribution
for a set of unrelated sequences. This distribution may then be used to evaluate the significance of the
true alignment score.
BLAST
Carrying out a BLAST search:
1. Select and copy the sequence from the GenBank database here.
1. Go to the BLAST page and carry out database searches using the above sequence.
– First carry out a search against a nucleotide database.
•
–
Which BLAST programs can you use? Name two possibilities. (Answer)
Next carry out a search against a protein database
•
•
Which BLAST program should you use? (Answer)
(i) Can you further narrow down the search? (ii) Also take for example if you were to search
for genes which code for proteins which have representative 3D structures; how would you
conduct such a search? (Answer)
BLAST
Answers to questions on carrying out a BLAST search:
•
First carry out a search against a nucleotide database.
–
Which BLAST programs can you use? Name two possibilities.
Answer: blastn and tblastx. tblastn is not a correct answer because it uses a protein query although the database
searched is a nucleotide database; the input sequence AF390557 is a DNA sequence.
•
Next carry out a search against a protein database
–
Which BLAST program should you use?
Answer: blastx
–
(i) Can you further narrow down the search? (i) Also take for example if you were to search for genes which
code for proteins which have representative 3D structures; how would you conduct such a search?
Answer: (ii) Yes, searches returning a very large number of hits can still be narrowed down. A carefully annotated
protein sequence database (e.g., PIR, SwissProt) will provide a more manageable output list of matched
sequences, and these proteins have probably been observed in the laboratory; i.e., the genes do produce a
protein product in cells. However, investigators may also wish to expand the search to include predicted genes
from gene annotations of genomic sequences that are frequently entered into the DNA sequence translation
databases (e.g., DNA sequences in the GenBank DNA sequence databases automatically translated into protein
sequences and placed in the GenPept protein sequence database). To compare a protein or predicted protein
sequence to EST sequences, the ESTs should be translated into all six possible reading frames. (ii) Such a search
can be carried out by choosing PDB as the database option. This will limit the blastx search to only protein
sequences which have known 3D structures in the PDB.
BLAST
Carrying out a BLAST search:
Retrieve the sequence provided and use it for your BLAST search.
See the GenBank page here for the sequence. Change the format of the view to
FASTA by selecting FASTA from the dropdown menu marked ‘Display’ (see here).
Use this sequence for a BLAST search.
Questions
- Identify the sequence which is used. What is this DNA usually used for? (Answer)
- Search for suitable primers to use for PCR. Which program can you use? (Answer)
- Identify restriction sites which can be found on this DNA. How many fragments will a
digestion with the restriction enzyme BsaI generate? In order to answer this question, you
will need to draw on any general web skills you already have to find the appropriate
resources. BLAST is not the tool to use in such a case. (Answer)
BLAST
Carrying out a BLAST search:
Questions
- Identify the sequence which is used. What is this DNA usually used for?
Answer: pBR322 plasmid, It is used a cloning vector for protein (IG-lambda) expression.
- Search for suitable primers to use for PCR. Which program can you use? What is the largest
product size from a possible primer pair found using a default search?
Answer: The Primer-BLAST program can be used. The largest possible product is 986bp.
- Identify restriction sites which can be found on this DNA. How many fragments will a
digestion with the restriction enzyme BsaI generate?
Answer: One such tool which can be used is NEBcutter. Cutting the pBR322 sequence with
BsaI will generate 3 fragments of DNA due to cleavage at 2 sites in the sequence.
SNPs
•
SNPs (pronounced snips) is a DNA sequence variation which occurs when a single nucleotide — A, T, C,
or G — in the genome (or other shared sequence) differs between members of a species (or between
paired chromosomes in an individual) and they comprise the largest known class of human genetic
variation.
•
SNPs may occur:
–
–
–
within coding sequences of genes,
non-coding regions of genes, or
in the intergenic regions between genes.
•
SNPs within a coding sequence will not necessarily change the amino acid sequence of the protein that
is produced, due to degeneracy of the genetic code (refer to the codon table discussed earlier) such
changes result in silent mutations (synonymous).
•
Non-synonymous changes can result in:
–
–
Mis-sense change  different amino acid coded
Nonsense change  premature STOP codon
•
Why are SNPs important? If the changes result in non-functional gene products or no gene products, a
.
diseased state
may be a possible the end result.
•
How can we find SNPS? Methods of discovering SNPs in sequence data: the easiest and most used
method is to align two sequences from the DNA of two individuals and look for high quality sequence
differences.
BLAST
Carrying out a BLAST search:
1. Select and copy the sequence from this link.
1. Go to the BLAST page and carry out a search for SNPs on the above sequence.
–
Observe the output. How is it different from previous BLAST searches you have carried out.
Correlate the output to what you know about SNPs.
Ribonucleic Acids
• RNA molecules play crucial roles in
molecular biology.
• Known functions include:
–
–
–
–
Information storage
Catalysis
Regulatory roles
Protein synthesis
• Diversity of functions associated to ‘RNA
World’ hypothesis
• Potential applications
– Molecular scaffolding (nanotechnology)
– Drug targets (riboswitches/ribosomes)
– RNA interference (RNAi)
The Economist, June 16th-22nd 2007
RNA : From Sequence to Function
What is a crucial determinant of functionality for functional RNAs?
RNA : From Sequence to Function
What is a crucial determinant of functionality for functional RNAs?
For functional RNAs, like for proteins, the 3D structure is crucial for
biological function.
RNA Structure
What are the major factors involved in
stabilizing the structure of RNA?
– Base stacking and hydrogen bonding contribute
to the stabilization of nucleic acid structure/ RNA
structure.
– RNA bases can form hydrogen bonds with each
other resulting in interactions between:
• complementary pairings in the canonical Watson
Crick interactions
• non-canonical interactions
– Hydrogen bonded base interactions are therefore
are crucial elements of a nucleic acid’s 3D
structure.
RNA Base Interactions
+
32 pairs
eg. Purine-pyrimidine base pairs (10)
after I. Tinoco, Jr. In Appendix 1 of: “The RNA World” (R. F.
Gesteland, J. F. Atkins, Eds.), Cold Spring Harbor Laboratory
Press, 1993, pp. 603-607.
RNA Structure
• Base stacking and hydrogen bonding
contribute to the stabilization of nucleic acid
structure/ RNA structure.
• RNA bases can form hydrogen bonds with each
other resulting in interactions between:
– complementary pairings in the canonical*
Watson Crick interactions
– non-canonical interactions
• Hydrogen bonded base interactions are
therefore are crucial elements of a nucleic
acid’s 3D structure
• 3 levels of RNA structure:
– Primary sequence, secondary structure, tertiary
structure.
*… from the Arabic word Qanun which in context here is better suited as the word ‘rule’
as opposed to the literal meaning of ‘law’.
RNA Structure
How do we get from sequence to structure?
How can we predict the structure of RNA?
RNA Structure
How do we get from sequence to structure?
•
•
•
•
Complex (non helical) RNA structures are not easy to
predict. Reliable structural information are sourced from
X-ray crystal structures.
Commonly, only the secondary structure level
interactions are predicted to give some insights into
what the functional structure may look like.
However such methods lack the detail which an actual
structure model is able to give, such as the exact
orientation of bases and specific atomic interactions
which are occurring.
Such interaction data is important because we know that
RNA bases can be involved in non-canonical interactions
which are different from the canonical Watson-Crick
interactions.
How can we predict the secondary structure of RNA?
•
•
•
Several programs which calculate the thermodynamics
of folding (energies of the base interactions) can be
used.
One such program is mfold by Michael Zuker.
Assessment of reliability can be done using multiple
alignments and comparisons to other predictions and
known structures.
RNA Secondary Structure Prediction – the mfold program
Predicting the secondary structure of non-coding RNA
• Copy the sequence here as input for the mfold program. All other parameters can
be left at default settings.
• Questions:
– How many paired bases are you able to observe in the predicted structure? (Answer)
– How many bases are unpaired? (Answer)
– Name the two types of structures where these unpaired bases can be found. What type
of secondary structure do you think can be observed for regions with canonical WatsonCrick base pairing? (Answer)
– Are you able to observe any base pairings which are non-canonical (non Watson-Crick)?
If yes, how many? (Answer)
– Having answered the previous two questions, are you really able to differentiate a
canonical vs a non-canonical pairing from the secondary structure diagram alone?
(Answer)
RNA Secondary Structure Prediction – the mfold program
Predicting the secondary structure of non-coding RNA
– How many paired bases are you able to observe in the predicted structure?
Answer: 29 pairs, 58 paired bases.
– How many bases are unpaired?
Answer: 27
– Name the two types of structures where these unpaired bases can be found. What type of
secondary structure do you think can be observed for regions with canonical Watson-Crick
base pairing?
Answer: Unpaired bases are found in bulges and loops. Regions with canonical pairings as in
Watson-Crick are most likely helical.
– Are you able to observe any base pairings which are non-canonical (non Watson-Crick)? If
yes, how many?
Answer: 4
– Having answered the previous two questions, are you really able to differentiate a canonical
vs a non-canonical pairing from the secondary structure diagram alone?
Answer: No, not really. Although a GU base pair is obviously non-canonical, GC and AU base
pairs which may possibly be non-canonical cannot be determined from the secondary
structure alone.
Analyses for RNA sequence data
Is predicting the secondary structure the only analyses we can do for RNA sequence
data?
Analyses for RNA sequence data
Is predicting the secondary structure the only analyses we can do for RNA sequence
data?
– NO.
– Genomic data can be analysed for the presence of the numerous types of known noncoding or functional RNA as well as possibly novel or yet to be discovered functional RNA
sequences.
– This appreciably more difficult than the problem of predicting genes. Why?
– Currently there are no widely used or general use methods.
– Such investigations are still highly exploratory and currently remain in the domain of
experts in the field.
Post-session Questions
•
•
•
•
•
•
•
•
•
What are nucleic acids?
What types of nucleic acids are there?
What functions do nucleic acids have?
What sort of information do nucleotide sequences carry?
What can be done with DNA sequences?
What can be done with RNA sequences?
Is molecular structure important for RNA sequences?
What is a sequence alignment?
What is the relationship of an alignment with regard to
biological function?
• Is extracting the encoded information for protein synthesis
the only sequence analysis which can be done?
Self Study and Self Assessment
• The self study module for this series of lectures on analyses of nucleotide
sequences are available for download from SPIN. Format of the file (this
file) is powerpoint show (.pps).
• The self assessment quiz is accessible from within the SPIN interface.
• Both these materials are for self assessment and self study use and
DOES NOT contribute to your final grades for this course.
• Also explore the references and texts listed in the course information file
and reading list.
• Explore resources made available via the self-study material.
Further Reading
Recommended Textbook (Lesk, 2nd Ed.)
• Basics – Chapter 1
– Pages 1-59
• Sequence alignments – Chapter 5, Chapter 1
– Pages 242-270
– Pages 21-59
Other Textbooks
• Baxevanis & Oullette, 3rd edition
– Chapters 5-7
• Pevsner