Download Methods to analyze RNA expression - RNA

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

X-inactivation wikipedia , lookup

Human genome wikipedia , lookup

NEDD9 wikipedia , lookup

DNA sequencing wikipedia , lookup

Epigenetics of diabetes Type 2 wikipedia , lookup

Polyadenylation wikipedia , lookup

Gene expression programming wikipedia , lookup

History of genetic engineering wikipedia , lookup

Short interspersed nuclear elements (SINEs) wikipedia , lookup

Pathogenomics wikipedia , lookup

RNA world wikipedia , lookup

Epigenomics wikipedia , lookup

RNA interference wikipedia , lookup

Cell-free fetal DNA wikipedia , lookup

Genome evolution wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Non-coding DNA wikipedia , lookup

Designer baby wikipedia , lookup

Whole genome sequencing wikipedia , lookup

Gene wikipedia , lookup

Microevolution wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Nucleic acid analogue wikipedia , lookup

Nucleic acid tertiary structure wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

RNA wikipedia , lookup

Epigenetics of human development wikipedia , lookup

Long non-coding RNA wikipedia , lookup

Bisulfite sequencing wikipedia , lookup

Gene expression profiling wikipedia , lookup

Deoxyribozyme wikipedia , lookup

History of RNA biology wikipedia , lookup

Epitranscriptome wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

RNA silencing wikipedia , lookup

Genomic library wikipedia , lookup

Non-coding RNA wikipedia , lookup

Genomics wikipedia , lookup

Mir-92 microRNA precursor family wikipedia , lookup

Metagenomics wikipedia , lookup

Primary transcript wikipedia , lookup

RNA-Seq wikipedia , lookup

Transcript
Methods to analyze RNA
expression
Is a specific RNA transcribed?
In which cells, under which condition is it
expressed?
How much is there?
Is the amount different from other cells/times?
RNA analysis techniques
1.  Low-to-mid-plex techniques:
–  Northern blot
–  Fluorescent in situ hybridization
–  Reverse transcription PCR (RT-PCR)
2.  Higher-plex techniques:
–  DNA microarray
–  Tiling array
–  RNA-Seq
2
RNA sequencing analysis
The newest technology for RNA expression analysis
•  Provides data for all the genes expressed in a
particular sample (tissues, conditions, stages, etc.)
•  Coupled with high throughput sequencing
•  Quantitative
•  Highly technical and expensive
•  Rely heavily on computational statistical analysis
3
Applications of RNA-Seq
From raw RNA seq data to transcriptome and
differential expression analysis.
1.  Abundance estimation
2.  Alternative splicing
3.  RNA editing
4.  Finding novel transcripts
And many more…..
4
Experimental design
•  Control and experimental conditions
•  Every sample in at least duplicate (triplicate better)
•  Uniformity of tissue used can be critical to detect
small significant differences in transcription levels.
5
Steps in RNA-seq
6
Overall flow chart for RNA-seq
7
Library construction
First RNA needs to be converted to cDNA as
sequencers are designed for DNA not RNA sequencing.
This is done using a special RNA-dependent DNA
polymerase known as Reverse transcriptase (RT). The
product is known as cDNA.
8
Library construction
Each sample (tissue, time
point, etc..) is used to
prepare RNA. Each RNA
then is converted into a
library of cDNA fragments.
The libraries from several
different samples will be
sequenced together, so each
library has to receive an
individual tag (AKA index).
9
Library construction
The double-stranded cDNA fragments are attached
(ligated) to small ds nucleotide sequences.
The SP will be used for sequencing later.
The index is a short DNA sequence that is specific to
each library.
10
Library construction
After size selection and limited
amplification by PCR, the library
representing short fragments of
all the RNAs present in your
initial tissues/ embryos/cells is
ready for sequencing.
When done correctly the
number of DNA fragments
corresponding to one mRNA is
proportional to the initial amount
of that specific mRNA.
11
Sequencing
•  The sequencer
• 
And the “flow cell” where the sequencing takes
place. Many libraries can be loaded together
on one flow cell. The data from each library
will be distinguished later because they each
have a different “index”.
12
Data quality
Several quality checks are done by the sequencer software
to ensure that the sequencing reactions worked correctly.
Do not worry about these
You will receive individual files where the reads from each
of your libraries have been sorted out by their index.
13
Initial data analysis
Initially each library is analyzed separately except for the two
files of each library if paired ends sequencing was done. 1)
The short reads are aligned on the reference genome if
available 2) The transcript(s) from each gene are
reconstructed. At that point the analysis is done with all the
libraries together looking at 3) differential expression and
statistical significance.
14
Work flow for data analysis
15
Work flow for data analysis
This flow chart also
shows when each
sample is done
separately or
compare
16
Work flow for data analysis
From RNA-seq reads
to differential
expression results:
Oshlack et al. Genome
Biology 2010, 11:220
17
The Tuxedo suite
A series of software can be used in
succession to perform the series of
steps needed for alignment, assembly
and expression analysis.
It is known as the Tuxedo protocol.
Information about software and
Tuxedo workflow was first described
in: Differential gene and transcript expression
analysis of RNA-seq experiments with TopHat and
Cufflinks.
Trapnell C, Roberts A, Goff L, Pertea G, Kim D,
Kelley DR, Pimentel H, Salzberg SL, Rinn JL, &
Pachter L (2012) Nature Protocols 7, 562–578
18
The Tuxedo suite
19
BOWTIE
20
BOWTIE
21
Bowtie and Tophat
Bowtie and TopHat align
each short RNA
sequence read to the
reference genome.
Bowtie does the initial
fast alignment. TopHat
handles reads spanning
two exons, including
alternative splicing.
22
Tophat
23
Tophat step 1
24
Tophat step 2
25
Tophat step 3
26
Tophat step 4
27
Tophat overall
28
Bowtie and Tophat
29
Visualizing TopHat data with IGV
30
The Tuxedo suite
31
Cufflinks
Cufflinks identifies splice
sites based on TopHat
placement of split reads
and will merge the short
reads to create a gene structure
based on RNAseq data.
It will also evaluate the proportion of each
alternative splice products.
The expression of each isoform will be
expressed in RPKM: Reads (for that
transcript) Per Kilobase (of gene) per
Million reads .
32
Cufflinks
For figure on previous slide
33
The Tuxedo suite
34
The Tuxedo suite
35
Differential expression
DEG = Differentially
Expressed Genes
36
Cuffdiff
37
Cuffdiff
•  Cuffdiff takes all the reads that
map to overlapping transcripts
(like isoforms) and infers which
reads belong to each variant
transcript from one gene.
•  It starts with the reads that map
uniquely to each isoform, then
iterate to find the most likely
distribution of the remaining
reads.
38
Cummerbund
•  Visualization of data generated by Cuffdiff
•  Graphical representation of over or under
expression
•  Classification of genes by molecular pathways
•  Rather complicated, but some easy tools are available
in DNA subway
39
Cummerbund
40
Differential gene expression
41
Volcano plot
42
Expression Profile of Specific
Pathways: HeatMap
43