Download f - PARNEC

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Public health genomics wikipedia , lookup

Genetic engineering wikipedia , lookup

History of genetic engineering wikipedia , lookup

Epitranscriptome wikipedia , lookup

Epigenetics of human development wikipedia , lookup

Genome (book) wikipedia , lookup

Copy-number variation wikipedia , lookup

NEDD9 wikipedia , lookup

Gene therapy wikipedia , lookup

Non-coding RNA wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

RNA silencing wikipedia , lookup

Genomic library wikipedia , lookup

Gene wikipedia , lookup

Gene desert wikipedia , lookup

Long non-coding RNA wikipedia , lookup

Gene nomenclature wikipedia , lookup

Epigenetics of diabetes Type 2 wikipedia , lookup

Gene therapy of the human retina wikipedia , lookup

Pathogenomics wikipedia , lookup

Whole genome sequencing wikipedia , lookup

Genome editing wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Mir-92 microRNA precursor family wikipedia , lookup

Primary transcript wikipedia , lookup

Genome evolution wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Gene expression profiling wikipedia , lookup

Microevolution wikipedia , lookup

Designer baby wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Gene expression programming wikipedia , lookup

Helitron (biology) wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Metagenomics wikipedia , lookup

Genomics wikipedia , lookup

RNA-Seq wikipedia , lookup

Transcript
Gene expression estimation from
RNA-Seq data
刘学军
2011.3.10
Outlines
•
•
•
•
•
Background
RPKM
Poisson model
N-URD model
Improved Poisson model
The Cycle of Forward Genetics
Sequencing
Genotype
Observation
Thinking
Phenotype
Hypothesis
Test Hypothesis
By Genetic Manipulation
Gene Deletion/Replacement
Recombinant Technology
Central Dogma
DNA
transcription
mRNA
translation
Protein
RNA-Seq protocal
•
•
•
•
RNA is isolated from a sample.
RNA is converted to cDNA fragments
High-throughput sequencing
Reads are mapped to a reference genome
(counts of reads – ‘digital’)
• Gene expression estimation
An example
reference ACGTCCCC
12 ACGTC reads
8 CGTCC reads
9 GTCCC reads
5 TCCCC reads
This gene can be summarized by a
sequence of counts 12, 8, 9, 5.
Advantages of RNA-Seq
•
•
•
•
Large dynamic range
Low background noise
Requirement of less sample RNA
Ability to detect novel transcripts
Challenges of RNA-Seq
• Sequencing non-uniformity
• Read mapping uncertainty
• Paired-end sequencing data
Sequencing non-uniformity
Source of read mapping
uncertainty
• Paralogous gene family
• Low-complexity sequence
• Alternatively spliced isoforms of the same
genes
• Uncertainty in read alignment
gene multireads and isoform multireads
Alternatively spliced isoforms
Read mapping uncertainty
基因
异构体 1
外显子 1
读 段
计数 1
…
外显子 2
读 段
计数 2
读 段
计数 3
异构体 n
… 外显子 m
…
读 段
计数 k
Paired-end sequencing
RPKM
• Reads per kilobase of the transcript per
million mapped reads to the transcriptome
--gene expression level
--isoform expression level?
Mortazavi et al. (2008) Nature Methods.
Jiang et al. (2009) Bioinformatics
Notations:
fg,i: the ith isoform of gene g.
lf: isoform length
kf: the number of transcript copies in the isoform
The total length of the transcripts is  k f l f .
f F
The probability of a read comes from some isoform f is
kf lf
pf 
 kf lf
Define  f 
kf
f F
as the expression index of isoform f.
k
l
 ff
f F
Model assumption
w: the total number of mapped reads
Given a region of length l in f, the number of reads
coming from that region,
X ~ B  w, f l 
which can be approximated by
X ~ Pois ( w f l )
Poisson model
For a gene with m exons, with lengths
and n isoforms with expressions
Observations
Xs: number of reads mapped to an exon
Poisson model
For every X, the Possion parameter is
where cij is 1 if isoform i contains exon j and
0 otherwise.
Data likelihood,
Wu et al. (2011) Bioinformatics
URD model -> N-URD model
Global bias curve (GBC)
Local bias curve (LBC)
Global bias curve
Local bias curve
Usage of the bias curve
The N-URD models
GN-URD: cij - > Gij
LN-URD: cij -> Lij
MN-URD: cij -> a*Gij +(1-a)*Lij
1-M: no. of iteration for LBC calculation is 1
5-M: no. of iteration for LBC calculation is 5
Li et al. (2010) Genome Biology
• Use variable rates for different positions.
• Poisson linear model,
Non-linear model
• Use empirical data to obtain the non-linear
relationship between sequencing
preference (ai) and the surrounding
sequences.
• Gene expression level with length L,