Download genes

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
基因表达和蛋白丰度之间的比较和分析
Comparative analysis of different label-free mass
spectrometry based protein abundance estimates and
their correlation with RNA-Seq gene expression data
宁康
[email protected]
计算生物学研究小组
中国科学院青岛生物能源与过程研究所(QIBEBT-CAS)
http://www.qibebt.ac.cn/
http://www.bioenergychina.org/
http://www.computationalbioenergy.org/
11/15/2012
Outline
 Background
 General analysis scheme
 Transcriptome analysis
 Proteome analysis
 Associated analysis
 Explanation of the correlations
 Technical issues
 Biological issues
 The best techniques…
The important biological questions
Everything goes
high-throughput…
The underline process in transcription and
translation?Transcriptome
Proteome
Gene expression
Protein abundance
But not very high
correlation…
The techniques for these questions
 On the proteomic side
LC–MS/MS or shotgun proteomics is the method of
choice for large-scale protein identification

Labeled
Label-free methods and labeled methods
Label-free
The techniques for these questions
 On the proteomic side
 Label-free: MS-1 based “peak intensity” or MS-2
based “spectrum counting”
Spectrum
counting
Peak intensity
J. Proteome Res, 2012, 11(4), 2261-2271
The techniques for these questions
 On the transcriptomic side
Next-generation sequencing has recently emerged as a
promising alternative to established microarray based
methods
Microarray
RNA-Seq
The main objectives
 Comparative analysis of different label-free protein
quantification methods using several software tools
on the proteomic side
 Correlation analysis of gene expression data derived
using microarray and RNA-Seq methods on the
genomic side
 Better understanding of correlation between gene
and protein expression
Outline
 Background
 General analysis scheme
 Transcriptome analysis
 Proteome analysis
 Associated analysis
 Explanation of the correlations
 Technical issues
 Biological issues
 The best techniques…
The overall scheme
J. Proteome Res, 2012, 11(4), 2261-2271
The datasets
comprehensively analyzed mouse
mitochondrial genes and proteins in
various mouse tissues
MitoCarta database
http://www.broadinstitute.org/pub
s/MitoCarta/
GNF1M tissue
atlas
RNA-Seq
profiling
(http://woldlab.ca
ltech.edu/rnaseq/)
The analysis procedure
mzXML
X!Tandem
PiptideProphet
ProteinProphet
Spectra count (NSAF)
msInspect
msBID
SpectrumMill
RPKM values
Outline
 Background
 General analysis scheme
 Transcriptome analysis
 Proteome analysis
 Associated analysis
 Explanation of the correlations
 Technical issues
 Biological issues
 The best techniques…
The number of comparable genes
 Mitochondrial (all) genes that could be compared at proteomic level
method
brainstem
liver
msInspect
611 (1457)
596 (1102)
msBID
566 (1197)
586 (996)
Spectral count
650 (1693)
641 (1267)
SpectrumMill
679
700
J. Proteome Res, 2012, 11(4), 2261-2271
Different techniques for expression
measurements
 At gene expression level
 At protein abundance level
Correlation between Gene Expression
and Protein Abundances
Spectrum
Mill
SpectrumMill
msInspect
0.91 (0.92)
msBID
NSAF
RPKM
Microarray
0.91 (0.91)
0.90 (0.90)
0.49 (0.51)
0.36 (0.40)
0.89 (0.91)
0.87 (0.88)
0.51 (0.53)
0.40 (0.44)
0.84 (0.89)
0.54 (0.54)
0.41 (0.42)
0.51 (0.53)
0.42 (0.44)
msInspect
0.91 (0.92)
msBID
0.91 (0.91)
0.89 (0.91)
NSAF
0.90 (0.90)
0.87 (0.88)
0.84 (0.89)
RPKM
0.49 (0.51)
0.51 (0.53)
0.54 (0.54)
0.51 (0.53)
Microarray
0.36 (0.40)
0.40 (0.44)
0.41 (0.42)
0.42 (0.44)
0.62 (0.61)
0.62 (0.61)
Correlation between Gene
Expression and Protein
Abundances
 MS-1 based “peak intensity”
 MS-2 based “spectrum count”
Changes of expressions in different
tissues
mRNA vs. protein
 Direction of changes
 The majority of genes exhibited same
direction of change based on gene
expression by mRNA-Seq and protein
abundance by msInspect for brainstem
against liver
Gene expression
Brainstem vs. Liver
Protein abundance
Technical Factors Affecting the
Correlation
 The lengths of genes
The gene length affect both gene expression
and protein abundance values
Technical Factors Affecting the
Correlation
 The low-abundance genes
 The inclusion of lower intensity genes and proteins
does not significantly affect the overall correlation.
Technical Factors Affecting the
Correlation
 Does number matter?
 The standard deviation of correlation coefficients gradually
increased: a noticeable shift in the correlation coefficients
toward lower values…
Increasing R
set size
RPKM-NSAF
RPKM-msBID
NSAF-msBID
all (527)
0.54
0.54
0.89
200
0.53 (0.03)
0.54 (0.03)
0.89 (0.01)
100
0.55 (0.06)
0.56 (0.06)
0.88 (0.02)
50
0.57 (0.10)
0.58 (0.11)
0.89 (0.03)
20
0.50 (0.12)
0.50 (0.11)
0.88 (0.05)
Technical Factors Affecting the
Correlation
“coding region dominant” genes
 Does gene structure matter?
 Restricting the analysis to these
genes only (termed “coding
region dominant” genes)
improved the correlation slightly
Biological Factors Affecting the
Correlation
 The effect of functional annotations
 Correlation between gene and protein abundances for selected GO categories
term
count
RPKM-NSAF
RPKM-msBID
mean RPKM
mean NSAF
CC:organelle inner
membrane
CC:mitochondrial lumen
189
0.67 (183)
0.64 (165)
0.44
0.52
mean
msBID
0.49
120
0.51 (113)
0.52 (102)
–0.21
0.01
–0.05
BP:generation of
precursor metabolites and
energy
BP:cofactor metabolic
process
CC:ribosome
CC:mitochondrial outer
membrane
CC:respiratory chain
94
0.60 (90)
0.62 (84)
0.92
1.06
1.02
52
0.67 (48)
0.70 (43)
0.07
0.12
0.25
68
33
0.00 (65)
0.32 (27)
0.22 (51)
0.35 (25)
–0.07
0.16
–0.4
0.15
–0.61
0.17
45
0.13 (42)
0.27 (42)
1.22
1.21
1.14
MF:hydrogen ion
transmembrane
transporter activity
BP:organic acid catabolic
process
MF:iron ion binding
29
0.57 (26)
0.61 (21)
1.41
1.36
0.96
22
0.48 (18)
0.64 (15)
–0.41
–0.2
0.1
34
0.58 (26)
0.45 (23)
0.53
0.39
0.51
MF:nucleotide binding
136
0.57 (125)
0.57 (108)
–0.1
–0.08
0
BP:nitrogen compound
biosynthetic process
38
0.74 (35)
0.73 (23)
0.43
0.33
0.48
Biological Factors Affecting the
Correlation
 The sub-location annotation issue
 Correlation based on these inner membrane genes is
better than based on all mouse mitochondrial genes
• Among the top 5 most read
articles in the journal in April
2012 (publication month)
Biological Factors Affecting the
Correlation
 The RNA/protein stability issue
 mRNA and protein half-lives in the mouse
 Protein and mRNA stability are among the most significant
factors governing the correlation between gene and protein
abundances
Quantitative model of
gene expression in
growing cells
Chen, et al., Nature, 2011
Next step: from analysis to prediction
mRNA expression
Translation rate
mathematics
model
predict
Protein
expression
Degradation rate
 Issues:
1. The translation and protein degradation rates are difficult to detect
2. The model is on the basis of stead-state in cell.
3. ……
Divide-and-conquer based on biclustering?
Quantification
and
explanations
Bi-clustering of gene expression /
protein abundance
Bi-clustering of expressions…
Cluster
1
mRNA
Protein
halfhalf-life
life
short unstable mRNAs and unstable proteins
short
2
short
long
unstable mRNAs and stable proteins
3
long
long
stable mRNAs and stable proteins
4
long
short
stable mRNAs and unstable proteins
Factors for protein degradation
• Enzyme activities:
Enzyme(energy metabolism)
high weight
Ribosomal protein
Dehydrogenase
medial
Regulatory factor
Carrier
low weight
• Other factors:
• Amino acids W,C, T,F,Y,V are enriched in labile proteins, but
E,D,K,N,R,Q are enriched in stable protein.
• Short half-life proteins are enriched for membrane proteins and signal
transduction proteins, whereas long–half-life proteins are enriched for
cytoskeleton proteins and nuclear proteins with housekeeping functions
Preliminary results
Hierarchical Cluster
Mouse liver tissue
Preliminary results
Bi-clustering
y = 0.7847x + 5.8634
R² = 0.7497
y = 0.8396x + 5.9941
R² = 0.8004
Preliminary results
Clusters of interests
Preliminary results
Bi-clustering result analysis
Preliminary results
1. Stable mRNA and protein
(1)Enzymes(citric acid cycle, energy
metabolism)
(2)Reductases
We reason that many housekeeping
genes tend to have stable mRNAs and
proteins.
2. Stable mRNA and labile protein
(1)Regulated genes expression products
(2)Dehydrogenases
(3)Oxidases
Preliminary results
Mathematics modeling
Use SVM (support vector machine) to combine multiple features?
Cluster1
Protein1
The effect of single factor
--- enzyme activity
Cluster2
Cluster3
Cluster4
?
?
Protein2
Plus: 3D
structure,
enzyme
activity, etc.
Protein3
SVM modeling
Protein4
Summary
 Spectral counts good as a basis for a more comprehensive
strategy of evaluating protein abundance trends
 Using the top 3 normalized peptide area intensities from
MS1 for protein abundance correlated best with gene
expression data collected through RNA-Seq
 Both technical and biological factors affect the correlations of
gene expression and protein abundance
 Divide-and-conquer method for designing robust computational
model for extracting gene and label-free protein abundance
information
http://www.computationalbioenergy.org/
Genotype
Phenotype
Enterotype
Big-Data
(genomics, proteomics, Raman profiling, etc.)
Community
Pure Strain
Single-cell
(Metagenomic
method)
(Genomic
method)
(Single-cell
method)
Bioenergy
Agriculture
Medicine
Cell biology
Healthcare
Environmental
monitoring
Synthetic
biology
Ecology
Food
screening
Microbial
community
Bionics
Fermentation
Bioresources
Molecular
biology
Biomaterial
Biodefense
……
Metagenomic
technology
Single-cell
technology
Single-cell data analysis platform
Single-cell data analysis platform
 Single-cell manipulation / sorting
 Automatic phenotyping
Acknowledgements

Members:

Stuff:Q Zhou, XQ Su, LH Ren, JY Wang, AH Wang,
XZ Chang, YH Qiao

Student:RR Huang, XJ Wang, BX Song, W Fang,
JQ Hu, M Gabriel (visiting), XW Cheng, J Wang









Collaborators:
JIANG Tao,(UC riverside, USA; ACM Fellow)(on
metagenomics)
WONG Limsoon (NUS, Singapore) (on network)
CUI Xingping (UC riverside,USA) (on SNP detection
and metagenomics)
Yiu SM, Li SC (Hong Kong) (on network)
Jan Baumbach (MPI, Germany) (on network)
WEI Chaochun (SJTU, China) (on metagenomics)
Alexey Nesvizhskii (U of Michigan, USA) (on
proteomics)
Ansgar Poetsch (RUB, Germany) (on proteomics)
Thank you!
http://ComputationalBioenergy.org
Research areas
Released software
Example software
Hardware platform
Thank you!
Qingdao / Tsingdao
Related documents