Download Gene Expression Deconvolution with Single-cell Data

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Gene Expression Deconvolution
with Single-cell Data
JAMES LINDSAY1
CAROLINE JAKUBA2
ION MANDOIU1
CRAIG NELSON2
UNIVERSITY OF CONNECTICUT
1DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
2DEPARTMENT OF MOLECULAR AND CELL BIOLOGY
Mouse Embryo
ANTERIOR / HEAD
Somites
Node
Primitive streak
POSTERIOR / TAIL
Unknown Mesoderm Progenitor
• What is the expression
profile of the progenitor
cell type?
NSB=node-streak border;
PSM=presomitic mesoderm; S=somite;
NT=neural tube/neurectoderm;
EN=endoderm
Characterizing Cell-types
• Goal: Whole transcriptome
expression profiles of individual
cell-types
• Technically challenging to measure
whole transcriptome expression
from single-cells
• Approach: Computational
Deconvolution of cell mixtures
• Assisted by single-cell qPCR
expression data for a small number
of genes
Modeling Cell Mixtures
Mixtures (X) are a linear combination of signature matrix (S) and
concentration matrix (C)
𝑋𝑚 𝑥 𝑛 = 𝑆𝑚 𝑥 𝑘 ∙ 𝐶𝑘 𝑥 𝑛
cell types
mixtures
cell types
genes
genes
mixtures
Previous Work
1. Coupled Deconvolution
Given: X, Infer: S, C
•
•
•
NMF
Minimum polytope
Repsilber, BMC Bioinformatics, 2010
Schwartz, BMC Bioinformatics, 2010
2. Estimation of Mixing Proportions
Given: X, S Infer: C
•
•
•
Quadratic Prog
LDA
Gong, PLoS One, 2012
Qiao, PLoS Comp Bio, 2o12
3. Estimation of Expression Signatures
Given: X, C Infer: S
•
•
csSAM
Shen-Orr, Nature Brief Com, 2010
Single-cell Assisted Deconvolution
Given: X and single-cells qPCR data
Infer: S, C
Approach:
1. Identify cell-types and estimate reduced signature
matrix 𝑆 using single-cells qPCR data
•
•
Outlier removal
K-means clustering followed by averaging
2. Estimate mixing proportions C using 𝑆
•
Quadratic programming, 1 mixture at a time
3. Estimate full expression signature matrix S using C
•
Quadratic programming , 1 gene at a time
Step 1: Outlier Removal + Clustering
Remove cells that have maximum Pearson
correlation to other cells below .95
unfiltered
filtered
Step 2: Estimate Mixture Proportions
For a given mixture i:
min( 𝑆𝑐 − 𝑥
2
), 𝑠. 𝑡.
𝑐=1
𝑐𝑙 ≥ 0 ∀𝑙 = 0 … 𝑘
𝑥 = 𝑋𝑗,𝑖 ∀ 𝑗 = 1 … 𝑚
𝑐 = 𝐶𝑙,𝑖 ∀ 𝑙 = 1 … 𝑘
Step 3: Estimating Full Expression Signatures
cell types
mixtures
cell types
genes
genes
mixtures
C: known from step 2
x: observed signals from new gene
s: new gene to estimate signatures
Now solve:
min( 𝑠𝐶 − 𝑥
2)
Experimental Design
Single Cell Profiles
•
92 profiles
•
31 genes
Simulated Concentrations
•
Sample uniformly at random [0,1]
•
Scale column sum to 1.
Actual Mixtures
•
12 mixtures
•
31 genes
Dimensions
•
k=3
•
m = 31
•
n = 92, 12
•
# mixtures = {10…300}
Simulated Mixtures
•
Choose single-cells randomly with
replacement from each cluster
•
Sum to generate mixture
Data Processing
RT-qPCR
• CT values are the cycle in which gene was detected
• Relative Normalization to house-keeping genes
• HouseKeeping genes
• gapdh, bactin1
• geometric mean
• Vandesompele, 2002
• dCT(x) = geometric mean – CT(x)
• expression(x) = 2^dCT(x)
Accuracy of Inferred Mixing Proportions
Concentration Matrix: Concordance
predicted
Leave-one-out Accuracy of Inferred Gene
Expression Signatures
Future Work
• Apply gene signature estimation technique using
more genes in mixed samples
• Identify PSM-Pr Signature
• Confirm the anatomical location of the putative PSM-Pr cell
population through exhaustive ISH
Conclusion
Special Thanks to:
•
•
•
•
Ion Mandoiu
Craig Nelson
Caroline Jakuba
Mathew Gajdosik
[email protected]