* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Gene Expression Deconvolution with Single-cell Data
Public health genomics wikipedia , lookup
Quantitative trait locus wikipedia , lookup
Essential gene wikipedia , lookup
Epigenetics in stem-cell differentiation wikipedia , lookup
X-inactivation wikipedia , lookup
Oncogenomics wikipedia , lookup
Epigenetics of diabetes Type 2 wikipedia , lookup
Long non-coding RNA wikipedia , lookup
History of genetic engineering wikipedia , lookup
Genome evolution wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Gene therapy of the human retina wikipedia , lookup
Microevolution wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Biology and consumer behaviour wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Genomic imprinting wikipedia , lookup
Genome (book) wikipedia , lookup
Ridge (biology) wikipedia , lookup
Polycomb Group Proteins and Cancer wikipedia , lookup
Minimal genome wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Designer baby wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Gene expression programming wikipedia , lookup
Mir-92 microRNA precursor family wikipedia , lookup
Towards Whole-Transcriptome Deconvolution with Single-cell Data JAMES LINDSAY1 ION MANDOIU1 CRAIG NELSON2 UNIVERSITY OF CONNECTICUT 1DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING 2DEPARTMENT OF MOLECULAR AND CELL BIOLOGY Mouse Embryo ANTERIOR / HEAD Somites Node Primitive streak POSTERIOR / TAIL Unknown Mesoderm Progenitor • What is the expression profile of the progenitor cell type? NSB=node-streak border; PSM=presomitic mesoderm; S=somite; NT=neural tube/neurectoderm; EN=endoderm Characterizing Cell-types • Goal: Whole transcriptome expression profiles of individual cell-types • Technically challenging to measure whole transcriptome expression from single-cells • Approach: Computational Deconvolution of cell mixtures • Assisted by single-cell qPCR expression data for a small number of genes Modeling Cell Mixtures Mixtures (X) are a linear combination of signature matrix (S) and concentration matrix (C) 𝑋𝑚 𝑥 𝑛 = 𝑆𝑚 𝑥 𝑘 ∙ 𝐶𝑘 𝑥 𝑛 cell types mixtures cell types genes genes mixtures Previous Work 1. Coupled Deconvolution Given: X, Infer: S, C • • • NMF Minimum polytope Repsilber, BMC Bioinformatics, 2010 Schwartz, BMC Bioinformatics, 2010 2. Estimation of Mixing Proportions Given: X, S Infer: C • • • Quadratic Prog LDA Gong, PLoS One, 2012 Qiao, PLoS Comp Bio, 2o12 3. Estimation of Expression Signatures Given: X, C Infer: S • • csSAM Shen-Orr, Nature Brief Com, 2010 Single-cell Assisted Deconvolution Given: X and single-cells qPCR data Infer: S, C Approach: 1. Identify cell-types and estimate reduced signature matrix 𝑆 using single-cells qPCR data • • Outlier removal K-means clustering followed by averaging 2. Estimate mixing proportions C using 𝑆 • Quadratic programming, 1 mixture at a time 3. Estimate full expression signature matrix S using C • Quadratic programming , 1 gene at a time Step 1: Outlier Removal + Clustering Remove cells that have maximum Pearson correlation to other cells below .95 unfiltered filtered Step 1: PCA of Clustering Step 2: Estimate Mixture Proportions For a given mixture i: min( 𝑆𝑐 − 𝑥 2 ), 𝑠. 𝑡. 𝑐=1 𝑐𝑙 ≥ 0 ∀𝑙 = 0 … 𝑘 𝑥 = 𝑋𝑗,𝑖 ∀ 𝑗 = 1 … 𝑚 Reduced signature matrix. Centroid of k-means clusters 𝑐 = 𝐶𝑙,𝑖 ∀ 𝑙 = 1 … 𝑘 Step 3: Estimating Full Expression Signatures cell types mixtures cell types genes genes mixtures C: known from step 2 x: observed signals from new gene s: new gene to estimate signatures Now solve: min( 𝑠𝐶 − 𝑥 2) Experimental Design Single Cell Profiles • 92 profiles • 31 genes Simulated Concentrations • Sample uniformly at random [0,1] • Scale column sum to 1. Simulated Mixtures • Choose single-cells randomly with replacement from each cluster • Sum to generate mixture Data: RT-qPCR • CT values are the cycle in which gene was detected • Relative Normalization to house-keeping genes • HouseKeeping genes • gapdh, bactin1 • geometric mean • Vandesompele, 2002 • dCT(x) = geometric mean – CT(x) • expression(x) = 2^dCT(x) Accuracy of Inferred Mixing Proportions Concentration Matrix: Concordance Concentration by # Genes: Random Concentration by # Genes: Ranked RMSE 2^dCT Leave-one-out: Concentration: 50 mix Missing Gene RMSE 2^dCT Leave-one-out: Signature: 10 mix Missing Gene RMSE 2^dCT Leave-one-out: Signature: 50 mix Missing Gene Future Work • Bootstrapping to report a confidence interval of each estimated concentration and signature • Show correlation between large CI and poor accuracy • Mixing of heterogeneous technologies • qPCR for single-cells, RNA-seq for mixtures • Normalization (need to be linear) • Whole-genome scale • # genes to estimate 10,000+ signatures • Data! Conclusion Special Thanks to: • • • • Ion Mandoiu Craig Nelson Caroline Jakuba Mathew Gajdosik [email protected]