Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Part 1 176 Chapter # HOW SIMILAR ARE PHENOTYPICALLY IDENTICAL CELLS AT THE TRANSCRIPTIONAL LEVEL? Subkhankulova T.*, Livesey F.J. Gurdon Institute and Department of Biochemistry, University of Cambridge, Tennis Court Road, Cambridge, CB2 1 QN, UK * Corresponding author: e-mail: [email protected] Key words: single neuronal stem cell, microarray expression profile, global polyA PCR-based amplification, model distribution, statistical analysis SUMMARY Motivation: The expression profiling of single cells using a microarray technology requires careful interpretation of the obtained data. It is crucial to distinguish between real differences in mRNA levels, sampling effects and random technical noise. Results: Simple mathematical models of expression data, based on sampling effect were developed and compared with real two-channel microarray data. We demonstrate that the real distribution of gene expression ratios for pairs of neuronal stem cells is much higher than predicted from sampling model. Conclusions: These findings confirm that there is significant difference in expression levels between individual phenotypically identical stem cells. INTRODUCTION The improvements in microarray technology provide a tool to analyze cellular heterogeneity at the level of a single-cell gene expression profiling. Amplification of the starting mRNA population is a crucial step required to generate labeled microarray targets from limiting amounts of RNA. It has been shown that global polyadenylated PCR-based amplification technique generates reliable data from picogram amounts of RNA (Subkhankulova, Livesey, 2006). However, high variability has been reported for twochannel microarray analysis at single cell level. This variability may be caused by sampling effect (the random picking of the low abundant transcripts) and therefore depends on mRNA species abundance and the efficiency of the amplification technique. Otherwise, tested single cells can be not identical at transcriptional level even if they possess high morphological and functional similarity. Here we provide the analysis of single cell expression data, based on estimation of efficiency of amplification technique and computational models fitted to the real distributions. We demonstrate that the real distribution of gene expression ratios for pairs of neuronal stemcells is much higher than predicted from sampling model. These findings confirm that there is significant difference in expression levels between single phenotypically identical stem cells. BGRS’2006 Computational structural and functional genomics and transcriptomics 177 MATERIAL AND METHODS Global polyadenylated PCR amplification. Neuronal stem cells were obtained from dissections of mouse embryo neocoretex at day 11.5. Tissue was disintegrated with papain dissociation system (Worthington Biochemical Corporation) and single cells were picked by thin capillary, washed in PBS and placed in PCR tubes with cell lysis buffer following by global polyadenylated PCR amplification, firstly suggested by Hiro Matsunami (Subkhankulova, Livesey, 2006). PCR products were purified with the CyScribe GFX Purification kit (Amersham Bioscience) and labeled with Cy3/Cy5 dCTP using Klenow DNA polymerase (BD Bioscience). Microarray hybridization. Expression microarrays containing 23232 65-mer oligonucleotides (Sigma-Genosys) were printed on Codelink slides (Amersham). Statistical methods. All statistical analysis was conducted using the R environment and the R package ‘Statistics for Microarray Analysis’. Log intensity ratios for each spot were obtained with background subtraction. Data normalization was performed using scaled loess normalization using Limma package. MODEL The gene expression difference between two cells obtained in microarray analysis generally may include a few components: 1. The real difference in gene expression profiles of two cells; 2. The difference caused by random picking of mRNA species from each cell RNA pool (sampling effect); 3. Technical noise arising from amplification, hybridization, washing procedures, uneven array printing, etc. Previously we have shown that technical noise is relatively low for the microarray data obtained in the hybridizations on the oligonucleotide arrays (Subkhankulova, Livesey, 2006), therefore we ignored it in subsequent calculations. However it is impossible to distinguish the sampling effect from real difference between two single-cell samples until we know that they are completely identical. So we chouse a single cell divided in two parts as a model of identical samples (model A). The only source of diversity in expression profiles for these two parts would be uneven picking of low abundant mRNA copies. This diversity will strongly depend on number of the mRNA species (abundance) for particular gene and efficiency of the amplification technique. Then we calculated the distributions for given number of mRNA copies of particular gene from 1 to 170, assuming that if transcript’s abundance is more then 170 the microarray data would reflect only technical noise: p i = ( C na )( C nN−−xa ) /( C xN ) , where pi – is probability for i-th gene to be selected x timesi from the mRNA pool when cell divided into two half, N – total number of transcripts in a single cell, n – number of transcripts picked from Ni,, a – number of mRNA transcripts for gene i-th; x – number of transcripts for gene i-th selected from a. To estimate the total probability distribution for transcripts with abundance from 1 to 12,000, we introduced the weight vector W ={w1,w2,…w170, w} which represents the percentage of genes with correspondent transcript abundance, where w is the weight for genes with abundance more then 170. We fitted the model distribution to real microarray data obtained for hybridizations of half to half single cells content by optimization the weight vector. After optimization weight vector was fixed for subsequent computations. BGRS’2006 Part 1 178 Then we repeated the calculations for two single cells model (model B). We hypothesized that expression profiles of any two neuronal progenitor cells are completely the same. Based of estimated efficiency of amplification technique equal to 90 % and fixed weight vector (W) we calculated the probability function (P) for genes to get the given expression log ratio (M) using the algorithm described above. This distribution was compared with real microarray data for targets from 12 neuronal progenitor cell cohybridized in pairs on oligonucleotide arrays. RESULTS AND DISCUSSION Efficiency of global polyadenylated PCR amplification. The sampling effect in generation of microarray targets depends on two factors: the absolute numbers of mRNA copies for given gene and efficiency of a few first steps of the amplification technique. The higher an efficiency of the first steps of the amplification (including cDNA synthesis, poly-adenylation, and first cycles of PCR reaction) the less mRNA transcripts are lost in fact, and the better the precision of expression profiling of target mRNA. With each cycle of PCR the efficiency of reaction becomes less important as total amount of original cDNA copies is growing and loss of 1–3 % of total number of copies is less crucial. We estimated that the first two steps (cDNA synthesis with following polyadenylation) produced 94 % of maximally expected amounts of polyadenylated ss cDNA. The PCR was as efficient as 97–98 % for each exponential cycle. Therefore, the most crucial steps of amplification of original mRNA would reproduce the original mRNA profile with approximately 90 % efficiency. The fitting of a model distribution to real microarray data. The probability (P) for genes to get the given expression log ratio (M) was calculated based on weight vector (W) as described above (model A). The model distribution fitted the best to real M-values distribution obtained from hybridizations of half cell vs. half cell (Fig. 1a) if vector W corresponded to the distribution of mRNA species when very a few genes (6.5 %) demonstrate relatively high abundance (more then170 copies) and majority of genes (63 %) are represented in total mRNA pool by low numbers of transcripts (less then 50). Comparison y expression data. We assumed that if tested single neuronal progenitor cells are absolutely identical therefore the diversity in microarray expression data will be entirely due to sampling effect, arising because of high proportion of low abundant genes and loss of transcripts during the amplification procedure. From experiments described above we estimated both these parameters: abundance of gene transcripts (W-vector) and the efficiency of amplification technique (90 %). Now we used these parameters to simulate the distribution of log(base2) expression ratios between two identical cells as described above (model B). The distributions of real expression data are much wider then it has been predicted in our model B, where the diversity between two samples is due only to sampling effect (Fig. 1b). It means that any pair of tested single cells possesses expression difference between each other which also contribute to wide distribution of M-values. Therefore our results disapprove the hypothesis about expression identity of progenitor cells. The variability of the transcript’s levels in neuronal progenitor cells while they posses high morphological and functional similarity may be a result of stochastic fluctuations intrinsic normal alive cells (Levsky, Singer, 2003). BGRS’2006 Computational structural and functional genomics and transcriptomics 179 Figure 1. The model distribution of M-values (log (base2) expression ratios) fitted to real microarray data (a). Dashed line – model distribution based on optimized weight vector (model a); black line – real distribution of M-values, obtained for half vs. half of single cell; pointed line – theoretical Gaussian distribution (sd = 0.42). The distributions of log(base2) expression ratios (M) for pairs of real cells are higher then predicted for two identical samples (b). Black solid line – predicted distribution of M-values for two identical samples; gray lines – real distributions of expression ratios for pairs of 12 neuronal progenitor cells. Dashed lines – theoretical Gaussian distributions with sd = 0.5 (approximation of average microarray expression data) and sd = 0.11 (approximation of the model distribution). CONCLUSIONS 1. We developed statistical models that can be used to validate of a single cell microarray expression data. 2. Our results show that both sampling effects and different expression levels contribute to the wide distribution of log(base2) ratios obtained for two-channel microarray analysis of pnenotypically similar cells. 3. Neuronal stem cells demonstrate high heterogeneity which possibly is a result of stochastic fluctuations in mRNA transcript levels intrinsic in cycling cell. REFERENCES Subkhankulova T., Livesey F.J. (2006) Comparative evaluation of linear and exponential amplification techniques for expression profiling at the single cell level. Genome Biol., 7(3). Levsky J.M, Singer R.H. (2003) Gene expression and the myth of the average cell. Trends in Cell Biology, 13(1), 4–6. BGRS’2006