Download HOW SIMILAR ARE PHENOTYPICALLY IDENTICAL CELLS AT

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Gene expression programming wikipedia , lookup

Transcript
Part 1
176
Chapter
#
HOW SIMILAR ARE PHENOTYPICALLY IDENTICAL
CELLS AT THE TRANSCRIPTIONAL LEVEL?
Subkhankulova T.*, Livesey F.J.
Gurdon Institute and Department of Biochemistry, University of Cambridge, Tennis Court Road,
Cambridge, CB2 1 QN, UK
*
Corresponding author: e-mail: [email protected]
Key words:
single neuronal stem cell, microarray expression profile, global polyA PCR-based
amplification, model distribution, statistical analysis
SUMMARY
Motivation: The expression profiling of single cells using a microarray technology
requires careful interpretation of the obtained data. It is crucial to distinguish between real
differences in mRNA levels, sampling effects and random technical noise.
Results: Simple mathematical models of expression data, based on sampling effect
were developed and compared with real two-channel microarray data. We demonstrate
that the real distribution of gene expression ratios for pairs of neuronal stem cells is much
higher than predicted from sampling model.
Conclusions: These findings confirm that there is significant difference in expression
levels between individual phenotypically identical stem cells.
INTRODUCTION
The improvements in microarray technology provide a tool to analyze cellular
heterogeneity at the level of a single-cell gene expression profiling. Amplification of the
starting mRNA population is a crucial step required to generate labeled microarray targets
from limiting amounts of RNA. It has been shown that global polyadenylated PCR-based
amplification technique generates reliable data from picogram amounts of RNA
(Subkhankulova, Livesey, 2006). However, high variability has been reported for twochannel microarray analysis at single cell level. This variability may be caused by
sampling effect (the random picking of the low abundant transcripts) and therefore
depends on mRNA species abundance and the efficiency of the amplification technique.
Otherwise, tested single cells can be not identical at transcriptional level even if they
possess high morphological and functional similarity.
Here we provide the analysis of single cell expression data, based on estimation of
efficiency of amplification technique and computational models fitted to the real
distributions. We demonstrate that the real distribution of gene expression ratios for pairs
of neuronal stemcells is much higher than predicted from sampling model. These findings
confirm that there is significant difference in expression levels between single
phenotypically identical stem cells.
BGRS’2006
Computational structural and functional genomics and transcriptomics
177
MATERIAL AND METHODS
Global polyadenylated PCR amplification. Neuronal stem cells were obtained from
dissections of mouse embryo neocoretex at day 11.5. Tissue was disintegrated with
papain dissociation system (Worthington Biochemical Corporation) and single cells were
picked by thin capillary, washed in PBS and placed in PCR tubes with cell lysis buffer
following by global polyadenylated PCR amplification, firstly suggested by Hiro
Matsunami (Subkhankulova, Livesey, 2006). PCR products were purified with the
CyScribe GFX Purification kit (Amersham Bioscience) and labeled with Cy3/Cy5 dCTP
using Klenow DNA polymerase (BD Bioscience).
Microarray hybridization. Expression microarrays containing 23232 65-mer
oligonucleotides (Sigma-Genosys) were printed on Codelink slides (Amersham).
Statistical methods. All statistical analysis was conducted using the R environment
and the R package ‘Statistics for Microarray Analysis’. Log intensity ratios for each spot
were obtained with background subtraction. Data normalization was performed using
scaled loess normalization using Limma package.
MODEL
The gene expression difference between two cells obtained in microarray analysis
generally may include a few components:
1. The real difference in gene expression profiles of two cells;
2. The difference caused by random picking of mRNA species from each cell RNA pool
(sampling effect);
3. Technical noise arising from amplification, hybridization, washing procedures, uneven
array printing, etc.
Previously we have shown that technical noise is relatively low for the microarray data
obtained in the hybridizations on the oligonucleotide arrays (Subkhankulova, Livesey,
2006), therefore we ignored it in subsequent calculations. However it is impossible to
distinguish the sampling effect from real difference between two single-cell samples until
we know that they are completely identical. So we chouse a single cell divided in two
parts as a model of identical samples (model A). The only source of diversity in
expression profiles for these two parts would be uneven picking of low abundant mRNA
copies. This diversity will strongly depend on number of the mRNA species (abundance)
for particular gene and efficiency of the amplification technique. Then we calculated the
distributions for given number of mRNA copies of particular gene from 1 to 170,
assuming that if transcript’s abundance is more then 170 the microarray data would reflect
only technical noise:
p i = ( C na )( C nN−−xa ) /( C xN ) ,
where pi – is probability for i-th gene to be selected x timesi from the mRNA pool
when cell divided into two half, N – total number of transcripts in a single cell,
n – number of transcripts picked from Ni,, a – number of mRNA transcripts for gene i-th;
x – number of transcripts for gene i-th selected from a.
To estimate the total probability distribution for transcripts with abundance from 1 to
12,000, we introduced the weight vector W ={w1,w2,…w170, w} which represents the
percentage of genes with correspondent transcript abundance, where w is the weight for
genes with abundance more then 170. We fitted the model distribution to real microarray
data obtained for hybridizations of half to half single cells content by optimization the
weight vector. After optimization weight vector was fixed for subsequent computations.
BGRS’2006
Part 1
178
Then we repeated the calculations for two single cells model (model B). We
hypothesized that expression profiles of any two neuronal progenitor cells are completely
the same. Based of estimated efficiency of amplification technique equal to 90 % and
fixed weight vector (W) we calculated the probability function (P) for genes to get the
given expression log ratio (M) using the algorithm described above. This distribution was
compared with real microarray data for targets from 12 neuronal progenitor cell cohybridized in pairs on oligonucleotide arrays.
RESULTS AND DISCUSSION
Efficiency of global polyadenylated PCR amplification. The sampling effect in
generation of microarray targets depends on two factors: the absolute numbers of mRNA
copies for given gene and efficiency of a few first steps of the amplification technique.
The higher an efficiency of the first steps of the amplification (including cDNA synthesis,
poly-adenylation, and first cycles of PCR reaction) the less mRNA transcripts are lost in
fact, and the better the precision of expression profiling of target mRNA. With each cycle
of PCR the efficiency of reaction becomes less important as total amount of original
cDNA copies is growing and loss of 1–3 % of total number of copies is less crucial. We
estimated that the first two steps (cDNA synthesis with following polyadenylation)
produced 94 % of maximally expected amounts of polyadenylated ss cDNA. The PCR
was as efficient as 97–98 % for each exponential cycle. Therefore, the most crucial steps
of amplification of original mRNA would reproduce the original mRNA profile with
approximately 90 % efficiency.
The fitting of a model distribution to real microarray data. The probability (P) for
genes to get the given expression log ratio (M) was calculated based on weight vector (W)
as described above (model A). The model distribution fitted the best to real M-values
distribution obtained from hybridizations of half cell vs. half cell (Fig. 1a) if vector W
corresponded to the distribution of mRNA species when very a few genes (6.5 %)
demonstrate relatively high abundance (more then170 copies) and majority of genes
(63 %) are represented in total mRNA pool by low numbers of transcripts (less then 50).
Comparison y expression data. We assumed that if tested single neuronal progenitor
cells are absolutely identical therefore the diversity in microarray expression data will be
entirely due to sampling effect, arising because of high proportion of low abundant genes
and loss of transcripts during the amplification procedure. From experiments described
above we estimated both these parameters: abundance of gene transcripts (W-vector) and
the efficiency of amplification technique (90 %). Now we used these parameters to
simulate the distribution of log(base2) expression ratios between two identical cells as
described above (model B).
The distributions of real expression data are much wider then it has been predicted in
our model B, where the diversity between two samples is due only to sampling effect
(Fig. 1b). It means that any pair of tested single cells possesses expression difference
between each other which also contribute to wide distribution of M-values. Therefore our
results disapprove the hypothesis about expression identity of progenitor cells.
The variability of the transcript’s levels in neuronal progenitor cells while they posses
high morphological and functional similarity may be a result of stochastic fluctuations
intrinsic normal alive cells (Levsky, Singer, 2003).
BGRS’2006
Computational structural and functional genomics and transcriptomics
179
Figure 1. The model distribution of M-values (log (base2) expression ratios) fitted to real microarray
data (a). Dashed line – model distribution based on optimized weight vector (model a); black line – real
distribution of M-values, obtained for half vs. half of single cell; pointed line – theoretical Gaussian
distribution (sd = 0.42). The distributions of log(base2) expression ratios (M) for pairs of real cells are
higher then predicted for two identical samples (b). Black solid line – predicted distribution of M-values
for two identical samples; gray lines – real distributions of expression ratios for pairs of 12 neuronal
progenitor cells. Dashed lines – theoretical Gaussian distributions with sd = 0.5 (approximation of
average microarray expression data) and sd = 0.11 (approximation of the model distribution).
CONCLUSIONS
1. We developed statistical models that can be used to validate of a single cell
microarray expression data.
2. Our results show that both sampling effects and different expression levels contribute
to the wide distribution of log(base2) ratios obtained for two-channel microarray
analysis of pnenotypically similar cells.
3. Neuronal stem cells demonstrate high heterogeneity which possibly is a result of
stochastic fluctuations in mRNA transcript levels intrinsic in cycling cell.
REFERENCES
Subkhankulova T., Livesey F.J. (2006) Comparative evaluation of linear and exponential amplification
techniques for expression profiling at the single cell level. Genome Biol., 7(3).
Levsky J.M, Singer R.H. (2003) Gene expression and the myth of the average cell. Trends in Cell
Biology, 13(1), 4–6.
BGRS’2006