Download Methods in Imaging Chromosomes

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

X-inactivation wikipedia , lookup

Chromosome wikipedia , lookup

Neocentromere wikipedia , lookup

Polyploid wikipedia , lookup

The Bell Curve wikipedia , lookup

Gene expression programming wikipedia , lookup

Karyotype wikipedia , lookup

Transcript
Methods in Imaging Chromosomes
Aalok Shah
1
Introduction
Experimental evidence indicates that an understanding of the physical organization and
positioning of chromosomes may lead to insight into understanding gene expression. Specifically, it is believed that certain locations in the nucleus have a higher propensity of active
genes [8]. Moreover, it is believed that the relative positioning of genes within the chromosome may also play a role in gene expression [8]. Thus, an exploration of the geometry
of chromosomes should lead to a deeper understanding of how genes are activated.
To achieve this understanding, we are considering a simpler set of chromosomes than
the human body: polytene chromosomes in the fruit fly, known as Drosophila. These are
ideal because of the availability of data and the prior work done on Drosophila. In order to
understand these polytene chromosomes, a geometrical representation must be developed,
and we must be able to simulate an image in order to compare the representation with
available image data. The latter will be the primary focus of this paper.
2
2.1
Background
The Fourier Transform
Using the Projection-Slice Theorem, we know that the Fourier Transform of a projection
is equal to the projection of the Fourier Transform. That is, the Fourier Transform of a
projection is equivalent to the Fourier Transform of the experimental image. For instance,
consider a 3-dimensional image, given by the function ρ(x, y, z). This image, or function,
can be reconstructed by summing the Fourier Transforms of slices along the z-direction
and then back transforming the total function.
As described in [4] and [8], we shall use a filamentary model to represent chromosomes
in the image. This means that the distribution ρ(x, y, z) is only parameterized by arc
length along a space curve r(s), where s is the arc length parameter. Mathematically, this
implies that
Z
L
λ(s)δ[x − r(s)]ds,
ρ(x) =
0
1
(1)
Figure 1: Sample X-ray image slice of Drosophila. The band pattern along the chromosome
represents the allele pattern.
where x is the standard position vector and λ(s) is the aforementioned filamentary density
function. Using this, the Fourier Transform becomes:
F(k) =
1
2π
Z
ρ(x) exp (ik · x)dx =
R3
1
2π
Z
L
λ(s) exp [ik · r(s)]ds.
(2)
0
In [4], Hausrath and Goriely denote this as the Frenet-Fourier Transform. If the curvature
and torsion parameters (denoted by κ(s) and τ (s)) are known for the space curve, [4]
shows that we can simultaneously calculate the Frenet-Fourier Transform and the space
curve r(s) by coupling this equation with the Frenet-Serret formulas:
F 0 [k](s) = λ(s) exp (ik · r)
r0 (s) = t
t0 (s) = κ(s)n
(3)
0
n (s) = −κ(s)t + τ (s)b
b0 (s) = −τ (s)n.
Here, {t, n, b} are respectively the tangent, normal, and binormal vectors to the curve.
Initial values for {r, t, n, b} must be given, and the initial value for the Fourier Transform
can be taken to be zero. Thus, we now have both an integral formulation and an ODE
formulation for computing the Fourier Transform of the image using the filamentary model.
Note that for both formulations, computations must be done for a grid that spans values
of k in all directions.
2
2.2
Filamentary Model
Figure 2: Filamentary tube model for polytene chromosomes. The figure on the left is the
space curve through the chromosomes, and the figure in the middle is radially symmetric
tube centered at that space curve. The last image is the same tube sliced in order to show
the radial symmetry.
In this model, chromosomes are tubes centered at the space curve r(s). In [4], this is
modeled with a Gaussian filament density. The density function λ(s) is radially symmetric
about this curve, as shown in Figure 4. Looking closely at Figure 1, one notes the banded
structure of the chromosome alleles. The pattern changes as a function of arc length,
which supports the filamentary model. Let the function describing the band pattern be
α(s). Also, let the intra-tube variance be β(s). In [4], Hausrath and Goriely derive the
density function to be:
λ(s) =
h (k · n)2 + (k · b)2 i
α(s)
exp
.
2β(s)
−4β(s)
(4)
Note that the exponent in this density function vanishes if and only if k is parallel to t,
which would be the direction of the space curve. In this direction, λ(s) is maximized.
Because the function α(s) describes the band pattern in the chromosome image, it is
especially valuable to the further applications of this problem. This function allows for
us to correlate specific alleles to physical location, since it is parameterized by arc length
and the position vector r(s) is also known. In 1935, Dr. C.B. Bridges drew by hand the
equivalent to α(s), and we would like to have a more accurate representation on a larger
scale.
3
Figure 3: Bridges’ Map for a piece of a polytene chromosome. Although hand-drawn, he
was still remarkably accurate, and his drawings are still used today.
2.3
Parameter Estimation
Previous work by Hausrath et. al [8] used algorithms to compute κ(s), τ (s), α(s). Essentially, ridge lines are computed by interpolating through the intensity of the image, and
points along the space curve are determined that ensure connectivity of the ridge lines. α(s)
is determined by the intensity profile using an interpolation scheme based on its neighbors.
Curvature and torsion profiles are estimated by scanning through a range of values and
choosing the helical arc that best fits each small segment of the chromosome data and satisfies smoothness constraints. β(s) is taken to be constant along helical arcs. The curve is
taken to have piecewise constant curvature and torsion. For this case, there exists an exact
solution to the Frenet-Serret Formulas, and the space curve is a combination of helical arcs.
Since the Frenet frame can be computed exactly, one only needs to solve for the integral in
(2) to compute the Fourier Transform. Furthermore, one can compute the Fourier Transform for each arc separately and then sum them afterwards since the transform is linear,
and so this computation is embarrassingly parallel.
3
3.1
Numerical Integration
Integral Formulation
As noted previously, there are two ways to formulate the problem of solving for the FrenetFourier transform of the image, denoted by (2). The first method is to compute the Frenet
frame for the desired curve and then to numerically integrate (2). This method is especially
useful when the curve has piecewise constant curvatures, because the Frenet frame can be
computed exactly. Thus, for this case, the numerical error involved in computing the
4
Figure 4: Sample helical arcs taken from chromosome. Segments are small to ensure
accuracy.
Fourier transform is only in the numerical integration. We shall consider three approaches
to numerically integrating (2).
The first approach is a simple quadrature rule to compute the integral. In this method,
we first split the integral into n subintervals of a predetermined length, denoted by h.
Then, within each subinterval, the integrand
can be fitted by a polynomial, denoted by
PR
pi (s). Then, (2) is approximated by
pi (s)ds. That is, if the polynomial is of degree q,
i ∆i
then
Z
F[k] =
λ(s)e
ık·r(s)
Z
ds =
∆i
(pi (s) + O(hq ))ds,
(5)
∆i
where O(hq ) is ”Big O” notation for the error in the polynomial fit. Thus, the error in
the numerical integration is of the same order, since we are summing over h1 intervals, and
the integration adds a factor of h. However, this also assumes that the integrand can be
sampled anywhere, and in practice, λ(s) is sampled at fixed points determined from the
image. Therefore, there is an increase in error if the step size becomes too small in this
method due to error arising from linearly interpolating λ(s).
The second approach considered involves treating the integral as an ordinary differential
equation (ODE). If the integrand is denoted by g(s), then this amounts to solving the ODE
dF
= g(s).
ds
(6)
Since this ODE is only a function of the independent variable s, any solver can be easily
implemented. Fourier integrals can be highly oscillatory, and so we used a simple stiff ODE
solver to account for this. Any other method can be used; in fact, various Runge-Kutta
methods would be equivalent to using various quadrature rules described above. However,
5
since stiff ODE solvers incorporate future information in its numerical estimation, they are
especially suited for ODEs that can change rapidly. We considered a Linear Multistep 2nd
order BDF method as our solver. For more information, please see [1].
For transforms that are relatively simple and do not vary on different time scales, the
previous approaches are suitable. However, this is not the case for most image data, and
it is therefore prudent to try another approach. Since we are computing a Fourier type
integral, it is perhaps more efficient to consider a method that takes this into account. That
is why the Clenshaw-Curtis method is the last approach considered in our applications.
This method is especially suited for Fourier integrals, and it is for this reason it is the
method of choice.
From a simple change of variables, we know that
Z 1
Z π
g(s)ds =
g(cos θ) sin θdθ.
(7)
−1
0
(7) motivates the Clenshaw-Curtis
method. This is because using a cosine transform, we
P
know that g(cos θ) = ak cos(kθ) (since g(cos θ) is an even function in θ, the sine transform
k
vanishes). The Discrete Cosine Transform (DCT) can then be used to calculate ak , which
implies that
Z π
∞
X
2a2k
g(cos θ) sin θdθ = a0 +
.
(8)
1
− 4k 2
0
k=1
Therefore, it remains to solve for the coefficients a2k . Using symmetry and the DCT, This
can be calculated, where N is the number of terms used to approximate the infinite sum,
taken to be even:
a2k
N/2−1
X
2 h g(1) + g(−1)
jπ
2jkπ i
jπ
k
=
+ g(0)(−1) +
) . (9)
[g(cos( )) + g(− cos( ))] cos(
N
2
N
N
N
j=1
In practice, N is usually taken to be a power of 2, because algorithms to compute the
Fourier Transform are usually optimized for such N . If a large number of oscillations is
expected, then it is important to use a larger value for N . Also, Trefethen has showed
that this approach fares well when compared to Gauss quadrature (which is a popular
polynomial technique), especially if the integrand is not necessarily analytic [7].
The integrand g(s) is dependent on the Frenet frame of the polytene curve, which can
be computed exactly. When implementing this scheme, it is important to first compute
the Frenet frame at all points necessary in order to speed up calculations.
Error can vary with this method depending on the integrand and its modes of oscillations. Local error has been estimated with a curve that is a circle (constant curvature
and no torsion). For such a curve, for N = 32, the local error, En , was within .03. Here,
En = |FN − F2N |. Below is a log log plot of the local error against the frequency N .
6
Log Log Plot of Error vs. Sampling Frequency
−2
y = − 0.61*x − 1.7
data 1
linear
−2.5
−3
−3.5
−4
−4.5
−5
1
1.5
2
2.5
3
3.5
4
4.5
5
Figure 5: Log Log Plot of Local Error vs. Frequency using the Clenshaw-Curtis method.
Although the rate of convergence is not particularly fast, small values of N are needed for
accurate results.
3.2
ODE Formulation
If the curvature values for the polytene curve is not piecewise constant, then it is difficult
to compute an exact solution to the Frenet frame. In this scenario, it is better to couple
the Frenet-Serret ODE system with the differential equation form of the Fourier transform
as in (3).
The mode of oscillations can vary significantly depending on the choice of wave number
k, and so a variable step size numerical solver should work well. A popular choice is
the Dormand-Prince Runge-Kutta 4-5 method. This method uses the difference between
a 4th order and 5th order calculation to determine the step size. In Matlab, it can be
implemented using the popular function ode45. If a fixed step size solver is desired, a
Runge-Kutta 4th order solver can be used, although depending on accuracy constraints,
for rapidly oscillating functions a restrictive step size may be necessary.
4
Preliminary Results
As can be seen, preliminary results are promising (the brightness can be attributed to a
difference in scale), although there are still some issues. Occasionally, the initial algorithm
used to determine curvature profiles and the general path was inaccurate. This is seen in
the third image from the top in Figure 6. In the future, a curvature based algorithm that
determines the direction of the next neighboring point should improve upon this. Also,
along various points, there is a ringing phenomenon in which the brightness intensifies. This
data set did not use the Clenshaw-Curtis method and instead used a simple quadrature
7
Figure 6: Simulated vs. Real Images for various slices of chromosome.
Figure 7: Polytene map of α(s) and the corresponding helical arc. Calculations taken on
a piece at low resolution, yet the results are still quite good.
8
method. That may have contributed to the ringing phenomenon. Also, the way the
simulated image is normalized may have also contributed to this problem.
Figure 7 is the distribution of α(s) for a small section. Although taken at a lowresolution (a 100 x 100 pixel image), the variation in the data is still relatively large, which
allows for an easier identification of allele patterns.
5
Parameter Optimization Procedure
Simulating an image so that it can be compared to the original is a good verification tool
for the Gaussian filament model. However, the simulated image should also be able to be
used in order to improve upon initial estimates of parameters such as α(s). Since among
the primary goals is to have an accurate representation of α(s), an algorithm that would
cause the simulated image to converge to the real image would be very helpful. Also, in
other applications, curvature may be an important parameter, as it could be potentially
related to gene expression. Thus, a general way to improve upon initial estimates would
lead to more accurate results without the need for more data.
Suppose there is only one constant parameter that needs to be adjusted. A simple
approach to improving initial parameter information is to slightly decrease the parameter
and slightly increase the parameter. Simulate the image twice and whichever simulation
has a lower error (calculated by using RMSD) should be a better approximation. This brute
force algorithm would work fairly well for such a simple case. However, if the parameters
that need to be modified are functions defined on the grid, then the number of potential
directions of change are exponentiated. Therefore, it is important to find a way to determine
the optimal direction in parameter space.
Let S(x, y) be the simulated image, and let I(x, y) be the real image data. Let the local
squared error be:
h
i2
R(x, y) = I(x, y) − S(x, y) .
(10)
This can also be defined along arc length, since the rest of the image is irrelevant:
h
i2
R(s) = R(x(s), y(s)) = I(x(s), y(s)) − S(x(s), y(s)) .
(11)
In truth, R(s) = R(s; f (s)), where f (s) is a vector of parameters that need to be optimized. For instance, if one were to try to optimize all the parameters, then f (s) =
(α(s), β(s), κ(s), τ (s)). Since we want R(s; f (s)) to be minimized, we would like to find
∂f
f (s) such that ∂R
∂f = 0. Assuming that ∂s is non-vanishing so that the inverse function
theorem implies, we get that:
∂s ∂f T ∂f −1 ∂f
=
.
∂f
∂s ∂s
∂s
9
Note that this formulation is essentially taking the pseudo inverse of ∂f
∂s . Using gradient
descent, we get the PDE:
df
∂R ∂f −2 ∂f
=−
.
(12)
dt
∂s ∂s
∂s
Also, one could consider the full variational setting. Suppose f (s) = α(s). Let E(s) be
the full error functional. That is,
Z
L
R(s; α(s))ds.
E(s) =
(13)
0
Then by the Euler-Lagrange equations, we know that E(s) is minimized when
∂R
d ∂R
−
= 0.
∂α
ds α00 (s)
(14)
The second term can be simplified to:
α00 (s)R00 (s) − R0 (s)α000 (s)
d ∂R
=
.
ds α00 (s)
[α00 (s)]2
(15)
Therefore, using gradient descent, we have a new PDE that minimizes the error functional
as a whole by incorporating local variation within the image. This PDE is:
dα
∂R ∂α −2 ∂α α00 (s)R00 (s) − R0 (s)α000 (s)
=−
+
.
dt
∂s ∂s
∂s
[α00 (s)]2
(16)
Both of these PDEs should be able to be used in order to improve upon initial estimates
of important parameters. In order to improve these parameters, one could take one time
step in the PDE and redo the simulation of the image with the new estimates. Then,
repeat the process until satisfied. Although (16) should be more precise than (12), it is
also much more complicated to discretize. However, (16) has the distinct advantage that
if ∂α
∂s vanishes, the second term will offset and keep the algorithm from failing. Numerical
tests are still needed to test this theory.
6
Discussion
In this paper, we have outlined methods used to represent polytene chromosomes from
image data. This work will then be used to examine the relationship between geometry
and gene expression. Tests can now be conducted to determine if, for instance, gene
expression is more prevalent in the center of the nuclei compared to the periphery. Also,
the relationship between curvature and gene expression can be examined, and we can also
determine if allele to allele interactions play a role. In addition to this statistical analysis,
there is more work to be done.
10
Improvements can be made in the parameter estimation process. Since the image data is
not particularly noisy, and high resolution images have fine edges, edge detection algorithms
can aid in determining the curvature profiles and the necessary points in which the space
curve goes through. Also, we need to determine a more accurate way to estimate the
variance within the tube: the parameter β. Finally, a stable numerical scheme needs to be
developed in order to implement the parameter optimization procedure. This would allow
for accurate calculations that would help answer these important questions in chromosome
geometry.
References
[1] U.M. Ascher and L.R. Petzold. Computer Methods for Ordinary Differential Equations
and Differential-Algebraic Equations. Society for Industrial and Applied Mathematics,
1998.
[2] C.W. Clenshaw and A.W. Curtis. A method for numerical integration on an automatic
computer. Num. Math., 2:197–205, 1960.
[3] Andy Hausrath and Alain Goriely. Continuous representations of proteins: Construction of coordinate models from curvature profiles. Journal of Structural Biology, November 2006.
[4] Andy Hausrath and Alain Goriely. The fourier transforms of curves and filaments
and their application to low-resolution protein crystallography. Journal of Applied
Crystallography, (42), 2009.
[5] T. Havie. On a modification of the clenshaw-curtis quadrature formula. Informationsbehandling, 9(1219):338–350, 1969.
[6] Tom Misteli. Self-organization in the genome. PNAS, 106:6885–6886, April 2009.
[7] Lloyd N. Trefethen. Is gauss quadrature better than clenshaw–curtis? SIAM Review,
50(1):67–87, 2008.
[8] Livia Zarnescu, Alain Goriely, Gio Bosco, Andy Hausrath, and Aalok Shah. The genome
in three dimensions: simulation of confocal images of polytene nuclei, 2010.
7
Acknowledgements
Much of this work was done in collaboration with Dr. Andrew Hausrath of the Department
of Biochemistry at the University of Arizona. This work is part of an ongoing project that
Dr. Hausrath is currently working on.
11