Download Analysis of the DNA microarray hybridization images using

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nucleosome wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

SNP genotyping wikipedia , lookup

DNA vaccination wikipedia , lookup

DNA damage theory of aging wikipedia , lookup

Comparative genomic hybridization wikipedia , lookup

Genomics wikipedia , lookup

Genomic library wikipedia , lookup

Nucleic acid analogue wikipedia , lookup

Molecular cloning wikipedia , lookup

United Kingdom National DNA Database wikipedia , lookup

Genealogical DNA test wikipedia , lookup

Cre-Lox recombination wikipedia , lookup

Non-coding DNA wikipedia , lookup

Gel electrophoresis of nucleic acids wikipedia , lookup

Extrachromosomal DNA wikipedia , lookup

Cell-free fetal DNA wikipedia , lookup

History of genetic engineering wikipedia , lookup

Nucleic acid double helix wikipedia , lookup

Epigenomics wikipedia , lookup

DNA supercoil wikipedia , lookup

Bisulfite sequencing wikipedia , lookup

Deoxyribozyme wikipedia , lookup

Transcript
1
2,3
Filip ILNICKI , Marcin IWANOWSKI
1. Warsaw University of Technology, Institute of Theory of Electrical Engineering, Measurement and Information Systems
2. Warsaw University of Technology, Institute of Control and Industrial Electronics
3. EC Joint Research Centre, Institute of Environment and Sustainability
Analysis of the DNA microarray hybridization images using
morphological image processing
Streszczenie: W artykule przedstawiono u yteczno morfologicznego przetwarzania obrazów w analizie obrazów hybrydyzacji mikromacierzy
DNA. Zostanie zaproponowany wykorzystuj cy morfologiczn kowariancj i rekonstrukcj morfologiczn algorytm automatycznego pozycjonowania
spotów. Przedstawiony algorytm umo liwia okrelenie w łatwy sposób dwóch wa nych parametrów obrazu hybrydyzacji: redniej odległoci mi dzy
spotami oraz redniej warto ci erozji tła.
Abstract: This paper presents the usefulness of the morphological image processing in analysis of the DNA microarray hybridization pictures. The
algorithm for automatic spot positioning based on the morphological covariance and morphological reconstruction will be proposed. The algorithm
presented here makes it possible to easily determine two important parameters of the microarray image: mean distance between spots and mean
value of the background erosions
Key Words : dna arrays, image processing, mathematical morphology
1. Introduction
2. DNA arrays
This paper presents the usefulness and application of
morphological image processing [6,7,8,9] in analysis of the
DNA hybridization pictures. The method proposed in the
paper aims at detecting the spots on the image of DNA
microarray.
A critical stage of analysis of the DNA array hybridization
pictures is determination positions of the spots. Typical
hybridization pictures contain, besides desirable objects
(spots), some amount of noise. Considering requirement of
elimination of noise and performing analysis in reasonable
amount of time, determination position of the spots is not a
trivial problem. In the literature there are many approaches
on how to solve this issue.
Generally, proposed solutions may be divided in two
categories: semi-automatic and fully automatic. In semiautomatic algorithms, to perform spot positioning and
further analysis, user is asked about some parameters of
tested picture (amount of spots, position of spot in group,
distance between spots etc ).This solution, despite that is
frequently used in commercial applications where it fulfills
expectations, stimulate engineers and scientists to create
algorithms with fully automatic positioning. An application
of mathematical morphology to the algorithm of the fully
automatic analysis of DNA microarrays was presented in
[1]. In that article J. Angluo and J. Sera introduced spot size
distribution law (granulomerty dealing with families of the
openings by the reconstruction) and based on it, method for
a morphological filtering by area extinction value.
The method proposed in this article is also a fully automatic
one. First step of this method relies on morphological
covariance - usage of the morphological erosions of
increasing sizes. Erosions are performed with structuring
elements consisting of a pair of points. By measuring the
volume of the eroded image as a function of the distance
between the elements of the pair, a distance between spots
is estimated. Second step relies on spot extraction by
means of morphological reconstruction.
DNA (deoxyribonucleic acid) exists in nucleus of living cells
DNA is a double-stranded polymer composed of four basic
molecular units called nucleotides. Each nucleotide
comprises, a phosphate group, a deoxyribose sugar, one of
four nitrogen bases: purines: adenine (A) and guanine (G),
pyrimidines: cytosine (C) and thymine (T). Base-pairing of
the DNA strand occurs according to the Watson-Crick rules:
C pairs with G, A pairs with T. (see fig. 1)
The proposed method successfully deals with two types of
common problems existing in fully automatic positioning:
efficiency of spot recognition and error elimination, as well
as its computational effectiveness. The importance of the
latter is due to the fact that DNA microarray images are
large and this makes the computation relatively costly.
Fig.1. DNA double strand
DNA microaray (BioChip) is a microscope slide which
contain thousands types of single DNA strands (organized
in spots) with carefully selected and strictly specified
(known) structure [4]. A single DNA array may contain
many thousands of spots. Main idea of microarray
experiment relies on hybridization of known DNA strands
from BioChip with an unknown and fluorescently tagged
nucleic acid sample in solution. After hybridization, the
microarray is scanned using specialized scanners. The
obtained hybridization picture is then analyzed. It is worth
mentioning that most common type on BioChips (called
cDNA) makes possible to use target marked with mixed two
types of the fluorescent dye which correspond to 635 (Cy5)
and 532 (Cy3) wavelengths respectively. So in this type of
the microarray for one experiment it is necessary to store
two pictures separately for each wavelength (for more
details see [4]).
One of the most common scanners for the DNA microarrays
– GexePix 4000 in with an analysis data software called
GenePix pro v. 6.0 defaultly encoding hybridization picture
in high resolution 16-bits grey scale multi-image Tiff
files, which ensures the best quality for the analysis. In
some applications however, even 8-bit grey scale pictures
fulfill requirements. Fig. 2 presents fragment of DNA
microarray hybridization picture.
(6)
Φ H (n) = Mean(ε H ( n ) ( f ))
(7)
Φ' H (n) = Vol (ε H ( n ) ( f ))
(8)
ΦV (n) = Mean(ε V ( n ) ( f ))
(9)
Φ'V (n) = Vol (ε V ( n ) ( f ))
Fig. 3 shows the function described by eq. 7 computed for
the image presented in fig. 2.
of the EROSIONS
70
MEAN VALUE
Fig. 2 Fragment of DNA microaray hybridization picture
Fig. 2 Fragment of DNA microarray hybridization picture
69
68
67
MEAN VALUE
OF THE
BACKGROUND
EROSIONS
66
65
1
9
17
25
33
41
49
57
65
73
81
89
97
105
SIZE of the STRUCTURING ELEMENT
3. Morphological covariance
The principal morphological tool used in this paper is the
morphological covariance [7,9]. It is based on computation
of a function of volume of consecutive images obtained by a
series of erosions by a pair of points of increasing distance
in between. The structuring element is equal in this case a
set of two points – the central point and the second one
located in a given distance from the center. Such an erosion
can be defined as:
Fig. 3. Morphological covariance function calculated for picture
from fig. 2.
Functions from Eq. 7 and 9 can be calculated not only for
the initial image but also for the binarized input images. An
example of such a function is shown in fig. 4. In this case
the binarization threshold for the image shown in fig.2 was
set to 70 . Function was calculated using the Eq. 7.
350000
(1)
ε P ( q ) ( f )[ p] = min{ f ( p), f ( p + q)}
where P(q) stands for the described above structuring
element, q is the vector of coordinates of the second point.
The horizontal and vertical versions of the erosion are
defined as, respectively:
(2)
ε H ( n) ( f ) = ε P ([ n,0]) ( f )
(3)
ε V ( n ) ( f ) = ε P ([ 0, n ]) ( f )
SUM of the EROSIONS
300000
Parameter n will be later referred to as a size of structuring
element.
In order to get the measurement of an image, two values
are computed: the volume and the mean value from the
following equations, respectively:
(4)
Vol ( f ) =
∑ ∑ f (i, j )
x max y max
i =0
(5)
250000
D x MEAN DISTANCE BETWEEN SPOTS
200000
150000
100000
50000
0
1
9
17
25
33
41
49
57
65
73
81
89
97
105
SIZE of the STRUCTURING ELEMENT
Fig.4 Morphological covariance function calculated for binarized
picture from fig. 2.
j =0
Vol ( f )
Mean( f ) =
x max⋅ y max
where xmax and ymax stands for the maximal image sizes
along x and y axes, respectively.
In order to compute the morphological covariance, the
above measurement are calculated for images obtained as
a result of horizontal/vertical erosion of increasing size:
There are two important parameters which can be read
from functions defined by Eq.6-9: mean distance between
spots (MDBS), and mean value of the background (MVBE).
Both parameters will be used later in proposed algorithm.
The mean distance between spots is such that analyzed
function reaches its maximum. For both directions x and y it
can be computed using the following equations:
(10)
Dx = arg max{Φ H (n)}
n
(11)
D y = arg max{ΦV (n)}
n
MVBE = min{Φ (n)}
(12)
H
4. The algorithm
Spots on the analyzed image are located in the nodes of
rectangular grid.
In the first step of the algorithm, the morphological
covariance of the initial image is computed. Using the
covariance function the following parameters are computed
the distance between spots (MDBS) along the x and y axes
(Dx and Dy) is measured as well as the mean pixel value
MPV.
In the second step the mean background value MVBE is
subtracted from the image. Due to this subtraction most of
the background is removed.
In the third step the spots are extracted from the input
image while all other areas of high gray value (noise), are
removed. To achieve this goal, first the supplementary
image with spot markers is generated. This image
contains the pixels located in the nodes of a grid artificially
created using Dx and Dy values. It is produced according
to the following rule:
(13)
m(i, j ) =
max iff
0
i = k ⋅ Dx ∧ j = l ⋅ D y
otherwise
where max stands for the maximum value of image pixel,
k and l are any integer values.
one gets the image without light areas located between
spots. By comparing fig. 5a with fig.5e the result of noise
filtering is visible – after the reconstruction only the proper
spots are present.
5. Algorithm acceleration
The main problem resulting from applying algorithm
presented in this article is a long time of processing the
images.
Below two methods allowing an important decrease in time
are presented. First method relies on limiting analysis only
to the first maximum of the erosion histogram. As it was
mentioned earlier, the morphological covariance function is
obtained for increasing size of the structuring element, so
theoretically it is possible to perform analysis for the
structuring element equal to the width (or height) of the
picture.
69,0
68,5
MEAN VALUE of the EROSIONS
The mean value of the background can be computed as the
minimum value of functions from Eq. 6 or 8:
68,0
67,5
67,0
66,5
66,0
1
201
101
401
301
601
501
801
701
1001
901
1201
1101
1401
1301
SIZE of the STRUCTURING ELEMENT
Fig. 6 Morphological covariance function generated for
unnecessarily long structuring element . With a vertical line is
marked largest size of the structuring element vital for presented
algorithms.
Fig. 5 Process of spot extraction: initial image (a), spot markers (b),
dilated spot markers (c), infinimum (d) and result of
reconstruction (e).
The spot markers are used in the next step to extract the
spots from the input image. As a principal tool for this
extraction, the morphological reconstruction is used [5].
This operation transforms an input image (called marker
image) in such a way that some part of it is removed while
the other left untouched. The phases of spot extraction
process are shown on fig. 5. The process is controlled by
using the additional image – a mask. In our case the mask
is equal to an input image (fig. 5a). The marker image for
the reconstruction is obtained from the image with spot
markers, according to the following rules. First of all, the
spot marker image (fig. 5b) is dilated in hexagonal grid with
an elementary structuring element of size equal to half of
the spot diameter estimated a priori (result of dilaiton – fig.
5c). Finally the marker image for reconstruction is obtained
by pointwise minimum of input image and dilated spot
markers (fig. 5d). Having computed the marker image, the
principal operation of reconstruction is executed. As a result
However to obtain information necessary for the presented
algorithms (mean distance between spots or Mean Intensity
of the background), the size of the structuring element may
be limited to the size corresponding occurrence of the first
maximum on the covariance function. Fig.6 presents
example showing that limiting size of the structuring
element may lead to considerable decrease in time of the
analysis.
Significant time savings may be also obtained by
performing analysis not for all points on the picture below.
Based on this assumption the second method of time
reduction relies on generating the covariance function for
randomly chosen points on the picture. Figure 7 and 8
presents the covariance functions calculated for fig. 2
performed for respectively 50% and 10% randomly chosen
points from each line.
Examples shown in the fig. 7 and 8 demonstrate some
distortion compared to the histogram shown in fig.3
however all the important parameters are still possible and
are relatively easy to read. This method was tested on
about 5 different hybridization images. In most cases to
obtain fair quality of the analysis it is necessary to choose
about 10% points from the picture.(this amount may be
higher for small size images).
slide with process
mechanically printed).
70
MEAN VALUE of the EROSIONS
68
67
66
65
1
8
15
22
29
36
43
50
57
64
71
78
85
SIZE of the STRUCTURING ELEMENT
Fig. 7 Covariance function with 50% of the randomly chosen points
from each line
70
(not
7. Concluding remarks
In this article, analysis of the hybridization images based on
morphological covariance function was introduced and
proposed it’s implementation for fully automatic spot
positioning. The presented analysis was tested on 5
different hybridization pictures and it’s performances were
satisfactory even with a presence of high level of noise on
the images. As mentioned earlier the biggest problem was
long time of the analysis, however efficient improvement
methods were also proposed.
REFERENCES
69
MEAN VALUE of the EROSIONS
photochemistry
The second problem is spot positioning algorithm for more
then one spot group. Spots on the microarray are organized
in the spot groups. Distances between neighboring spots
which belongs to different groups are greater then distance
within one spot group. So there can be a problem with
direct application of the algorithm on more then one spot
group.
The solutions of the above problems are open question and
the subject of authors’ further research.
69
68
67
66
65
involving
1
7
13
19
25
31
37
43
49
55
61
67
73
79
85
SIZE of the STRUCTURING ELEMENT
Fig. 8 Covariance function with 10% of the randomly chosen points
from each line.
6. Other problems and questions
The algorithm presented above can also be applied for the
background correction. The background of the typical
hybridization picture is usually not constant nor uniform. To
receive accurate results of the DNA microarray experiment,
it is necessary to eliminate the influence of the local
background on the overall brightness level of the spot. A
currently applied solutions to the background correction
problem, relies on calculating the mean or median value of
the background of the variously defined areas on the
hybridization picture.
Subject presented in the article may yield to another
approach to solution of the background correction problem.
MVBE calculated for small area of the image can be used
as a background adjustment value.
There are also two other problems related to the analised
subject which has not been solved by the proposed method.
The first problem is a distortion and skewedness of the spot
lines on the hybridization pictures. There can be a problem
with direct application of the presented algorithm on the
images where spot positions on the array are somehow
distorted. This is usually caused by the imperfect
positioning of the printing head. This problem is relatively
easy to eliminate in the oligonucleotide microarrays
because spots are built directly on the surface of the glass
[1] Angulo J., Serra J., Automatic analysis of DNA
microarray images using mathematical morphology
BIOINFORMATICS Vol. 19 no. 5 2003, pages 553–562
[2] Dudoit S . , G e n t l e m a n R . , I r i za r r y R . , Ya n g Y .
H . Introduction to DNA microarray technologies,
http://www.bioconductor.org/workshops/WyethCourse1
01702/
[3] D u d o i t S . , G e n t l e m a n R , Introduction to genome
biology
http://www.bioconductor.org/workshops/
WyethCourse101702/
[4] Ilnicki F., BioChip – Narz dzie do analizy DNA,
Przegl d Elektrotechniczny 4 2004. pages 277 -282
[5] Iwanowski M., Automatic car number plate detection
using morphological image processing, Przegl d
Elektrotechniczny, this issue
[6] Nieniewski
M.,
Morfologia
matematyczna
w
przetwarzaniu obrazów, PLJ Warszawa, 1998,
[7] Serra J., Image analysis and mathematical
morphology, vol.1, Academic Press, 1983,
[8] Serra J., Image analysis and mathematical
morphology, vol.2, Academic Press, 1988,
[9] Soille P., Morphological image analysis, Springer
Verlag, 2002
[10] Yang Y. H., Buckley M., Dudoit S., Speed T.,
Comparison of methods for image analysis on DNA
microarray data,University of California, Berkeley,
Technical Report # 584 (2002).
Authors:
Filip Ilnicki M.Sc., Politechnika Warszawska, Instytut Elektrotechniki
Teoretycznej i Systemów Informacyjno - Pomiarowych,
ul.Koszykowa 75, 00-661 Warszawa, Poland, e-mail: [email protected]
Marcin Iwanowski, Ph.D., 1) Politechnika Warszawska, Instytut
Sterowania i Elektroniki Przemysłowej, ul.Koszykowa 75, 00-662
Warszawa, Poland, e-mail:[email protected] 2) EC Joint
Research Centre, Institute of Environment and Sustainability, Via
Enrico Fermi 1, T.P. 262, 21020 Ispra (VA), Italy