* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Analysis of the DNA microarray hybridization images using
Therapeutic gene modulation wikipedia , lookup
SNP genotyping wikipedia , lookup
DNA vaccination wikipedia , lookup
DNA damage theory of aging wikipedia , lookup
Comparative genomic hybridization wikipedia , lookup
Genomic library wikipedia , lookup
Nucleic acid analogue wikipedia , lookup
Molecular cloning wikipedia , lookup
United Kingdom National DNA Database wikipedia , lookup
Genealogical DNA test wikipedia , lookup
Cre-Lox recombination wikipedia , lookup
Non-coding DNA wikipedia , lookup
Gel electrophoresis of nucleic acids wikipedia , lookup
Extrachromosomal DNA wikipedia , lookup
Cell-free fetal DNA wikipedia , lookup
History of genetic engineering wikipedia , lookup
Nucleic acid double helix wikipedia , lookup
Epigenomics wikipedia , lookup
DNA supercoil wikipedia , lookup
1 2,3 Filip ILNICKI , Marcin IWANOWSKI 1. Warsaw University of Technology, Institute of Theory of Electrical Engineering, Measurement and Information Systems 2. Warsaw University of Technology, Institute of Control and Industrial Electronics 3. EC Joint Research Centre, Institute of Environment and Sustainability Analysis of the DNA microarray hybridization images using morphological image processing Streszczenie: W artykule przedstawiono u yteczno morfologicznego przetwarzania obrazów w analizie obrazów hybrydyzacji mikromacierzy DNA. Zostanie zaproponowany wykorzystuj cy morfologiczn kowariancj i rekonstrukcj morfologiczn algorytm automatycznego pozycjonowania spotów. Przedstawiony algorytm umo liwia okrelenie w łatwy sposób dwóch wa nych parametrów obrazu hybrydyzacji: redniej odległoci mi dzy spotami oraz redniej warto ci erozji tła. Abstract: This paper presents the usefulness of the morphological image processing in analysis of the DNA microarray hybridization pictures. The algorithm for automatic spot positioning based on the morphological covariance and morphological reconstruction will be proposed. The algorithm presented here makes it possible to easily determine two important parameters of the microarray image: mean distance between spots and mean value of the background erosions Key Words : dna arrays, image processing, mathematical morphology 1. Introduction 2. DNA arrays This paper presents the usefulness and application of morphological image processing [6,7,8,9] in analysis of the DNA hybridization pictures. The method proposed in the paper aims at detecting the spots on the image of DNA microarray. A critical stage of analysis of the DNA array hybridization pictures is determination positions of the spots. Typical hybridization pictures contain, besides desirable objects (spots), some amount of noise. Considering requirement of elimination of noise and performing analysis in reasonable amount of time, determination position of the spots is not a trivial problem. In the literature there are many approaches on how to solve this issue. Generally, proposed solutions may be divided in two categories: semi-automatic and fully automatic. In semiautomatic algorithms, to perform spot positioning and further analysis, user is asked about some parameters of tested picture (amount of spots, position of spot in group, distance between spots etc ).This solution, despite that is frequently used in commercial applications where it fulfills expectations, stimulate engineers and scientists to create algorithms with fully automatic positioning. An application of mathematical morphology to the algorithm of the fully automatic analysis of DNA microarrays was presented in [1]. In that article J. Angluo and J. Sera introduced spot size distribution law (granulomerty dealing with families of the openings by the reconstruction) and based on it, method for a morphological filtering by area extinction value. The method proposed in this article is also a fully automatic one. First step of this method relies on morphological covariance - usage of the morphological erosions of increasing sizes. Erosions are performed with structuring elements consisting of a pair of points. By measuring the volume of the eroded image as a function of the distance between the elements of the pair, a distance between spots is estimated. Second step relies on spot extraction by means of morphological reconstruction. DNA (deoxyribonucleic acid) exists in nucleus of living cells DNA is a double-stranded polymer composed of four basic molecular units called nucleotides. Each nucleotide comprises, a phosphate group, a deoxyribose sugar, one of four nitrogen bases: purines: adenine (A) and guanine (G), pyrimidines: cytosine (C) and thymine (T). Base-pairing of the DNA strand occurs according to the Watson-Crick rules: C pairs with G, A pairs with T. (see fig. 1) The proposed method successfully deals with two types of common problems existing in fully automatic positioning: efficiency of spot recognition and error elimination, as well as its computational effectiveness. The importance of the latter is due to the fact that DNA microarray images are large and this makes the computation relatively costly. Fig.1. DNA double strand DNA microaray (BioChip) is a microscope slide which contain thousands types of single DNA strands (organized in spots) with carefully selected and strictly specified (known) structure [4]. A single DNA array may contain many thousands of spots. Main idea of microarray experiment relies on hybridization of known DNA strands from BioChip with an unknown and fluorescently tagged nucleic acid sample in solution. After hybridization, the microarray is scanned using specialized scanners. The obtained hybridization picture is then analyzed. It is worth mentioning that most common type on BioChips (called cDNA) makes possible to use target marked with mixed two types of the fluorescent dye which correspond to 635 (Cy5) and 532 (Cy3) wavelengths respectively. So in this type of the microarray for one experiment it is necessary to store two pictures separately for each wavelength (for more details see [4]). One of the most common scanners for the DNA microarrays – GexePix 4000 in with an analysis data software called GenePix pro v. 6.0 defaultly encoding hybridization picture in high resolution 16-bits grey scale multi-image Tiff files, which ensures the best quality for the analysis. In some applications however, even 8-bit grey scale pictures fulfill requirements. Fig. 2 presents fragment of DNA microarray hybridization picture. (6) Φ H (n) = Mean(ε H ( n ) ( f )) (7) Φ' H (n) = Vol (ε H ( n ) ( f )) (8) ΦV (n) = Mean(ε V ( n ) ( f )) (9) Φ'V (n) = Vol (ε V ( n ) ( f )) Fig. 3 shows the function described by eq. 7 computed for the image presented in fig. 2. of the EROSIONS 70 MEAN VALUE Fig. 2 Fragment of DNA microaray hybridization picture Fig. 2 Fragment of DNA microarray hybridization picture 69 68 67 MEAN VALUE OF THE BACKGROUND EROSIONS 66 65 1 9 17 25 33 41 49 57 65 73 81 89 97 105 SIZE of the STRUCTURING ELEMENT 3. Morphological covariance The principal morphological tool used in this paper is the morphological covariance [7,9]. It is based on computation of a function of volume of consecutive images obtained by a series of erosions by a pair of points of increasing distance in between. The structuring element is equal in this case a set of two points – the central point and the second one located in a given distance from the center. Such an erosion can be defined as: Fig. 3. Morphological covariance function calculated for picture from fig. 2. Functions from Eq. 7 and 9 can be calculated not only for the initial image but also for the binarized input images. An example of such a function is shown in fig. 4. In this case the binarization threshold for the image shown in fig.2 was set to 70 . Function was calculated using the Eq. 7. 350000 (1) ε P ( q ) ( f )[ p] = min{ f ( p), f ( p + q)} where P(q) stands for the described above structuring element, q is the vector of coordinates of the second point. The horizontal and vertical versions of the erosion are defined as, respectively: (2) ε H ( n) ( f ) = ε P ([ n,0]) ( f ) (3) ε V ( n ) ( f ) = ε P ([ 0, n ]) ( f ) SUM of the EROSIONS 300000 Parameter n will be later referred to as a size of structuring element. In order to get the measurement of an image, two values are computed: the volume and the mean value from the following equations, respectively: (4) Vol ( f ) = ∑ ∑ f (i, j ) x max y max i =0 (5) 250000 D x MEAN DISTANCE BETWEEN SPOTS 200000 150000 100000 50000 0 1 9 17 25 33 41 49 57 65 73 81 89 97 105 SIZE of the STRUCTURING ELEMENT Fig.4 Morphological covariance function calculated for binarized picture from fig. 2. j =0 Vol ( f ) Mean( f ) = x max⋅ y max where xmax and ymax stands for the maximal image sizes along x and y axes, respectively. In order to compute the morphological covariance, the above measurement are calculated for images obtained as a result of horizontal/vertical erosion of increasing size: There are two important parameters which can be read from functions defined by Eq.6-9: mean distance between spots (MDBS), and mean value of the background (MVBE). Both parameters will be used later in proposed algorithm. The mean distance between spots is such that analyzed function reaches its maximum. For both directions x and y it can be computed using the following equations: (10) Dx = arg max{Φ H (n)} n (11) D y = arg max{ΦV (n)} n MVBE = min{Φ (n)} (12) H 4. The algorithm Spots on the analyzed image are located in the nodes of rectangular grid. In the first step of the algorithm, the morphological covariance of the initial image is computed. Using the covariance function the following parameters are computed the distance between spots (MDBS) along the x and y axes (Dx and Dy) is measured as well as the mean pixel value MPV. In the second step the mean background value MVBE is subtracted from the image. Due to this subtraction most of the background is removed. In the third step the spots are extracted from the input image while all other areas of high gray value (noise), are removed. To achieve this goal, first the supplementary image with spot markers is generated. This image contains the pixels located in the nodes of a grid artificially created using Dx and Dy values. It is produced according to the following rule: (13) m(i, j ) = max iff 0 i = k ⋅ Dx ∧ j = l ⋅ D y otherwise where max stands for the maximum value of image pixel, k and l are any integer values. one gets the image without light areas located between spots. By comparing fig. 5a with fig.5e the result of noise filtering is visible – after the reconstruction only the proper spots are present. 5. Algorithm acceleration The main problem resulting from applying algorithm presented in this article is a long time of processing the images. Below two methods allowing an important decrease in time are presented. First method relies on limiting analysis only to the first maximum of the erosion histogram. As it was mentioned earlier, the morphological covariance function is obtained for increasing size of the structuring element, so theoretically it is possible to perform analysis for the structuring element equal to the width (or height) of the picture. 69,0 68,5 MEAN VALUE of the EROSIONS The mean value of the background can be computed as the minimum value of functions from Eq. 6 or 8: 68,0 67,5 67,0 66,5 66,0 1 201 101 401 301 601 501 801 701 1001 901 1201 1101 1401 1301 SIZE of the STRUCTURING ELEMENT Fig. 6 Morphological covariance function generated for unnecessarily long structuring element . With a vertical line is marked largest size of the structuring element vital for presented algorithms. Fig. 5 Process of spot extraction: initial image (a), spot markers (b), dilated spot markers (c), infinimum (d) and result of reconstruction (e). The spot markers are used in the next step to extract the spots from the input image. As a principal tool for this extraction, the morphological reconstruction is used [5]. This operation transforms an input image (called marker image) in such a way that some part of it is removed while the other left untouched. The phases of spot extraction process are shown on fig. 5. The process is controlled by using the additional image – a mask. In our case the mask is equal to an input image (fig. 5a). The marker image for the reconstruction is obtained from the image with spot markers, according to the following rules. First of all, the spot marker image (fig. 5b) is dilated in hexagonal grid with an elementary structuring element of size equal to half of the spot diameter estimated a priori (result of dilaiton – fig. 5c). Finally the marker image for reconstruction is obtained by pointwise minimum of input image and dilated spot markers (fig. 5d). Having computed the marker image, the principal operation of reconstruction is executed. As a result However to obtain information necessary for the presented algorithms (mean distance between spots or Mean Intensity of the background), the size of the structuring element may be limited to the size corresponding occurrence of the first maximum on the covariance function. Fig.6 presents example showing that limiting size of the structuring element may lead to considerable decrease in time of the analysis. Significant time savings may be also obtained by performing analysis not for all points on the picture below. Based on this assumption the second method of time reduction relies on generating the covariance function for randomly chosen points on the picture. Figure 7 and 8 presents the covariance functions calculated for fig. 2 performed for respectively 50% and 10% randomly chosen points from each line. Examples shown in the fig. 7 and 8 demonstrate some distortion compared to the histogram shown in fig.3 however all the important parameters are still possible and are relatively easy to read. This method was tested on about 5 different hybridization images. In most cases to obtain fair quality of the analysis it is necessary to choose about 10% points from the picture.(this amount may be higher for small size images). slide with process mechanically printed). 70 MEAN VALUE of the EROSIONS 68 67 66 65 1 8 15 22 29 36 43 50 57 64 71 78 85 SIZE of the STRUCTURING ELEMENT Fig. 7 Covariance function with 50% of the randomly chosen points from each line 70 (not 7. Concluding remarks In this article, analysis of the hybridization images based on morphological covariance function was introduced and proposed it’s implementation for fully automatic spot positioning. The presented analysis was tested on 5 different hybridization pictures and it’s performances were satisfactory even with a presence of high level of noise on the images. As mentioned earlier the biggest problem was long time of the analysis, however efficient improvement methods were also proposed. REFERENCES 69 MEAN VALUE of the EROSIONS photochemistry The second problem is spot positioning algorithm for more then one spot group. Spots on the microarray are organized in the spot groups. Distances between neighboring spots which belongs to different groups are greater then distance within one spot group. So there can be a problem with direct application of the algorithm on more then one spot group. The solutions of the above problems are open question and the subject of authors’ further research. 69 68 67 66 65 involving 1 7 13 19 25 31 37 43 49 55 61 67 73 79 85 SIZE of the STRUCTURING ELEMENT Fig. 8 Covariance function with 10% of the randomly chosen points from each line. 6. Other problems and questions The algorithm presented above can also be applied for the background correction. The background of the typical hybridization picture is usually not constant nor uniform. To receive accurate results of the DNA microarray experiment, it is necessary to eliminate the influence of the local background on the overall brightness level of the spot. A currently applied solutions to the background correction problem, relies on calculating the mean or median value of the background of the variously defined areas on the hybridization picture. Subject presented in the article may yield to another approach to solution of the background correction problem. MVBE calculated for small area of the image can be used as a background adjustment value. There are also two other problems related to the analised subject which has not been solved by the proposed method. The first problem is a distortion and skewedness of the spot lines on the hybridization pictures. There can be a problem with direct application of the presented algorithm on the images where spot positions on the array are somehow distorted. This is usually caused by the imperfect positioning of the printing head. This problem is relatively easy to eliminate in the oligonucleotide microarrays because spots are built directly on the surface of the glass [1] Angulo J., Serra J., Automatic analysis of DNA microarray images using mathematical morphology BIOINFORMATICS Vol. 19 no. 5 2003, pages 553–562 [2] Dudoit S . , G e n t l e m a n R . , I r i za r r y R . , Ya n g Y . H . Introduction to DNA microarray technologies, http://www.bioconductor.org/workshops/WyethCourse1 01702/ [3] D u d o i t S . , G e n t l e m a n R , Introduction to genome biology http://www.bioconductor.org/workshops/ WyethCourse101702/ [4] Ilnicki F., BioChip – Narz dzie do analizy DNA, Przegl d Elektrotechniczny 4 2004. pages 277 -282 [5] Iwanowski M., Automatic car number plate detection using morphological image processing, Przegl d Elektrotechniczny, this issue [6] Nieniewski M., Morfologia matematyczna w przetwarzaniu obrazów, PLJ Warszawa, 1998, [7] Serra J., Image analysis and mathematical morphology, vol.1, Academic Press, 1983, [8] Serra J., Image analysis and mathematical morphology, vol.2, Academic Press, 1988, [9] Soille P., Morphological image analysis, Springer Verlag, 2002 [10] Yang Y. H., Buckley M., Dudoit S., Speed T., Comparison of methods for image analysis on DNA microarray data,University of California, Berkeley, Technical Report # 584 (2002). Authors: Filip Ilnicki M.Sc., Politechnika Warszawska, Instytut Elektrotechniki Teoretycznej i Systemów Informacyjno - Pomiarowych, ul.Koszykowa 75, 00-661 Warszawa, Poland, e-mail: [email protected] Marcin Iwanowski, Ph.D., 1) Politechnika Warszawska, Instytut Sterowania i Elektroniki Przemysłowej, ul.Koszykowa 75, 00-662 Warszawa, Poland, e-mail:[email protected] 2) EC Joint Research Centre, Institute of Environment and Sustainability, Via Enrico Fermi 1, T.P. 262, 21020 Ispra (VA), Italy