* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Offline Arabic Character Recognition using Genetic Approach
Survey
Document related concepts
Transcript
Offline Arabic Character Recognition using Genetic Approach: A Survey Hanan Abdulrahman Aljuaid Dzulkifli Muhamad UTM UTM E-mail: [email protected] E-mail: [email protected] of recognizing individual characters in the word or holistic approach of dealing with the entire word image as a whole. Analytical approaches (e.g. Kim and Govindaraju, 1997; ElYacoubi et al., 1999; Koerich et al., 2003; El-Hajj et al., 2005; Benouareth et al., 2006) basically have two steps, segmentation and combination. ABSTRACT This study reports a survey on off-line Arabic character recognition. It cost a minor light on characteristics of Arabic writing. This paper also presents a general concept on the recognition processes involved in the entire system of the Arabic Character Recognition. It also includes some studies on the works done in related issues based on Genetic Algorithms. First the input image is segmented into units no bigger than characters. Then segments are combined to match character models using dynamic programming. Holistic approaches (Madhvanath and Govindaraju, 2001) deal with the entire input image. Holistic features like translation/rotation invariant quantities, word length, connected components, ascenders, descenders; dots, etc. are usually used to eliminate less likely choices in the lexicon. Since holistic models must be trained for every word in the lexicon, compared against analytical models that need only be trained for every character, their application is limited to those with small and constant lexicons, such as reading the courtesy amount on bank checks (Farah et al., 2006; Souici and Sellami, 2006). Keywords Arabic character Recognition, Offline, |Genetic Algorithms 1. . INTRODUCTION/BACKGROUND Character Recognition (CR) mechanization occupies an intensive research region of the pattern recognition research area. CR means translating images of characters into a text, in other words, it represents an attempt to simulate the human reading process. In other said, Handwriting recognition is a very challenging task due to the existence of many difficulties such as the high variability of the handwritten styles and shapes, uncertainty of human writing, writing skew or slant, segmentation of the words into characters and the size of the lexicon. While exploring many different methods, the use of genetic algorithm to recognize a character has been a new algorithm used in this problem. Genetic algorithms offer a particularly attractive approach for this kind of problems since they are generally quite effective for rapid global search. Moreover, genetic algorithms are very effective in solving large-scale problems, but what is the Gas? The problem of handwriting recognition can be classified into two main groups, off-line and on-line recognition, according to the format of handwriting inputs. In offline recognition, only the image of the handwriting is available, while in the on-line case temporal information such as pen tip coordinates, as a function of time, is also available. Many applications require off-line HWR capabilities such as bank processing, mail sorting, document archiving, commercial form-reading, office automation, etc. So far, off-line HWR remains an open problem, in spite of a dramatic boost of research (Koerich et al., 2003; Plamondon and Srihari, 2000; Vinciarelli, 2002) in this field and the latest improvement in recognition methodologies (El-Yacoubi et al., 1999, 2002; Vinciarelli et al., 2004). Genetic Algorithm (GAS) is a search technique used in computer science to find approximate solutions to optimization and search problems and is inspired by evolutionary biology such as inheritance, mutation, natural selection, and recombination. Genetic algorithms are typically implemented as a computer simulation in which a population of abstract representations of candidate solutions to an optimization problem evolves toward better solutions. Traditionally, solutions are represented in binary as strings of 0s and 1s, but different encodings are also possible. The evolution starts from a population of completely random individuals and happens in generations. In each generation, the fitness of the whole population is evaluated, multiple individuals are stochastically selected from the current population (based on their fitness), modified (mutated or recombined) to form a new population, which becomes current in the next iteration of the algorithm as show in figure 1.1. So, evolutionary algorithms work on populations, instead of single solutions. In this way the search is performed in a parallel manner. Studies in Arabic handwriting recognition, although not as advanced as those devoted to other scripts (e.g. Latin), have recently shown renewed interest (Amin, 1998; Ben Amara and Bouslama, 2003; Lorigo and Govindaraju, 2006). We point out that the techniques developed for Latin HWR are not appropriate for Arabic handwriting because, Arabic script is based on an alphabet and rules distinct from those of Latin. Since the word is the most natural unit of handwriting, its recognition process can be done either by an analytic approach 71 Table 2.2: Different shape of Arabic Alphabet 2. . LITERATURE REVIEW 2.4 GENERAL CHARACTERISTICS OF ARABIC WRITING The Arabic alphabet is the most important language not only for Arab, but also for Muslim, because it is language for the holly book Alquran. Although, the script of it used for writing several languages of Asia and Africa, such as Arabic, Persian, and Urdu. After the Latin alphabet, it is the second-most widely used alphabet around the world. So, it is significant script need to be recognized. Several hard works have been devoted to recognition of cursive script like Arabic, but so far it is still an unsolved problem. A comparison of the various characteristics of Arabic, Latin, Hebrew and Hindi scripts are outlined in Table 2.1. Arabic is written from right to left. Arabic text (machine printed or handwritten) is cursive in general and Arabic letters are normally connected on the base line. This feature of connectivity is important to be highlighted in the segmentation process. Some machine printed and handwritten texts are not cursive, but most Arabic texts are, and thus it is not surprising that the recognition rate of Arabic characters is lower than that of disconnected characters such as printed English. Table 2.1: Comparison of Various Scripts Characteristics Justification Arabic R-to-L Hebrew R-to-L Yes Yes 3 Latin L-toR No No 5 No No 11 Hindi L-toR Yes Yes - Cursive Diacritics Number of vowels Letters shapes Number of letters Complementary characters 1-4 28 2 26 1 22 1 40 More than3 - - - Arabic writing is similar to English in that it uses letters (which consist of 28 basic letters), numerals, punctuation marks, as well as spaces and special symbols. It differs from English, however, in its representation of vowels since Arabic utilizes various diacritical markings. The presence and absence of vowel diacritics indicates different meanings in what would otherwise be the same word. i. . However, there are four main characteristic for Arabic language that are: An Arabic letter might have up to four different shapes, depending on its relative position in the text. For instance, the letter ( )عhas four different shapes: at the beginning of the word (preceded by a space), in the middle of the word (no space around it), at the end of the word (followed by a space), and in isolation (preceded by an unconnected letter and followed by a space). These four possibilities are represented in Table 2.2, and the different shapes of the Arabic characters in different positions of the word. Different Arabic characters may have exactly the same shape, and are distinguished from each other only by the addition of a complementary character1. These are normally a dot, a group of dots or a zigzag (hamza). These may appear on, above, or below the base line and are positioned differently, for instance, above, below or within the confines of the character. Figure 2.1 depicts three sets of characters, the first set having three characters and the other set two and five characters. Clearly, each set contains characters which differ only by the position and/or the number of dots or zigzag shape (hamza) associated with it. It is worth noting that any erosion or deletion of these complementary characters results in a misrepresentation of the character. Hence, any thinning algorithm needs to efficiently deal with these dots so as not to change the identity of the character. 1 Complementary characters: a portion of a character that is needed to complement an Arabic character 72 Figure 2.1. Arabic characters differing with dots or hamza ii. Arabic writing is cursive such that words are separated by spaces. However, word can be divided into smaller units called sub words (a portion of a word including one or more connected characters).Some Arabic characters are not connectable with the succeeding character. Therefore, if one of these characters exists in a word, it divides that word into two sub words. These characters appear only at the tail of a sub word, and the succeeding character forms the head of the next sub word. Figure 2.2 shows four Arabic words with one, two, and three sub words. The first word consists of one sub word which has four letters; the second has two sub words with two letters, respectively. The third has four sub words with three and one letter. The last word contains four sub words, each consisting of only one letter. ﻓﻴﺼﻞ ﻳﺎﺳﺮ ﻋﺒﺪاﻟﺮﺣﻤﻦ رزاق Figure 2.3: Different styles and fonts for the writing of Arabic text. 2.2 CHARACTERS RECOGNITION Character recognition systems can throw in greatly to the advancement of the automation process and can improve the interaction between man and machine in many applications, including office automation, check verification and a large variety of banking, business and data entry applications. Figure 2.2: Example of Arabic sub-word iii. Arabic writing can be, in general, classified into typewritten (Naskh), handwritten (Ruq’a) and artistic (or decorative Calligraphy, Kufi, Diwani, Royal, and Thuluth) styles as shown in Figure 2.3. Handwritten and decorative styles usually include vertical combinations of characters called ligatures. This feature makes it difficult to determine the boundaries of the characters. Furthermore, characters of the same font have different sizes (i.e. characters may have different widths even though the two characters have the same font and point size). Hence, word segmentation based on a fixed size width cannot be applied to Arabic. There are two recognition approaches applied to printed and handwritten Arabic character recognition. These can be classifies as follows. First are Holistic strategies in which the recognition is performed on the whole representation of words and where there is no need to identify characters individually ((Dehghan, 2001); (Khorsheed, 2003) ;(El-Hajj et al., 2005); (Alma'adeed et al., 2004), (Souici-Meslati et al, 2004) (Snoussi Maddouri et al, 2002)). The second are Analytical strategies in which words are not considered as a whole, but as sequences of small size units and the recognition is not directly performed at word level but at an intermediate level dealing with these units( (Altuwaijri and Bayoumi,1995); (Abuhaiba et al. , 1998); (Fahmy and Al Ali , 2001); (El-Dabi et al., 1990); (Hashemi et al., 1995); (Mostafa, 2004); (Nawaz, S.N, Sarfraz, M., Zidouri, A. and Al-Khatib, 2003); (Sari, T., Souici, L. and Sellami, M, 2002)). Character recognition systems need more than one stage to arrive at the recognition stage. In the next section described for that stages. 73 preprocessing that prepares a concise representation of the word image in order to be segmented. The following techniques are common representation process. 2.3 Arabic Characters Recognition Stages The Arabic character recognition system can be decomposed into a number of stages: pre-processing, representation, stroke or character segmentation, features, and recognizer. Some approaches do not use all of these elements but only a subset. These stages are description in table 2.3. • The vertical projection method helps in detecting the white spaces and the junction lines between the adjacent characters by counting the black pixels in each column of the word image. Although this method is not efficient in the handwritten script due to the overlapping and skew problems (Zahour, 2001). It has been used with the horizontal projection to analyze the word image into lines, word and characters these methods are based on the fact that the connection stroke between characters is always of less thickness than other parts of the word. In these methods, the vertical and horizontal projections of the image are obtained. Table 2.3: Components of OCR Recognition Component Description Pre‐processing Like noise removal, detection ,similar Representation Like Skeletons ,contours, pixels Segmentation segment words or sub-words or characters , strokes or other unit text The horizontal projection is defined as: h (i) = ∑ p (i, j) Information passed to the recognizer like shape attribute , pixels Features Recognizer Vertical and horizontal Projection And the vertical projection as: v (j) = ∑ p (i, j) Algorithm that identifies letters Where i is the row number and j is the column number .P is the pixel value. It is 0 for white pixel (or background), or 1 for black pixel (or for ground). First, an image is cleaned with image processing techniques. It may be converted to a more short representation, and then features are detected from words or characters. With the features as input, a recognizer returns the identified text string. The term “features” does not necessarily refer to structural or pre-computed items, but any quantities approved the recognizer. They may be pre-computed for use in segmentation, computed on individual letters after segmentation, or both as show in figure 2.4 . Figure 2.5 shows the horizontal and vertical projection profiles of an Arabic sentence after removing the secondaries. The longest spike in Figure 2.4(c) represents the baseline. The thickness of baseline is resolute by computing the thickness of the longest spike, taking the most repeated column-height (Timsari & Fahimi 1996), or considering the position of loops as a reference as they are always close to the baseline (Olivier et al.,1996). Among other basic information that can be computed from the projection profile are the width, height and number of connected components sub-words (AlYousefi & Udpa 1992; Mohammed 2006). The segmentation methods that depend in this technique discussed in section 2.4.2. Figure 2.5: Horizontal and vertical projections (Mohammed, 2006) Figure 2.4: Arabic Characters Recognition stages 2.4.1 Preprocessing and Representation • The techniques of preprocessing stage are divided into 2 types of techniques according to their function. First the techniques function to produce clean and usable raw data such as: noise reduction. Second normalization and smoothing the techniques function to prepare the data image to be segmented such as: vertical and horizontal projection, contour tracing and skeleton extraction. These techniques represent the preprocessing of the segmentation process. The segmentation approach defines its Thinning (skeleton extraction) The thinning operation means creating the skeleton of the image. A skeleton is a one pixel width created by highlighting the centerline of the word image. It helps in restoring the essential information about the word figure 2.6 Show an example of image thinning.( Abuhaiba et al., 1994; Amin et al. ,1996; Khorsheed & Clocksin 1999, Cowell & hussain, 2001) claim that extracting segment. 74 The special representation of “contour of projections” was employed by Dehghani et al. in 2001. The task was Persian character recognition, and pre-processing included median and mathematical morphological filtering, linearization, scaling, and centering. Regional projection contour transformation (RPCT) was used , so the image was projected in multiple directions (here, horizontal and vertical), and the chaincode contour of each projection was obtained. The contour was sampled and features were obtained for each section using a two-dimensional pattern, the number of active pixels, and slope and curvature. Separate feature vectors from the contours of horizontal and vertical projections were computed and modeled by individual HMMs, yielding two HMMs per character. During recognition, scores from individual classifiers were integrated to improve performance. The size of the training and testing sets was not provided. Recognition rates were 92.76% on the training set and 71.82% on the test set. Another segmentation method depended on thinning discuses in section 2. Figure 2.6.: A Word image and its skeleton In 1994, Abuhaiba et al. proposed a set of character graph models to recognize isolated letters (Abuhaiba et a, 1994). Each model was a state machine with transitions corresponding to directions of segments in the character and with additional “fuzzy” constraints to distinguish some characters. Each letter’s skeleton was converted to a tree structure which was matched to a model by a rule-based recognizer. Test data was written by four people. Recognition rates depended on tuning the models after experiments on letters by each writer, and thinning errors caused recognition errors. Amin et al. also used a skeleton-based graph representation for the recognition of single letters (1996) . Structural features including curves were fed into a five-layer neural network. The network was trained with 2000 characters, retrained with 528 of the 2000, and tested with another 1000 by 10 writers. A 92% recognition rate was obtained. Difficulties included spurious thinned lines, incorrect curve directions, and the need to modify rules during testing. • The baseline is an imaginary line to connect the characters of the word. The baseline in the segmentation stage is usually detected. It helps in distinguishing the strokes of the characters. Several methods have been published for detecting the baseline. J. Kanai et al utilized the projection profile technique to detect the fiducially points by decoding by decoding the lowest resolution layer of the image ( Kanai et al., 1998). Detecting the baseline is a common step in many off-line handwritten Arabic OCR systems and it is often an important step before the segmentation and the feature extraction steps. Extending the work in (Abuhaiba et al., 1994) proposed a system for the recognition of free handwritten text in 1998 . It used the skeleton representation and segmented sub-words into strokes that were further segmented into “tokens”. Tokens are single vertices representing dots or loops or sequences of vertices. The recognizer was a “fuzzy sequential machine” which consisted of classes to be recognized, sets of initial and terminal states, stroke directions used for entering states, and a function for transitioning between states. Tokens were recognized if possible, else used to augment the recognizer. When needed, the user interactively grouped tokens into meaningful “token strings”. To detect lines of text, strokes from the entire page were partitioned using a minimal spanning tree algorithm. Another graph algorithm grouped strokes into characters and sub-words. 13 pages by 13 writers were used for training, and another 20 pages by 20 writers were used for testing. Writers were asked to write in a particular style, to write the main stroke without lifting the pen, to omit diacritics, and to avoid generating blobs, but most did not comply with these constraints. Sub-word and character recognition rates of 55.4% and 51.1% were obtained. No lexicon was used. In addition to the technical method, this publication is important since it generalized the domain to free handwriting. • baseline Detection For instance, El-Hajj et al. confirmed the benefit of features based on upper and lower baselines, within the context of frame-based features with an HMM recognizer , included features measuring densities, transitions, and concavities in zones defined by the detected baselines. The system was tested on the IFN/ENIT database a smaller amount that has fewer than eight images. For each of four experiments, the system was trained on three of the four image sets and tested on the remaining set. In their experiments, the addition of the baselinedependent features to similar measurements that do not use those zones significantly improved recognition. 2.4.2 Segmentation The segmentation phase is a compulsory step in recognizing printed Arabic text. Any error in segmenting the basic shape of Arabic characters will produce a different representation of the character component. One of the recognition strategies need the segmentation stage which is the analytical strategies as discussed in section 2.3. These strategies are sub classed into two techniques have been applied for segmenting machine printed and handwritten Arabic words into individual characters: implicit and explicit segmentations. Contour Tracing In the implicit segmentation also called internal segmentation, words are segmented into letters and recognized simultaneously. This type of segmentation is usually designed with rules that attempt to identify all the character’s segmentation points. Many rules must be constructed manually to achieve good accuracy. So the higher the number of rules, the higher the recognition. The tracing of the contour aims at transforming the border of the word into a string of codes to extract the features of the image. The coding scheme starts by identifying the position of an initial pixel and continues identifying the relative positions of the successive pixels on the contour until reaching the starting pixel. The Freeman chain code is widely used as a scheme for features extraction which to be employed either for the segmentation process or for the recognition process. In the explicit segmentation or external segmentation words are externally segmented into pseudo-letters which are then recognized individually. This approach is usually more 75 expensive due to the increased complexity of finding optimum word hypotheses. In character recognition, the essential information about a shape is stored in its skeleton (Abuhaiba et al. ,1994; Khorsheed & Clocksin, 1999) claimed that extracting segments from the skeleton graph is more reliable than finding the actual connection points in a word. In general, many algorithms have been proposed to extract skeletons, but those specifically designed for Arabic text are (Tellache et al. ,1993; Altuwaijri & Bayoumi 1995; Altuwaijri & Bayoumi 1998; Cowell & Hussain 2001). Almuallim and Yamaguchi (1987) also detected the baseline of the thinned word. Then the words are segmented into strokes. The extraction of a stroke is made by finding out its start point. The search for the start point is done just around the baseline, and then the curve is traced until a point which is inferred to be the stroke end point is reached. An end point can be a branch point, a cross point, a line end or a point with sudden change in the curvature (up or down) after a horizontal motion near the baseline. During the segmentation process, if the current stroke is connected to the next stroke then the difference between the y—coordinate of the connection point and the current baseline is calculated. If it happened that this difference was bigger than a certain threshold, then the baseline is adjusted and given the value of the average of the coordinates of the connection points found so far. However, the studied in Arabic character segmentation divided according to segmentation strategies. 2.3.2.1 Explicit segmentation There is more studied standard in explicit strategy, and used different method of representation in them works. There is more techniques standard on Projection methods, on character Skeleton or on Contour Tracing. In the work of Zheng et al the sub-words consisting of one character were excluded first as they do not need to be segmented. Nevertheless, the algorithm used to exclude those single characters was not able to detect all of them correctly. Furthermore, it was also not explained how to count the number of characters in each sub-word. The vertical projection is then scanned to search for points near the baseline where they changes from low to high values. Those points are considered beginning of characters. Where the points of change from high to low values are the end of characters. Then, some rules are used to verify those potential segmentation points. The method was only tested on non-overlapping fonts and segmentation rate of 94% was reported (Zheng et al,2004). In (Amin & Al-Sadoun 1992; Al-Sadoun & Amin 1995), the authors traced the thinned word from right to left using a 3 x 3 window to identify potential points for segmentation. Then, a binary tree is constructed and the skeleton is represented using Freeman code (Freeman 1968). Each node of the binary tree describes the shape of the corresponding part of the sub word. The binary tree is smoothed to minimize the number of nodes by eliminating the empty nodes, minimize the freeman code string, and to eliminate or minimize any noise in the thinned image. Finally, the binary tree is segmented into sub trees such that each sub tree describes a character using primitives including lines, loops, and double loops. Some rules were set to ensure the correct boundaries of characters such as: long horizontal segment signals the end of the current character, and the existence of loops or a long vertical segment are regarded as the beginning of a character. The algorithm can be applied to any font and size of Arabic text, in addition, it can be applied to hand printed text and permits the overlay of characters (Amin et al. 1996), and however, due to the erosion experienced in the image, some of the characters were not segmented properly. The method was adopted in (Amin 2001). One advantage of this method is that the identification of the baseline becomes unnecessary since the sub word is described by a binary tree, hence, saving processing time (Amin & Al-Sadoun 1992). In other hand, Nawaz et al. and Sarfraz et al. used the vertical projection of the middle zone instead of the projection of the entire word. They identified four text line zones, i.e. the upper, middle, baseline and lower zones. The baseline zone is the one with the highest density of black pixels, any zone just above the baseline and twice the thickness of the baseline is the middle zone. The vertical projection of the middle zone is constructed. A fixed threshold is used for segmenting the word into characters. Whenever the value of the vertical projection of the middle zone is less than two third of the baseline thickness, the area is considered as a connection area between two characters. Any area follows the connection area with a larger value is regarded as the start of a new character, as long as the profile is greater than one third of the baseline. The method was designed for the recognition of the Naskh font. It is clear that this method may over-segment characters such as u. However, the authors tried to resolve this problem in the recognition stage(Nawaz et al. ,2003; Sarfraz et al 2003). Although, in the work of Altuwaijri and Bayoumi (1995) constructed the vertical projection for each sub-word excluding the pixels of the baseline and secondaries. Potential segmentation points are then determined using the minimum projection values and verified by some rules which are designed to avoid over-segmentation. .( Altuwaijri and Bayoumi ,1995) Jambi (1991) constructed the vertical projection of the thinned word where dots were removed. The start and end points of characters are determined from the vertical projection; these points could be actual points or just candidates. The actual start point is determined if there is a change from 0 to non-zero in the vertical projection, while the actual end point is determined if there is a change from non-zero to 0. The candidate start point is determined if there is a change from 1 to a greater value, while the candidate end point is determined if there is a change from a higher value to 1. Due to the different widths of Arabic characters, it is not easy to avoid over-segmentation, however, some inconsistencies can be detected easily such as having two consecutive ends, but some are still difficult to be determined such as u which has two actual and six candidate starting and ending points. Applying this method will segment the tail of when appear at the end of a word or in isolated form. The method was adopted by Abandah and Khedher (2004). This method needs further processing in the presence of vertical The previous methods were designed to segment the Arabic printed characters. The method developed by Fahmy et al (2001) was devoted to segment handwritten text. The maximum and minimum peaks are found from the vertical projection. The word is then segmented into vertical strips (frames). The boundaries of the strips are then defined to be the midpoints between adjacent maximum / minimum pairs. To ensure that the frames are of proper widths, the very short ones are eliminated and the long ones are divided into shorter and the separation point is chosen to be a portion of the character height. Then, each frame will be divided into three horizontal areas; one below the baseline and two above, from which features will be extracted. However, the results are not perfect; for instance, a character such as طis divided into two frames( Fahmy et al ,2001) 76 overlaps. Although, in (Mostafa ,2004) proposed an adaptive rulebased segmentation algorithm based on the general structural relationship of the Arabic text. The main rule used is that In most characters start with, and end before a T—junction on the baseline”. A T— junction occurs when the drawing of the character goes up or down the baseline. This holds for all character shapes at the middle and the end of a word, however, few characters such as u and سhave more than one T— junction with the baseline, and should need a special treatment. Structural features such as strokes, dots, loops, curves, character relative width and height, and baseline relative position are extracted from the skeleton using some rules. Finally, the characters are segmented by grouping its components, e.g. loops with their bulges like ـﺼـ. Dots are used to help line the grouping process. The method is noise-independent, Omni-font and Omni-size. However, the method was tested on Simplified Arabic font only and the reported segmentation accuracy was 96.5%. segmentation rate was 86%. The algorithm suffers from oversegmentation in cases of characters like سand from undersegmentation in cases of ligatures. The authors claimed that their method does not need slant correction. Mostafa and Darwish (1999) traced the upper contour of handwritten words searching for local minima, and at the same time traced the lower contour searching for local maxima. The determination of local minima and maxima are based on the negative and positive slopes. These points are marked as potential segmentation points. A matching process between upper and lower potential segmentation points is performed in order to obtain the minimum number of nonoverlapping potential segmentation points for each word. The algorithm achieved 97.7% correct segmentation. Among the advantages of this method, as reported by the authors, are: that it does not require the existence of a single baseline for the whole line or even for parts of the word; it is a writer independent and does not require any learning procedures. Also, the problems of segmenting overlapping and overhanging characters are completely surmounted. Furthermore, it can effectively split touching characters. Kandil and El-Bialy (2004) observed that the connection strokes are formed of two parallel lines. Hence, the contour is traced searching for this phenomenon. However, not only the connection strokes are formed of two parallel lines but also some other parts of the word. To overcome this problem, only columns having the two pixels in predefined middle zone are considered. The authors claimed that the method works for multiple font and size and can tolerate some skewness in the line. Recently, (Zidouri et al.,2005) scanned the skeleton (with no secondaries) from right to left to find a band of horizontal pixels having length greater than or equal to the width of the smallest character. Then the vertical projection is found, if no pixel is encountered, a vertical line is drawn as a guide for segmentation. The procedure is repeated for all rows. As a result, an image with several guide bands is obtained. Features are extracted from each guide band, a set of rules is then designed to select from and correct the guide bands. However, the method suffers form the problem of overlapping and ligatures which is left to the recognition phase to deal with. Among the drawbacks of the methods based on the extracted skeleton, is that different thinning algorithms may produce different thinned characters. Moreover, the thinning process might alter the shape of the character, especially in the case of poor quality characters. Some of the common problems encountered during the thinning process include the elimination of vertical notches in some characters and elimination of secondary characters. These modificatios make the segmentation of thinned characters a difficult task. This conclusion is in agreement with Amin (2001) and Cowell and Hussain (2001). Safabakhsh and Adibi applied a continuous-density variableduration hidden Markov model to the recognition of handwritten Persian words in the Nastaaligh style (2005). This style contains many vertically overlapping letters and slop letter sequences, which present problems for the ordering of characters and for baseline detection. Their system removed ascenders and descenders before the primary recognition stage to avoid incorrect orderings and was baseline-independent. Words were over-segmented into pseudo-characters using local minima of their upper contour. Eight features were computed for each pseudo-character. The HMM was path discriminate and included 25 character states each of which was divided into up to four sub-states to indicate position-dependent shapes. The lexicon consisted of 50 words chosen to include all characters and compound forms, and the training set contained two 50word scripts from each of seven writers. On a test set of two 50word scripts from two different writers and omitting words that showed error in an earlier stage of the method, the system achieved a 69% recognition rate with 5 iterations of the recognition step and a 91% rate with 20 iterations. The rates were 52.38% and 90.48% on 21 words not in the lexicon. Methods based on contour tracing avoid all problems resulted from the thinning process because it analyzes the structural shape of characters as they have been scanned. However, they are affected by noise on the contour, hence the contour need to be smoothed first. The set of boundary pixels or the contour includes important information of an object (Khorsheed 2002). Segmentation is also achieved by tracing the outer contour of a given word. The segmentation method used in the SARAT system (Segmentation And Recognition of Arabic printed Text) (Margner 1992) was based on the outer contour of the main body of the words. First, the start and the end points of the upper contour are determined. Then, a segmentation of the upper contour into parts is made having a curvature of the same sign. Starting with a positive curvature, for example, the change to a negative curvature will finish this segment and start with a new one. In another word, wherever the outer contour changes sign a character is segmented. Sari et al. (2002) proposed a method known as ACSA (Arabic Character Segmentation Algorithm) to segment Arabic handwritten text by detecting the local minima of the lower contour. The baseline is detected first, then, the sub-words and secondaries are extracted using the contour tracing. Using horizontal projection, three zones are determined for each subword; namely, upper, median and lower zones. Topological features such as turning points, holes, zigzag, ascenders and descenders are extracted. The segmentation point is defined as a local minimum in the lower outer contour. A set of rules is designed to validate the segmentation points. The reported 2.3.2.2 Implicit segmentation Recognition-based segmentation methods dissimilar the previously discussed methods which were considered as explicit segmentation methods, the recognition- standard techniques is an implicit one. In the implicit methods, characters are 77 segmented while being recognized. Hence, it is also called recognition Based segmentation or straight-segmentation. The basic principle of this approach is to use a mobile window of variable width to provide the tentative segmentations which are confirmed (or not) by the classification (Cheung et al. 2001). In other words, the system scans the image for components that match classes in its alphabets (Khedher & Abandah 2002). online, and one offline recognition, and two of there in Arabic character recognition. 2.4.1 Gildas Menier 2008: A GENETIC ALGORITHM FOR ON-LINE CURSIVE HANDWRITING RECOGNITION In (El-Dabi et al. 1990), the invariant moments are calculated and checked against the feature space of the font. If a character is not found, another column is appended to the underlying portion of the word and moments are calculated and checked again. This process is repeated until a character is recognized or the end of the word is reached. However, as the system is not always able to recognize all characters, which implied that all succeeding characters in that sub word would not be processed, backup scanning algorithm is triggered when such a blockage happened (Khorsheed 2002). To accelerate the recognition process the scanning can be done from both ends (Khorsheed & Clocksin 2000). The method allowed the system to handle overlapping and to isolate the connecting baseline between connected characters. This method seems to be limited to the recognition of typewritten fonts; furthermore, it is font dependent and sensitive to pattern variations. Also, the system uses intensive computations to compute the required accumulative moments. No figures are reported regarding the system recognition rate and efficiency (Abuhaiba et al. 1994; Abuhaiba 2003b). In (Auda & Raafat 1993) a similar approach is used in which slices are added to a window and a feature vector is fed into neural networks trying to recognize the character first before the segmentation. The reported segmentation rate was 83%. In (Zidouri et al. 2003; Zidouri 2004), the Minimum Covering Run (MCR) expression is used to represent the character by a number of strokes. MCR of a region of a binary image is the minimal combination of the run-length encoding in both horizontal and vertical directions. The features of those strokes are used to build reference prototypes for recognition by matching. The separation of words into characters is done automatically once characters composing parts are successfully identified and a correct match is found. This approach presents a genetic algorithm for on-line recognition of cursive handwriting. The GAs works with a population of solutions that called strings. Each string has a lexical – picture and graphic primitive list which described how the word is written. Each string is made with construction blocs called allograph. The GAs is used to find the best reconstruction of the word to be analyzed, based on graphic primitives and using the allograph list. It can be seen as an alternative analysis method for word recognition which does not require the definition of a scanning strategy. This system achieves 84% recognition in a manuscript test with a lexical set of 150 words and with a small allograph set. The recognition subset is of 160 words, included ten extra words not belonging to the lexicon. 2.4.2 Ramin Halavati 2006: EVOLUTION OF MULTIPLE STATES MACHINES FOR RECOGNITION OF ONLINE CURSIVE HANDWRITING this paper presents a novel Multiple States Machine as a general tool for elastic pattern recognition and use an evolutionary approach to create these machines. The major idea behind the machines is to develop and maintain different hypotheses about the given sequence of segments and gradually prove or prune them to reach a single final decision. It is implemented on Persian (Farsi) language using a typical feature set and a specific tailored genetic algorithm and the recognition and computation time is compared with dynamic programming comparison approach. The approach is tested over a set of Persian language test cases with 89% best recognition rate without dictionary and 96.1% with dictionary. It is also compared with pruned dynamic programming, showing an almost constant recognition speed while DP's computational time increases exponentially when the number of segments increase, resulting in more than 10 times faster results for 9 segment words and 100 times fast results for words with 13 segments. As can be noticed from the above discussion, the recognitionbased segmentation approach aims at overcoming the classical segmentation serious problems. Hence, no accurate character segmentation path is necessary. In principle, any of the other approaches can be used here as far as it has some recognition capabilities (Cheung et al. 2001) 2.4 Genetic algorithm & handwriting GAs are a class of optimization and search methods that use randomness to avoid local extreme solutions. They are capable of adaptive and robust search over a wide range of space topologies. GAs were envisaged by Holland (1975) in the 1970s as an algorithmic concept based on a Darwinian theory ‘‘survival of the fittest’’ with sexual reproduction, where stronger individuals in the population have a higher chance of creating an offspring. GAs are distinguished from other techniques by a principal characteristic: they search intrinsically parallel fashion from several solutions and not from a single solution (Pernkopf and Bouchaffra, 2005; Schneider et al., 2005). GA is too an iterative algorithm that depends on the generation-by-generation development of possible solutions, with selection schemes permitting the elimination of bad solutions and the replication of good ones that can be modified. 2.4.3 shashank mathur 2008: OFFLINE HANDWRITING RECOGNITION USING GENETIC ALGORITHM The handwriting recognition model described here works at three stages, segmentation of the handwritten text, recognition of segmented characters with the help of artificial neural networks and lastly selecting the best solution from the four artificial neural network outputs with the help of genetic algorithm. A robust algorithm for handwriting segmentation has been described here with the help of which individual characters can be segmented from a word selected from a paragraph of handwritten text image which is given as input to the module. Then each of the segmented characters are converted into column vectors of 625 values that are later fed into the advanced neural network setup that has been designed in the Now we will take a look at the works that used the GAs in the handwriting recognition we find five works four of there in 78 In other hand, the problem of recognition the Arabic characters stile the interested area in a lot of studies where used different methods to solved that problem. El-Hajj 2005 used HMMs with four states. Khorsheed 2003 used One HMM from 32 character he used HMMs with unlimited jumps. Other researchers used the neural network to solve that problem like Fahmy and Al-Ali (2001) and Souici-Meslati 2004. Other researchers detected special roles to solve that problem. In Abuhaiba 1993-1998 their rules to match tree structures to graph models, in El-Dabi et al. 1990 their rules of portion the word and Calculated the invariant moments, and in Nawaz & Sarfraz(2003) Identified 4 text line zones: the upper, middle, baseline and lower zone . form of text files. The networks has been designed with quadruple layered neural network with 625 input and 26 output neurons each corresponding to a character from a-z, the outputs of all the four networks is fed into the genetic algorithm which has been developed using the concepts of correlation, with the help of this the overall network is optimized with the help of genetic algorithm. The algorithms were tested with 200 handwritten samples out of which 142 samples were correctly recognized providing with an overall efficiency of 71.0% 2.4.4 Kherallah and et al.: On-line Arabic handwriting recognition system based on visual encoding and genetic algorithm Howevere, some researchers used GAs to recognized Arabic characters. Alimi 1997 used Fuzzy Neural Network and GAs to select the best combination of characters recognized by a fuzzy neural network. Where Kherallah and et al 2008 used visual encoding and GAs in on-line Arabic handwriting. Although, this work discus a lot of works in representation the image of Arabic word, segment it and recognize it. In this approach, a GAs has been developed in order to select the best combination of visual codes extracted from a word by the heuristic method. The evolutionary approach here permits the recognition of cursive handwriting without the limitation of a lexical dictionary. It has been known that there is no guarantee that global optimization can always be found by using (GAs). Therefore, the convergence of GAs algorithm is assured by the technique given in the fitness function which consists in the use of the visual codes of Arabic words and the comparison method established between the visual indices strings. The number of generations (500) and the fitness value (0.5) are fixed as a convergence condition criterion. The average of the recognition rate found is about 97%. 4. CONCLUSION AND SUGGESTED WORK In this paper a comprehensive review in the literature review in the stage of Arabic character recognition. It is concluded that the data bases that the researcher used were a small data with limited number of the words. So this area needs a large data base with different types of words and paragraph. Therefore this area of research is still open for further enhancement. Extensive research need to be conducted. 2.4.5 Alimi: Evolutionary Neuro-Fuzzy Approach to Recognize On-Line Arabic Handwriting 5. Alimi set forth a complete system that segmented letters according to an understanding of the way that humans write. Given that an Arabic letter can have at most 6 strokes and that a stroke is defined as an asymmetric bell-shaped function of curvilinear velocity with the speed tapering off at the end of the stroke, a system can automatically segment a letter into substrokes, which define that letter. Each character can be represented as 6 feature vectors. If the character has less than 6 strokes, the empty strokes are zeroed out. REFERENCES A Benouareth, A Ennaji, M Sellami. (2006). HMMs with Explicit State Duration Applied to Handwritten Arabic Word Recognition. 18th International Conference on Pattern Recognition (ICPR'06), 2, pp. 897-900. A. Alimi and O. Ghorbe. (1995). The analysis in an on-line recognition system of Arabic handwritten characters. Proc. 3rd Int. Conf. on Document Analysis and Recognition,, (p. pp. 890Ð893). Canada. A. Amin and G. Masini. (1985). Deux Methodes de Reconnaissance de Mots pour lÕEcriture Arabe Manuscrite. Reconnaissance des Formes et Intelligence Artficielle . This set of feature vectors was given to a fuzzy beta radial basis function neural network to recognize various letters. The strokes were overlapped to give all possible combinations of strokes into letters. These overlapped outputs were passed to a genetic algorithm to robustly recognize words. Through a series of mutations and crossovers, the letters were segmented out and recognized. Reported accuracy was 89% without dot and diacritical information (Alimi,1997). A. Amin and G. Masini. (1982). Machine recognition of cursive Arabic words. Application of Digital image Processing Vol. IV G. Tescher , pp. 1127Ð1135. A. Amin and M. Kavianifar. (1997). Automatic Recognition of Printed Arabic Text Using Neural Network Classifier. Image Analysis and Processing , 616-623. 3. 4. DISCUSSION/CRITICAL ANALYSIS OF LITERATURE A. Amin, G. Masini and J. P. Haton. (1984). Recognition of handwritten Arabic words and sentences. Proc. 7th Int.Conf. on Pattern Recognition, (p. pp. 1055Ð1057). Montrea. In this paper more studies are discussed in Arabic Character Recognition. Some of their done according to the recognition stages, and other focused in one stage like the segmentation or representation of the image. In Zheng (2004) the hourizental and vertical projection used to detect the baseline and segment the word to characters only without recognition by detect the point at which histogram value changes from low to height and the upset point. Also in Mostafa(2004) focused in segmentation stage, where used the skeleton algorithm to detect the strokes, dots, loops, curves, character width and height and baseline position. A. Amin, M. Bemford and A. Hocman. (1996). A knowledge acquisition technique for recognizing Hand-printed Chinese characters. Proc. 13th Int. Conf. on Pattern Recognition, (p. pp. 254Ð258). Austria. Abandah, G.A. and Khedher, M.Z. (2004). Printed and handwritten arabic optical character recognition-initial study. Technical Report, University of Jordan. Amman, Jordan: aug. Abdullah, S. A. (2007). Off-line handwritten Arabic Characters Segmentation using rotation Invariant segment Feature(RISF). 79 Cowell, J. and Hussain, F. (2001). Thinning Arabic characters for feature extraction. IEEE Conference on Information Visualization (pp. 181-185). London, UK: 25-27 Jul. Thesis Submitted in Fulfilment of the requirement for the degree of Master of science. Abuhaiba, I. , Mahmoud, S. and Green, R. (1994). Recognition of handwritten cursive Arabic characters. IEEE Transactions on Pattern Analysis and Machine Intelligence(PAMI) , 6 (16), 664672. Ehrich, E. M. Riseman and R. W. (1971). Contextual word recognition using binary iagrams. IEEE ¹rans Comput , c-20, 397Ð403. El-Dabi, S.S , Ramisis, R. and Kamel, A. (1990). Arabic character recognition system: a statistical approach for recognizing cursive typewritten text. Pattern Recognition , 5 (23), 485-495. Abuhaiba, I. (2003b). A discrete arabic script for better automatic document understanding. The Arabian Journal for Science and Engineering , 28 (1B), 77-94. ALIMI, A. M. (1997). An Evolutionary Neuro-Fuzzy Approach. IEEE , 0-8 186-7898-4/97. El-Khaly, F. and Sid-Ahmed, M.A. (1990). Machine recognition of optically captured machine printed Arabic text. Pattern Recognition , 23 (11), 1207-1214. Almuallim, H and Yamaguchi, S. (1987). A method of recognition of Arabic cursive handwriting. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) , 9 (5), 715-722. El-Yacoubi, A., Gilloux, M., Sabourin, R. and Suen, C. (1999). An HMM-based approach for off-line unconstrained handwritten word modeling and recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence , 21 (8), 752– 760. Al-Sadoun, H. a. (1995). A new structural technique for recognizing printed Arabic text. International Journal of Pattern Recognition and Artificial Intelligence , 9 (1), 101-125. Fahmy, M.M.M , Al Ali, S. (2001). Automatic recognition of handwritten Arabic characters using their geometrical features. Jurnal of Studies in Informaticas and Control , 10 (2). Altuwaijri , M.& Bayoumi, M. (1995). A new thinning algorithm for Arabic characters using self-organizing neural network. IEEE International Symposium on Circuits and Systems (ISCAS'95), (pp. 3:1824-1827). Seattle, WA, USA. Al-yousefi, H. and Udpa, S.S. (1992). Recognition of Arabic characters. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) , 8 (14), 853-857. Farah, M.G., Rygh, J.H, Steen, T.W, Selmer, R., Heldal, E. & Bjune, G. ((2006)). Patient and health care system delays in the start of tuberculosis treatment in Norway. BMC Infectious Diseases 6 , 1186/1471-2334-6-33. Amin, A and Al-Sadoun, H. (1992). Anew segmentation technique of Arabic text. 11th International Conference on Pattern Recognition: Methodology and Systems(ICPR'92). 2, pp. 441-445. The Hague, Netherlands: 30Aug- 3 Sep. Freund, R. (1992). Syntatic analysis of handwritten characters by quasi-regular programmed array grammars, in Advances in Structural and Syntactic Pattern Recognition, H. Bunke. pp. 310Ð319. Amin, A. (1985). Arabic handwritten recognition and understanding. pp. 1Ð40. G. Kim Govindaraju, V. (1997). A lexicon driven approach to handwritten word recognition forreal-time applications. IEEE Transactions on Pattern Analysis and Machine Intelligence , 19 (4), 366-379. Amin, A. (1987). IRAC: Recognition and understanding systems in Applied Arabic ¸linguistic and Signal and Information Processing. pp. 159Ð170. Gildas MENER, Guy LORETTE, Philippe GENTRIC. (1994). A Genetic Algorithm for On-line Cursive Handwriting Recognition. IEEE , 1051-4651/94. Amin, a. (1993). Issue on Arabic character recognition, Arabian. Arabian J.Sci. Engng , 319D341. Guyon, I. (1991). Application of neural network to character recognition, in Character and Handwriting Recognition in Expanding Frontiers. P. S. P. Wang , pp. 353Ð382. Amin, A. (1982). Machine recognition of handwritten Arabic word by the IRAC II system. Proc. 6th Int Conf. on Pattern Recognition, (p. pp. 34Ð36). Munich, Germany. Hashemi, M.R, Fatemi, O. and Safavi, R. (1995). Persian cursive script recognition. 3th international Conference on Document Analysis and Recognition(ICDAR'95), 2, pp. 869873. Montreal, Canada. Amin, A. (1998). Off-line Arabic Character Recognition:The State of the art. Pattern Recognition,Vol. 31,No 5 , PP. 517-530. Amin, A. (2003). Recognition of hand-printed characters based on structural description and inductive logic programming. Pattern Recognition Letters , vol. 24, pp. 3187-3196. http://ar.wikipedia. (2009). http://ar.wikipedia.org/wiki/ Retrieved from Arabic OCR. (2007). Retrieved from http://wiki.arabeyes.org/Arabic_OCR#Optical_Character_Reco gnition I. S. I. Abuhaiba, M. J. J. Holt, and S. Datta,. (1998). Recognition of Off-Line Cursive Handwriting. Computer Vision and Image Understanding , vol. 71, pp. 19-38. Auda, G and Raafat, H. (1993). An automatic text reader using neural networks. Canadian Conference on Electrical and computer Engineering. 1, pp. 92-95. 14-17 Sep. J. Kanai and A. D. Bagdanov. (1998). Projection profile based skew estimation algorithm for JPIG compressed images. Int. J. Document Anal.Recognition , 1 (1), 43-51. Ben Amara, Najoua Essoukri. (2003). Classification of Arabic script using multiple sources of information: State of the art and perspectives. International Journal on Document Analysis and Recognition , 5, 195-212. J. W. The and R. T. Chin. (1988). On image analysis by the methods of moments. IEEE ¹rans. Pattern Anal Mach. Intell. PAMI-10 , 496Ð508. Jambi, K. (1991). Design and implementation of a system for recognizing Arabic handwritten words with learning ability. M.Sc Thesis. Illinois Institute of Technology. C. Y. Suen and C. L. Yu. (1990). Performance Accessmant of a character recognition Expert System. Int. Expert System application EXPERSYS 90 , pp. 195Ð200. Kandil, A.H. and El-Baily, A. (2004). Arabic OCR: a centerline independent segmentation technique. International conference cheung, A., Bennamoun, N.W. (2001). An arabic optical character recognition system using recognition-based segmentation. Pattern recognition , 34 (2), 215-233. 80 Computer Mohammed, A. M. (2006). Segmentation of Arabic characters using Voronoi Diagrams.Phd thesis. UKM, Bangi. Kheder,M.Z and Abandah, G. (2002). Arabic character recognition using approximate stroke sequence. 3rd International conference on Languge Resources and Evalution (LREC'02), Workshop on Arabic Language Resouces and Evaluation:Status and Prospects. Las Palmas de Gran Canaria, Spain: 1 Jun. Mostafa, K and Darwish, A.M. (1999). Robust baselineindependent algorithms for segmentation and reconstruction of Arabic handwritten cursive script. SPIE Proceedings Document and recognition and Retrieval VI. 3651, pp. 73-83. san Jose: Jan. on Electrical, Electronic and Engineering(ICEEC'04) (pp. 412-415). 5-7 Sep. Mostafa, M. (2004). An adaptive algorithm for algorithm for the automatic segmentation of printed Arabic text. 17th Natinal Computer Conference, (pp. 437-444). Madinah, Saudi Arabia. Khorsheed, M. S. (2003). Recognising handwritten Arabic manuscripts using a single hidden Markov model. PatternRecognition Letters, vol. 24 , pp. 2235-2242. Nawaz, S.N, Sarfraz, M., Zidouri, A. and Al-Khatib. (2003). An approach to offline Arabic character recognition using neural networks. 10th IEEE International Conference on Electronics, Circuits and Systems(ICECS'03), 3, pp. 1328-1331. W.G. Khorsheed, M.S. and Clocksin, W.F. (1999). Structural features of cursive Arabic script. 10th British Machine vision Conference (BMV'99). 2, p. 422431. University of Nottingham, UK: Sep. O. Olivier, H. Miled, K. Romeo, and Y. Lecourtier,. (1996). Segmentation and coding of Arabic handwritten words. in Proc.13th International Conference on Pattern Recognition, 3, pp. 264-268. Koerich, A. S. (2003). Large vocabulary off-line handwriting recognition:a survey. Pattern Anal. , 6 , 97–121. L. LikfoomanÐSolem, H. Maiutre and C. Sirait. (1991). An expert and vision system for analysis of Hebrew characters and autheutication of manuscript. Pattern Recognition 24 , 121Ð137. Pernkopf, F. and Bouchaffra, D. (2005). Genetic-based EM algorithm for learning Gaussian mixture models. IEEE Transactions on Pattern Analysis and Machine Intelligence 27 (8) , 1344–1348. L. Souici-Meslati and M. Sellami. (2004). A Hybrid Approach for Arabic Literal Amounts Recognition. TheArabian Journal for Science and Engi neering, vol. 29 , pp. 177- 194. Plamondon, R. and Srihari, S. N. (2000). On-line and off-line handwriting recognition: a comprehensive survey. IEEE Transactions on Pattern Analysis and Machine Intelligence , 22 (1), 63–84. L. Souici-Meslati1, Mokhtar Sellami1. (2006). Toward a generalization of neuro-symbolic recognition: An application to arabic words. International Journal of Knowledge-Based and Intelligent Engineering Systems , 10 (5), 347-361. Plamondon, R. (2000). On-Line and Off-line Handwriting Recognition, A Comprehensive Survey. IEEE Transaction on Pattern Analysis And Machine Intelligence , 22:1. L.S. Oliveira and R.Sabourin. (2003). A Methodology for feature Selection using Multiobjective Genetic Algoriyhms for Handwritten digit string recognition. International Journal of Pattern Recognition and Artifical Intelligence Vol. 17, No. 6 , 903-929. R. El-Hajj, L. Likforman-Sulem, and C. Mokbel. (2005). Arabic Handwriting Recognition Using Baseline Dependant Features and Hidden Markov Modeling. in Proc. International Conference on Document Analysis and Recognition, (pp. pp. 893-897). Seoul, Korea. Lorigo, L. G. (2006). Offline Arabic handwriting recognition: a survey. IEEE Transactions on Pattern Analysis and Machine Intelligence , 28 (5), 712-724. R. Safabakhsh and P. Adibi. (2005). Nastaaligh Handwritten Word Recognition Using a Continuous-Density VariableDuration HMM. The Arabian Journal for Science and Engineering , 30, 95-118. Lorigo, L.M. and Govindaraju, V. (2006). Off-line Arabic Handwriting Recognition: A Survey. IEEE TRANSACTIONSON PATTERN ANALYSIS AND MACHINE INTELLIGENCE. , 712-724. Luger, G. (2005). Artificial Intelligence: Structures and Strategies for Complex Problem Solving. (5. Ed., Ed.) AddisonWesley. RAMIN HALAVATI, SAEED BAGHERI SHOURAKI, SAEED HASSANPOUR. (2006). Evolution Of Multiple States Machines For Recognition Of Online Cursive Handwriting. World Automation Congress (WAC)2006 (p. 10.1109/WAC). Budapest, Hungary: IEEE. M. Dehghan, K. F. (2001). Handwritten Farsi (Arabic) word recognition: a holistic approach using discrete HMM. Pattern Recognition , vol.34, pp. 1057-1065. S. Al-Emami and M. Usher. (1990). On-line recognition of handwritten Arabic characters. IEEE ¹rans. Pattern Anal Mach. Intell. PAMI-12 , 704Ð710. M. Kherallah, et al. (2008). On-line Arabic handwriting recognition system based on visual encoding and genetic algorithm. Engineering Applications of Artificial Intelligence , doi:10.1016/j. S. Alma’adeed, C. Higgens, and D. Elliman. (2002). Recognition of off-line handwritten Arabic words using hidden Markov model approach. in Proc. 16th International Conference on Pattern Recognition, vol. 3 , pp. 481-484. Madhvanath, S. G. (2001). The role of holistic paradigms in handwritten word recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence , 23 (2), 149-164. S. Snoussi Maddouri, H. A. (2002). Combination of Local and Global Vision Modeling for Arabic Handwritten Words Recognition. International Conference on Frontiers in Handwriting Recognition, (pp. pp. 128-135). in Proc. Margner, V. (1992). SARAT-A system for the recognition of Arabic printed text. 11th IAPR International Conference on Pattern Recognition Methodology and Systems(ICPR'92). 2, pp. 561-564. Horgue, Netherlands: 30 Aug-3 Sep. S.Alma’adeed, C. Higgens, and D. Elliman. (2004). Off-line recognition of handwritten Arabic words using multiple hidden Markov models. Knowledge-Based Systems, vol. 17 , pp. 75-79. S.J. Raudys and A. Jain. (1991). Small sample size effect in statistical pattern recognition. IEEE ¹rans. Pattern Anal Mach. Intell. PAMI-1 , 252Ð264. Maroy, M. B. (1979). Learning in syntactic recognition of symbols drawn on a graphic tablet. 166Ð182. 81 Sarfraz, M., Nawaz, S.N and Al-khuraidly, A. (2003). Offline Arabic text recognition system. International Conference on Geometric Modeling and Graphics (GMAG'03), (pp. 30-36). London, England. Sari, T., Souici, L. and Sellami, M. (2002). Off-line handwritten Arabic character segmentation and recognition system: ACSA. 8th International Workshop on Frontiers in Handwriting Recognition(IWFHR'8), (pp. 452-457). Niagara-on-the-lake, CA, USA. Schneider, G., Wersing, H. ,Sendhoff, B., et al. (2005). Evolutionary optimization of a hierarchical object recognition model. Man and Cybernetics B: Cybernetics 35 (3) , 426–437. Shashank Mathur, Vaibhav Aggarwal, Himanshu Joshi, Anil Ahlawat. (2008). OFFLINE HANDWRITING RECOGNITION USING GENETIC ALGORITHM. Sixth International Conference on Information Research and Applications. Varna, Bulgaria,: IBS-02-p03. S-W. Lee and Y-J. Kim. (1995). A new type of recurrent neural network for handwritten character recognition. Proc.3rd Int. Conf. On Document Ananlysis and Recognition, (p. pp. 38Ð41). Canada,. T. Matsunage and H. Kida. (1995). An experimental study of learning curves for statistical pattern classifiers. Proc.3rd Int. Conf. on Document Analysis and Recognition, (p. pp. 1103Ð1106). Canada. T. S. El-Sheikh and S. G. El-Taweel. (1990). Real-time Arabic handwritten character recognation. Pattern Recognition , 13 (12), 1323-1332 , vol13. Timsari , B & Fahimi, H. (1996). Morphological approach to character recognition in machine-printed Persian words. SPIE Document Recognition III. San Jose, CA. Vinciarelli, A. (2002). A survey on off-line cursive word recognition. Pattern Recognition , 35, 1433–1446. Vinciarelli, A. B. (2004). Offline recognition of unconstrained handwritten texts using HMMs and statistical language models. IEEE Transactions on Pattern Analysis and Machine Intelligence , 26 (6), 709–720. wikipedia. (2009). Retrieved from http://www.wikipedia.org/ Yuan-Kai Wang and Kuo-Chin Fan. (1996). Applying Genetic Algorithms on Pattern Recognition: an analysis and Survey. Proceeding of ICPR’ 96, IEEE . Zheng, L., Hassin, A.H. and Tang, Z. (2004). A new algorithm for machine printed Arabic character segmentation. Pattern Recognition Letters , 15 (25), 1723-1729. Zidouri, A. (2004). ORAN - Offline recognition of Arabic characters and numerals. International Symposium on Intelligent Multimedia, Video and Speech Processing. (pp. 703706). Hong Kong: 20-22 Oct. Zidouri, A. S. (2005). Adaptive dissection-based subword segmentation of printed Arabic text. 9th International Conference on Information Visualisation (pp. 239-243). premises of the Unversity of Greenwich: 6-8 Jul. 82