Dynamic recognition in the omni-writer frame : Application to hand

parallax still exist, owed to the thickness of the LCD, but since it remains constant, the user can adapt easily. Now, the use of handwriting recognition system ...
94KB taille 6 téléchargements 272 vues
Dynamic recognition in the omni-writer frame : Application to hand-printed text recognition Lo¨ıc Oudot, Lionel Prevost, Maurice Milgram Universit´e Paris VI, UFR 924, LISIF / PARC. 4 Place Jussieu, 75252 Paris Cedex 5, France [email protected], [email protected], [email protected]

Abstract We present in this contribution a new system dedicated to the analysis of hand-printed dynamic text. It appears as an alternative to merchandized personal digital assistants (PDAs) which are very user restricting and automatic cursive word recognizers that have been developed in laboratories and have still not reached a marketable target in spite of their accuracy. The whole treatment process, from the acquisition to the efficient reading, is integrated in a user-friendly interface. The results on an omni-writer text database are very encouraging. They should be improved by using a new lexicon driven expert that adapts the recognition to the writer.

1. Introduction We present here a dynamic handwriting recognition system, aimed at the analysis of hand-printed texts which can contain several lines of structured text. Hand-printed writing is less constraining than most of the PDA’s alphabet (Graphiti : figure 1.a). In fact, users just have to lift up the pen between each character (figure 1.b). Allowing for less segmentation problems than in cursive handwriting (figure 1.c). In addition, because we use 62 classes (26 lower case letters, 26 upper case letters and 10 digits) and a large dictionary, we are limited in a use of cursive handwriting recognizers because there are not robust enough in our conditions.

1.1. Data capture The on-line handwriting recognition systems use a pen tablet digitizer which supplies a sequence of    coordinates representing the text (context) to analyze. Data capture peripherals can be classified in two categories : with

(a)

(b)

(c)

Figure 1. Constrained writing (a) hand-printed writing (b) cursive writing (c).

or without visual feedback under the pen, each having its benefits and drawbacks. The first tablet digitizers did not generally include this visual feedback. So, while writing on the tablet, the user had to look at the computer display, and needed a significant time to adapt. Writing distortions could also be observed, precisely regarding the line basis of the text (tendency to write askew) as well as a spatial location problem due to the loss of synchronization between hand and eye movement. This problem could be avoided by putting a sheet of paper on to the tablet and use an ink pen. So the user got the same sensations as when writing with a usual pen. Nevertheless, he had to look at the display to edit or correct the text. In the last few years, technical improvement have given the possibility of using a pen to write directly on the LCD area. In that way, the visual feedback takes place directly under the pen and the user is able to write and manipulate the system without leaving the screen. The problem of the parallax still exist, owed to the thickness of the LCD, but since it remains constant, the user can adapt easily. Now, the use of handwriting recognition system becomes very attractive when applied on pen computers (keyboardless and mouse-less) gathering in the same device tablet digitizer, the display and the computational power. The pen can easily replace both interfaces (keyboard and mouse) as it combine data acquisition and pointing functions. In the same way, the ability to transcribe handwritten notes would be of great benefit in many applications. The emergence

and fast development of PDAs is an evidence of the interest that users have in such capabilities

1.2. System overview Text analysis allows one to improve letter and word recognition rates, using the richness of the context (text lines, punctuations, shapes...) and the possibility of incorporating a lexicon. A left to right structure of experts has been implemented, centered around a classifier of hand-printed characters; the first ones for pre-processing, and the seconds for postprocessings. All these experts are linked by a principle of induction/inference hypothesis (figure 2). This allows to enhance results from one expert to another. An expert uses the context and hypothesis misled by the previous one to emit new hypothesis. These latter are validated by the expert itself. We think that feedback should be added like in [1] in order to improve the system accuracy.

Figure 2. Expert synopsis. Pre-processings work on the sequence of coordinates stemmed from the tablet digitizer. First, the text is segmented into lines. Then, in one given line, strokes are merged to form characters. These latter are labelled : punctuation and diacritical are put aside while letters are classified. A blank detector allows one to emit word hypotheses which are compared to the word lexicon. Figure 3 gives a flow diagram of the whole recognition system. The next section is devoted to pre-processings. We describe the six experts taking over, respectively, the detection of line jumps, the strokes re-organization, the segmentation, the detection of basis lines, the character labelling and finally the blank detection. The section 3 recall in short last works issued on our dynamic classifiers on isolated character recognition [6]. We show the impact of lexical postprocessings on the system accuracy in section 4. Finally, section 5 deals with conclusions and future works.

2. Pre-processings

right, line jumps can be detected easily. One observes an abrupt pen displacement left and downwards (figure 4.a). Other line jumps can appear : when signing a document the pen displacement is right and downwards (figure 4.b). In order to increase the expert accuracy, we will use a two step system of emission / validation hypothesis. This expert can be broken into two parts: the first to emit hypothetical line jumps and the second to validate and keep only the real line jumps.

(a)

(b)

(c)

Figure 4. Line jump (a & b). Erroneous jump (c).

Hypothesis emission : All the lateral and downward pen displacements between two consecutive strokes, are considered as hypothetical line jumps. Decision thresholds are deducted from medium width   and height  of strokes composing the text. Then, any lateral displacement     such as     and    is an hypothetical jump. Both thresholds are computed considering all the stroke bounding boxes (at this stage of the process, characters are not yet at disposal). Hypothesis validation : Thanks to these line jumps, strokes belonging to one hypothetical text line can be isolated, allowing the approximation of basis lines. Comparing two consecutive basis lines lets one to check whether it is a real line jump (lines are distinct : figure 5.a) or not (lines are intertwined : figure 5.b). At the end of this stage and because of the hypothesis lack, basis lines will be approximated by straight lines.

(a)

(b)

Figure 5. Validation of line jumps.

2.1. Line detection 2.2. Temporal strokes re-organization The text can contain several lines. Therefore, it is necessary to extract strokes belonging to each line in order to allow the subsequent computing. As Latin writing is left to

The hand-printed text analysis is done character to character. It is necessary that diacritical follows temporally the

Figure 3. System synopsis. character to which it is linked. However, frequently the writer comes back to write in these signs. To suppress these breaks in the context flow, it is necessary to restore the signal, in order to have spatially close strokes are in the chronological order. So, when a backspace between two consecutive strokes is found, one looks for the best position for the stroke in the line i.e. presenting the best recovery rate in with another stroke (figure 6). The data flow being organized correctly, the strokes segmentation can be done.

¾  ½ Æ relative cover up: ½  ¾  Æ right offset:

The Æ threshold is auto-adaptive and varies along segmentation phase. The optimal segmentation results are obtained with a Æ  Æ   Æ   where Æ   ½  ½  and Æ   ¾  ¾ .

Figure 7. Stroke recursive merging.

2.4. Character labelling

Figure 6. Temporal stroke re-organization.

2.3. Segmentation Hand-printed characters can contain several strokes. These strokes must have to be merged in order to form one given character. The segmentation expert is a recursive merging process which regroups consecutive strokes whose bounding boxes overlap according to a set of topological rules described in. The depth of the recursivity is four, because the letter containing the greatest number of strokes is the capital  composed of four strokes. Figure 7 shows the result of the stroke merging. Let consider two strokes ½ and ¾ with respectively bounding box ( ½ , ½ ,  ½ ,  ½ ) and ( ¾ , ¾ ,  ¾ ,  ¾ ). Assuming that ½ ¾ , the rules of fusion are: complete cover up:

¾

left offset:

¾ 

½



½ Æ

¾

and

½



At this stage, we have enough information to know if strokes belong to a letter or if they are punctuations or accentuations. A first labelling is done on bounding boxes observing the location of these latter respect to the basis lines. Strokes whose box is above the body line are diacritical and strokes descending under the baseline are punctuations, therefore it is not necessary to classify them. This stage allows to distinct three different labellings : letter (to be classified) punctuation (put aside) diacritical (put aside)

2.5. Baseline computation and letter labelling Knowing the character position against the basis lines is one more piece of information which allows to assume the character class. Thanks to their shape, characters can be clustered in several subsets :

Figure 8. Character labelling. medium letters : a, e, i big letters : upper case letters, digits and ascending : b, d, h

Figure 10. Blank detection. Inter-letter and inter-word thresholds.

3. Classification

descending : g, j, p ascending and descending : f Thanks to this clustering, it should be easy to activate or inhibit the corresponding classification experts (see 3) and to increase the classifier speed and accuracy even more. This leads us to compute as precisely as possible the basis and body lines and it is why we have selected broken base and body lines instead of straight lines. First, medium characters are isolated. Characters are merely filtered considering the height of the bounding boxes and the medium height  (see 2.1). Then, a broken basis line is obtained by linking with straight lines the bottom of the filtered bounding boxes. In the same way the broken body line links the top of the bounding boxes. Remaining characters, are flagged as ascending or descending comparing the location of the bounding box in relation to basis lines.

The classifier uses prototypes of allograph (from MDCA2 clustering [6]) and k-nearest-neighbor classification to take its decision. It deals with 62 classes, respectively upper case and lower case letters and digits. With its modelizing structure, each class has its own classification expert (figure 11). Thus it is possible to activate or inhibit a set of non-relevant experts when needed[4]. At the end of the classification process, the three more relevant classes (the so called top1, top2 and top3) are retained to be used during the lexical post-processing.

Figure 11. Classifier structure.

Figure 9. Broken baseline and bodyline.

2.6. Blank detection As post-processing is a lexical correction, it is necessary to isolate words by finding the blank separating them. In order to do this, we focus on spaces separating the bounding boxes. We can compute two thresholds : the first one (  : inter-letter) beneath which the blank is considered as splitting two letters and a second called (  : inter-word) above which the blank splits two words. Blanks situated between these two thresholds are hypothetical (figure 10). The ambiguousness will be cancelled during to the lexical post-processing stage.

Classes 62

Prototypes 2 944

Test 34 439

Top1 96.8 %

Top2 99.0 %

Top3 99.4 %

Table 1. Classifier recognition rates (Unipen Train R01-V07).

4. Lexical correction Previous steps offered pieces of information on classifying results, blanks, punctuation and diacritical. Blanks are used to emit word hypotheses using top1, top2 and top3 letters. These hypothesis are validated or corrected using a French dictionary of approximate 200 000 words. For an unknown word, a 30-nearest-neighbor search is performed

on the dictionary and we choose the word which minimizes the computed score of the word and the top3. If hypothetical blank appears, a straightforward merging rule is implemented : the blank is suppressed if the score (the smallest Euclidean distance between the hypothesis and the lexicon words) of the merged word is less or equal than the sum of the scores of the separate words. Two consecutive words ½ and ¾ will be merged if they verify the relation below:

 ½  ¾    ½    ¾ 

5. Conclusion and future works The system has been tested on a database of about 1 000 words (about 5 000 letters) from 10 different writers. The following results have been obtained : word recognition 86 % and character recognition 97,2 %. Mistakes are shared out between the different experts as follow: segmentation 1,75 %, activation 2,17 %, classification 3,5 %. The lexical post-processing corrects 63,3 % of the false letters and 77,1 % of the hypothetical spaces. Table 2 shows the results of several cursive word recognizers as a reference. Text recognition results are encouraging; They should be improved by using a writer adaptation expert. Which is based on the detection of writer’s morphological invariants [3]. MDCA clustering is applied on all the text characters. Subsets are labelled using the recognition results. The new prototypes are stored in the writer’s personal dataset. In that way, the system keeps its writer-independent properties and increase its writer-dependent accuracy. Systems [Anquetil & al. 97] [Kosmala & al. 99] [Wimmer & al. 99] Oudot & al. 2000

Words 7 896 2 000 8 781 1 054

Writers 7 3 9 10

Lexicon size 6 915 200 000 20 200 188 795

Recognition rate 84.0 % 90.6 % 89.1 % 86.0 %

Table 2. Classifier recognition rates (Unipen Train R01-V07).

References [1] L. G. Author: Anquetil E. Perceptual model of handwriting drawing application to the handwriting segmentation problem. ICDAR’97, (2), 1997. [2] L. E. . S. C. Author: Cote M., Cheriet M. Automatic reading of cursive scripts using human knowledge. ICDAR’97, pages 107–111, 1997. [3] R. G. Author: Kosmala A., Willet D. Advanced state clustering for very large vocabulary hmm-based on-line handwritting recognition. ICDAR’99, pages 442–445, 1999. [4] P. T. . L. Y. Author: Nosary A., Heutte L. Definig writer’s invariants to adapt recognition. ICDAR’99, 1999.

[5] M. M. Author: Prevost L. Automatic allograph selection and multiple expert classification for totally unconstrained handwritten character recognition. ICPR’98, (1), 1998. [6] M. M. Author: Prevost L. Reconnaissance automatique de l’´ecriture scripte en mode omni-scripteur : un premier pas dans la conception d’un analyseur d’´equations. CIFED’98, 1998. [7] M. M. Author: Prevost L. Modelizing character allographs in omni-scriptor frame : a new non-supervised algorithm. Pattern Recognitoin Letters, 21(4), 2000. [8] D. B. . G. P. Author: Wimmer Z. Distionary pre-selection in a neuro-markovian word recognition system. ICDAR’99, 1999.