Using top n Recognition Candidates to Categorize On-line Handwritten Documents
Sebastián , Emmanuel et Christian {sebastian.pena-saldarriaga, emmanuel.morin, christian.viard-gaudin}@univ-nantes.fr (†) LINA Université de Nantes, (‡) IRCCyN École Polytechnique de l'Université de Nantes † Peña Saldarriaga
† Morin
1 Introduction Archival & retrieval of on-line handwriting Particular interest for text categorization (TC) TC attempts to derive information from text Recognition is a necessary eort
Noisy text categorization
Top 1 recognition
SVM
Recognition Errors
top 1 recognition.
2 Intuitive Idea
... as n increases
Char. level rec. Accuracy improves
Use top n (n > 1) recognition candidates Greater probability of having the correct word
n 1 5 10 15 Word level rec. 22.08% 15.73% 13.41% 12.04% Char. level rec. 52.48% 37.28% 33.09% 31.02% Recognition rates for dierent n However, the text is ooded with false occurrences of words. We then redene the standard tf × idf weighting scheme.
Weighting top n candidates 3 w i = qP M
& the candidate-term
frequency (ctf ) by
N
candidate lists in which
i
i
This work is funded by La Région Pays de la Loire under the MILES Project and by The French NaTLOG-009.
in
occurs
Acknowledgments tional Research Agency grant number ANR-06-
1 2 3 4 5
n
10
Redenition of the tf × idf weighting Based on probabilities of recognition candidates
The bad
However, it is uneective with word level recognition. Needs further experimental validation.
Thresholding/rejection strategies on candidates Track & type recognition errors
The sum of the probabilities of the
80%
What has been accomplished ?
What's next ?
2 (ctf (j) × idf (i)) j=1
85%
6 Conclusion
A simple idea that yields interesting results on heavily degraded texts with two dierent classiers.
ctf (i) × idf (i)
svm/word kppv/word svm/char kppv/char
90%
... ∀ n > 1 With both algorithms
The good
The weight w of a term i is given by
n=1 pn(i)
With 1 or n recognition candidates
Word level rec. Accuracy decreases
...
ctf (i) =
Experiments Comparing kNN & SVM With word & char. level recognition
5 Results
kNN
PN
On-line handwritten data Reuters-21578 corpus 2,000 samples 10 categories
trajet, d'offrir aux flâneurs des tro oirs plus larges pouvant accueillir des terrasses ... centrale ainsi que des nouveaux tro oirs, sur une surface de 60.000 mètres.
moted on-line handwriting signals, produced with such de- vices, as a new .... the errors rates of the test set documents are reported below. Figure 3 shows the ... The top-candidate recognition is already good enough for categorization with a ...
mentclass. But there would be other documentclasses like book 2, report 2 and letter 2 which are described in Section 2. Finally, Section 3 gives the conclusions.
Input and output d. Looking Under the ... a subprogram or a function is a collection of statements that ... there exists predefined identifiers, such as cout and cin.
inside 2-1/8" (54mm) hole through pilot hole until nail makes indentation in door jamb (E). 5. Drill 1" (25mm) diameter hole in door edge. Drill hole 2" (51mm).
numéro de téléphone sans frais. Installer la gâche (E). ... consumer service para los tornillos de una via (no ... or the toll free number. 6. 8 www.kwikset.com. (D).
La Direttiva concernente i Rifiuti di apparecchiature elettriche ed elettroniche (RAEE) prevede che tutti i nostri prodotti che presentano questo simbolo (a ...
U.S.: 1-800-289-6636. Monday - Friday 8:00 a.m. to 8:00 p.m. EST. Saturday 8:00 a.m. to 6:30 p.m. EST. Or e-mail us at: [email protected]. Be sure to ...
réglementations en matière de construction. En cas de doute, consultez un ... statutory rights in your country of purchase. Enlite reserves the right to alter ...
Website: www.enlitelighting.com ... considerazione delle più recenti Normative edilizie (Building Regulations). .... Should this product fail during the guarantee period it will be replaced free of charge, subject to correct installation and return