A Simple Skew Angle Estimation Technique for Binary ... - Core

documents which is based on Linear Regression. Analysis. ... Recognition (OCR) systems and Document Image. Mosaicing .... Consider (xi, yi) is a point from a set of samples. {(xi, yi)/ i = 1, .... gives the mean (M) and standard deviation (SD) of.
83KB taille 4 téléchargements 272 vues
Proceedings: National Workshop on IT Services and Applications (WITSA2003) Feb 27-28, 2003

Skew Estimation of Binary Document Images Using Static and Dynamic Thresholds Useful for Document Image Mosaicing *P.Shivakumara, G. Hemantha Kumar, D. S Guru, P. Nagabhushan Department of Studies in Computer Science, University of Mysore, Manasagangotri

This paper presents a computationally efficient procedure for skew detection in digitized text documents which is based on Linear Regression Analysis. The determination of the skew angle in text documents is essential in Optical Character Recognition (OCR) systems and Document Image Mosaicing (DIM). We use the Linear Regression formula to estimate a skew angle for each text line segment of the given skewed text document. The part of the text line is extracted using static and dynamic thresholds of projection profile based method. The proposed method is tested on variety of text documents and it provides good and accurate results. 1. INTRODUCTION Document image processing has become an increasingly important technology in making paper less office. Automatic document scanners such as text readers and Optical Character Recognition (OCR) systems are essential components of system capable of doing such tasks. One of the problems in this field is that the document to be read is not always placed correctly on a flat bed scanner. This means that the document may be skewed on the scanner bed, resulting in a skewed image. This skew has a detrimental effect on document analysis, document understanding, document image mosaicing, character segmentation and recognition. Consequently, detecting the skew of a document image and correcting it are important issues in realising a practical document reader [11]. Most of the Optical Character Recognition (OCR) systems are very sensitive to skew in text document images. The methods of OCR systems could not give hundred percent accuracy due to the problem of accuracy in detecting skew angle and in accurate noise removal methods [11]. Even as small as one degree of skew existing in the given text document image for recognition, results in failure of segmentation of complete characters from the words since the space between the characters, words and text lines is reduced. Similarly, Document Image Mosaicing methods fail to obtain a mosaiced image from its split images in presence of skew in one of the split images. This is because the methods work, based on Pattern Matching Approach (PMA). The PMA is obtained by generating the Strings of Column Sums (SCS) of the split images. The SCS is the string of sum of values of pixels present in each column of the split image. When both of the split images are skewed

by different angles there will be no match in SCS of these split images. Therefore mosaicing becomes impossible [1-4]. Hence, accuracy in finding a skew angle is also much important in the field of DIM and OCR rather than estimating skew detection for the skewed document. The excessive skew is preventable by human operator but mild skew (±100) is inevitable since our human vision system fails to identify that mild skew in a skewed document [22]. Several methods have been developed by many researches for estimating a skew angle for skewed text document which are based on i) Projection Profile ii) Hough Transform iii) Fourier Transform iv) Nearest Neighbour Clustering v) Interline Cross Correlation. In the Projection Profile method, a series of projection profiles are obtained at a number of angles close to the expected orientation and the variation is calculated for each of the profiles. The profile which gives maximum variation corresponds to the projection with the best alignment to the text lines and that projection angle is actual skew angle of the skewed document. Baird proposes this method and states that the skew angle should be limited to ±150 to achieve high accuracy [5]. But the accuracy depends upon the angular resolution of the projection profile. However, the method is time consuming and its accuracy reduces when the documents are noisy and containing a character fragments. (Nakano, 1990; Srihari and Govindaraj, 1989; and Hindus et al., 1990) proposed skew detection techniques based on Hough Transform (HT) [10], [14] and [9]. The HT maps each point in the original (x, y) plane to all points with (ρ, θ) Hough space of lines through (x, y) with slope θ and distance ρ from the origin where ρ = xCosθ + ySinθ for 0 ≤ θ < ∏ . The peak in the Hough space represents the dominant line and it’s skew. The major draw back of this method is that it is computationally expensive and is difficult to choose a peak in the Hough space when text becomes sparse. (Postl, 1986) proposed a method based on Fourier Transform (FT). In this method the direction for which the density of the Fourier space is the largest, gives the skew angle. It is found that Fourier method is computationally expensive for large images [12]. A bottom up technique for skew estimation based on Nearest Neighbour Clustering (NNC) is

__________________________________________________________________________________ Jamia Millia Islamia (A Central University), New Delhi-110025 India

1

Proceedings: National Workshop on IT Services and Applications (WITSA2003) Feb 27-28, 2003

described by (Hoshizume et al., 1986). In this work, nearest neighbours of all connected components are found, the direction vector for all nearest neighbour pairs are accumulated in a histogram and the histogram peak is found to obtain a skew angle. Since only one nearest neighbour connectivity is made for each component, connection with noisy sub parts of characters reduce the accuracy of the method [18]. (Yan, 1993) introduced a method for detecting a skew angle of an image using cross correlation between the text lines at a fixed distance. It is based on the observation that the correlation between vertical lines in an image is maximized for a skewed document, in general if one line is shifted relatively to the other lines such that the character base line levels for two lines are coincident. It is found that the proposed method is computationally expensive as well as it gives less accuracy [15]. A novel approach based on NNC is proposed by (Shivakumara. P et al., 2001). The method fixes the boundary for the character present in the text lines using contour following technique [7] and [8]. The direction vector of the text line with respect to horizontal axis is obtained by allowing the boundary of the character to grow until it reaches a pixel of neighbouring character. The direction of the growing boundary is guaranteed to be aligned with that of text lines of the space between the two successive characters belonging to two different lines. However, some times, in reality, the presence of upper case letters, dots in a text line may reduce the space between two successive text lines so that it is smaller than the space between the current character and neighbour and this affects the direction of the growing boundary. But this method is computationally inexpensive compared to HT and FT based methods [20]. Recently, the same authors have addressed a simple and efficient method to estimate a skew angle for skewed text document based on Linear Regression Analysis (LRA). The method considers all the black pixels present in the document without looking for the segmentation of individual text lines. The method gives better accuracy up to ±100 for small documents. Due to the problem of handling large values which are obtained by formula, with limited variable memory reduces the accuracy of the method. In addition to this the proposed method becomes unreliable when the document contains more number of text lines and is skewed by more than ±100. This problem can be overcome by segmenting the text lines from the skewed text document [21]. From the above literature, it is revealed that, the methods give accuracy but they are computationally expensive. Some methods are computationally in expensive but they give less accuracy. Hence there is a lot of scope for developing methods to find out accurate skew angle for skewed document with minimum expense.

In this paper, we propose a simple and computationally efficient algorithm to estimate a skew angle of scanned document image using linear regression formula. This method uses the static and dynamic thresholds to segment the text lines from the skewed text document. The method works even if the documents are skewed by 20 and 30 degree. However, the methods assume that the space is present between the two text lines. The paper is organized into 4 sections. The proposed methodology is described in section 2. The results and comparative study made are given in section 3. And section 4 concludes paper. 2. PROPOSED METHODOLOGY In this section, we propose simple and efficient methods to estimate a skew angle for skewed document based on Regression line. We determine the equation of best fit for a line y = A + Bx where the coefficients A and B are computed using the following formula [23]. n   n  n n ∑ xi yi −  ∑ xi  ∑ yi    i =1  i =1 B = i =1 2 n   2 n ∑ xi −  ∑ xi   i =1 

Consider (xi, yi) is a point from a set of samples {(xi, yi)/ i = 1, 2 .. n)}where the xi and yi’ are the coordinate values of all black pixels of the segmented text line and n is the number of black pixels in the text lines present in the document. We find the slope of text line by substituting these coordinate values in equation (B). The skew angle of the text line is estimated using the formula θ = tan−1 B where B represents the slope. The methods assume that the space is available between the text lines in the document. The following section is divided into two sections, skew estimation using static threshold and using dynamic threshold. In the first, method we split the given skewed text document vertically till segment all the text lines present in the document. In the second method the threshold varies dynamically from one text line to other text line.

( )

2.1 Skew Estimation using Fixed Threshold In this section, we present an algorithm for segmentation of part of text lines based on projection profiles. The profiles are generated for given skewed text document using static threshold. If the document is not skewed then we get valleys and peaks in profiles. The valleys indicate that there is a clear separation between the text lines. The projection profiles for the given document are generated based on the String of Row Sums (SRS) of an image. The SRS is the sum of values of pixels present in each row of the image. The valleys are found in profiles if SRS contains zeros. The valleys are not found if SRS does not contain zeros. The

__________________________________________________________________________________ Jamia Millia Islamia (A Central University), New Delhi-110025 India

2

Proceedings: National Workshop on IT Services and Applications (WITSA2003) Feb 27-28, 2003

part of the text line is extracted from the text line with the help of such valleys. The part of the text line is sufficient to estimate the skew angle of skewed document. This is possible because, the text lines, words and characters are oriented in particular direction if the document is skewed or not skewed. If the document is skewed then we would not get valleys and peaks in the profiles. In such cases, divide the whole document vertically till we get valleys in the profiles with the help of fixed threshold value. At some stage we get valleys in the profiles for each text line which indicates that there is a space between the text lines. This is depicted in the Fig. 1. Using the linear regression formula the method computes the skew angle for all text lines by considering all black pixels of segmented text line. The average of all skew angles gives actual skew angle of skewed document. Here, the part of the text line reduces as the skew angle of skewed document increases. Due to this reason the method loses its accuracy when the document is skewed about 300 to 400.

Fig. 1. The document is skewed (left) and the text lines are segmented using static threshold (right) Static threshold begins Input: skewed document Output: skew angle Method: Step 1: Get the skewed binary image Step 2: Compute the row sum Step 3: Scan the image in column wise until the row sum is equal to zero. Once when the row sum becomes equal to one, threshold value is taken as the pervious column value Step 4: Apply the threshold to the whole document Step 5: Apply the Least square fit to each line and compute the slope using the formula B = (n ∑ xi yi - ∑ xi ∑yi) / (n ∑ xi2 – (∑ xi)2) Where (x, y) → co-ordinate value of the pixel. n → number of black pixels present in the text line. Step 6: Compute the skew angle using the formula θ = tan-1(B) Step 7: Do step 6 and 7 for all the text lines Step 8: Compute average of the skew angles of all the text lines Step 9: Stop Static threshold ends

2.2 Skew Estimation using Dynamic Threshold This method is similar to the above procedure except in fixing threshold value. The threshold value is not fixed but it varies dynamically depending upon the text line. As soon as we get valleys in the profiles we segment the text line in order to estimate a skew angle of that text line using the linear regression formula (refer Fig. 2.). The linear regression formula uses all black pixels of segmented text line to estimate a skew angle. This procedure is repeated for all text lines. Finally the average of all skew angles gives actual skew angle of skewed document. Fixing the threshold dynamically results in more accuracy.

Fig. 2. The document is skewed (left) and the text lines are segmented using dynamic threshold (right) Dynamic threshold begins Input: skewed document Output: skew angle Method: Step 1: Get the skewed binary image Step 2: Compute the row sum Step 3: Scan the image in column wise until the row sum is equal to zero. Once when the row sum becomes equal to one, threshold value is taken as the pervious column value Step 4: The threshold value is fixed dynamically depending upon the text line Step 5: Least square fit is applied to two lines and compute the slope using the formula B = (n ∑ xi yi - ∑ xi ∑yi) / (n ∑ xi2 – (∑ xi)2) Where (x, y) → co-ordinate value of the pixel. n → number of black pixels present in the text line. Step 6: Compute the skew angle using the formula θ = tan-1(B) Step 7: Repeat step 6 and 7 till completion of all text lines Step 8: The average of all skew angles of all text lines gives the actual skew angle. Step 9: Stop Dynamic threshold ends 3. Experimentation and Comparative Study For the experimentation we have considered more than hundred document images from different books, magazines and journals. Out of them a few are presented here. The documents are tested by pre

__________________________________________________________________________________ Jamia Millia Islamia (A Central University), New Delhi-110025 India

3

Proceedings: National Workshop on IT Services and Applications (WITSA2003) Feb 27-28, 2003

F

E

D

C

B

A

1.24

29.43

0.49

1.22

9.99

0.34 0.33

3.05

5.02

20.6

1.24 0.45 0.28

0.30

1.22

29.69 9.9 3.04

5.00

20.8

0.37 0.23 0.33

0.26

0.34

29.85 9.96 3.41

5.03

19.52

0.31 0.21 0.22

0.43

0.36

30.16 10.05 2.88

5.18

19.66

0.96 0.84 0.53

0.68

1.13

29.40 19.25 10.67 5.24 2.80

0.73 0.07

0.15

0.77

0.55

M S D M S D M S D M S D M S D M S D 29.48 19.91 10.21 4.71 3.72

10 3

5

20

30

True angle

specified angle varying between 0 and 300. This angle is considered as true skew angle. The documents were subjected to both static and dynamic thresholds methods. The following table gives the mean (M) and standard deviation (SD) of the two proposed methods where E represents static threshold method and F represents dynamic threshold method. From the following table it is found that the methods give high accuracy up to 10 degree of skewed document but the methods fail to obtain the constant standard deviation for the 30 and 20 degree skewed documents. However, the methods achieve the accuracy of skew angle up to ±30 degree. Both the methods work fine and give better results than results of the of conventional methods (A, B, C and D). Table-1 Mean (M) and Standard Deviation (SD) of estimated skew angle obtained from different methods (for each true skew angle 20 images have been tested)

Table. 1. Comparative study of proposed methods A: Hough Transform over total image B: Hough Transform on pixels selected by the Li et al (1994) method C: Hough Transform on pixels selected by the Pal and Chadhury (1996) i,e pixels of L1 and L2 D: Quick method proposed by Pal and Chaudhry (1996) Note that the entries in the table -1 from A- D are chosen from the paper An improved document skew estimation technique by Pal and Chadhuri (1996)[7]for comparative study E: Method based on static threshold (our proposed method) F: Method based on dynamic threshold (our proposed second method)

The typical results are shown in the above table. We realized that if the methods involve the HT or FT to estimate a skew angle for the document then those methods are considered to be computationally expensive. However, they give high accuracy. But the proposed methods are simple computationally inexpensive because the proposed methods do not involve the HT or FT to estimate the skew angle for the skewed document. And the proposed methods do not label the connected components as in HT and FT and NNC based methods. However, the methods would not give accuracy as the degree of skewed document increases This is because the methods could not segment the full text lines from the skewed text document as degree of the skew increases.. This problem could be overcome with the help of one more algorithm but which is beyond the scope of this paper. 4. Conclusion In this paper, we discussed a novel method, to estimate the skew angle for the skewed document. This approach is based on fixing the threshold statically and dynamically in order to separate the text lines. This method proceeds with the assumption that there is space between text lines. These methods give accurate results for up to ±30° and the skew is computed by considering all the text lines in the document. It works well for the images of any size. The major disadvantage of this method is that it loses the accuracy for the documents having grater than the 30 degree. This is because it is unable to separate the text lines by dynamic threshold or fixed threshold method due to the white space becomes nil between the lines. ACKNOWLEDGEMENT The authors acknowledge the support extended by the students Kalashree and Rashmi of Department of Studies in Computer Science, University of Mysore, Manasagangotri, Mysore-570006. (Mr. Shivakumara .P ) wishes to thank the fellowship sponsoring agency AICTE, vide sanction number F.No/8020/RID/R&D – 50.2001-01, New Delhi, in pursuing his work. REFRENCES 1. Shivakumara et al., Document Image Mosaicing: A Novel Technique Based on Pattern Matching Approach. Proceedings of the National Conference on Recent Trends in Advanced Computing (NCRTAC-2001), Tamil Nadu, Feb 9-10, 2001, pp 01-08. 2. Shivakumara et al., Pattern Matching Approach based Image Sequencing useful for Document Image Mosaicing. Proceedings of the National Conference on Document

__________________________________________________________________________________ Jamia Millia Islamia (A Central University), New Delhi-110025 India

4

Proceedings: National Workshop on IT Services and Applications (WITSA2003) Feb 27-28, 2003

3.

4.

5.

6.

7.

8. 9.

10.

11.

12.

13.

14.

15.

Analysis and Recognition (NCDAR-2001), Mandya, Karnataka, July 13-14, 2001. Shivakumara et al., Mosaicing of Color Documents: A Technique based on Pattern Matching Approach. Proceedings of National Conference on NCCIT, Kilakarai, Tamilnadu, 24th and 25th September, 2001, pp 69-74. Shivakumara et al., Mosaicing of Scrolled Split Images Based on Pattern Matching Approach. Proceedings of Third National Conference on Recent Trends in Advanced Computing (NCRTAC – 2002), Tamil Nadu, Feb 13-15, 200 Baird H. S The skew angle of printed documents. In Proc SPSE 40th Symp. Hybrid Imaging Systems, Rochester, NY, May 1987, pp 739-743. Changming Sun and Deyi Si, Skew and Slant Correction for Document Images using Gradient Direction, Csiro Math and Information Science, Locked Bag 17, Australia North Ryde, 1999. Duda and Hart, Pattern Classification and Scene Analysis, A Wiley-Interscience Publications, 1973. Gonzalez et al., Digital Image processing, Addision-Wesley Publishing Company, 2000. Hindus, S. C et al., A Document Skew detection method using run-length encoding and the Hough transform. In Proc. International conference on Pattern Recognition, Volume I, 1990, pp 464-468 Nakano.Y, et al., An algorithm for the skew normalization of document image. In Proc. of International Conference on Pattern Recognition, Volume II, 1990, pp 8-13. Pal. U and B.B Chudhuri, An improved document skew angle estimation technique, Pattern Recognition Letters 17, Computer Vision and Pattern Recognition Unit, Indian Statistical Institute, Kolkatta, India, 1996, pp 899-904 Postl. W Detection of liner oblique structure and skew scan in digitized documents. In Proc. of International Conference on Pattern Recognition, 1986, pp 687-689 Splitz. L. Skew determination in CCIT Group 4 compressed document images. In Proc. Symp. Document Analysis and Information Retrieval, Las Vegas, Navada, USA, 1992, pp 11-25 Srihari. S. N and V. Govindaraju. Analysis of textual images using the Hough transform. Machine Vision Applications 2, 1989, pp 141153 Yan, H. Skew correction of document images using interline cross-correlation, Computer

16.

17.

18.

19.

20.

21.

22.

23.

Vision, Graphics, and Image Processing 55, 1993, pp 538-543 A.Amin and S. Fischer A Document Skew Detection Method Using the Hough Transform, Pattern Analysis and Applications, SpringerVerlag London Limited, 2000,3,pp 243-253. Huiye Ma and Zhenwei Yu, An Enhanced Skew Angle Estimation Technique for Binary Document Images, Beijing Graduate School of China University of Mining and Technology, Beijing, China, 1999. Hashizume et al., A method of detecting the orientation of aligned components. Pattern Recognition Letters 4, 1986, pp 125-132 Avanindra and Subhasis Chadhuri Robust Detection of Skew in Document Images, IEEE Transactions on image processing Vol.6, No.2, February, 1997. Shivakumara. P et al, Text-Skew Detection Through Contour Following in Document Image, Proceedings of National Workshop on Computer Vision, Graphics and Image Processing –WVGIP 2002, 15th and 16th of February, 2002, pp 39-44. Shivakumara P et al., Skew Detection in Binary Document Image using Linear Regression Analysis. Proceedings of National Conference on Advanced Computer Application – NCAC2002, NGM College, Pollachi, Tamil Nadu, October 11 and 12th , 2002, pp 41-46 Robert S. Caparai, Algorithm for Text Page Up/Down Orientation Determination , Pattern Recognition Letters, 21, 2000, pp 311-317 Kishor S Trivedi, Probability and Statistics with Reliability, Queuing and Computer Science Applications, Prentice’ Hall of India Private Limited, New Delhi, 1988

__________________________________________________________________________________ Jamia Millia Islamia (A Central University), New Delhi-110025 India

5