Handwritten Documents Text Line Segmentation based on Information Energy

Authors

  • Costin Anton Boiangiu ”Politehnica” University of Bucharest, Romania, 060042 Bucharest
  • Mihai Cristian Tanase VirtualMetrix Design, Romania, 060104 Bucharest
  • Radu Ioanitescu ”Politehnica” University of Bucharest, Romania, 060042 Bucharest

Keywords:

text line segmentation, text recognition, information energy, OCR

Abstract

The first step in the text recognition process is represented by the text line segmentation procedures. Only after text lines are correctly identified can the process proceed to the recognition of individual characters. This paper proposes a line segmentation algorithm based on the computation of an information content level, called energy, for each pixel of the image and using it to execute the seam carving procedure. The algorithm proposes the identification of text lines which follow the text more accurately with the expected downside of the computational overhead.

Author Biography

Costin Anton Boiangiu, ”Politehnica” University of Bucharest, Romania, 060042 Bucharest

PhD Eng., Associate Professor, Computer Science Departament, Faculty of Automatic Control and Computers

References

dos Santos, R.P. et al (2009), Text Line Segmentation Based on Morphology and Histogram Projection, Document Analysis and Recognition (ICDAR), pp. 651- 655.

Saha, S. et al (2010), A Hough Transform based Technique for Text Segmentation, Journal of Computing, vol. 2, no. 2. Arivazhagan, M. et al (2007), A Statistical approach to line segmentation in handwritten documents, Proceedings of SPIE.

Strand, L. et al (2007), Minimal Cost-Path for Path-Based Distances, Image and Signal Processing and Analysis, pp. 379-384.

Avidan, S. et al (2007), Seam Carving for Content-Aware Image Resizing, ACM Siggraph, article 10.

Saabni, S. et al (2001), Language-Independent Text Lines Extraction Using Seam Carving, Document Analysis and Recognition (ICDAR), pp. 563-568.

Papavassiliou, V. et al (2010), Handwritten document image segmentation into text lines and words, Pattern Recognition, vol. 43, no 1, pp. 369-377. http://dx.doi.org/10.1016/j.patcog.2009.05.007

Tripathy, N.; Pal, U. (2004), Handwriting segmentation of unconstrained Oriya text, Frontiers in Handwriting Recognition, pp. 306-311.

Kennard, D.J., Barrett, W.A. (2006), Separating Lines of Text in Free-Form Handwritten Historical Documents, Document Image Analysis for Libraries, pp. 12-23.

Asi, A. et al (2011), Text Line Segmentation for Gray Scale Historical Document Images, Proceedings of the 2011 Workshop on Historical Document Imaging and Processing, 120-126 http://dx.doi.org/10.1145/2037342.2037362

Bar-Yosef, I. (2005), Input sensitive thresholding for ancient Hebrew manuscript, Pattern Recognition Letters, vol. 26, no. 8, pp. 1168-1173. http://dx.doi.org/10.1016/j.patrec.2004.07.014

Bar-Yosef, I. et al (2009), Line segmentation for degraded handwritten historical documents, Document Analysis and Recognition, pp. 1161-1165.

Published

2014-01-03

Most read articles by the same author(s)

Obs.: This plugin requires at least one statistics/report plugin to be enabled. If your statistics plugins provide more than one metric then please also select a main metric on the admin's site settings page and/or on the journal manager's settings pages.