Estimation of the Text Skew in the Old Printed Documents

Darko Brodic, Čedomir A. Maluckov, Liangrui Peng

Abstract


Old printed documents represent the significant part of our heritage. In order to preserve them, the digitalization is indispensable. The paper proposed a robust skew estimation method for old printed document. It is based on the connected components made by filled convex hulls around text element. The connected components are enlarged by oriented morphological operation. Then, the longest connected component is extracted. The global orientation of the document is detected by its orientation. Accordingly, document image was globally de-skewed. The algorithm is tested on synthetic and real datasets. Obtained results proved the algorithms
correctness.


Keywords


convex hull; document image analysis; moment methods; optical character recognition; skew adjustment; vertical projection profiles.

Full Text:

PDF

References


Amin, A.; Wu, S. (2005); Robust Skew Detection in Mixed Text/Graphics Documents, Proc.of 8th ICDAR, Seoul, Korea, 247-251.

Manmatha, R.; Srimal, N. (1999); Scale Space Technique for Word Segmentation in Handwritten Manuscripts, Proc. of 2nd ICSSTCV, LNCS 1682, London, Great Britain, 22-33.

O'Gorman, L. (1993); The Document Spectrum for Page Layout Analysis, IEEE Trans Pattern Anal Mach Intell, ISSN 0162-8828, 15(11): 1162-1173.

Louloudis, G.; Gatos, B.; Pratikakis, I.; Halatsis, C. (2008); Text Line Detection in Handwritten Documents, Pattern Recognition, ISSN 0031-3203, 41(12): 3758-3772.

Postl,W. (1986); Detection of Linear Oblique Structures and Skew Scan in Digitized Documents, Proc. of 8th ICPR, Paris, France, 687-689.

Yan, H. (1993); Skew Correction of Document Images Using Interline Cross-Correlation, CVGIP: Graphical Models and Image Processing, ISSN 1049-9652, 55(6): 538-543.

Brodić, D.; Milivojević, Z.N. (2013); Log-polar Transformation as a Tool for Text Skew Estimation, Elektronika Ir Elektrotechnika ISSN 1392-1215, 19(2): 61-64.

Saragiotis, P.; Papamarkos, N. (2008); Local Skew Correction in Documents, Int J Pattern Recognit Artif Intell ISSN 0218-0014, 22(4): 691-710.

Makridis, M.; Nikolau, N.; Papamarkos, N. (2010); An Adaptive Technique for Global and Local Skew Correction in Color Documents, Expert Syst Appl, ISSN 0957-4174, 37(10): 6832-6843.

Otsu, N. (1979); A Threshold Selection Method from Gray-level Histograms, IEEE Trans Sys, Man, Cyber, ISSN 0018-9472, 9(1): 62-66.

Chen, Kuo-Nan; Chen, Chin-Hao; Chang, Chin-Chen (2012); Efficient Illumination Compensation Techniques for Text Images, Digit Signal Prog ISSN 0165-1684, 22(5): 726-733.

Brodić, D.; Milivojević, D.R. (2012); An Algorithm for the Estimation of the Initial Text Skew, Inf Technol Control, ISSN 1392-124X, 41(3): 211-219.

Brodić, D. (2011); The Evaluation of the Initial Skew Rate for Printed Text, J Electr Eng, ISSN 1335-3632, 62(3): 142-148.

Kapogiannopoulos, G.; Kalouptsidis, N. (2002); A Fast High Precision Algorithm for the Estimation of Skew Angle Using Moments, Proc. of SPPRA, Crete, Greece, 275-279.

Zramdini, A.; Ingold, R. (1993); Optical Font Recognition from Projection Profiles, Electronic Publishing, ISSN 0194-4851, 6(3): 249-260.

Brodić, D.; Milivojević, D.; Tasić, V.; Milivojević, Z. (2013); Identification of the Global Text Skew Based on the Convex Hulls, Proc. of MIPRO, Opatija, Croatia, 1282-1286.

Identification of the Global Text Skew Based on the Convex Hulls, Proc. of MIPRO, Opatija, Croatia, 1282-1286.




DOI: https://doi.org/10.15837/ijccc.2013.5.377



Copyright (c) 2017 Darko Brodic, Čedomir A. Maluckov, Liangrui Peng

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

CC-BY-NC  License for Website User

Articles published in IJCCC user license are protected by copyright.

Users can access, download, copy, translate the IJCCC articles for non-commercial purposes provided that users, but cannot redistribute, display or adapt:

  • Cite the article using an appropriate bibliographic citation: author(s), article title, journal, volume, issue, page numbers, year of publication, DOI, and the link to the definitive published version on IJCCC website;
  • Maintain the integrity of the IJCCC article;
  • Retain the copyright notices and links to these terms and conditions so it is clear to other users what can and what cannot be done with the  article;
  • Ensure that, for any content in the IJCCC article that is identified as belonging to a third party, any re-use complies with the copyright policies of that third party;
  • Any translations must prominently display the statement: "This is an unofficial translation of an article that appeared in IJCCC. Agora University  has not endorsed this translation."

This is a non commercial license where the use of published articles for commercial purposes is forbiden. 

Commercial purposes include: 

  • Copying or downloading IJCCC articles, or linking to such postings, for further redistribution, sale or licensing, for a fee;
  • Copying, downloading or posting by a site or service that incorporates advertising with such content;
  • The inclusion or incorporation of article content in other works or services (other than normal quotations with an appropriate citation) that is then available for sale or licensing, for a fee;
  • Use of IJCCC articles or article content (other than normal quotations with appropriate citation) by for-profit organizations for promotional purposes, whether for a fee or otherwise;
  • Use for the purposes of monetary reward by means of sale, resale, license, loan, transfer or other form of commercial exploitation;

    The licensor cannot revoke these freedoms as long as you follow the license terms.

[End of CC-BY-NC  License for Website User]


INTERNATIONAL JOURNAL OF COMPUTERS COMMUNICATIONS & CONTROL (IJCCC), With Emphasis on the Integration of Three Technologies (C & C & C),  ISSN 1841-9836.

IJCCC was founded in 2006,  at Agora University, by  Ioan DZITAC (Editor-in-Chief),  Florin Gheorghe FILIP (Editor-in-Chief), and  Misu-Jan MANOLESCU (Managing Editor).

Ethics: This journal is a member of, and subscribes to the principles of, the Committee on Publication Ethics (COPE).

Ioan  DZITAC (Editor-in-Chief) at COPE European Seminar, Bruxelles, 2015:

IJCCC is covered/indexed/abstracted in Science Citation Index Expanded (since vol.1(S),  2006); JCR2018: IF=1.585..

IJCCC is indexed in Scopus from 2008 (CiteScore2018 = 1.56):

Nomination by Elsevier for Journal Excellence Award Romania 2015 (SNIP2014 = 1.029): Elsevier/ Scopus

IJCCC was nominated by Elsevier for Journal Excellence Award - "Scopus Awards Romania 2015" (SNIP2014 = 1.029).

IJCCC is in Top 3 of 157 Romanian journals indexed by Scopus (in all fields) and No.1 in Computer Science field by Elsevier/ Scopus.

 

 Impact Factor in JCR2018 (Clarivate Analytics/SCI Expanded/ISI Web of Science): IF=1.585 (Q3). Scopus: CiteScore2018=1.56 (Q2); Editors-in-Chief: Ioan DZITAC & Florin Gheorghe FILIP.