Scientific and Technical Journal


ISSN Print 2221-3937
ISSN Online 2221-3805

The purpose of this paper is developing the intellectual system of text detection and characters recognition in photographs and video of complex graphic scenes. The system includes two main parts, each of which is made on the basis of individual convolutional neural network. Subproblems, which were solved when solving text recognition problem on image, are text field location on the image, character segmentation in the text fields, and character recognition. Procedure of text field location on the image is based on two-stage scheme. In the first stage, gradient methods were used to analyze intensity drops in local areas color image (RGB) and select areas of the image, which can be text information. In the second stage, the classifier, which is built on the basis of convolution neural network with multi-scale image representation on discrete wavelet transform, is used to refine and estimate the probability of belonging text of each pixel of text fields (result in the first stage). After learning network, the text/non-text classification accuracy of the learning sample amounted to 99.3 %, and of the control sample - 77.7 %. Character segmentation will be conducted in three stages (row selection, segmentation of words, character segmentation), improved the quality image of segment characters using morphological operations (e.g. noise removal, other objects that are not associated with a symbol or have no borders), and also refined its borders. Character recognition algorithm is built on the basis of convolutional neural network. Error back-propagation algorithm was selected to learn network. After learning network, the character recognition accuracy of the learning sample was 96.88%, and of the control sample - 93%. Experimental verification of the proposed solutions confirmed their ability to detect text and recognize character in the images in conditions of complex graphic scenes, when there are many non-textual objects (e.g. people, fragments of the house …). Further improve the quality of intellectual functioning text systems can be achieved using a linguistic correction of recognized texts.

  1. Andrianov A.I. Lokalizacija teksta na izobrazhenijah slozhnyh graficheskih scen [Text Localization in Images of Complex Graphic Scenes], (2013),Sovremennye Problemy Nauki i Obrazovanija, Vol. 3(InRussian). URL: (accessed: 27.01.2014).
  1. NikolenkoA.A., andNguenTienT.K. Obnaruzhenietekstovyhoblastejvvideo-posledovatel'nostjah [TextRegionsDetectioninVideoFrames], (2012), IskusstvennyjIntellectPubl.,Vol. 4, – pp. 227 – 234 (InRussian).
  2. NikolenkoA.A., BabilungaO.Ju., andNguenTienT.K. Lokalizacijatekstovyhob-lastejnaizobrazhenijahsispol'zovaniemsvertochnojnejronnojseti[LocalizationoftheTextAreaontheImagesUsingaConvolutionNeuralNetwork], (2013), VestnikNacional'nogo Politehnicheskogo universiteta “HPI”Publ., Vol. 19 (992), pp. 121 – 127(InRussian).
  3. Danial Md Nor, Rosli Omar, M. Zarar M. Jenu, and Jean-Marc Ogier. Image Segmentation and Text Extraction: Application to the Extraction of Textual Information in Scene Image, (2011),International Seminar on Application of Science Mathematics 2011 (ISASM2011)(In English). URL:
  4. Frank Y. Shin, (2010), Image Processing and Pattern Recognition: Fundamentals and Techniques)Hoboken, New Jersey, Wiley-IEEE Press(In English)
  5. Changan Han. Neural Network Based Off-line Handwritten Text Recognition System, (2011), FIU Electronic Theses andDissertations, 363p. (In English). URL:
  6. Simadr P., Steinkraus D., and Platt J. Best Practices for Convolutional Neural Networks Applied to Visual Document Analysis, (2003), International Conference on Document Analysis and Recognition (ICDAR), IEEE Computer Society, Los Alamitos, pp. 958 – 962 (In English).
  7. Mirovski P., LeCun Y., Madkhavan D., and Kujneskii R. Comparing SVM and Convolutional Networks for Epileptic Seizure Prediction from Intracranial EEG, (2008). Proceeding. Machine Learning and Signal Processing (MLSP'08), IEEE, 2008 (In English) URL: (accessed 01.03.2014)  
  8. Ebrahimpour R., Esmkhani A., and Faridi S. Farsi Handwritten Digit Recognition Based on Mixture of RBF Experts, (2010),IEICE Electronics Express, Vol. 7, No. 14, p. 1014 –1019(In English).
  9. Khashman A.A Modified Backpropa-gation Learning Algorithm With AddedEmotional Coefficients, (2008), IEEE Transactions on Neural Networks, Vol. 19, No.11, November 2008(In English).
  10. LeCun Y, Bottou L., OrrG., and K. Muller. Efficient BackProp, (1998)(In English). URL: (accessed 01.03.2014). 
Last download:
21 Oct 2018

[ © KarelWintersky ] [ All articles ] [ All authors ]
[ © Odessa National Polytechnic University, 2014-2018. Any use of information from the site is possible only under the condition that the source link! ]