Scientific and Technical Journal

ELECTROTECHNIC AND COMPUTER SYSTEMS

ISSN Print 2221-3937
ISSN Online 2221-3805
DEVELOPING A TOOL FOR TEXTS WITH HETEROGENEOUS STRUCTURE PROCESSING
Abstract:

This article presents our approach in the elaboration of the system for processing Romanian unstructured text data. The project aims to elaborate the SoFTcrates tool, a software system for processing unstructured text data in order to create structured data output as computer linguistics resources. We described some mathematical aspects in text representation and presented some stages in unstructured text data processing. Also, the interface of the application is illustrated. In the future we will try to implement mechanisms of diversification of the founded words by means of derivation and WordNet semantic net. More over we will optimize the interface to have the possibility to find not only by a single word, but also by several words that the user consider more relevant to the text.

Authors:
Keywords
DOI
References
  1. Pankratova N.D., and Dozirtsiv V.O., (2016), Application of Methods for text Analysis of the Emotional tone to Identify Social Disasters, In System analysis and Information Technology: 18-th International Conference SAIT 2016, Kiev, Ukraine, May 30 – June 2, 2016, Proceedings. 38 p.
  2. Bolea C., (2015), Vocabulary, Synonyms and Sentiments of Hazard-related Posts on Social Networks. An Analysis for Romanian Messages, Proc. IEEE Conf. SPED 2015, Bucharest, Oct. 2015.
  3. Simionescu R., (2011), Hybrid POS Tagger, Proceedings of “Language Resources and Tools with Industrial Applications” Workshop (Eurolan 2011 Summer School), Cluj-Napoca, Romania, pp. 21 – 28.
  4. Petic M., and Raciula L., (2014), Computer Based Identification of Lines with Romanian Chromatic Words from Poems, In: Electrotechnic and Computer Systems Journal, No. 13 (89), Section Systems of Artificial Intelligence,  Odessa, Ukraine, pp. 114 – 119.
  5. Petic M., and Gîfu D., (2014), Transliteration and Alignment of Parallel Texts from Cyrillic to Latin, In: Proceedings of LREC-2014, Reykjavik, Iceland, 26 – 31 May 2014, pp. 1819 –1823.
  6. Petic M., Chilat S., Dumbraveanu R., and Osoian E.,  (2015), Methods of Processing Unstructured Texts, Book of Abstract CAIM, Suceava, Romania, 74 p.
  7. Wang D., and Zhang H., (2013), Inverse-Category-Frequency Based Supervised Term Weighting Schemes for Text Categorization,  Journal of Information Science and Engineering, Vol. 29, pp. 209 – 225.
  8. Turney P.D., and Pantel P., (2010),  From Frequency to Meaning: Vector Space Models of Semantics, Journal of Artificial Intelligence Research, Vol. 37, pp. 141 – 188.
  9. Egghe L., and  Rousseau R., (1998), Topological Aspect of Information Retrieval, Journal of the American Society for Information and Technology, Vol. 49, No.13 pp. 1144 – 1160.
  10. C. J. van Rijsbergen, (2004), The Geometry of Information Retrieval, Cambridge University Press, Cambridge. 
Published:
Last download:
2017-11-16 11:31:54

[ © KarelWintersky ] [ All articles ] [ All authors ]
[ © Odessa National Polytechnic University, 2014. Any use of information from the site is possible only under the condition that the source link! ]
Яндекс.Метрика