Scientific and Technical Journal


ISSN Print 2221-3937
ISSN Online 2221-3805

The purpose of the work is to define algorithms for the software system of scientific publications analysis, designed to identify research areas and groups of researchers with similar interests within the same university or faculty.

There are many algorithms for solving information extracting problems, but they have some disadvantages regarding the solved problem. Therefore, we developed a proprietary algorithm that consists of four steps: lexical analysis, terminals normalization, entities combining and filtering.

The results of information extracting are used to solve identification problems of authors groups and keywords groups considered as a clustering problem. The analyzed data are presented in the form of graphs of two types: a weighted graph of authors’ interactions and semantic graph of papers. This allows using for the analysis the clustering algorithms based on graph theory and algorithm of stochastic analysis MCL. An analysis of a test articles sample showed that clustering algorithms based on graph theory and algorithm of MCL identified the same clusters, but the algorithm that based on minimum spanning tree was better regarding computational complexity.

  1. Sarawagi S. Information Extraction, (2007), Foundations and Trends in Databases, Vol. 1, No. 3, pp. 261 – 377.
  2. Sarkar K., Nasipuri M., and Ghose S. (2010), A New Approach to Keyphrase Extraction Using Neural Networks International Journal of Computer Science Issues, Vol. 7, Issue 2, No. 3, pp. 16–25.
  3. Skounakis M., Craven M., and Soumya R., (2003), Hierarchical Hidden Markov Models for Information Extraction, IJCAI'03 Proceedings of the 18th International Joint Conference on Artificial Intelligence, pp. 427 –433.
  4. Zou W.Y., Socher R., Cer D.M., and Manning C.D., (2013), Bilingual Word Embeddings for Phrase-Based Machine Translation, Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pp. 1393 – 1398.
  5. Kanungo T., Mount D.M., Netanyahu N. S., Piatko C.D., Silverman R., and Wu A.Y., (2002), An Efficient k-means Clustering Algorithm: Analysis and Implementation, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 24, Issue 7, pp. 881 – 892.
  6. Tarjan R.E., (1983),   An Improved Algorithm for Hierarchical Clustering using Strong Components, Information Processing Letters, Vol. 17, Issue 1, pp. 37 – 41.
  7. Flakea G.W., Tarjan R.E., Tsioutsiouliklisc K., (2004), Graph Clustering and Minimum Cut Trees, Internet Mathematics, Vol. 1,  Issue 4, pp. 385 – 408.
  8. Schaeffer S.E., (2007), Graph Clustering, Computer Science Review, Vol. 1, Issue 1, pp. 27 – 64.
  9. Zhou Y., Cheng H., and Yu J.X., (2009), Graph Clustering Based on Structural/attribute Similarities, Proceedings of the VLDB Endowment, Vol. 2, Issue 1, pp. 718 – 729.
  10. Brandes U., Gaertler M., and Wagner D., (2003), Experiments on Graph Clustering Algorithms, Lecture Notes in Computer Science, Vol. 2832, pp. 568 – 579.
Last download:
2017-11-17 02:03:49

[ © KarelWintersky ] [ All articles ] [ All authors ]
[ © Odessa National Polytechnic University, 2014. Any use of information from the site is possible only under the condition that the source link! ]