Article

Article title DESIGN AND ANALYSIS OF ALGORITHM FOR SOLVING CLUSTERING PROBLEM FOR QUESTION-ANSWERING MODULE OF FORECASTING SYSTEM
Authors S.B. Kartiev, V.M. Kureychick
Section SECTION I. DATA ANALYSIS AND KNOWLEDGE MANAGEMENT
Month, Year 07, 2016 @en
Index UDC
DOI DOI 10.18522/2311-3103-2016-7-1828
Abstract This article is dedicated to problems of construction of the module for question-answer search unstructured information in the information-analytical prediction system relative to the collection of the initial states of complex technical systems. Modern information and search engines are based on the principles of search by keywords. This type of search provides the output of web pages collection that probability may contain the necessary material to the user. The paper proposes an approach to the task of bringing the clustering optimization problem and solve it using metaheuristic methods. An introduction to traditional clustering methods, given their advantages and disadvantages. Clustering is a special case of learning without a teacher. The lack of teachers is provided that the system has no expert who can assign the document class. The description of the basic model of computing and data storage means used in the developed system. An approach to the construction of such modules and their software, which is the solution of some problems of natural language processing. The novelty of the work lies in the use of the modified genetic algorithm for solving the problem of clustering of text documents that allows you to simultaneously analyze a number of the best solutions .. This allows you to improve the quality of the search subsystems of information-analytical system (IAS) prediction. IAS search subsystem is used to retrieve information for the prediction of the collection of the initial states of complex technical system. Produced development of modified genetic clustering algorithm. Shows the software implementation of information retrieval module IAS forecasting algorithm developed using the Java programming language to solve the problem of clustering and application OpenNLP libraries for natural language processing. Also defines the place of the developed module in the system diagnostics of complex technical systems to maintain the health of a software system. The testing of such a system to copy the latest version Wikipedia.org site. The experiments showed a decrease execution time of the algorithm and improve the quality of the results.

Download PDF

Keywords Genetic algorithm; clustering; information retrieval; prediction.
References 1. Simmons, Klein, McConlogue. 1964. Indexing and Dependency Logic for Answering English Questions. American Documentation 15:30, 196U204.
2. Solov'ëv A.A. Sintaksicheskie i semanticheskie modeli i algoritmy v zadache voprosno-otvetnogo poiska [Syntactic and semantic models and algorithms in the task of question-answering search], Trudy 13-y Vserossiyskoy nauchnoy konferentsii «Elektronnye biblioteki: perspektivnye metody i tekhnologii, elektronnye kollektsii» - RCDL’2011, Voronezh, Rossiya, 2011 [Proceedings of 13-th all-Russian scientific conference "digital libraries: advanced methods and technologies, digital collections" - RCDL'2011, Voronezh, Russia, 2011].
3. Solov'ev A.A., Peskova O.V. Postroenie voprosno-otvetnoy sistemy dlya russkogo yazyka: modul' analiza voprosov [Question-answering system Building for the Russian language: the module of analysis of issues], Novye informatsionnye tekhnologii v avtomatizirovan-nykh sistemakh: materialy 13-go nauchno-prakticheskogo seminara [New information technologies in automatedtion systems: proceedings of the 13th scientific-practical seminar]. Moscow: Mosk. gos. in-t elektroniki i matematiki 2010, pp. 41-49.
4. Bishop C. Pattern Recognition and Machine Learning, Springer, 2006.
5. Tapas Kanungo, David M. Mount, Nathan S. Netanyahu, Christine D. Piatko, Ruth Silverman and Angela Y.Wu. An Efficient k-means Clustering Algorithm: Analysis and Implementation.
6. Jure Leskovec, Anand Rajaraman, Jeffrey D. Ullman Mining of Massive Datasets. Cambridge University Press, 2014, 511 p.
7. Manning K., Ragkhavan P., Shyuttse Kh. Vvedenie v informatsionnyy poisk [Introduction to information retrieval]. Moscow: Vil'yams, 2011, 528 p.
8. Jure Leskovec, Anand Rajaraman, Jeffrey D. Ullman Mining of Massive Datasets. Cambridge University Press, 2014, 511 p.
9. Mirkin B.G. Vvedenie v analiz dannykh: uchebnik dlya bakalavriata i magistratury [Introduction to data analysis: a textbook for undergraduate and graduate programs]. Moscow: Yurayt, 2014, 174 p.
10. Romanovskiy I.V. Diskretnyy analiz: ucheb. posobie dlya studentov, spetsializiruyushchikhsya po prikladnoy matematike i informatike [Discrete analysis: a textbook for students specializing in applied mathematics and computer science]. 4th ed. – SPb.: Nevskiy Dialekt; BKhV-Peterburg, 2008, 336 p.
11. Lande D.V, Snarskiy A.A. Internetika: Navigatsiya v slozhnykh setyakh: modeli i algoritmy [Internetika: Navigation in complex networks: models and algorithms]. Moscow: Knizhnyy dom «LIBROKOM», 2009, 264 p.
12. Emel'yanov V.V., Kureychik V.V., Kureychik V.M. Teoriya i praktika evolyutsionnogo modeliro-vaniya [Theory and practice of evolutionary modeling]. Moscow: Fizmatlit, 2003, 432 p.
13. Gladkov L.A., Kureychik V.V., Kureychik V.M. Geneticheskie algoritmy: ucheb. posobie [Genetic algorithms: a textbook], under the ed. of V.M. Kureychika. Moscow: Fizmatlit, 2004, 400 p.
14. Lomakina, L.S., Gubernatorov V.P. Modifikatsiya evolyutsionno-geneticheskogo algoritma dlya effektivnogo diagnostirovaniya slozhnykh sistem [Modification of evolutionary genetic algorithm for efficient diagnosis of complex systems ], Sistemy upravleniya i informatsionnye tekhnologii [Control systems and information technology], 2013, T. 53, No. 3, pp. 59-64.
15. Namestnikov A.M. Metauroven' informatsionnogo obespecheniya SAPR: ot teorii k praktike [Meaurement information support of CAD: from theory to practice]. Ul'yanovsk: UlGTU, 2015, 176 p.
16. Gavrilova T.A, Kudryavtsev D.V., Muromtsev D.I. Inzheneriya znaniy. Modeli i metody: uchebnik [Knowledge engineering. Models and methods: textbook]. St. Petersburg: Izd-vo Lan' 2016, 324 p.
17. Klarens Kho, Rob Kharrop. Spring 3 dlya professionalov = Pro Spring 3 [Spring 3 for pros = Pro Spring 3]. Moscow: Vil'yams, 2012, 880 p.
18. Kartiev S.B., Kureychik V.M. Algoritm klassifikatsii, osnovannyĭ na printsipakh sluchaĭnogo lesa dlya resheniya zadachi prognozirovaniya [The classification algorithm is based on the principles of random forests for forecasting], Programmnye produkty i sistemy [Software Products and Systems], 2016, No. 2, pp. 11-15.
19. Kartiev S.B., Kureychik V.M. Martynov A.V. Parallel'nyy algoritm prognozirovaniya korotkikh vremennykh ryadov [A parallel algorithm for forecasting short time series], Trudy Kongressa po intellektual'nym sistemam i in-formatsionnym tekhnologiyam «IS&IT’15». Nauchnoe izdanie v 4-kh t. [Proceedings of Congress on intelligent systems and information technologies "IS&IT'15". Scientific publication in 4 vol.]. Moscow: Fizmatlit, 2015, pp. 27-47.
20. Kartiev S.B., Kureychik V.M. Razrabotka raspredelennoy sistemy analiza vremennykh ryadov na osnove modeli vychisleniya MapReduce [Development of a distributed system for analyzing time series based on the model of MapReduce computation], Trudy Kongressa po intel-lektual'nym sistemam i informatsionnym tekhnologiyam «IS&IT’16». Nauchnoe izdanie v 4-kh t. [Proceedings of Congress on intelligent systems and information technologies "IS&IT'16". Scientific publication in 4 vol.]. Moscow: Fizmatlit, 2016, pp. 36-43.

Comments are closed.