Article

Article title MODIFIED EM-CLUSTERING ALGORITHM FOR INTEGRATED BIG DATA PROCESSING TASKS
Authors V. V. Bova, S. N. Scheglov, D. V. Leshchanov
Section SECTION IV. DATA ANALYSIS AND KNOWLEDGE MANAGEMENT
Month, Year 04, 2018 @en
Index UDC 004.822
DOI
Abstract The paper presents a possible solution to the problems of structuring large-scale data, as well as their integrated storage in structures that ensure the integrity, consistency of their presentation, high speed and flexibility of processing of non-structured information. To solve these problems, we propose a method for constructing a multilevel ontological structure that provides a solution to the interrelated tasks of identifying, structuring, and processing large data sets, predominantly natural language forms of representation. Developed on the basis of methods of semantic analysis and ontological modeling, a multilevel model is suitable for the interpretation and efficient integrated processing of unstructured data obtained from distributed sources of information. The multilevel representation of the large-scale data structuring model determines the methods and mechanisms of the unified meta-description of the data elements at the logical level, the search for patterns and the classification of the characteristic space at the semantic level, and the linguistic level of implementation of the procedures for identifying, consolidating and enriching data. As a possible solution to this problem, we propose a method and algorithm for cluster analysis that reduces the dimension of the initial data set and reveals the semantic areas of terminological coverage. Modification of this method consists in applying a scalable and computationally effective genetic algorithm for searching and generating weight coefficients that correspond to different measures of the similarity of the set of observed features used in the formation of the data clustering model. The data obtained in a series of computational experiments confirmed the theoretical significance and prospects of applying the clustering method with GA to assess the semantic proximity of data elements presented in ontology.

Download PDF

Keywords Semantic similarity; ontology; semantic network; unstructured data; Bigdata; semantic analysis; semantic meta-model; genetic algorithms; clustering
References 1. Bova V.V., Kureichik V.V., Leshchanov D.V. The model of semantic similarity estimation for the problems of big data search and structuring, Application of Information and Communication Technologies - AICT 2017, pp. 27-32.
2. Kravchenko Yu.A., Markov V.V., Novikov A.A. Semanticheskiy poisk v SemanticWeb [Search semantic SemanticWeb], Izvestiya YuFU [Izvestiya SFedU. Engineering Sciences], 2016,
No. 6 (179), pp. 65-75.
3. Mumford C., Jain L. Computational Intelligence. Collaboration, Fuzzyand Emergence. Berlin: Springer-Verlag, 2009, 726 p.
4. Kureichik V., Zaporozhets D., Zaruba D. Generation of bioinspired search procedures for optimization problems, Application of Information and Communication Technologies, AICT 2016 - Conference Proceedings 10, 2016, pp. 7991822.
5. Kuliev E.V., Kravchenko Yu.A., Loginov O.A., Zaporozhets D.Yu. Metod intellektual'nogo prinyatiya effektivnykh resheniy na osnove bioinspirirovannogo podkhoda [Method of intellectual decision-making based on bioinspired approach], Izvestiya Kabardino-Balkarskogo nauchnogo tsentra RAN [Izvestiya of Kabardino-Balkar scientific centre of the RAS], 2017, No. 62 (80), pp. 162-169.
6. Bova V.V., Kuliev E.V., Leshchanov D.V. Kontseptual'nye osnovy avtomatizirovannoy obrabotki nestrukturirovannoy informatsii v sistemakh upravleniya problemno-orientirovannymi znaniyami [Conceptual basis of automated processing of unstructured information in problem-oriented knowledge management systems], V sb. "IS&IT'17" [In the collection "IS&IT '17"], 2017, pp. 341-350.
7. Bova V.V., Lezhebokov A.A., Leshchanov D.V. Modelirovanie semanticheskoy seti predstavleniya znaniy na osnove ontologicheskogo podhoda [Modeling of semantic network of knowledge representation on the basis of ontological approach], Informatizatsiya i svyaz' [Informatization and communication], 2018, No. 4, pp. 78-88.
8. Kureychik V., Semenova A. Сombined method for integration of heterogeneous ontology models for big data processing and analysis, Advances in Intelligent Systems and Computing, 2017, Vol. 573, pp. 302-311.
9. Kureichik V., Safronenkova I. Integrated algorithm of the domain ontology development, Advances in Intelligent Systems and Computing, 2017, Vol. 573, pp. 146-155.
10. Harchenko A.M. Adaptivnyy raschet funktsii dlya dinamicheskogo EM-algoritma [The adaptive function is calculated for the dynamic em algorithm], Matematika [Mathematics], 2015, pp. 134.
11. Hoang V.K., Tuzovskiy A.F. Metody opredeleniya urovney bezopasnosti elementov ontologii [Methods of determining safety levels of the elements of the ontology], Izvestiya Tomskogo politekhnicheskogo universiteta [News of Tomsk Polytechnic University], 2013, Vol. 322,
No. 5, pp. 148-152.
12. Loukachevitch N., Dobrov B. Ontological resources for representing security domain in information-analytical system, Otkrytye semanticheskie tekhnologii proektirovaniya intellektual'nykh system [Open semantic technology of intelligent systems], 2018, Vol. 2,
No. 8, pp. 185-191.
13. Kopaygorodskiy A.N., Semicheva O.A Semanticheskaya informatsionnaya sistema dlya predstavleniya nauchnoy deyatel'nosti v seti internet [Semantic information system for representation of scientific activity in the Internet], Vestnik Irkutskogo gosudarstvennogo tekhnicheskogo universiteta [Bulletin of Irkutsk state technical University], 2014, No. 12,
pp. 23-29.
14. Kuliev E.V., Lezhebokov A.A., Leshchanov D.V., Shkalenko B.I. Mekhanizmy roevogo intellekta i evolyutsionnoy adaptatsii na osnove virtual'nogo nabora populyatsiy dlya resheniya zadach upravleniya problemno-orientirovannymi znaniyami [Mechanisms of swarm intelligence and evolutionary adaptation based on a virtual set of populations for problem-oriented knowledge management], Informatika, vychislitel'naya tekhnika i inzhenernoe obrazovanie [Informatics, computer engineering and engineering education], 2017, No. 1 (29), pp. 34-45.
15. Gladkov L.A., Shcheglov S.N., Gladkova N.V. The application of bioinspired methods for solving vehicle routing problems, Procedia Computer Science, 120 (2017). 9th International Conference on Theory and Application of Soft Computing, Computing with Words and Perception, ICSCCW 2017, pp. 39-46.
16. Kuliev E.V., Shcheglov S.N., Pantelyuk E.A., Loginov O.A. Adaptivnyy algoritm stai serykh volkov dlya resheniya zadach proektirovaniya [The adaptive algorithm of the pack of gray wolves for the solution of design tasks], Izvestiya YUFU. Tekhnicheskie nauki [Izvestiya SFedU. Engineering Sciences], 2017, No. 7 (192), pp. 28-38.
17. Loginov O.A., Lezhebokov A.A., Bova V.V., Shcheglov S.N. Intellektual'nyy analiz dannykh na osnove bioinspirirovannogo podkhoda [Data mining based on bioinspired approach], Informatizatsiya i svyaz' [Informatization and communication], 2018, No. 4, pp. 66-71.
18. Shcheglov S.N. Ispol'zovanie ontologiy v sistemakh podderzhki prinyatiya resheniy [The use of ontologies in decision support systems], Kongress po intellektual'nym sistemam i informatsionnym tekhnologiyam IS-IT’17: Trudy kongressa [Congress on intelligent systems and information technology IS-IT’17: Proceedings of the Congress], 2017, Vol. 1, pp. 242-252.
19. Kravchenko YU.A., Kovalenko M.S. Razrabotka instrumental'noy sredy obrabotki dannykh [Development of data processing tool environment], Kongress po intellektual'nym sistemam i informatsionnym tekhnologiyam IS-IT’17: Trudy kongressa [Congress on intelligent systems and information technologies IS-IT’17: Proceedings of the Congress], 2017, Vol. 3, pp. 211-218.
20. Esfandani Gholamreza, Abolhassani Hassan. MSDBSCAN: multidensity scale-independent clustering algorithm based on DBSCAN, In: Advanceddata mining and applications. Chongqing, China: Springer, 2010, pp. 202-13.
21. Carmelo Cassisi, Alfredo Ferro, Rosalba Giugno, Giuseppe Pigola, Alfredo Pulvirenti. Enhancing density-based clustering: parameter reduction andoutlier detection, In: Syst, 2013, Vol. 38 (3), pp. 317-30.
22. Makarov I.E. Avtomatizatsiya analiza proektnykh resheniy s primeneniem metodov intellektual'noy obrabotki [Automating the analysis of design decisions with application of methods of intellectual processing], Intellektual'nye sistemy [Intellectual systems], 2014,
No. 10, pp. 26-27.
23. Bernhard Pfahringer. Data stream mining: a practical approach. Available at: http://voxel.dl.sourceforge.net/project/moadatastream/ StreamMining.pdf.
24. Gaziev G.Z., Kurdyukova G.N., Kurdyukov V.V. Klasterizatsiya Big Data dlya ikh analiza i obrabotki [Big Data clustering for their analysis and processing], Sb. nauchnykh statey konferentsii «Napravleniya i mekhanizmy razvitiya nauki novogo vremeni: ot teorii do vnedreniya rezul'tatov» [Collection of scientific articles of the conference "Directions and mechanisms of development of modern science: from theory to implementation of results"], 2017, pp. 150-162.

Comments are closed.