Article

Article title ASSESSMENT OF THE QUALITY OF MACHINE TRANSLATION TEXT WITH USE A METHOD OF ANALYSIS OF FUZZY DUPLICATES
Authors V. S. Kornilov, V. M. Glushan, A. Y. Lozovoy
Section SECTION II. ARTIFICIAL INTELLIGENCEAND FUZZY SYSTEMS
Month, Year 07, 2017 @en
Index UDC 004.912
DOI
Abstract This article is devoted to problems of evaluation of machine translation quality. Modern methods and means of fully automatic machine translation is far from perfect, because when you translate a text from one natural language to another natural language, the semantic component is often lost in the translation and the result is often incorrect. The aim of the authors is to develop applications for automatic correction of machine translation to the level of publication. Currently, there are many methods and applications for evaluation of machine and automated translation involving the processing of parallel corpora. The disadvantage of these methods is the inability to track errors in a specific parallel corpus. The authors propose to use the results of reverse transla-tion to evaluate the machine translation quality and to search the discrepancies, and comparing it with the original text with use the system identification of fuzzy duplicates, to search for inconsist-encies in the parallel corpora and the subsequent adjustments. The modern methods of analysis of fuzzy duplicates in unstructured text that used to search for plagiarism are described. Classical methods of analysis of the text are divided into syntactic methods of analysis of sequences, consist-ing of characters, words, phrases or sentences and methods of lexical analysis and definition of the checksum words, phrases, sentences and paragraphs. These methods are reviewed for their use in the system of adjustments of machine translation. The novelty of this work is to develop schemes of work machine translator with automatic correction of discrepancies that identified by the described method. The prospects of using this method are analyzed.

Download PDF

Keywords Machine translation; quality of translation; the reverse translation; search for plagiarism; fuzzy duplicate; computer-aided editing of translations.
References 1. Kornilov V.S., Glushan' V.M. Kriterii chislennoy otsenki kachestva mashinnoperevedennogo teksta [Criteria for numerical evaluation of the quality of machine translated text], Informatsionnye tekhnologii, sistemnyy analiz i upravlenie – ITSAU-2016: Sbornik trudov XIV Vserossiyskoy nauchnoy konferentsii molodykh uchenykh, aspirantov i studentov, 16-19 noyabrya 2016 g. [Information Technology, System Analysis and Management – ITSAU-2016.Proceedings of XIV all-Russian scientific conference of young scientists, postgraduates and students, 16-19 November 2016]. Vol. 1. Taganrog: Izd-vo YuFU, 2016, pp. 170-175.
2. Yashina L.I. Kachestvo avtomaticheskogo perevoda tekstov [The quality of the automatic translation of texts], Aktual'nye problemy lingvistiki – 2015: Materialy Mezhdunarodnoy nauchno-prakticheskoy konferentsii studentov, aspirantov i molodykh uchenykh, 15 aprelya [Actual problems of linguistics - 2015: materials of the International scientific-practical con-ference of students, graduate students and young scientists, April 15], executive editor
Kh.S. Shagbanova. Tyumen': TyumGNGU, 2015, 448 p.
3. Gureeva L.V., Koz'mina N.A. Kontseptsii perevoda v kontekste sovremennykh lingvisticheskikh issledovaniy [The concept of translation in the context of contemporary linguistic research], Molodoy uchenyy [Young Scientist], 2015, No. 11 (91), 1148 p.
4. Tsvilling M.Ya., Turover G.Ya. O kriteriyakh otsenki perevoda [About the criteria for assessing the translation], Tetradi perevodchika [Interpreters' notebooks], 1978, No. 15, 32 p.
5. Alexandra Antonova, Alexey Misyurev Building a Web-based parallel corpus and filtering out machinetranslated text, Proceedings of the 4th Workshop on Building and Using Comparable Corpora, 49th Annual Meeting of the Association for Computational Linguistics, Portland, Oregon, 24 June 2011, pp. 136-144.
6. Kishore Papineni, Salim Roukos, Todd Ward, Wei-Jing Zhu BLEU: a Method for Automatic Evaluation of Machine Translation, Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL), Philadelphia, July 2002, pp. 311-318.
7. Ellen Cushing Dawn of the Digital Sweatshop. East Bay Express article, August 01, 2012, 12 p.
8. Serge Gladkoff Moses: from out of the box to industry quality level in three months, Language business innovation, TAUS user conference Portland (OR), USA 4-6 October 2010, 20 p.
9. Derbenev N.V., Kozlyuk D.A., Nikitin V.V., Tolcheev V.O. Eksperimental'noe issledovanie metodov vyyavleniya nechetkikh dublikatov nauchnykh publikatsiy [Experimental study of methods for identifying near duplicates of scientific publications], Mashinnoe obuchenie i analiz dannykh [Machine learning and data analysis], 2014, Vol. 1, No. 7, 232 p.
10. Galatenko A.V., Galatenko V.V. O rasstoyanii Khemminga mezhdu pochti vsemi funktsiyami algebry logiki [On the Hamming distance between almost all functions of the algebra of logic], Fundamental'naya i prikladnaya matematika [Fundamental and applied mathematics], 2009, Vol. 15, No. 5, pp. 43-47.
11. Khromov N.A. K zadache vyyavleniya nechetkikh dublikatov dlya obnaruzheniya plagiata v nauchnykh publikatsiyakh i otchetnykh materialakh [To the problem of identifying near dupli-cates for detection of plagiarism in scientific publications and reports], Konferentsii na fakul'tete Fiziko-matematicheskikh i estestvennykh nauk RUDN, Informatsionno-telekommunikatsionnye tekhnologii i matematicheskoe modelirovanie vysokotekhnologichnykh sistem 2012 [Conferences at the Faculty of Physics and Mathematics and Natural Sciences, PFUR, Information and Telecommunication Technologies and Mathematical Modeling of High-Tech Systems 2012]. Moscow: Izd-vo RUDN, 2012, pp. 41-49.
12. Rubtsov D.N., Barakhnin V.B. Vyyavlenie dublikatov v raznorodnykh bibliograficheskikh istochnikakh [Identification of duplicates in various bibliographic sources], Vestnik NGU. Seriya: Informatsionnye tekhnologii [Vestnik NSU. Series: Information technology], 2009, Vol. 7, No. 3, pp. 430-438.
13. Zelenkov Yu.G., Segalovich I.V. Sravnitel'nyy analiz metodov opredeleniya nechetkikh dublikatov dlya Web-dokumentov [Comparative analysis of methods for determining near du-plicates for Web-documents], Trudy 9-oy Vserossiyskoy nauchnoy konferentsii «Elektronnye biblioteki: perspektivnye metody i tekhnologii, elektronnye kollektsii» – RCDL2007 [Proceedings of the 9th All-Russian Scientific Conference "Digital Libraries: Advanced Methods and Technologies, Digital Collections" - RCDL'2007]. Pereslavl'-Zalesskiy, 2007, pp. 16-25.
14. Zagorul'ko Yu.A., Salomatina N.V., Seryy A.S., Sidorova E.A., Shestakov V.K. Vyyavlenie nechetkikh dublikatov pri avtomaticheskom formirovanii tematicheskikh kollektsiy dokumentov na osnove Web-publikatsiy [Identification of near duplicates in the automatic formation of the-matic collections of documents based on Web publications], Vestnik Novosibirskogo gosudarstvennogo universiteta. Seriya: Informatsionnye tekhnologii [Vestnik of Novosibirsk state University. Series: Information technology], 2013, Vol. 11, Issue 4, pp. 59-70.
15. Semenova A.V., Kureychik V.M. Obzor i analiz sostoyaniya problemy obrabotki tekstovoy informatsii v sistemakh mashinnogo perevoda [Review and analysis of the state of the problem of processing textual information in machine translation systems], Informatika, vychislitel'naya tekhnika i inzhenernoe obrazovanie [Informatics, computer science and engineering education], 2014, No. 2 (17), pp. 76-93.
16. Pushpak Bhattacharyya Machine Translation. CRC Press Taylor&Francis Group, 2015, 323 p.
17. Glushan' V.M., Karelin V.P. Ispol'zovanie matematicheskikh modeley prinyatiya resheniy v intellektual'nykh SAPR [Use of mathematical models of decision-making in intelligent CAD], Izvestiya YuFU. Tekhnicheskie nauki [Izvestiya SFedU. Engineering Sciences], 2007, No. 2 (77), pp. 96-103.
18. Lutsiv D.V., Koznov D.V. Ierarkhicheskiy algoritm DIFF pri rabote so slozhnymi dokumentami [Hierarchical diff algorithm for working with complex documents], Sistemnoe programmirovanie [System Programming], 2012, Vol. 7, pp. 105-114.
19. Glushan' V.M., Karelin V.P., Kuz'menko O.L. Nechetkie modeli i metody mnogokriterial'nogo vybora v intellektual'nykh sistemakh podderzhki prinyatiya resheniy [Near models and methods of multi-choice in intellectual decision support systems], Izvestiya YuFU. Tekhnicheskie nauki [Izvestiya SFedU. Engineering Sciences], 2009, No. 4 (93), pp. 51-63.
20. Starostin A.S., Mal'kovskiy M.G. Algoritm sintaksicheskogo analiza, ispol'zuemyy v sisteme morfosintaksicheskogo analiza«Treeton» [Algorithm of parsing used in the system of morphosyntactic analysis "Treeton"], Trudy mezhdunarodnoy konferentsii «Dialog 2007» [Proceedings of the international conference "Dialogue 2007"], Moscow: Izd-vo MGU, 2007, pp. 516-524.
21. Komarnitskaya O.I., Komarnitskaya I.I. Metod semanticheskogo sravneniya nechetkoy informatsii pri proverke tekstov na nalichie plagiata [Method of semantic comparison of near information when checking texts for plagiarism], Filosofskie problemy informatsionnykh tekhnologiy i kiberprostranstva. Prikladnye aspekty informatsionnykh tekhnologiy [Philosophical problems of information technologies and cyberspace. Applied aspects of information technologies]. Pyatigorsk: Izd-vo PGLU, 2015, No. 2 (10), pp. 127-139.

Comments are closed.