Authors Bermudez Soto José Gregorio
Month, Year 03, 2017 @en
Index UDC 004.89
Abstract The paper considers a method of comparing textual documents in the processing of natural language in Russian with the purpose of determining their semantic proximity; considered is the subtask of measuring the semantic similarity according to the criteria of correctness and depth. On the basis of the conducted review of existing approaches of texts comparison, a method is proposed for determining the semantic similarity between two texts on the basis of textual passages, which makes it possible to determine not only the semantic proximity of documents presented in natural language, but also quantify the similarity of these documents. This study is framed in the field of automatic text processing (ATP) and the formalization of natural languages, gradually shifting from the simplest methods of analysis to the more complex, gradually reaching a level of processing that can already see the text not just as a sequence of words, but as a single whole, which has some meaning, as it corresponds to human perception. In accordance with the general scheme of automatic text processing, this study is focused on the semantic level and is a detailed description of the final stage about comparing the closeness of the general scheme. The method is based on determining the degree of similarity between the passages. Under the passage we mean a separate place in the text, which has some kind of integrity. This work uses segmentation of texts as a basis for text comparison in the natural language processing in Russian; it will be considered subtask of extracting parts of text with a special meaning, which are called "passage". Also the comparison of texts in Russian is used, in the subtask of determination of semantic proximity. A review of existing methods of comparison is given. The determination method of degree of simi-larity between textual passages within a semantic class is proposed. Existing methods are compared with the proposed method and a comparison made by people in an experiment, which shows the suitability of the proposed method.

Keywords Measurement of textual proximity; definition of similarity; comparison of texts; presentation of semantic schemes; passages.
