Authors Lednov D. A., Kulay A. Yu., Melnikov S. Yu.
Month, Year 08, 2008 @en
Index UDC 681.056
Abstract Different statistical methods for distorted texts language identification are described. An experimental comparison of performance is given for different lengths of text messages. Advice on choosing statistical method for speech language identification is given under the assumption that the proposed text distortion model is adequate to the distortion observed in the processing of speech signal.

Download PDF

Keywords language identification methods, small order models, probability smoothing methods, automata theory.
References 1. Батальщиков А.А., Леднов Д.А. Модель открытой идентификации языка // Сб. трудов XVII сессии Российского Акустического Общества, 11-17 сентября 2006 г. Таганрог. – Москва, ГЕОС, 2006. Т. 3. – С. 44-45.
2. Campbell W., Gleason T., Navratil J., Reynolds D., Shen W., Singer E., Torres-Carrasquillo P. Advanced language recognition using cepstra and phonotactics: MITLL system performance on the NIST 2005 language recognition evaluation. In Proc. IEEE Odyssey 2006: The Speaker and Language Recognition Workshop, (San Juan, Puerto Rico), June 2006.
3. Katz S.M. Estimation of probabilities from sparse data for the language model component of a speech recognizer. IEEE Transactions on Acoustics, Speech and Signal Processing, 35(3): 400-401, 1987.
4. Basavaraja S.V., Screenivas T.V. Low Complexity LID using Pruned Pattern Tables of LZW. In INTERSPEECH-2006, paper 1398-Mon2CaP.4.
5. Nelson M. LZW Data Compression. Dr Dobbs Journal, Oct 1989.
6. Bahl L., Brown P., DeSouza P., Mercer R. A tree-based statistical language model for natural language speech recognition. IEEE Trans. on Acoustics, Speech, and Signal Processing, 37(7): 1001-1008, July 1989.
7. Мельников С.Ю. Многогранники, характеризующие статистические свойства конечных автоматов // Труды по дискретной математике, 2003. Т. 7. – М.: Изд-во физико-
математической литературы, 2003. – С. 126-137.
8. Kulay A.Y., Melnikov S.Y. Different approaches to the garbled text language recognition, using the data compression methods. Proc. XII intern. Conference “Speech and Computer” 15-18 Oct. 2007, vol. 2, pp. 697-701.
9. Moffat A. Implementing the PPM data compression scheme. IEEE Transactions on Communications, 38(11): 1917-1921, 1990.
10. Ватолин Д., Ратушняк А., Смирнов М., Юкин В. Методы сжатия данных. Устройство архиваторов, сжатие изображений и видео. – М.: ДИАЛОГ-МИФИ, 2002.
11. Кулай А.Ю., Мельников С.Ю. Сравнение нескольких подходов к распознаванию языков искаженных текстов // Труды второй международной конференции «Системный анализ и информационные технологии» (САИТ-2007), (Обнинск, Россия), 10-14 сентября 2007 г. – М.: Изд-во ЛКИ, 2007. Т. 1. – С. 218-220.
12. Nadas A., Nahamoo D., Picheny M., Powell J. An iterative «Flip-Flop» approximation of the most informative split in the construction of decision tree. Proc. of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP 1991), (Toronto, Canada), May 1991, pp. 565-568.
13. Navratil J. Recent advances in phonotactic language recognition using binary-decision trees. In INTERSPEECH-2006, paper 1338-Mon2CaP.6.

Comments are closed.