Article

Article title MODERN APPROACHES IN BIG DATA SYSTEMS
Authors V.V. Khashkovsky, A.N. Shkurko
Section SECTION IV. METHODS OF INFORMATION PROCESSING
Month, Year 08, 2014 @en
Index UDC 004.67
DOI
Abstract This paper discusses the current approaches to the organization of processing large amounts of data on an example of the integrated systems of leading manufacturers of modular and ISV solutions. The main attention is paid to the methods for the preparation and sources of information for data processing systems. The basic methods and sources of background information and a brief description of them is given. The main stages of information processing systems, data mining, ranging from direct information before forming a final conclusion on the results of the analysis. For the main processing steps are presented examples of existing software systems that implement the required functionality. We also consider some of the approaches to the characterization of documents and examples of software systems that implement these approaches. To study the documents shows the main parameters of documents on which the analysis is conducted. In conclusion we consider about the state of the market business intelligence systems in Russia and the prospects for their adaptation and implementation.

Download PDF

Keywords Big data; data mining; text mining; machine learning; classification.
References 1. Business Intelligence, (mirovoy rynok). IT-Direktoru, BI, Rynki, Rynki, programmnoe obespechenie, 2014 [Business Intelligence, (world market). The CIO, BI, Markets, Markets, software, 2014]. Available at:
http://www.tadviser.ru/index.php/%D0%A1%D1%82%D0%B0%D1%82% D1%8C%D1%8F:Business_Intelligence,_BI_(%D0%BC%D0%B8%D1% 80%D0%BE%D0%B2%D0%BE%D0%B9_%D1%80%D1%8B%D0% BD%D0%BE%D0%BA).
2. Jiawei Han, Micheline Kamber. Data Mining: Concepts and Techniques Second Edition, USA Elsevier Inc., 2006, 743 p.
3. Koeffitsient doveriya v ekspertnykh sistemakh, 2014 [The factor of trust in expert systems, 2014]. Available at: http://www.aiportal.ru/articles/expert-systems/confidence-factor.html.
4. Web Data Extractor – Extract Email, URL, Meta Tag, Phone, Fax from Websites, 2014. Available at: http://www.webextractor.com.
5. Web Scraping, Web Extraction, WebSundew, 2014. Available at:
http://www.websundew.com/.
6. Data Extraction, Web Screen Scraping Tool, Mozenda Scraper, 2014. Available at: https://www.mozenda.com/pricing.
7. Screen-scraper: Data extraction software and services, 2014. Available at: http://www.screen-scraper.com/download/choose_version.php.
8. Kapow Katalyst: The Leading Application Integration Platform for connecting cloud, mobile, social and big data – Kapow Software, 2014. Available at: http://www.kapowsoftware.com/products/kapow-katalyst/index.php.
9. Gershenzon L. Novostnye agregatory i onlayn-SMI: zhizn' vmeste, 2009 [News aggregators and online media: life together, 2009]. Available at: http://download.yandex.ru/ company/Yandex_News_11_2009.pdf.
10. O sayte / Khabrakhabr, 2014 [About the website, Habrahabr, 2014]. Available at: http://habrahabr.ru/info/about.
11. Gulin A., Karpovich P., Raskovalov D., Segalovich I. Optimizatsiya algoritmov ranzhirovaniya metodami mashinnogo obucheniya, 2009 [Optimization algorithms of ranking methods in machine learning, 2009]. Available at: http://romip.ru/romip2009/15_yandex.pdf.
12. Kollaborativnaya fil'tratsiya – Vikipediya, 2014 [Collaborative filtering - Wikipedia, 2014]. Available at:
http://ru.wikipedia.org/wiki/%D0%9A%D0%BE%D0%BB%D0%BB%
D0%B0%D0%B1%D0%BE%D1%80%D0%B0%D1%82%D0%B8%
D0%B2% D0%BD%D0%B0%D1%8F_%D1%84%D0%B8%D0%BB
%D1%8C% D1%82%D1%80%D0%B0%D1%86%D0%B8%D1%8F.
13. Okapi BM25 – Wikipedia, 2013 Available at: http://ru.wikipedia.org/wiki/Okapi_BM25.
14. Part-of-speech tagging – Wikipedia, the free encyclopedia, 2014. Available at: http://en.wikipedia.org/wiki/Part-of-speech_tagging.
15. Apache OpenNLP – Welcome to Apache OpenNLP, 2010. Available at:
http://opennlp.apache.org/.
16. Natural Language Toolkit – NLTK 3.0 documentation, 2013. Available at:
http://www.nltk.org/.
17. KNIMEtech KNIME Text Processing, 2014. Available at: http://tech.knime.org/knime-text-processing.
18. Russian stemming algorithm. Available at: http://snowball.tartarus.org/algorithms/ russian/stemmer.html.
19. The Source for Social Data – Gnip, 2014. Available at: http://gnip.com/.
20. Spinn3r: RSS Content, News Feeds, News Content, News Crawler and Web Crawler APIs, 2014. Available at: http://www.spinn3r.com/.
21. Oracle Social Cloud, Social Relationship Management (SRM) Solutions | Oracle, 2014. Available at: http://www.oracle.com/us/solutions/social/ overview/index.html.
22. Data Sift Powering the Social Economy, 2014. Available at: http://datasift.com/.
23. Severov M. Klyuchevye igroki rynka BI: krug szhimaetsya, Analiticheskie sistemy, Informatsionnye tekhnologii, 2008 [Key market players BI: circle shrinks, Analytical systems, Information technology, 2008]. Available at: http://www.iteam.ru/publications/it/ section_92/article_3625/.

Comments are closed.