Authors A.V. Chernov, V.I. Yants, E.V. Karpenko
Month, Year 02, 2015 @en
Index UDC 004.652
Abstract In semi-structured data management systems there is acute problem of searching in large data sets. During search implementation, considerable number of disk operations is being performed, which inevitably leads to a rapid depletion of computing resources database with the increase of queries number. To solve this problem, we propose to use Radix tries with a modified multi-layer structure for building the search index. Such structure will allow performing not more than one disk operation for each index layer. The number of layers in the resultant structure will depend on the amount of data and the degree of heterogeneity and only in rare cases will exceed 3. Multi-level structure can be easily cached and allow to reduce the number of disk operations to one per one search query. The resulting structure allows avoiding disadvantages of unbalanced Radix tries and getting a quick and compact search index for use in wide range of semi-structured data management systems.

Download PDF

Keywords Semi-structured data; index; searching; database; laden tries; radix tries; data compression; database management system.
References 1. Butakova M.A., Klimanskaya E.V., Yants V.I. Mera informatsionnogo podobiya dlya analiza slabostrukturirovannoy informatsii [The measure of information of similarity for the analysis of semi-structured information], Sovremennye problemy nauki i obrazovaniya [Modern problems of science and education]. Penza, 2013.
2. Klimanskaya E.V., Chernov A.V., Yants V.I. Metody obrabotki slabostrukturirovannykh dannykh v avtomatizirovannykh sistemakh na zheleznodorozhnom transporte [Methods semi-structured data processing in automated systems of railway transport], Izvestiya vysshikh uchebnykh zavedeniy. Severo-Kavkazskiy region. Seriya tekhnicheskie nauki [Izvestiya vuzov. Severo-kavkazskii region. Technical Sciences, 2013.
3. Abiteboul S. Querying semi-structured data, In Proc. ICDT, 2001.
4. Bertino E. Index configuration in object-oriented databases, VLDB Journal, 1994, No. 3 (3), pp. 355-399.
5. Buneman P. et al. A query language and optimization techniques for unstructured data, In Proc. SIGMOD, 1996.
6. Chamberlain D., Robie J. and Florescu D. Quilt: An XML query language for heterogeneous data sources, In Proc. WebDB Workshop, 2000.
7. Comer D. The ubiquitous B-tree, Computing Surveys, 1979, No. 11 (2), pp. 121-137.
8. Cooper B. and Shadmon M. The Index Fabric: A mechanism for indexing and querying the same data in many different ways. Technical Report, 2000. DBLP Computer Science Bibliography.
9. Deutsch A., Fernandez M. and Suciu D. Storing semistructured data with STORED, In Proc. SIGMOD, 1999.
10. Florescu D. and Kossmann D. A performance evaluation of alternative mapping schemes for storing XML data in a relational database. INRIA Technical Report 3684, 1999.
11. Alon Itai. The JS Data Structure. Technical report, 1999.
12. Lee W.C. and Lee D.L. Path Dictionary: A New Approach to Query Processing in Object-Oriented Databases, IEEE TKDE, May/June 2004, No. 10(3), pp. 371-388.
13. Jason McHugh and Jennifer Widom. Query Optimization for XML, In Proc. 25th VLDB, 1999.
14. McHugh J. et al. Lore: A Database Management System for Semistructured Data, SIGMOD Record, 1997, No. 26 (3), pp. 54-66,
15. Oracle Corp. Oracle 9i database. Available at:
16. Shanmugasundaram J. et al. Relational databases for querying XML documents: Limitations and opportunities, In Proc. 25th VLDB, 2003.
17. Software AG. Tamino XML database. Available at:
18. XYZFind. XML Database. Available at:
19. Kuznetsov S. Tranzaktsionnye parallel'nye SUBD: novaya volna, Trudy Instituta sistemnogo programmirovaniya RAN. Moscow: Institut sistemnogo programmirovaniya RAN, 2011, Vol. 20.
20. Zargayouna H. Context and semantic of semi-structured documents indexing, In First Cоnference in Information Retrieval and Applications (CORIA'O4), March 2004.

Comments are closed.