IDEAS home Printed from https://ideas.repec.org/a/spr/infosf/v18y2016i6d10.1007_s10796-015-9561-6.html
   My bibliography  Save this article

XHQE: A hybrid system for scalable selectivity estimation of XML queries

Author

Listed:
  • E.-S. M. El-Alfy

    (King Fahd University of Petroleum & Minerals)

  • S. Mohammed

    (King Fahd University of Petroleum & Minerals)

  • A. F. Barradah

    (Exploration Network Operations Department, Saudi ARAMCO)

Abstract

With the increasing popularity of XML applications in enterprise and big data systems, the use of efficient query optimizers is becoming very essential. The performance of an XML query optimizer depends heavily on the query selectivity estimators it uses to find the best possible query execution plan. In this work, we propose a novel selectivity estimator which is a hybrid of structural synopsis and statistics, called XHQE. The structural synopsis enhances the accuracy of estimation and the structural statistics makes it scalable to the allocated memory space. The structural synopsis is generated by labeling the nodes of the source XML dataset using a fingerprint function and merging subtrees with similar fingerprints (i.e. having similar structures). The generated structural synopsis and structural statistics are then used to estimate the selectivity of given queries. We studied the performance of the proposed approach using different types of queries and four benchmark datasets with different structural characteristics. We compared XHQE with existing algorithms such as Sampling, TreeSketch and one histogram-based algorithm. The experimental results showed that the XHQE is significantly better than other algorithms in terms of estimation accuracy and scalability for semi-uniform datasets. For non-uniform datasets, the proposed algorithm has comparable estimation accuracy to TreeSketch as the allocated memory size is highly reduced, yet the estimation data generation time of the proposed approach is much lower (e.g., TreeSketch took more than 50 times longer than that of the proposed approach for XMark dataset). Comparing to the histogram-based algorithm, our approach supports regular twig quires in addition to having higher accuracy when both run under similar memory constraints.

Suggested Citation

  • E.-S. M. El-Alfy & S. Mohammed & A. F. Barradah, 2016. "XHQE: A hybrid system for scalable selectivity estimation of XML queries," Information Systems Frontiers, Springer, vol. 18(6), pages 1233-1249, December.
  • Handle: RePEc:spr:infosf:v:18:y:2016:i:6:d:10.1007_s10796-015-9561-6
    DOI: 10.1007/s10796-015-9561-6
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s10796-015-9561-6
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s10796-015-9561-6?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Binh Viet Phan & Eric Pardede & Wenny Rahayu, 2013. "On the improvement of active XML (AXML) representation and query evaluation," Information Systems Frontiers, Springer, vol. 15(2), pages 203-222, April.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Jakub Malý & Martin Nečaský, 2015. "Model-driven approach to modeling and validating integrity constraints for XML with OCL and Schematron," Information Systems Frontiers, Springer, vol. 17(4), pages 917-946, August.
    2. Gabriele Kotsis & Ismail Khalil, 2013. "Special issue on Semantic Information Management guest editorial," Information Systems Frontiers, Springer, vol. 15(2), pages 151-157, April.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:infosf:v:18:y:2016:i:6:d:10.1007_s10796-015-9561-6. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.