IDEAS home Printed from https://ideas.repec.org/a/gam/jmathe/v11y2023i18p3908-d1239944.html
   My bibliography  Save this article

Efficient Complex Aggregate Queries with Accuracy Guarantee Based on Execution Cost Model over Knowledge Graphs

Author

Listed:
  • Shuzhan Ye

    (HDU-ITMO Joint Institute, Hangzhou Dianzi University, Hangzhou 310005, China)

  • Xiaoliang Xu

    (School of Computer Science and Technology, Hangzhou Dianzi University, Hangzhou 310005, China)

  • Yuxiang Wang

    (School of Computer Science and Technology, Hangzhou Dianzi University, Hangzhou 310005, China)

  • Tao Fu

    (School of Computer Science and Technology, Hangzhou Dianzi University, Hangzhou 310005, China)

Abstract

Knowledge graphs (KGs) have gained prominence for representing real-world facts, with queries of KGs being crucial for their application. Aggregate queries, as one of the most important parts of KG queries (e.g., “ What is the average price of cars produced in Germany?”), can provide users with valuable statistical insights. An efficient solution for KG aggregate queries is approximate aggregate queries with semantic-aware sampling (AQS). This balances the query time and result accuracy by estimating an approximate aggregate result based on random samples collected from a KG, ensuring that the relative error of the approximate aggregate result is bounded by a predefined error. However, AQS is tailored for simple aggregate queries and exhibits varying performance for complex aggregate queries. This is because a complex aggregate query usually consists of multiple simple aggregate queries, and each sub-query influences the overall processing time and result quality. Setting a large error bound for each sub-query yields quick results but with a lower quality, while aiming for high-quality results demands a smaller predefined error bound for each sub-query, leading to a longer processing time. Hence, devising efficient and effective methods for executing complex aggregate queries has emerged as a significant research challenge within contemporary KG querying. To tackle this challenge, we first introduced an execution cost model tailored for original AQS (i.e., supporting simple queries) and founded on Taylor’s theorem. This model aids in identifying the initial parameters that play a pivotal role in the efficiency and efficacy of AQS. Subsequently, we conducted an in-depth exploration of the intrinsic relationship of the error bounds between a complex aggregate query and its constituent simple queries (i.e., sub-queries), and then we formalized an execution cost model for complex aggregate queries, given the accuracy constraints on the error bounds of all sub-queries. Harnessing the multi-objective optimization genetic algorithm, we refined the error bounds of all sub-queries with moderate values, to achieve a balance of query time and result accuracy for the complex aggregate query. An extensive experimental study on real-world datasets demonstrated our solution’s superiority in effectiveness and efficiency.

Suggested Citation

  • Shuzhan Ye & Xiaoliang Xu & Yuxiang Wang & Tao Fu, 2023. "Efficient Complex Aggregate Queries with Accuracy Guarantee Based on Execution Cost Model over Knowledge Graphs," Mathematics, MDPI, vol. 11(18), pages 1-28, September.
  • Handle: RePEc:gam:jmathe:v:11:y:2023:i:18:p:3908-:d:1239944
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2227-7390/11/18/3908/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2227-7390/11/18/3908/
    Download Restriction: no
    ---><---

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jmathe:v:11:y:2023:i:18:p:3908-:d:1239944. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.