IDEAS home Printed from https://ideas.repec.org/a/spr/infosf/v23y2021i1d10.1007_s10796-020-09995-2.html
   My bibliography  Save this article

Cache-Based Multi-Query Optimization for Data-Intensive Scalable Computing Frameworks

Author

Listed:
  • Pietro Michiardi

    (Eurecom)

  • Damiano Carra

    (University of Verona)

  • Sara Migliorini

    (University of Verona)

Abstract

In modern large-scale distributed systems, analytics jobs submitted by various users often share similar work, for example scanning and processing the same subset of data. Instead of optimizing jobs independently, which may result in redundant and wasteful processing, multi-query optimization techniques can be employed to save a considerable amount of cluster resources. In this work, we introduce a novel method combining in-memory cache primitives and multi-query optimization, to improve the efficiency of data-intensive, scalable computing frameworks. By careful selection and exploitation of common (sub)expressions, while satisfying memory constraints, our method transforms a batch of queries into a new, more efficient one which avoids unnecessary recomputations. To find feasible and efficient execution plans, our method uses a cost-based optimization formulation akin to the multiple-choice knapsack problem. Extensive experiments on a prototype implementation of our system show significant benefits of worksharing for both TPC-DS workloads and detailed micro-benchmarks.

Suggested Citation

  • Pietro Michiardi & Damiano Carra & Sara Migliorini, 2021. "Cache-Based Multi-Query Optimization for Data-Intensive Scalable Computing Frameworks," Information Systems Frontiers, Springer, vol. 23(1), pages 35-51, February.
  • Handle: RePEc:spr:infosf:v:23:y:2021:i:1:d:10.1007_s10796-020-09995-2
    DOI: 10.1007/s10796-020-09995-2
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s10796-020-09995-2
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s10796-020-09995-2?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Prabhakant Sinha & Andris A. Zoltners, 1979. "The Multiple-Choice Knapsack Problem," Operations Research, INFORMS, vol. 27(3), pages 503-515, June.
    2. Chao Zhu & Qiang Zhu & Calisto Zuzarte & Wenbin Ma, 2016. "Optimization of generic progressive queries based on dependency analysis and materialized views," Information Systems Frontiers, Springer, vol. 18(1), pages 205-231, February.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Melachrinoudis, Emanuel & Kozanidis, George, 2002. "A mixed integer knapsack model for allocating funds to highway safety improvements," Transportation Research Part A: Policy and Practice, Elsevier, vol. 36(9), pages 789-803, November.
    2. Andris A. Zoltners & Prabhakant Sinha, 2005. "The 2004 ISMS Practice Prize Winner—Sales Territory Design: Thirty Years of Modeling and Implementation," Marketing Science, INFORMS, vol. 24(3), pages 313-331, September.
    3. Yuji Nakagawa & Ross J. W. James & César Rego & Chanaka Edirisinghe, 2014. "Entropy-Based Optimization of Nonlinear Separable Discrete Decision Models," Management Science, INFORMS, vol. 60(3), pages 695-707, March.
    4. Francis, Peter & Zhang, Guangming & Smilowitz, Karen, 2007. "Improved modeling and solution methods for the multi-resource routing problem," European Journal of Operational Research, Elsevier, vol. 180(3), pages 1045-1059, August.
    5. Morton, Alec, 2014. "Aversion to health inequalities in healthcare prioritisation: A multicriteria optimisation perspective," Journal of Health Economics, Elsevier, vol. 36(C), pages 164-173.
    6. Tue R. L. Christensen & Kim Allan Andersen & Andreas Klose, 2013. "Solving the Single-Sink, Fixed-Charge, Multiple-Choice Transportation Problem by Dynamic Programming," Transportation Science, INFORMS, vol. 47(3), pages 428-438, August.
    7. Bagchi, Ansuman & Bhattacharyya, Nalinaksha & Chakravarti, Nilotpal, 1996. "LP relaxation of the two dimensional knapsack problem with box and GUB constraints," European Journal of Operational Research, Elsevier, vol. 89(3), pages 609-617, March.
    8. Wilbaut, Christophe & Todosijevic, Raca & Hanafi, Saïd & Fréville, Arnaud, 2023. "Heuristic and exact reduction procedures to solve the discounted 0–1 knapsack problem," European Journal of Operational Research, Elsevier, vol. 304(3), pages 901-911.
    9. Dauzère-Pérès, Stéphane & Hassoun, Michael, 2020. "On the importance of variability when managing metrology capacity," European Journal of Operational Research, Elsevier, vol. 282(1), pages 267-276.
    10. Johnston, Robert E. & Khan, Lutfar R., 1995. "Bounds for nested knapsack problems," European Journal of Operational Research, Elsevier, vol. 81(1), pages 154-165, February.
    11. Gasparini, Gaia & Brunelli, Matteo & Chiriac, Marius Dan, 2022. "Multi-period portfolio decision analysis: A case study in the infrastructure management sector," Operations Research Perspectives, Elsevier, vol. 9(C).
    12. Silvio Alexandre de Araujo & Bert De Reyck & Zeger Degraeve & Ioannis Fragkos & Raf Jans, 2015. "Period Decompositions for the Capacitated Lot Sizing Problem with Setup Times," INFORMS Journal on Computing, INFORMS, vol. 27(3), pages 431-448, August.
    13. Tsesmetzis, Dimitrios & Roussaki, Ioanna & Sykas, Efstathios, 2008. "QoS-aware service evaluation and selection," European Journal of Operational Research, Elsevier, vol. 191(3), pages 1101-1112, December.
    14. Sung, C. S. & Cho, Y. K., 2000. "Reliability optimization of a series system with multiple-choice and budget constraints," European Journal of Operational Research, Elsevier, vol. 127(1), pages 159-171, November.
    15. Vijay Aggarwal & Narsingh Deo & Dilip Sarkar, 1992. "The knapsack problem with disjoint multiple‐choice constraints," Naval Research Logistics (NRL), John Wiley & Sons, vol. 39(2), pages 213-227, March.
    16. Jacob B. Feldman & Huseyin Topaloglu, 2015. "Capacity Constraints Across Nests in Assortment Optimization Under the Nested Logit Model," Operations Research, INFORMS, vol. 63(4), pages 812-822, August.
    17. Orlin, J. B., 1984. "Some Very Easy Knapsack/Partition Problems," Econometric Institute Archives 272288, Erasmus University Rotterdam.
    18. Ewa M. Bednarczuk & Janusz Miroforidis & Przemysław Pyzel, 2018. "A multi-criteria approach to approximate solution of multiple-choice knapsack problem," Computational Optimization and Applications, Springer, vol. 70(3), pages 889-910, July.
    19. Pisinger, David, 2001. "Budgeting with bounded multiple-choice constraints," European Journal of Operational Research, Elsevier, vol. 129(3), pages 471-480, March.
    20. Tobin, Roger L., 2002. "Relief period optimization under budget constraints," European Journal of Operational Research, Elsevier, vol. 139(1), pages 42-61, May.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:infosf:v:23:y:2021:i:1:d:10.1007_s10796-020-09995-2. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.