IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0258439.html
   My bibliography  Save this article

A new framework based on features modeling and ensemble learning to predict query performance

Author

Listed:
  • Mohamed Zaghloul
  • Mofreh Salem
  • Amr Ali-Eldin

Abstract

A query optimizer attempts to predict a performance metric based on the amount of time elapsed. Theoretically, this would necessitate the creation of a significant overhead on the core engine to provide the necessary query optimizing statistics. Machine learning is increasingly being used to improve query performance by incorporating regression models. To predict the response time for a query, most query performance approaches rely on DBMS optimizing statistics and the cost estimation of each operator in the query execution plan, which also focuses on resource utilization (CPU, I/O). Modeling query features is thus a critical step in developing a robust query performance prediction model. In this paper, we propose a new framework based on query feature modeling and ensemble learning to predict query performance and use this framework as a query performance predictor simulator to optimize the query features that influence query performance. In query feature modeling, we propose five dimensions used to model query features. The query features dimensions are syntax, hardware, software, data architecture, and historical performance logs. These features will be based on developing training datasets for the performance prediction model that employs the ensemble learning model. As a result, ensemble learning leverages the query performance prediction problem to deal with missing values. Handling overfitting via regularization. The section on experimental work will go over how to use the proposed framework in experimental work. The training dataset in this paper is made up of performance data logs from various real-world environments. The outcomes were compared to show the difference between the actual and expected performance of the proposed prediction model. Empirical work shows the effectiveness of the proposed approach compared to related work.

Suggested Citation

  • Mohamed Zaghloul & Mofreh Salem & Amr Ali-Eldin, 2021. "A new framework based on features modeling and ensemble learning to predict query performance," PLOS ONE, Public Library of Science, vol. 16(10), pages 1-18, October.
  • Handle: RePEc:plo:pone00:0258439
    DOI: 10.1371/journal.pone.0258439
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0258439
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0258439&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0258439?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Fosso Wamba, Samuel & Akter, Shahriar & Edwards, Andrew & Chopin, Geoffrey & Gnanzou, Denis, 2015. "How ‘big data’ can make big impact: Findings from a systematic review and a longitudinal case study," International Journal of Production Economics, Elsevier, vol. 165(C), pages 234-246.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Mariani, Marcello M. & Fosso Wamba, Samuel, 2020. "Exploring how consumer goods companies innovate in the digital age: The role of big data analytics companies," Journal of Business Research, Elsevier, vol. 121(C), pages 338-352.
    2. Małgorzata Pańkowska & Mariusz Żytniewski, 2019. "Modele architektury biznesowej administracji publicznej w warunkach przetwarzania danych masowych," Collegium of Economic Analysis Annals, Warsaw School of Economics, Collegium of Economic Analysis, issue 56, pages 171-183.
    3. Hans-Joachim Schramm & Carolin Nicole Czaja & Michael Dittrich & Matthias Mentschel, 2019. "Current Advancements of and Future Developments for Fourth Party Logistics in a Digital Future," Logistics, MDPI, vol. 3(1), pages 1-17, February.
    4. Zhang, Yi & Huang, Ying & Porter, Alan L. & Zhang, Guangquan & Lu, Jie, 2019. "Discovering and forecasting interactions in big data research: A learning-enhanced bibliometric study," Technological Forecasting and Social Change, Elsevier, vol. 146(C), pages 795-807.
    5. van den Broek, Tijs & van Veenstra, Anne Fleur, 2018. "Governance of big data collaborations: How to balance regulatory compliance and disruptive innovation," Technological Forecasting and Social Change, Elsevier, vol. 129(C), pages 330-338.
    6. de Camargo Fiorini, Paula & Roman Pais Seles, Bruno Michel & Chiappetta Jabbour, Charbel Jose & Barberio Mariano, Enzo & de Sousa Jabbour, Ana Beatriz Lopes, 2018. "Management theory and big data literature: From a review to a research agenda," International Journal of Information Management, Elsevier, vol. 43(C), pages 112-129.
    7. Mohammad Ali Yamin, 2021. "Investigating the Drivers of Supply Chain Resilience in the Wake of the COVID-19 Pandemic: Empirical Evidence from an Emerging Economy," Sustainability, MDPI, vol. 13(21), pages 1-16, October.
    8. Anke Joubert & Matthias Murawski & Markus Bick, 2023. "Measuring the Big Data Readiness of Developing Countries – Index Development and its Application to Africa," Information Systems Frontiers, Springer, vol. 25(1), pages 327-350, February.
    9. Sidney Anderson, 2024. "Expanding data literacy to include data preparation: building a sound marketing analytics foundation," Journal of Marketing Analytics, Palgrave Macmillan, vol. 12(2), pages 227-234, June.
    10. Amankwah-Amoah, Joseph, 2016. "Emerging economies, emerging challenges: Mobilising and capturing value from big data," Technological Forecasting and Social Change, Elsevier, vol. 110(C), pages 167-174.
    11. Maniyassouwe Amana & Pingfeng Liu & Mona Alariqi, 2022. "Value Creation and Capture with Big Data in Smart Phones Companies," Sustainability, MDPI, vol. 14(23), pages 1-22, November.
    12. Li, Ying & Dai, Jing & Cui, Li, 2020. "The impact of digital technologies on economic and environmental performance in the context of industry 4.0: A moderated mediation model," International Journal of Production Economics, Elsevier, vol. 229(C).
    13. Michela Arnaboldi, 2018. "The Missing Variable in Big Data for Social Sciences: The Decision-Maker," Sustainability, MDPI, vol. 10(10), pages 1-18, September.
    14. Pan Liu & Shu-ping Yi, 2018. "Investment decision-making and coordination of a three-stage supply chain considering Data Company in the Big Data era," Annals of Operations Research, Springer, vol. 270(1), pages 255-271, November.
    15. Benjamin T. Hazen & Joseph B. Skipper & Christopher A. Boone & Raymond R. Hill, 2018. "Back in business: operations research in support of big data analytics for operations and supply chain management," Annals of Operations Research, Springer, vol. 270(1), pages 201-211, November.
    16. Linda Zhang & Sara Shafiee, 2022. "Developing separate or integrated configurators? A longitudinal case study," Post-Print hal-03707380, HAL.
    17. Olumide Emmanuel Oluyisola & Fabio Sgarbossa & Jan Ola Strandhagen, 2020. "Smart Production Planning and Control: Concept, Use-Cases and Sustainability Implications," Sustainability, MDPI, vol. 12(9), pages 1-29, May.
    18. Aleš Popovič & Ray Hackney & Rana Tassabehji & Mauro Castelli, 2018. "The impact of big data analytics on firms’ high value business performance," Information Systems Frontiers, Springer, vol. 20(2), pages 209-222, April.
    19. Bin Shen & Hau-Ling Chan, 2017. "Forecast Information Sharing for Managing Supply Chains in the Big Data Era: Recent Development and Future Research," Asia-Pacific Journal of Operational Research (APJOR), World Scientific Publishing Co. Pte. Ltd., vol. 34(01), pages 1-26, February.
    20. Tiago Carneiro & Winnie Ng Picoto & Inês Pinto, 2023. "Big Data Analytics and Firm Performance in the Hotel Sector," Tourism and Hospitality, MDPI, vol. 4(2), pages 1-13, April.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0258439. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.