IDEAS home Printed from https://ideas.repec.org/a/gam/jftint/v13y2021i1p9-d473960.html
   My bibliography  Save this article

Using Machine Learning for Web Page Classification in Search Engine Optimization

Author

Listed:
  • Goran Matošević

    (Faculty of Economics and Tourism Dr. Mijo Mirković, University of Pula, 52100 Pula, Croatia)

  • Jasminka Dobša

    (Faculty of Organization and Informatics Varaždin, University of Zagreb, 10000 Zagreb, Croatia)

  • Dunja Mladenić

    (Institute Jozes Stefan Ljubljana, 1000 Ljubljana, Slovenia)

Abstract

This paper presents a novel approach of using machine learning algorithms based on experts’ knowledge to classify web pages into three predefined classes according to the degree of content adjustment to the search engine optimization (SEO) recommendations. In this study, classifiers were built and trained to classify an unknown sample (web page) into one of the three predefined classes and to identify important factors that affect the degree of page adjustment. The data in the training set are manually labeled by domain experts. The experimental results show that machine learning can be used for predicting the degree of adjustment of web pages to the SEO recommendations—classifier accuracy ranges from 54.59% to 69.67%, which is higher than the baseline accuracy of classification of samples in the majority class (48.83%). Practical significance of the proposed approach is in providing the core for building software agents and expert systems to automatically detect web pages, or parts of web pages, that need improvement to comply with the SEO guidelines and, therefore, potentially gain higher rankings by search engines. Also, the results of this study contribute to the field of detecting optimal values of ranking factors that search engines use to rank web pages. Experiments in this paper suggest that important factors to be taken into consideration when preparing a web page are page title, meta description, H1 tag (heading), and body text—which is aligned with the findings of previous research. Another result of this research is a new data set of manually labeled web pages that can be used in further research.

Suggested Citation

  • Goran Matošević & Jasminka Dobša & Dunja Mladenić, 2021. "Using Machine Learning for Web Page Classification in Search Engine Optimization," Future Internet, MDPI, vol. 13(1), pages 1-20, January.
  • Handle: RePEc:gam:jftint:v:13:y:2021:i:1:p:9-:d:473960
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/1999-5903/13/1/9/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/1999-5903/13/1/9/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Cristòfol Rovira & Lluís Codina & Frederic Guerrero-Solé & Carlos Lopezosa, 2019. "Ranking by Relevance and Citation Counts, a Comparative Study: Google Scholar, Microsoft Academic, WoS and Scopus," Future Internet, MDPI, vol. 11(9), pages 1-21, September.
    2. Andreas Giannakoulopoulos & Nikos Konstantinou & Dimitris Koutsompolis & Minas Pergantis & Iraklis Varlamis, 2019. "Academic Excellence, Website Quality, SEO Performance: Is there a Correlation?," Future Internet, MDPI, vol. 11(11), pages 1-25, November.
    3. Christos Ziakis & Maro Vlachopoulou & Theodosios Kyrkoudis & Makrina Karagkiozidou, 2019. "Important Factors for Improving Google Search Rank," Future Internet, MDPI, vol. 11(2), pages 1-12, January.
    4. Lee, Ji-Hyun & Yeh, Wei-Chang & Chuang, Mei-Chi, 2015. "Web page classification based on a simplified swarm optimization," Applied Mathematics and Computation, Elsevier, vol. 270(C), pages 13-24.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Alessandro Massaro & Daniele Giannone & Vitangelo Birardi & Angelo Maurizio Galiano, 2021. "An Innovative Approach for the Evaluation of the Web Page Impact Combining User Experience and Neural Network Score," Future Internet, MDPI, vol. 13(6), pages 1-21, May.
    2. Konstantinos I. Roumeliotis & Nikolaos D. Tselikas & Dimitrios K. Nasiopoulos, 2022. "Airlines’ Sustainability Study Based on Search Engine Optimization Techniques and Technologies," Sustainability, MDPI, vol. 14(18), pages 1-23, September.
    3. Ponzoa, José M. & Gómez, Andrés & Mas, José M., 2023. "EU27 and USA institutions in the digital ecosystem: Proposal for a digital presence measurement index," Journal of Business Research, Elsevier, vol. 154(C).

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Andreas Veglis & Dimitrios Giomelakis, 2019. "Search Engine Optimization," Future Internet, MDPI, vol. 12(1), pages 1-2, December.
    2. Cristòfol Rovira & Lluís Codina & Carlos Lopezosa, 2021. "Language Bias in the Google Scholar Ranking Algorithm," Future Internet, MDPI, vol. 13(2), pages 1-17, January.
    3. Karol Król & Dariusz Zdonek, 2020. "Aggregated Indices in Website Quality Assessment," Future Internet, MDPI, vol. 12(4), pages 1-23, April.
    4. Karol Król & Dariusz Zdonek, 2021. "The Quality of Infectious Disease Hospital Websites in Poland in Light of the COVID-19 Pandemic," IJERPH, MDPI, vol. 18(2), pages 1-19, January.
    5. Artur Strzelecki, 2020. "Google Medical Update: Why Is the Search Engine Decreasing Visibility of Health and Medical Information Websites?," IJERPH, MDPI, vol. 17(4), pages 1-13, February.
    6. Laura Icela González-Pérez & María Soledad Ramírez-Montoya & Francisco José García-Peñalvo, 2021. "Improving Institutional Repositories through User-Centered Design: Indicators from a Focus Group," Future Internet, MDPI, vol. 13(11), pages 1-19, November.
    7. Konstantinos I. Roumeliotis & Nikolaos D. Tselikas & Dimitrios K. Nasiopoulos, 2022. "Airlines’ Sustainability Study Based on Search Engine Optimization Techniques and Technologies," Sustainability, MDPI, vol. 14(18), pages 1-23, September.
    8. Cristòfol Rovira & Lluís Codina & Frederic Guerrero-Solé & Carlos Lopezosa, 2019. "Ranking by Relevance and Citation Counts, a Comparative Study: Google Scholar, Microsoft Academic, WoS and Scopus," Future Internet, MDPI, vol. 11(9), pages 1-21, September.
    9. Wei-Chang Yeh & Yunzhi Jiang & Yee-Fen Chen & Zhe Chen, 2016. "A New Soft Computing Method for K-Harmonic Means Clustering," PLOS ONE, Public Library of Science, vol. 11(11), pages 1-14, November.
    10. Andreas Giannakoulopoulos & Nikos Konstantinou & Dimitris Koutsompolis & Minas Pergantis & Iraklis Varlamis, 2019. "Academic Excellence, Website Quality, SEO Performance: Is there a Correlation?," Future Internet, MDPI, vol. 11(11), pages 1-25, November.
    11. Minos-Athanasios Karyotakis & Evangelos Lamprou & Matina Kiourexidou & Nikos Antonopoulos, 2019. "SEO Practices: A Study about the Way News Websites Allow the Users to Comment on Their News Articles," Future Internet, MDPI, vol. 11(9), pages 1-13, August.
    12. Mariusz Duka & Marek Sikora & Artur Strzelecki, 2023. "From Web Catalogs to Google: A Retrospective Study of Web Search Engines Sustainable Development," Sustainability, MDPI, vol. 15(8), pages 1-16, April.
    13. Tamás Stadler & Ágoston Temesi & Zoltán Lakner, 2022. "Soil Chemical Pollution and Military Actions: A Bibliometric Analysis," Sustainability, MDPI, vol. 14(12), pages 1-17, June.
    14. Ziyun Deng & Tingqin He, 2018. "A Method for Filtering Pages by Similarity Degree based on Dynamic Programming," Future Internet, MDPI, vol. 10(12), pages 1-12, December.
    15. Le, Tran Duc & Le-Dinh, Thang & Uwizeyemungu, Sylvestre, 2024. "Search engine optimization poisoning: A cybersecurity threat analysis and mitigation strategies for small and medium-sized enterprises," Technology in Society, Elsevier, vol. 76(C).
    16. Muhammad Fakruhayat Ab Rashid & Sharifah Rohayah Sheikh Dawood, 2024. "Statistical Evaluation of Webometric Analysis of Tourism Websites in ASEAN Countries," International Journal of Research and Innovation in Social Science, International Journal of Research and Innovation in Social Science (IJRISS), vol. 8(2), pages 2389-2406, February.
    17. Artur Strzelecki, 2019. "Google Web and Image Search Visibility Data for Online Store," Data, MDPI, vol. 4(3), pages 1-10, August.
    18. Sandro Serpa & Maria José Sá & Ana Isabel Santos & Carlos Miguel Ferreira, 2020. "Challenges for the Academic Editor in the Scientific Publication," Academic Journal of Interdisciplinary Studies, Richtmann Publishing Ltd, vol. 9, May.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jftint:v:13:y:2021:i:1:p:9-:d:473960. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.