IDEAS home Printed from https://ideas.repec.org/a/spr/scient/v107y2016i2d10.1007_s11192-016-1863-z.html
   My bibliography  Save this article

Estimating search engine index size variability: a 9-year longitudinal study

Author

Listed:
  • Antal Bosch

    (Radboud University)

  • Toine Bogers

    (Aalborg University Copenhagen)

  • Maurice Kunder

    (De Kunder Internet Media)

Abstract

One of the determining factors of the quality of Web search engines is the size of their index. In addition to its influence on search result quality, the size of the indexed Web can also tell us something about which parts of the WWW are directly accessible to the everyday user. We propose a novel method of estimating the size of a Web search engine’s index by extrapolating from document frequencies of words observed in a large static corpus of Web pages. In addition, we provide a unique longitudinal perspective on the size of Google and Bing’s indices over a nine-year period, from March 2006 until January 2015. We find that index size estimates of these two search engines tend to vary dramatically over time, with Google generally possessing a larger index than Bing. This result raises doubts about the reliability of previous one-off estimates of the size of the indexed Web. We find that much, if not all of this variability can be explained by changes in the indexing and ranking infrastructure of Google and Bing. This casts further doubt on whether Web search engines can be used reliably for cross-sectional webometric studies.

Suggested Citation

  • Antal Bosch & Toine Bogers & Maurice Kunder, 2016. "Estimating search engine index size variability: a 9-year longitudinal study," Scientometrics, Springer;Akadémiai Kiadó, vol. 107(2), pages 839-856, May.
  • Handle: RePEc:spr:scient:v:107:y:2016:i:2:d:10.1007_s11192-016-1863-z
    DOI: 10.1007/s11192-016-1863-z
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s11192-016-1863-z
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s11192-016-1863-z?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Steve Lawrence & C. Lee Giles, 1999. "Accessibility of information on the web," Nature, Nature, vol. 400(6740), pages 107-107, July.
    2. Mike Thelwall, 2008. "Quantitative comparisons of search engine results," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 59(11), pages 1702-1710, September.
    3. Thelwall, Mike & Sud, Pardeep, 2012. "Webometric research with the Bing Search API 2.0," Journal of Informetrics, Elsevier, vol. 6(1), pages 44-52.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. M. E. Bontempi & M. Frigeri & R. Golinelli & M. Squadrani, 2019. "Uncertainty, Perception and the Internet," Working Papers wp1134, Dipartimento Scienze Economiche, Universita' di Bologna.
    2. Isidro F. Aguillo, 2020. "Altmetrics of the Open Access Institutional Repositories: a webometrics approach," Scientometrics, Springer;Akadémiai Kiadó, vol. 123(3), pages 1181-1192, June.
    3. Amalia Mas-Bleda & Mike Thelwall, 2016. "Can alternative indicators overcome language biases in citation counts? A comparison of Spanish and UK research," Scientometrics, Springer;Akadémiai Kiadó, vol. 109(3), pages 2007-2030, December.
    4. Enrique Orduña-Malea & Cristina I. Font-Julián & Jorge Serrano-Cobos, 2024. "Open access publications drive few visits from Google Search results to institutional repositories," Scientometrics, Springer;Akadémiai Kiadó, vol. 129(11), pages 7131-7152, November.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Judit Bar-Ilan & Rina Azoulay, 2012. "Map of nonprofit organization websites in Israel," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 63(6), pages 1142-1167, June.
    2. Gandal, Neil, 2001. "The dynamics of competition in the internet search engine market," International Journal of Industrial Organization, Elsevier, vol. 19(7), pages 1103-1117, July.
    3. Thelwall, Mike & Sud, Pardeep, 2012. "Webometric research with the Bing Search API 2.0," Journal of Informetrics, Elsevier, vol. 6(1), pages 44-52.
    4. Eric T. Bradlow & David C. Schmittlein, 2000. "The Little Engines That Could: Modeling the Performance of World Wide Web Search Engines," Marketing Science, INFORMS, vol. 19(1), pages 43-62, June.
    5. Judit Bar-Ilan, 2001. "Data collection methods on the Web for infometric purposes — A review and analysis," Scientometrics, Springer;Akadémiai Kiadó, vol. 50(1), pages 7-32, January.
    6. Shakina, Elena & Parshakov, Petr & Alsufiev, Artem, 2021. "Rethinking the corporate digital divide: The complementarity of technologies and the demand for digital skills," Technological Forecasting and Social Change, Elsevier, vol. 162(C).
    7. Rangaswamy, Arvind & Giles, C. Lee & Seres, Silvija, 2009. "A Strategic Perspective on Search Engines: Thought Candies for Practitioners and Researchers," Journal of Interactive Marketing, Elsevier, vol. 23(1), pages 49-60.
    8. Muhammad Omar & Arif Mehmood & Gyu Sang Choi & Han Woo Park, 2017. "Global mapping of artificial intelligence in Google and Google Scholar," Scientometrics, Springer;Akadémiai Kiadó, vol. 113(3), pages 1269-1305, December.
    9. Xu, Yongxin & Xuan, Yuhao & Zheng, Gaoping, 2021. "Internet searching and stock price crash risk: Evidence from a quasi-natural experiment," Journal of Financial Economics, Elsevier, vol. 141(1), pages 255-275.
    10. Carbone, Anna & Jensen, Meiko & Sato, Aki-Hiro, 2016. "Challenges in data science: a complex systems perspective," Chaos, Solitons & Fractals, Elsevier, vol. 90(C), pages 1-7.
    11. Enrique Orduna-Malea & Juan M. Ayllón & Alberto Martín-Martín & Emilio Delgado López-Cózar, 2015. "Methods for estimating the size of Google Scholar," Scientometrics, Springer;Akadémiai Kiadó, vol. 104(3), pages 931-949, September.
    12. Juan Feng & Hemant K. Bhargava & David M. Pennock, 2007. "Implementing Sponsored Search in Web Search Engines: Computational Evaluation of Alternative Mechanisms," INFORMS Journal on Computing, INFORMS, vol. 19(1), pages 137-148, February.
    13. Srijana Acharya & Han Woo Park, 2017. "Open data in Nepal: a webometric network analysis," Quality & Quantity: International Journal of Methodology, Springer, vol. 51(3), pages 1027-1043, May.
    14. James S. Dietz & Ivan Chompalov & Barry Bozeman & Eliesh O'Neil Lane & Jongwon Park, 2000. "Using the Curriculum Vita to Study the Career Paths of Scientists and Engineers: An Exploratory Assessment," Scientometrics, Springer;Akadémiai Kiadó, vol. 49(3), pages 419-442, November.
    15. repec:pri:cpanda:wp09%20-%20introna%2bnissenbaum is not listed on IDEAS
    16. Ping Liu & Qiong Wu & Xiangming Mu & Kaipeng Yu & Yiting Guo, 2015. "Detecting the intellectual structure of library and information science based on formal concept analysis," Scientometrics, Springer;Akadémiai Kiadó, vol. 104(3), pages 737-762, September.
    17. Paul Thomas, 2012. "To what problem is distributed information retrieval the solution?," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 63(7), pages 1471-1476, July.
    18. Will Serrano, 2018. "Neural Networks in Big Data and Web Search," Data, MDPI, vol. 4(1), pages 1-41, December.
    19. Judit Bar-Ilan, 2018. "Eugene Garfield on the Web in 2001," Scientometrics, Springer;Akadémiai Kiadó, vol. 114(2), pages 389-399, February.
    20. Valentina Della Corte & Giovanna Del Gaudio & Fabiana Sepe & Fabiana Sciarelli, 2019. "Sustainable Tourism in the Open Innovation Realm: A Bibliometric Analysis," Sustainability, MDPI, vol. 11(21), pages 1-18, November.
    21. Han Park, 2012. "Examining academic Internet use using a combined method," Quality & Quantity: International Journal of Methodology, Springer, vol. 46(1), pages 251-266, January.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:scient:v:107:y:2016:i:2:d:10.1007_s11192-016-1863-z. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.