IDEAS home Printed from https://ideas.repec.org/a/spr/scient/v107y2016i2d10.1007_s11192-016-1863-z.html
   My bibliography  Save this article

Estimating search engine index size variability: a 9-year longitudinal study

Author

Listed:
  • Antal Bosch

    (Radboud University)

  • Toine Bogers

    (Aalborg University Copenhagen)

  • Maurice Kunder

    (De Kunder Internet Media)

Abstract

One of the determining factors of the quality of Web search engines is the size of their index. In addition to its influence on search result quality, the size of the indexed Web can also tell us something about which parts of the WWW are directly accessible to the everyday user. We propose a novel method of estimating the size of a Web search engine’s index by extrapolating from document frequencies of words observed in a large static corpus of Web pages. In addition, we provide a unique longitudinal perspective on the size of Google and Bing’s indices over a nine-year period, from March 2006 until January 2015. We find that index size estimates of these two search engines tend to vary dramatically over time, with Google generally possessing a larger index than Bing. This result raises doubts about the reliability of previous one-off estimates of the size of the indexed Web. We find that much, if not all of this variability can be explained by changes in the indexing and ranking infrastructure of Google and Bing. This casts further doubt on whether Web search engines can be used reliably for cross-sectional webometric studies.

Suggested Citation

  • Antal Bosch & Toine Bogers & Maurice Kunder, 2016. "Estimating search engine index size variability: a 9-year longitudinal study," Scientometrics, Springer;Akadémiai Kiadó, vol. 107(2), pages 839-856, May.
  • Handle: RePEc:spr:scient:v:107:y:2016:i:2:d:10.1007_s11192-016-1863-z
    DOI: 10.1007/s11192-016-1863-z
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s11192-016-1863-z
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s11192-016-1863-z?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Mike Thelwall, 2008. "Quantitative comparisons of search engine results," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 59(11), pages 1702-1710, September.
    2. Thelwall, Mike & Sud, Pardeep, 2012. "Webometric research with the Bing Search API 2.0," Journal of Informetrics, Elsevier, vol. 6(1), pages 44-52.
    3. Steve Lawrence & C. Lee Giles, 1999. "Accessibility of information on the web," Nature, Nature, vol. 400(6740), pages 107-107, July.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. M. E. Bontempi & M. Frigeri & R. Golinelli & M. Squadrani, 2019. "Uncertainty, Perception and the Internet," Working Papers wp1134, Dipartimento Scienze Economiche, Universita' di Bologna.
    2. Isidro F. Aguillo, 2020. "Altmetrics of the Open Access Institutional Repositories: a webometrics approach," Scientometrics, Springer;Akadémiai Kiadó, vol. 123(3), pages 1181-1192, June.
    3. Amalia Mas-Bleda & Mike Thelwall, 2016. "Can alternative indicators overcome language biases in citation counts? A comparison of Spanish and UK research," Scientometrics, Springer;Akadémiai Kiadó, vol. 109(3), pages 2007-2030, December.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Judit Bar-Ilan & Rina Azoulay, 2012. "Map of nonprofit organization websites in Israel," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 63(6), pages 1142-1167, June.
    2. Gandal, Neil, 2001. "The dynamics of competition in the internet search engine market," International Journal of Industrial Organization, Elsevier, vol. 19(7), pages 1103-1117, July.
    3. Thelwall, Mike & Sud, Pardeep, 2012. "Webometric research with the Bing Search API 2.0," Journal of Informetrics, Elsevier, vol. 6(1), pages 44-52.
    4. Eric T. Bradlow & David C. Schmittlein, 2000. "The Little Engines That Could: Modeling the Performance of World Wide Web Search Engines," Marketing Science, INFORMS, vol. 19(1), pages 43-62, June.
    5. Will Serrano, 2018. "Neural Networks in Big Data and Web Search," Data, MDPI, vol. 4(1), pages 1-41, December.
    6. Judit Bar-Ilan, 2018. "Eugene Garfield on the Web in 2001," Scientometrics, Springer;Akadémiai Kiadó, vol. 114(2), pages 389-399, February.
    7. Judit Bar-Ilan, 2001. "Data collection methods on the Web for infometric purposes — A review and analysis," Scientometrics, Springer;Akadémiai Kiadó, vol. 50(1), pages 7-32, January.
    8. Shakina, Elena & Parshakov, Petr & Alsufiev, Artem, 2021. "Rethinking the corporate digital divide: The complementarity of technologies and the demand for digital skills," Technological Forecasting and Social Change, Elsevier, vol. 162(C).
    9. Valentina Della Corte & Giovanna Del Gaudio & Fabiana Sepe & Fabiana Sciarelli, 2019. "Sustainable Tourism in the Open Innovation Realm: A Bibliometric Analysis," Sustainability, MDPI, vol. 11(21), pages 1-18, November.
    10. Han Park, 2012. "Examining academic Internet use using a combined method," Quality & Quantity: International Journal of Methodology, Springer, vol. 46(1), pages 251-266, January.
    11. Rangaswamy, Arvind & Giles, C. Lee & Seres, Silvija, 2009. "A Strategic Perspective on Search Engines: Thought Candies for Practitioners and Researchers," Journal of Interactive Marketing, Elsevier, vol. 23(1), pages 49-60.
    12. Muhammad Omar & Arif Mehmood & Gyu Sang Choi & Han Woo Park, 2017. "Global mapping of artificial intelligence in Google and Google Scholar," Scientometrics, Springer;Akadémiai Kiadó, vol. 113(3), pages 1269-1305, December.
    13. Amancio, Diego R. & Oliveira Jr., Osvaldo N. & Costa, Luciano da F., 2012. "Structure–semantics interplay in complex networks and its effects on the predictability of similarity in texts," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 391(18), pages 4406-4419.
    14. Amalia Mas-Bleda & Mike Thelwall, 2016. "Can alternative indicators overcome language biases in citation counts? A comparison of Spanish and UK research," Scientometrics, Springer;Akadémiai Kiadó, vol. 109(3), pages 2007-2030, December.
    15. Pardeep Sud & Mike Thelwall, 2014. "Linked title mentions: a new automated link search candidate," Scientometrics, Springer;Akadémiai Kiadó, vol. 101(3), pages 1831-1849, December.
    16. Blazewicz, Jacek & Pesch, Erwin & Sterna, Malgorzata, 2005. "A novel representation of graph structures in web mining and data analysis," Omega, Elsevier, vol. 33(1), pages 65-71, February.
    17. García-Gallego Aurora & Georgantzís Nikolaos & Pereira Pedro & Pernías-Cerrillo José C., 2016. "Bias and Size Effects of Price-Comparison Platforms: Theory and Experimental Evidence," Review of Network Economics, De Gruyter, vol. 15(1), pages 1-34, March.
    18. Junwei Ma & Jianhua Wang & Philip Szmedra, 2019. "Sustainable Competitive Position of Mobile Communication Companies: Comprehensive Perspectives of Insiders and Outsiders," Sustainability, MDPI, vol. 11(7), pages 1-15, April.
    19. Gao, Yuyang & Liang, Wei & Shi, Yuming & Huang, Qiuling, 2014. "Comparison of directed and weighted co-occurrence networks of six languages," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 393(C), pages 579-589.
    20. Liang, Wei & Shi, Yuming & Tse, Chi K. & Liu, Jing & Wang, Yanli & Cui, Xunqiang, 2009. "Comparison of co-occurrence networks of the Chinese and English languages," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 388(23), pages 4901-4909.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:scient:v:107:y:2016:i:2:d:10.1007_s11192-016-1863-z. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.