IDEAS home Printed from https://ideas.repec.org/a/eee/infome/v12y2018i1p1-9.html
   My bibliography  Save this article

Microsoft Academic automatic document searches: Accuracy for journal articles and suitability for citation analysis

Author

Listed:
  • Thelwall, Mike

Abstract

Microsoft Academic is a free academic search engine and citation index that is similar to Google Scholar but can be automatically queried. Its data is potentially useful for bibliometric analysis if it is possible to search effectively for individual journal articles. This article compares different methods to find journal articles in its index by searching for a combination of title, authors, publication year and journal name and uses the results for the widest published correlation analysis of Microsoft Academic citation counts for journal articles so far. Based on 126,312 articles from 323 Scopus subfields in 2012, the optimal strategy to find articles with DOIs is to search for them by title and filter out those with incorrect DOIs. This finds 90% of journal articles. For articles without DOIs, the optimal strategy is to search for them by title and then filter out matches with dissimilar metadata. This finds 89% of journal articles, with an additional 1% incorrect matches. The remaining articles seem to be mainly not indexed by Microsoft Academic or indexed with a different language version of their title. From the matches, Scopus citation counts and Microsoft Academic counts have an average Spearman correlation of 0.95, with the lowest for any single field being 0.63. Thus, Microsoft Academic citation counts are almost universally equivalent to Scopus citation counts for articles that are not recent but there are national biases in the results.

Suggested Citation

  • Thelwall, Mike, 2018. "Microsoft Academic automatic document searches: Accuracy for journal articles and suitability for citation analysis," Journal of Informetrics, Elsevier, vol. 12(1), pages 1-9.
  • Handle: RePEc:eee:infome:v:12:y:2018:i:1:p:1-9
    DOI: 10.1016/j.joi.2017.11.001
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S1751157717303346
    Download Restriction: Full text for ScienceDirect subscribers only

    File URL: https://libkey.io/10.1016/j.joi.2017.11.001?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Emilio Delgado López-Cózar & Nicolás Robinson-García & Daniel Torres-Salinas, 2014. "The Google scholar experiment: How to index false papers and manipulate bibliometric indicators," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 65(3), pages 446-454, March.
    2. Fiorenzo Franceschini & Domenico Maisano & Luca Mastrogiacomo, 2013. "A novel approach for estimating the omitted‐citation rate of bibliometric databases with an application to the field of bibliometrics," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 64(10), pages 2149-2156, October.
    3. Michel Zitt, 2012. "The journal impact factor: angel, devil, or scapegoat? A comment on J.K. Vanclay’s article 2011," Scientometrics, Springer;Akadémiai Kiadó, vol. 92(2), pages 485-503, August.
    4. Fiorenzo Franceschini & Domenico Maisano & Luca Mastrogiacomo, 2015. "Influence of omitted citations on the bibliometric statistics of the major Manufacturing journals," Scientometrics, Springer;Akadémiai Kiadó, vol. 103(3), pages 1083-1122, June.
    5. Fiorenzo Franceschini & Domenico Maisano & Luca Mastrogiacomo, 2016. "Do Scopus and WoS correct “old” omitted citations?," Scientometrics, Springer;Akadémiai Kiadó, vol. 107(2), pages 321-335, May.
    6. Sven E. Hug & Martin P. Brändle, 2017. "The coverage of Microsoft Academic: analyzing the publication output of a university," Scientometrics, Springer;Akadémiai Kiadó, vol. 113(3), pages 1551-1571, December.
    7. Fiorenzo Franceschini & Domenico Maisano & Luca Mastrogiacomo, 2015. "Errors in DOI indexing by bibliometric databases," Scientometrics, Springer;Akadémiai Kiadó, vol. 102(3), pages 2181-2186, March.
    8. Thelwall, Mike, 2017. "Three practical field normalised alternative indicator formulae for research evaluation," Journal of Informetrics, Elsevier, vol. 11(1), pages 128-151.
    9. Thelwall, Mike, 2017. "Microsoft Academic: A multidisciplinary comparison of citation counts with Scopus and Mendeley for 29 journals," Journal of Informetrics, Elsevier, vol. 11(4), pages 1201-1212.
    10. Anne-Wil Harzing, 2016. "Microsoft Academic (Search): a Phoenix arisen from the ashes?," Scientometrics, Springer;Akadémiai Kiadó, vol. 108(3), pages 1637-1647, September.
    11. Franceschini, Fiorenzo & Maisano, Domenico & Mastrogiacomo, Luca, 2016. "The museum of errors/horrors in Scopus," Journal of Informetrics, Elsevier, vol. 10(1), pages 174-182.
    12. Halevi, Gali & Moed, Henk & Bar-Ilan, Judit, 2017. "Suitability of Google Scholar as a source of scientific information and as a source of data for scientific evaluation—Review of the Literature," Journal of Informetrics, Elsevier, vol. 11(3), pages 823-834.
    13. Anne-Wil Harzing & Satu Alakangas, 2017. "Microsoft Academic: is the phoenix getting wings?," Scientometrics, Springer;Akadémiai Kiadó, vol. 110(1), pages 371-383, January.
    14. Fiorenzo Franceschini & Domenico Maisano & Luca Mastrogiacomo, 2013. "A novel approach for estimating the omitted-citation rate of bibliometric databases with an application to the field of bibliometrics," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 64(10), pages 2149-2156, October.
    15. Philippe Mongeon & Adèle Paul-Hus, 2016. "The journal coverage of Web of Science and Scopus: a comparative analysis," Scientometrics, Springer;Akadémiai Kiadó, vol. 106(1), pages 213-228, January.
    16. Franceschini, Fiorenzo & Maisano, Domenico & Mastrogiacomo, Luca, 2014. "Scientific journal publishers and omitted citations in bibliometric databases: Any relationship?," Journal of Informetrics, Elsevier, vol. 8(3), pages 751-765.
    17. Pardeep Sud & Mike Thelwall, 2014. "Evaluating altmetrics," Scientometrics, Springer;Akadémiai Kiadó, vol. 98(2), pages 1131-1143, February.
    18. Waltman, Ludo & van Eck, Nees Jan & van Leeuwen, Thed N. & Visser, Martijn S. & van Raan, Anthony F.J., 2011. "Towards a new crown indicator: Some theoretical considerations," Journal of Informetrics, Elsevier, vol. 5(1), pages 37-47.
    19. Anne-Wil Harzing & Satu Alakangas, 2017. "Microsoft Academic is one year old: the Phoenix is ready to leave the nest," Scientometrics, Springer;Akadémiai Kiadó, vol. 112(3), pages 1887-1894, September.
    20. Franceschini, Fiorenzo & Maisano, Domenico & Mastrogiacomo, Luca, 2016. "Empirical analysis and classification of database errors in Scopus and Web of Science," Journal of Informetrics, Elsevier, vol. 10(4), pages 933-953.
    21. Mike Thelwall, 2016. "Interpreting correlations between citation counts and other indicators," Scientometrics, Springer;Akadémiai Kiadó, vol. 108(1), pages 337-347, July.
    22. Sven E. Hug & Michael Ochsner & Martin P. Brändle, 2017. "Citation analysis with microsoft academic," Scientometrics, Springer;Akadémiai Kiadó, vol. 111(1), pages 371-378, April.
    23. Thelwall, Mike & Fairclough, Ruth, 2015. "Geometric journal impact factors correcting for individual highly cited articles," Journal of Informetrics, Elsevier, vol. 9(2), pages 263-272.
    24. Kayvan Kousha & Mike Thelwall, 2008. "Sources of Google Scholar citations outside the Science Citation Index: A comparison between four science disciplines," Scientometrics, Springer;Akadémiai Kiadó, vol. 74(2), pages 273-294, February.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Iman Tahamtan & Lutz Bornmann, 2019. "What do citation counts measure? An updated review of studies on citations in scientific documents published between 2006 and 2018," Scientometrics, Springer;Akadémiai Kiadó, vol. 121(3), pages 1635-1684, December.
    2. Kaiwen Shi & Kan Liu & Xinyan He, 2024. "Heterogeneous hypergraph learning for literature retrieval based on citation intents," Scientometrics, Springer;Akadémiai Kiadó, vol. 129(7), pages 4167-4188, July.
    3. Kousha, Kayvan & Thelwall, Mike & Abdoli, Mahshid, 2018. "Can Microsoft Academic assess the early citation impact of in-press articles? A multi-discipline exploratory analysis," Journal of Informetrics, Elsevier, vol. 12(1), pages 287-298.
    4. Thelwall, Mike, 2018. "Dimensions: A competitor to Scopus and the Web of Science?," Journal of Informetrics, Elsevier, vol. 12(2), pages 430-435.
    5. Michael Thelwall, 2018. "Can Microsoft Academic be used for citation analysis of preprint archives? The case of the Social Science Research Network," Scientometrics, Springer;Akadémiai Kiadó, vol. 115(2), pages 913-928, May.
    6. Alberto Martín-Martín & Mike Thelwall & Enrique Orduna-Malea & Emilio Delgado López-Cózar, 2021. "Google Scholar, Microsoft Academic, Scopus, Dimensions, Web of Science, and OpenCitations’ COCI: a multidisciplinary comparison of coverage via citations," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(1), pages 871-906, January.
    7. Michael Gusenbauer, 2019. "Google Scholar to overshadow them all? Comparing the sizes of 12 academic search engines and bibliographic databases," Scientometrics, Springer;Akadémiai Kiadó, vol. 118(1), pages 177-214, January.
    8. Anne-Wil Harzing, 2019. "Two new kids on the block: How do Crossref and Dimensions compare with Google Scholar, Microsoft Academic, Scopus and the Web of Science?," Scientometrics, Springer;Akadémiai Kiadó, vol. 120(1), pages 341-349, July.
    9. Kousha, Kayvan & Thelwall, Mike, 2018. "Can Microsoft Academic help to assess the citation impact of academic books?," Journal of Informetrics, Elsevier, vol. 12(3), pages 972-984.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Waltman, Ludo, 2016. "A review of the literature on citation impact indicators," Journal of Informetrics, Elsevier, vol. 10(2), pages 365-391.
    2. Mike Thelwall, 2018. "Does Microsoft Academic find early citations?," Scientometrics, Springer;Akadémiai Kiadó, vol. 114(1), pages 325-334, January.
    3. Michael Thelwall, 2018. "Can Microsoft Academic be used for citation analysis of preprint archives? The case of the Social Science Research Network," Scientometrics, Springer;Akadémiai Kiadó, vol. 115(2), pages 913-928, May.
    4. Mike Thelwall, 2017. "Are Mendeley reader counts useful impact indicators in all fields?," Scientometrics, Springer;Akadémiai Kiadó, vol. 113(3), pages 1721-1731, December.
    5. Shirley Ainsworth & Jane M. Russell, 2018. "Has hosting on science direct improved the visibility of Latin American scholarly journals? A preliminary analysis of data quality," Scientometrics, Springer;Akadémiai Kiadó, vol. 115(3), pages 1463-1484, June.
    6. Kousha, Kayvan & Thelwall, Mike & Abdoli, Mahshid, 2018. "Can Microsoft Academic assess the early citation impact of in-press articles? A multi-discipline exploratory analysis," Journal of Informetrics, Elsevier, vol. 12(1), pages 287-298.
    7. Zhentao Liang & Jin Mao & Kun Lu & Gang Li, 2021. "Finding citations for PubMed: a large-scale comparison between five freely available bibliographic data sources," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(12), pages 9519-9542, December.
    8. Kousha, Kayvan & Thelwall, Mike, 2018. "Can Microsoft Academic help to assess the citation impact of academic books?," Journal of Informetrics, Elsevier, vol. 12(3), pages 972-984.
    9. Raminta Pranckutė, 2021. "Web of Science (WoS) and Scopus: The Titans of Bibliographic Information in Today’s Academic World," Publications, MDPI, vol. 9(1), pages 1-59, March.
    10. Thelwall, Mike, 2018. "Do females create higher impact research? Scopus citations and Mendeley readers for articles from five countries," Journal of Informetrics, Elsevier, vol. 12(4), pages 1031-1041.
    11. Alberto Martín-Martín & Enrique Orduna-Malea & Emilio Delgado López-Cózar, 2018. "Coverage of highly-cited documents in Google Scholar, Web of Science, and Scopus: a multidisciplinary comparison," Scientometrics, Springer;Akadémiai Kiadó, vol. 116(3), pages 2175-2188, September.
    12. Franceschini, Fiorenzo & Maisano, Domenico & Mastrogiacomo, Luca, 2016. "Empirical analysis and classification of database errors in Scopus and Web of Science," Journal of Informetrics, Elsevier, vol. 10(4), pages 933-953.
    13. Thelwall, Mike, 2018. "Dimensions: A competitor to Scopus and the Web of Science?," Journal of Informetrics, Elsevier, vol. 12(2), pages 430-435.
    14. Anne-Wil Harzing, 2019. "Two new kids on the block: How do Crossref and Dimensions compare with Google Scholar, Microsoft Academic, Scopus and the Web of Science?," Scientometrics, Springer;Akadémiai Kiadó, vol. 120(1), pages 341-349, July.
    15. Alberto Martín-Martín & Mike Thelwall & Enrique Orduna-Malea & Emilio Delgado López-Cózar, 2021. "Google Scholar, Microsoft Academic, Scopus, Dimensions, Web of Science, and OpenCitations’ COCI: a multidisciplinary comparison of coverage via citations," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(1), pages 871-906, January.
    16. Martín-Martín, Alberto & Orduna-Malea, Enrique & Thelwall, Mike & Delgado López-Cózar, Emilio, 2018. "Google Scholar, Web of Science, and Scopus: A systematic comparison of citations in 252 subject categories," Journal of Informetrics, Elsevier, vol. 12(4), pages 1160-1177.
    17. Xiancheng Li & Wenge Rong & Haoran Shi & Jie Tang & Zhang Xiong, 2018. "The impact of conference ranking systems in computer science: a comparative regression analysis," Scientometrics, Springer;Akadémiai Kiadó, vol. 116(2), pages 879-907, August.
    18. Dunaiski, Marcel & Geldenhuys, Jaco & Visser, Willem, 2019. "On the interplay between normalisation, bias, and performance of paper impact metrics," Journal of Informetrics, Elsevier, vol. 13(1), pages 270-290.
    19. Mike Thelwall, 2018. "Early Mendeley readers correlate with later citation counts," Scientometrics, Springer;Akadémiai Kiadó, vol. 115(3), pages 1231-1240, June.
    20. Robin Haunschild & Sven E. Hug & Martin P. Brändle & Lutz Bornmann, 2018. "The number of linked references of publications in Microsoft Academic in comparison with the Web of Science," Scientometrics, Springer;Akadémiai Kiadó, vol. 114(1), pages 367-370, January.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:infome:v:12:y:2018:i:1:p:1-9. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/joi .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.