IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2411.16662.html
   My bibliography  Save this paper

A Supervised Machine Learning Approach for Assessing Grant Peer Review Reports

Author

Listed:
  • Gabriel Okasa
  • Alberto de Le'on
  • Michaela Strinzel
  • Anne Jorstad
  • Katrin Milzow
  • Matthias Egger
  • Stefan Muller

Abstract

Peer review in grant evaluation informs funding decisions, but the contents of peer review reports are rarely analyzed. In this work, we develop a thoroughly tested pipeline to analyze the texts of grant peer review reports using methods from applied Natural Language Processing (NLP) and machine learning. We start by developing twelve categories reflecting content of grant peer review reports that are of interest to research funders. This is followed by multiple human annotators' iterative annotation of these categories in a novel text corpus of grant peer review reports submitted to the Swiss National Science Foundation. After validating the human annotation, we use the annotated texts to fine-tune pre-trained transformer models to classify these categories at scale, while conducting several robustness and validation checks. Our results show that many categories can be reliably identified by human annotators and machine learning approaches. However, the choice of text classification approach considerably influences the classification performance. We also find a high correspondence between out-of-sample classification performance and human annotators' perceived difficulty in identifying categories. Our results and publicly available fine-tuned transformer models will allow researchers and research funders and anybody interested in peer review to examine and report on the contents of these reports in a structured manner. Ultimately, we hope our approach can contribute to ensuring the quality and trustworthiness of grant peer review.

Suggested Citation

  • Gabriel Okasa & Alberto de Le'on & Michaela Strinzel & Anne Jorstad & Katrin Milzow & Matthias Egger & Stefan Muller, 2024. "A Supervised Machine Learning Approach for Assessing Grant Peer Review Reports," Papers 2411.16662, arXiv.org, revised Dec 2024.
  • Handle: RePEc:arx:papers:2411.16662
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2411.16662
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Benoit, Kenneth & Conway, Drew & Lauderdale, Benjamin E. & Laver, Michael & Mikhaylov, Slava, 2016. "Crowd-sourced Text Analysis: Reproducible and Agile Production of Political Data," American Political Science Review, Cambridge University Press, vol. 110(2), pages 278-295, May.
    2. Tirthankar Ghosal & Sandeep Kumar & Prabhat Kumar Bharti & Asif Ekbal, 2022. "Peer review analyze: A novel benchmark resource for computational analysis of peer reviews," PLOS ONE, Public Library of Science, vol. 17(1), pages 1-29, January.
    3. Sven E. Hug & Mirjam Aeschbach, 2020. "Criteria for assessing grant applications: a systematic review," Palgrave Communications, Palgrave Macmillan, vol. 6(1), pages 1-15, December.
    4. Rachel Heyard & Manuela Ott & Georgia Salanti & Matthias Egger, 2022. "Rethinking the Funding Line at the Swiss National Science Foundation: Bayesian Ranking and Lottery," Statistics and Public Policy, Taylor & Francis Journals, vol. 9(1), pages 110-121, December.
    5. Hren, Darko & Pina, David G. & Norman, Christopher R. & Marušić, Ana, 2022. "What makes or breaks competitive research proposals? A mixed-methods analysis of research grant evaluation reports," Journal of Informetrics, Elsevier, vol. 16(2).
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Martin Haselmayer & Marcelo Jenny, 2017. "Sentiment analysis of political communication: combining a dictionary approach with crowdcoding," Quality & Quantity: International Journal of Methodology, Springer, vol. 51(6), pages 2623-2646, November.
    2. Cindy Cheng & Joan Barceló & Allison Spencer Hartnett & Robert Kubinec & Luca Messerschmidt, 2020. "COVID-19 Government Response Event Dataset (CoronaNet v.1.0)," Nature Human Behaviour, Nature, vol. 4(7), pages 756-768, July.
    3. Lawson, Cornelia & Salter, Ammon, 2023. "Exploring the effect of overlapping institutional applications on panel decision-making," Research Policy, Elsevier, vol. 52(9).
    4. Keren Weinshall & Lee Epstein, 2020. "Developing High‐Quality Data Infrastructure for Legal Analytics: Introducing the Israeli Supreme Court Database," Journal of Empirical Legal Studies, John Wiley & Sons, vol. 17(2), pages 416-434, June.
    5. Elio Amicarelli & Jessica Di Salvatore, 2021. "Introducing the PeaceKeeping Operations Corpus (PKOC)," Journal of Peace Research, Peace Research Institute Oslo, vol. 58(5), pages 1137-1148, September.
    6. Tóth, Tamás & Demeter, Márton & Csuhai, Sándor & Major, Zsolt Balázs, 2024. "When career-boosting is on the line: Equity and inequality in grant evaluation, productivity, and the educational backgrounds of Marie Skłodowska-Curie Actions individual fellows in social sciences an," Journal of Informetrics, Elsevier, vol. 18(2).
    7. Lutz Bornmann & Julian N. Marewski, 2024. "Opium in science and society: numbers and other quantifications," Scientometrics, Springer;Akadémiai Kiadó, vol. 129(9), pages 5313-5346, September.
    8. Christopher J Fariss & James Lo, 2020. "Innovations in concepts and measurement for the study of peace and conflict," Journal of Peace Research, Peace Research Institute Oslo, vol. 57(6), pages 669-678, November.
    9. Buljan, Ivan & Garcia-Costa, Daniel & Grimaldo, Francisco & Klein, Richard A. & Bakker, Marjan & Marušić, Ana, 2024. "Development and application of a comprehensive glossary for the identification of statistical and methodological concepts in peer review reports," Journal of Informetrics, Elsevier, vol. 18(3).
    10. Anton Oleinik, 2024. "A Bayesian index of association: comparison with other measures and performance," Quality & Quantity: International Journal of Methodology, Springer, vol. 58(1), pages 277-305, February.
    11. Mubashir Qasim, 2019. "Sustainability and Wellbeing: A Text Analysis of New Zealand Parliamentary Debates, Official Yearbooks and Ministerial Documents," Working Papers in Economics 19/01, University of Waikato.
    12. Cindy Cheng & Joan Barcelo & Allison Spencer Hartnett & Robert Kubinec & Luca Messerschmidt, 2020. "CoronaNet: A Dyadic Dataset of Government Responses to the COVID-19 Pandemic," Working Papers 20200042, New York University Abu Dhabi, Department of Social Science, revised Apr 2020.
    13. Song Jing & Qingzhao Ma & Siyi Wang & Hanliang Xu & Tian Xu & Xia Guo & Zhuolin Wu, 2024. "Research on developmental evaluation based on the "four abilities" model: evidence from early career researchers in China," Quality & Quantity: International Journal of Methodology, Springer, vol. 58(1), pages 681-704, February.
    14. Andrijana Perković Paloš & Antonija Mijatović & Ivan Buljan & Daniel Garcia-Costa & Elena Álvarez-García & Francisco Grimaldo & Ana Marušić, 2023. "Linguistic and semantic characteristics of articles and peer review reports in Social Sciences and Medical and Health Sciences: analysis of articles published in Open Research Central," Scientometrics, Springer;Akadémiai Kiadó, vol. 128(8), pages 4707-4729, August.
    15. Miriam Sorace, 2018. "The European Union democratic deficit: Substantive representation in the European Parliament at the input stage," European Union Politics, , vol. 19(1), pages 3-24, March.
    16. Joshua Robison & Randy T. Stevenson & James N. Druckman & Simon Jackman & Jonathan N. Katz & Lynn Vavreck, 2018. "An Audit of Political Behavior Research," SAGE Open, , vol. 8(3), pages 21582440187, August.
    17. Atsushi Ueshima & Matthew I. Jones & Nicholas A. Christakis, 2024. "Simple autonomous agents can enhance creative semantic discovery by human groups," Nature Communications, Nature, vol. 15(1), pages 1-11, December.
    18. Sorace, Miriam, 2018. "The European Union democratic deficit: substantive representation in the European Parliament at the input stage," LSE Research Online Documents on Economics 87625, London School of Economics and Political Science, LSE Library.
    19. Xinyan Zhao & Chau-Wai Wong, 2024. "Automated measures of sentiment via transformer- and lexicon-based sentiment analysis (TLSA)," Journal of Computational Social Science, Springer, vol. 7(1), pages 145-170, April.
    20. Wenqing Wu & Haixu Xi & Chengzhi Zhang, 2024. "Are the confidence scores of reviewers consistent with the review content? Evidence from top conference proceedings in AI," Scientometrics, Springer;Akadémiai Kiadó, vol. 129(7), pages 4109-4135, July.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2411.16662. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.