IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2502.00070.html
   My bibliography  Save this paper

Can AI Solve the Peer Review Crisis? A Large Scale Cross Model Experiment of LLMs' Performance and Biases in Evaluating over 1000 Economics Papers

Author

Listed:
  • Pat Pataranutaporn
  • Nattavudh Powdthavee
  • Chayapatr Achiwaranguprok
  • Pattie Maes

Abstract

This study examines the potential of large language models (LLMs) to augment the academic peer review process by reliably evaluating the quality of economics research without introducing systematic bias. We conduct one of the first large-scale experimental assessments of four LLMs (GPT-4o, Claude 3.5, Gemma 3, and LLaMA 3.3) across two complementary experiments. In the first, we use nonparametric binscatter and linear regression techniques to analyze over 29,000 evaluations of 1,220 anonymized papers drawn from 110 economics journals excluded from the training data of current LLMs, along with a set of AI-generated submissions. The results show that LLMs consistently distinguish between higher- and lower-quality research based solely on textual content, producing quality gradients that closely align with established journal prestige measures. Claude and Gemma perform exceptionally well in capturing these gradients, while GPT excels in detecting AI-generated content. The second experiment comprises 8,910 evaluations designed to assess whether LLMs replicate human like biases in single blind reviews. By systematically varying author gender, institutional affiliation, and academic prominence across 330 papers, we find that GPT, Gemma, and LLaMA assign significantly higher ratings to submissions from top male authors and elite institutions relative to the same papers presented anonymously. These results emphasize the importance of excluding author-identifying information when deploying LLMs in editorial screening. Overall, our findings provide compelling evidence and practical guidance for integrating LLMs into peer review to enhance efficiency, improve accuracy, and promote equity in the publication process of economics research.

Suggested Citation

  • Pat Pataranutaporn & Nattavudh Powdthavee & Chayapatr Achiwaranguprok & Pattie Maes, 2025. "Can AI Solve the Peer Review Crisis? A Large Scale Cross Model Experiment of LLMs' Performance and Biases in Evaluating over 1000 Economics Papers," Papers 2502.00070, arXiv.org, revised Apr 2025.
  • Handle: RePEc:arx:papers:2502.00070
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2502.00070
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. David Card & Stefano DellaVigna & Patricia Funk & Nagore Iriberri, 2020. "Are Referees and Editors in Economics Gender Neutral?," The Quarterly Journal of Economics, President and Fellows of Harvard College, vol. 135(1), pages 269-327.
    2. Joshua Angrist & Pierre Azoulay & Glenn Ellison & Ryan Hill & Susan Feng Lu, 2017. "Economic Research Evolves: Fields and Styles," American Economic Review, American Economic Association, vol. 107(5), pages 293-297, May.
    3. James J. Heckman & Sidharth Moktan, 2020. "Publishing and promotion in economics - The tyranny of the Top Five," Vox eBook Chapters, in: Sebastian Galliani & Ugo Panizza (ed.), Publishing and Measuring Success in Economics, edition 1, volume 1, chapter 1, pages 23-32, Centre for Economic Policy Research.
    4. Alessandro Checco & Lorenzo Bracciale & Pierpaolo Loreti & Stephen Pinfield & Giuseppe Bianchi, 2021. "AI-assisted peer review," Palgrave Communications, Palgrave Macmillan, vol. 8(1), pages 1-11, December.
    5. Glenn Ellison, 2002. "The Slowdown of the Economics Publishing Process," Journal of Political Economy, University of Chicago Press, vol. 110(5), pages 947-993, October.
    6. David Card & Stefano DellaVigna, 2013. "Nine Facts about Top Journals in Economics," Journal of Economic Literature, American Economic Association, vol. 51(1), pages 144-161, March.
    7. Andrew J. Oswald, 2007. "An Examination of the Reliability of Prestigious Scholarly Journals: Evidence and Implications for Decision‐Makers," Economica, London School of Economics and Political Science, vol. 74(293), pages 21-31, February.
    8. Erin Hengel, 2022. "Publishing While Female: are Women Held to Higher Standards? Evidence from Peer Review," The Economic Journal, Royal Economic Society, vol. 132(648), pages 2951-2991.
    9. Marianne Bertrand & Sendhil Mullainathan, 2004. "Are Emily and Greg More Employable Than Lakisha and Jamal? A Field Experiment on Labor Market Discrimination," American Economic Review, American Economic Association, vol. 94(4), pages 991-1013, September.
    10. David Card & Stefano DellaVigna, 2020. "What Do Editors Maximize? Evidence from Four Economics Journals," The Review of Economics and Statistics, MIT Press, vol. 102(1), pages 195-217, March.
    11. Brogaard, Jonathan & Engelberg, Joseph & Parsons, Christopher A., 2014. "Networks and productivity: Causal evidence from editor rotations," Journal of Financial Economics, Elsevier, vol. 111(1), pages 251-270.
    12. Jonathan Brogaard & Joseph E. Engelberg & Sapnoti K. Eswar & Edward D. Van Wesep, 2024. "On the Causal Effect of Fame on Citations," Management Science, INFORMS, vol. 70(10), pages 7187-7214, October.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Pataranutaporn, Pat & Powdthavee, Nattavudh & Maes, Pattie, 2025. "Can AI Solve the Peer Review Crisis? A Large-Scale Experiment on LLM's Performance and Biases in Evaluating Economics Papers," IZA Discussion Papers 17659, Institute of Labor Economics (IZA).
    2. Syed Hasan & Robert Breunig, 2021. "Article length and citation outcomes," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(9), pages 7583-7608, September.
    3. Cloos, Janis & Greiff, Matthias & Rusch, Hannes, 2020. "Geographical Concentration and Editorial Favoritism within the Field of Laboratory Experimental Economics (RM/19/029-revised-)," Research Memorandum 014, Maastricht University, Graduate School of Business and Economics (GSBE).
    4. Christoph Siemroth, 2024. "Economics Peer-Review: Problems, Recent Developments, and Reform Proposals," The American Economist, Sage Publications, vol. 69(2), pages 241-258, October.
    5. Ductor, Lorenzo & Visser, Bauke, 2022. "When a coauthor joins an editorial board," Journal of Economic Behavior & Organization, Elsevier, vol. 200(C), pages 576-595.
    6. Ali Sina Önder & Sergey V. Popov & Sascha Schweitzer, 2021. "Leadership in Scholarship: Editors’ Appointments and the Profession’s Narrative," Working Papers in Economics & Finance 2021-05, University of Portsmouth, Portsmouth Business School, Economics and Finance Subject Group.
    7. Koffi, Marlene, 2021. "Innovative ideas and gender inequality," CLEF Working Paper Series 35, Canadian Labour Economics Forum (CLEF), University of Waterloo.
    8. Önder, Ali Sina & Schweitzer, Sascha & Yilmazkuday, Hakan, 2021. "Specialization, field distance, and quality in economists’ collaborations," Journal of Informetrics, Elsevier, vol. 15(4).
    9. Lawson, Nicholas, 2023. "What citation tests really tell us about bias in academic publishing," European Economic Review, Elsevier, vol. 158(C).
    10. Bruns, Stephan B. & Doucouliagos, Anthony & Doucouliagos, Chris & König, Johannes & Stanley, T. D. & Zigova, Katarina, 2025. "The Delayed Acceptance of Female Research in Economics," IZA Discussion Papers 17649, Institute of Labor Economics (IZA).
    11. Ali Sina Önder & Sascha Schweitzer & Hakan Yilmazkuday, 2021. "Field Distance and Quality in Economists’ Collaborations," Working Papers in Economics & Finance 2021-04, University of Portsmouth, Portsmouth Business School, Economics and Finance Subject Group.
    12. Ann Mari May & Mary G. McGarvey & Yana Rodgers & Mark Killingsworth, 2021. "Critiques, Ethics, Prestige and Status: A Survey of Editors in Economics," Eastern Economic Journal, Palgrave Macmillan;Eastern Economic Association, vol. 47(2), pages 295-318, April.
    13. Peter Andre & Armin Falk, 2021. "What’s Worth Knowing? Economists’ Opinions about Economics," ECONtribute Discussion Papers Series 102, University of Bonn and University of Cologne, Germany.
    14. Jenny Bourne & Nathan Grawe & Nathan D. Grawe & Michael Hemesath & Maya Jensen, 2022. "Scholarly Activity among Economists at Liberal Arts Colleges: A Life Cycle Analysis," Working Papers 2022-01, Carleton College, Department of Economics.
    15. Püttmann, Vitus & Thomsen, Stephan L. & Trunzer, Johannes, 2020. "Zur Relevanz von Ausstattungsunterschieden für Forschungsleistungsvergleiche: Ein Diskussionsbeitrag für die Wirtschaftswissenschaften in Deutschland," Hannover Economic Papers (HEP) dp-679, Leibniz Universität Hannover, Wirtschaftswissenschaftliche Fakultät, revised Mar 2021.
    16. María Victoria Anauati & Sebastian Galiani & Ramiro H. Gálvez, 2020. "Differences In Citation Patterns Across Journal Tiers: The Case Of Economics," Economic Inquiry, Western Economic Association International, vol. 58(3), pages 1217-1232, July.
    17. Bethmann, Dirk & Bransch, Felix & Kvasnicka, Michael & Sadrieh, Abdolkarim, 2023. "Home Bias in Top Economics Journals," IZA Discussion Papers 15965, Institute of Labor Economics (IZA).
    18. Matthias Aistleitner & Stephan Puehringer, 2020. "Exploring the trade (policy) narratives in economic elite discourse," ICAE Working Papers 110, Johannes Kepler University, Institute for Comprehensive Analysis of the Economy.
    19. James J. Heckman & Sidharth Moktan, 2020. "Publishing and Promotion in Economics: The Tyranny of the Top Five," Journal of Economic Literature, American Economic Association, vol. 58(2), pages 419-470, June.
    20. Demeze-Jouatsa, Ghislain-Herman & Pongou, Roland & Tondji, Jean-Baptiste, 2021. "A Free and Fair Economy: A Game of Justice and Inclusion," Center for Mathematical Economics Working Papers 653, Center for Mathematical Economics, Bielefeld University.

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2502.00070. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.