IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2403.12108.html
   My bibliography  Save this paper

Does AI help humans make better decisions? A statistical evaluation framework for experimental and observational studies

Author

Listed:
  • Eli Ben-Michael
  • D. James Greiner
  • Melody Huang
  • Kosuke Imai
  • Zhichao Jiang
  • Sooahn Shin

Abstract

The use of Artificial Intelligence (AI), or more generally data-driven algorithms, has become ubiquitous in today's society. Yet, in many cases and especially when stakes are high, humans still make final decisions. The critical question, therefore, is whether AI helps humans make better decisions compared to a human-alone or AI-alone system. We introduce a new methodological framework to empirically answer this question with a minimal set of assumptions. We measure a decision maker's ability to make correct decisions using standard classification metrics based on the baseline potential outcome. We consider a single-blinded and unconfounded treatment assignment, where the provision of AI-generated recommendations is assumed to be randomized across cases with humans making final decisions. Under this study design, we show how to compare the performance of three alternative decision-making systems--human-alone, human-with-AI, and AI-alone. Importantly, the AI-alone system includes any individualized treatment assignment, including those that are not used in the original study. We also show when AI recommendations should be provided to a human-decision maker, and when one should follow such recommendations. We apply the proposed methodology to our own randomized controlled trial evaluating a pretrial risk assessment instrument. We find that the risk assessment recommendations do not improve the classification accuracy of a judge's decision to impose cash bail. Furthermore, we find that replacing a human judge with algorithms--the risk assessment score and a large language model in particular--leads to a worse classification performance.

Suggested Citation

  • Eli Ben-Michael & D. James Greiner & Melody Huang & Kosuke Imai & Zhichao Jiang & Sooahn Shin, 2024. "Does AI help humans make better decisions? A statistical evaluation framework for experimental and observational studies," Papers 2403.12108, arXiv.org, revised Oct 2024.
  • Handle: RePEc:arx:papers:2403.12108
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2403.12108
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Mitchell Hoffman & Lisa B Kahn & Danielle Li, 2018. "Discretion in Hiring," The Quarterly Journal of Economics, President and Fellows of Harvard College, vol. 133(2), pages 765-800.
    2. David Arnold & Will Dobbie & Peter Hull, 2022. "Measuring Racial Discrimination in Bail Decisions," American Economic Review, American Economic Association, vol. 112(9), pages 2992-3038, September.
    3. David Arnold & Will Dobbie & Peter Hull, 2021. "Measuring Racial Discrimination in Algorithms," AEA Papers and Proceedings, American Economic Association, vol. 111, pages 49-54, May.
    4. Will Dobbie & Jacob Goldin & Crystal S. Yang, 2018. "The Effects of Pretrial Detention on Conviction, Future Crime, and Employment: Evidence from Randomly Assigned Judges," American Economic Review, American Economic Association, vol. 108(2), pages 201-240, February.
    5. Victoria Angelova & Will S. Dobbie & Crystal Yang, 2023. "Algorithmic Recommendations and Human Discretion," NBER Working Papers 31747, National Bureau of Economic Research, Inc.
    6. Sharad Goel & Justin M. Rao & Ravi Shroff, 2016. "Personalized Risk Assessments in the Criminal Justice System," American Economic Review, American Economic Association, vol. 106(5), pages 119-123, May.
    7. Jon Kleinberg & Himabindu Lakkaraju & Jure Leskovec & Jens Ludwig & Sendhil Mullainathan, 2018. "Human Decisions and Machine Predictions," The Quarterly Journal of Economics, President and Fellows of Harvard College, vol. 133(1), pages 237-293.
    8. Richard A. Berk & Susan B. Sorenson & Geoffrey Barnes, 2016. "Forecasting Domestic Violence: A Machine Learning Approach to Help Inform Arraignment Decisions," Journal of Empirical Legal Studies, John Wiley & Sons, vol. 13(1), pages 94-115, March.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Joshua Grossman & Julian Nyarko & Sharad Goel, 2023. "Racial bias as a multi‐stage, multi‐actor problem: An analysis of pretrial detention," Journal of Empirical Legal Studies, John Wiley & Sons, vol. 20(1), pages 86-133, March.
    2. Ivan A Canay & Magne Mogstad & Jack Mount, 2024. "On the Use of Outcome Tests for Detecting Bias in Decision Making," The Review of Economic Studies, Review of Economic Studies Ltd, vol. 91(4), pages 2135-2167.
    3. Bharti, Nitin Kumar & Roy, Sutanuka, 2023. "The early origins of judicial stringency in bail decisions: Evidence from early childhood exposure to Hindu-Muslim riots in India," Journal of Public Economics, Elsevier, vol. 221(C).
    4. Danielle Li & Lindsey R. Raymond & Peter Bergman, 2020. "Hiring as Exploration," NBER Working Papers 27736, National Bureau of Economic Research, Inc.
    5. Jens Ludwig & Sendhil Mullainathan, 2021. "Fragile Algorithms and Fallible Decision-Makers: Lessons from the Justice System," Journal of Economic Perspectives, American Economic Association, vol. 35(4), pages 71-96, Fall.
    6. Isil Erel & Léa H Stern & Chenhao Tan & Michael S Weisbach, 2021. "Selecting Directors Using Machine Learning," NBER Chapters, in: Big Data: Long-Term Implications for Financial Markets and Firms, pages 3226-3264, National Bureau of Economic Research, Inc.
    7. Ginther, Donna K. & Heggeness, Misty L., 2020. "Administrative discretion in scientific funding: Evidence from a prestigious postdoctoral training program✰," Research Policy, Elsevier, vol. 49(4).
    8. Nicolás Grau & Damián Vergara, "undated". "A Simple Test for Prejudice in Decision Processes: The Prediction-Based Outcome Test," Working Papers wp493, University of Chile, Department of Economics.
    9. Dario Sansone & Anna Zhu, 2023. "Using Machine Learning to Create an Early Warning System for Welfare Recipients," Oxford Bulletin of Economics and Statistics, Department of Economics, University of Oxford, vol. 85(5), pages 959-992, October.
    10. Xiaochen Hu & Xudong Zhang & Nicholas Lovrich, 2021. "Public perceptions of police behavior during traffic stops: logistic regression and machine learning approaches compared," Journal of Computational Social Science, Springer, vol. 4(1), pages 355-380, May.
    11. Shroff, Ravi & Vamvourellis, Konstantinos, 2022. "Pretrial release judgments and decision fatigue," LSE Research Online Documents on Economics 117579, London School of Economics and Political Science, LSE Library.
    12. Chugunova, Marina & Sele, Daniela, 2022. "We and It: An interdisciplinary review of the experimental evidence on how humans interact with machines," Journal of Behavioral and Experimental Economics (formerly The Journal of Socio-Economics), Elsevier, vol. 99(C).
    13. Stevenson, Megan T. & Doleac, Jennifer, 2019. "Algorithmic Risk Assessment in the Hands of Humans," IZA Discussion Papers 12853, Institute of Labor Economics (IZA).
    14. Hyunjin Kim & Edward L. Glaeser & Andrew Hillis & Scott Duke Kominers & Michael Luca, 2024. "Decision authority and the returns to algorithms," Strategic Management Journal, Wiley Blackwell, vol. 45(4), pages 619-648, April.
    15. David Almog & Romain Gauriot & Lionel Page & Daniel Martin, 2024. "AI Oversight and Human Mistakes: Evidence from Centre Court," Papers 2401.16754, arXiv.org, revised Feb 2024.
    16. Richard Berk, 2019. "Accuracy and Fairness for Juvenile Justice Risk Assessments," Journal of Empirical Legal Studies, John Wiley & Sons, vol. 16(1), pages 175-194, March.
    17. Bauer, Kevin & Gill, Andrej, 2021. "Mirror, mirror on the wall: Machine predictions and self-fulfilling prophecies," SAFE Working Paper Series 313, Leibniz Institute for Financial Research SAFE.
    18. Elliott Ash & Claudia Marangon, 2024. "Judging disparities: Recidivism risk, image motives and in-group bias on Wisconsin criminal courts," Discussion Papers 2024-03, Nottingham Interdisciplinary Centre for Economic and Political Research (NICEP).
    19. Runshan Fu & Ginger Zhe Jin & Meng Liu, 2022. "Does Human-algorithm Feedback Loop Lead to Error Propagation? Evidence from Zillow’s Zestimate," NBER Working Papers 29880, National Bureau of Economic Research, Inc.
    20. Fumagalli, Elena & Rezaei, Sarah & Salomons, Anna, 2022. "OK computer: Worker perceptions of algorithmic recruitment," Research Policy, Elsevier, vol. 51(2).

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2403.12108. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.