IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2310.07132.html
   My bibliography  Save this paper

Risk Aware Benchmarking of Large Language Models

Author

Listed:
  • Apoorva Nitsure
  • Youssef Mroueh
  • Mattia Rigotti
  • Kristjan Greenewald
  • Brian Belgodere
  • Mikhail Yurochkin
  • Jiri Navratil
  • Igor Melnyk
  • Jerret Ross

Abstract

We propose a distributional framework for benchmarking socio-technical risks of foundation models with quantified statistical significance. Our approach hinges on a new statistical relative testing based on first and second order stochastic dominance of real random variables. We show that the second order statistics in this test are linked to mean-risk models commonly used in econometrics and mathematical finance to balance risk and utility when choosing between alternatives. Using this framework, we formally develop a risk-aware approach for foundation model selection given guardrails quantified by specified metrics. Inspired by portfolio optimization and selection theory in mathematical finance, we define a metrics portfolio for each model as a means to aggregate a collection of metrics, and perform model selection based on the stochastic dominance of these portfolios. The statistical significance of our tests is backed theoretically by an asymptotic analysis via central limit theorems instantiated in practice via a bootstrap variance estimate. We use our framework to compare various large language models regarding risks related to drifting from instructions and outputting toxic content.

Suggested Citation

  • Apoorva Nitsure & Youssef Mroueh & Mattia Rigotti & Kristjan Greenewald & Brian Belgodere & Mikhail Yurochkin & Jiri Navratil & Igor Melnyk & Jerret Ross, 2023. "Risk Aware Benchmarking of Large Language Models," Papers 2310.07132, arXiv.org, revised Jun 2024.
  • Handle: RePEc:arx:papers:2310.07132
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2310.07132
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Larry Y. Tzeng & Rachel J. Huang & Pai-Ta Shih, 2013. "Revisiting Almost Second-Degree Stochastic Dominance," Management Science, INFORMS, vol. 59(5), pages 1250-1254, May.
    2. Moshe Leshno & Haim Levy, 2002. "Preferred by "All" and Preferred by "Most" Decision Makers: Almost Stochastic Dominance," Management Science, INFORMS, vol. 48(8), pages 1074-1085, August.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Tommaso Lando & Lucio Bertoli-Barsotti, 2019. "Distorted stochastic dominance: a generalized family of stochastic orders," Papers 1909.04767, arXiv.org.
    2. Bi, Hongwei & Huang, Rachel J. & Tzeng, Larry Y. & Zhu, Wei, 2019. "Higher-order Omega: A performance index with a decision-theoretic foundation," Journal of Banking & Finance, Elsevier, vol. 100(C), pages 43-57.
    3. Xu, Guo & Wing-Keung, Wong & Lixing, Zhu, 2013. "Almost Stochastic Dominance for Risk-Averse and Risk-Seeking Investors," MPRA Paper 51744, University Library of Munich, Germany.
    4. Chia-Lin Chang & Michael McAleer & Wing-Keung Wong, 2018. "Big Data, Computational Science, Economics, Finance, Marketing, Management, and Psychology: Connections," JRFM, MDPI, vol. 11(1), pages 1-29, March.
    5. Guo, Xu & Wong, Wing-Keung & Zhu, Lixing, 2016. "Almost stochastic dominance for risk averters and risk seeker," Finance Research Letters, Elsevier, vol. 19(C), pages 15-21.
    6. Guo, Xu & Wong, Wing-Keung & Zhu, Lixing, 2013. "Make Almost Stochastic Dominance really Almost," MPRA Paper 49745, University Library of Munich, Germany.
    7. Michel Denuit & Rachel Huang & Larry Tzeng, 2014. "Bivariate almost stochastic dominance," Economic Theory, Springer;Society for the Advancement of Economic Theory (SAET), vol. 57(2), pages 377-405, October.
    8. Lando, Tommaso & Bertoli-Barsotti, Lucio, 2020. "Distorted stochastic dominance: A generalized family of stochastic orders," Journal of Mathematical Economics, Elsevier, vol. 90(C), pages 132-139.
    9. Guo, Xu & Post, Thierry & Wong, Wing-Keung & Zhu, Lixing, 2014. "Moment conditions for Almost Stochastic Dominance," Economics Letters, Elsevier, vol. 124(2), pages 163-167.
    10. Michel Denuit & Rachel Huang & Larry Tzeng, 2015. "Almost expectation and excess dependence notions," Theory and Decision, Springer, vol. 79(3), pages 375-401, November.
    11. Guo, Xu & Wong, Wing-Keung & Zhu, Lixing, 2013. "Almost Stochastic Dominance and Moments," MPRA Paper 49274, University Library of Munich, Germany.
    12. Simon Dietz & Anca N. Matei, 2013. "Is there space for agreement on climate change? A non-parametric approach to policy evaluation," GRI Working Papers 136, Grantham Research Institute on Climate Change and the Environment.
    13. Simon Dietz & Anca N. Matei, 2013. "Spaces for agreement: a theory of Time-Stochastic Dominance," GRI Working Papers 137, Grantham Research Institute on Climate Change and the Environment.
    14. Caporin, Massimiliano & Costola, Michele & Jannin, Gregory & Maillet, Bertrand, 2018. "“On the (Ab)use of Omega?”," Journal of Empirical Finance, Elsevier, vol. 46(C), pages 11-33.
    15. Bruni, Renato & Cesarone, Francesco & Scozzari, Andrea & Tardella, Fabio, 2017. "On exact and approximate stochastic dominance strategies for portfolio selection," European Journal of Operational Research, Elsevier, vol. 259(1), pages 322-329.
    16. Denuit, Michel M. & Huang, Rachel J. & Tzeng, Larry Y. & Wang, Christine W., 2014. "Almost marginal conditional stochastic dominance," Journal of Banking & Finance, Elsevier, vol. 41(C), pages 57-66.
    17. Chia-Lin Chang & Michael McAleer & Wing-Keung Wong, 2016. "Management Science, Economics and Finance: A Connection," Tinbergen Institute Discussion Papers 16-040/III, Tinbergen Institute.
    18. Simon Dietz & Nicoleta Anca Matei, 2016. "Spaces for Agreement: A Theory of Time-Stochastic Dominance and an Application to Climate Change," Journal of the Association of Environmental and Resource Economists, University of Chicago Press, vol. 3(1), pages 85-130.
    19. Guo, Xu & Zhu, Xuehu & Wong, Wing-Keung & Zhu, Lixing, 2013. "A note on almost stochastic dominance," Economics Letters, Elsevier, vol. 121(2), pages 252-256.
    20. Chang, C-L. & McAleer, M.J. & Wong, W.-K., 2018. "Management Information, Decision Sciences, and Financial Economics : a connection," Econometric Institute Research Papers 2018-004/III, Erasmus University Rotterdam, Erasmus School of Economics (ESE), Econometric Institute.

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2310.07132. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.