IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2501.19266.html
   My bibliography  Save this paper

Jackpot! Alignment as a Maximal Lottery

Author

Listed:
  • Roberto-Rafael Maura-Rivero
  • Marc Lanctot
  • Francesco Visin
  • Kate Larson

Abstract

Reinforcement Learning from Human Feedback (RLHF), the standard for aligning Large Language Models (LLMs) with human values, is known to fail to satisfy properties that are intuitively desirable, such as respecting the preferences of the majority \cite{ge2024axioms}. To overcome these issues, we propose the use of a probabilistic Social Choice rule called \emph{maximal lotteries} as a replacement for RLHF. We show that a family of alignment techniques, namely Nash Learning from Human Feedback (NLHF) \cite{munos2023nash} and variants, approximate maximal lottery outcomes and thus inherit its beneficial properties. We confirm experimentally that our proposed methodology handles situations that arise when working with preferences more robustly than standard RLHF, including supporting the preferences of the majority, providing principled ways of handling non-transitivities in the preference data, and robustness to irrelevant alternatives. This results in systems that better incorporate human values and respect human intentions.

Suggested Citation

  • Roberto-Rafael Maura-Rivero & Marc Lanctot & Francesco Visin & Kate Larson, 2025. "Jackpot! Alignment as a Maximal Lottery," Papers 2501.19266, arXiv.org.
  • Handle: RePEc:arx:papers:2501.19266
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2501.19266
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Ray, Paramesh, 1973. "Independence of Irrelevant Alternatives," Econometrica, Econometric Society, vol. 41(5), pages 987-991, September.
    2. Florian Brandl & Felix Brandt, 2020. "Arrovian Aggregation of Convex Preferences," Econometrica, Econometric Society, vol. 88(2), pages 799-844, March.
    3. Gibbard, Allan, 1977. "Manipulation of Schemes That Mix Voting with Chance," Econometrica, Econometric Society, vol. 45(3), pages 665-681, April.
    4. Florian Brandl & Felix Brandt & Hans Georg Seedig, 2016. "Consistent Probabilistic Social Choice," Econometrica, Econometric Society, vol. 84, pages 1839-1880, September.
    5. Gibbard, Allan, 1973. "Manipulation of Voting Schemes: A General Result," Econometrica, Econometric Society, vol. 41(4), pages 587-601, July.
    6. Brandl, Florian & Brandt, Felix & Hofbauer, Johannes, 2019. "Welfare maximization entices participation," Games and Economic Behavior, Elsevier, vol. 114(C), pages 308-314.
    7. P. C. Fishburn, 1984. "Probabilistic Social Choice Based on Simple Voting Comparisons," The Review of Economic Studies, Review of Economic Studies Ltd, vol. 51(4), pages 683-692.
    8. Brandl, Florian & Brandt, Felix, 2024. "A natural adaptive process for collective decision-making," Theoretical Economics, Econometric Society, vol. 19(2), May.
    9. Florian Brandl & Felix Brandt, 2021. "A Natural Adaptive Process for Collective Decision-Making," Papers 2103.14351, arXiv.org, revised Mar 2024.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Brandt, Felix & Lederer, Patrick & Suksompong, Warut, 2023. "Incentives in social decision schemes with pairwise comparison preferences," Games and Economic Behavior, Elsevier, vol. 142(C), pages 266-291.
    2. Brandl, Florian & Brandt, Felix, 2024. "A natural adaptive process for collective decision-making," Theoretical Economics, Econometric Society, vol. 19(2), May.
    3. Florian Brandl & Felix Brandt, 2021. "A Natural Adaptive Process for Collective Decision-Making," Papers 2103.14351, arXiv.org, revised Mar 2024.
    4. Aziz, Haris & Brandl, Florian & Brandt, Felix & Brill, Markus, 2018. "On the tradeoff between efficiency and strategyproofness," Games and Economic Behavior, Elsevier, vol. 110(C), pages 1-18.
    5. Florian Brandl & Felix Brandt & Christian Stricker, 2022. "An analytical and experimental comparison of maximal lottery schemes," Social Choice and Welfare, Springer;The Society for Social Choice and Welfare, vol. 58(1), pages 5-38, January.
    6. Brandt, Felix & Saile, Christian & Stricker, Christian, 2022. "Strategyproof social choice when preferences and outcomes may contain ties," Journal of Economic Theory, Elsevier, vol. 202(C).
    7. Chatterji, Shurojit & Zeng, Huaxia, 2018. "On random social choice functions with the tops-only property," Games and Economic Behavior, Elsevier, vol. 109(C), pages 413-435.
    8. Pycia, Marek & Ünver, M. Utku, 2015. "Decomposing random mechanisms," Journal of Mathematical Economics, Elsevier, vol. 61(C), pages 21-33.
    9. Chatterji, Shurojit & Roy, Souvik & Sadhukhan, Soumyarup & Sen, Arunava & Zeng, Huaxia, 2022. "Probabilistic fixed ballot rules and hybrid domains," Journal of Mathematical Economics, Elsevier, vol. 100(C).
    10. Tom Demeulemeester & Dries Goossens & Ben Hermans & Roel Leus, 2023. "Fair integer programming under dichotomous and cardinal preferences," Papers 2306.13383, arXiv.org, revised Apr 2024.
    11. Lê Nguyên Hoang, 2017. "Strategy-proofness of the randomized Condorcet voting system," Social Choice and Welfare, Springer;The Society for Social Choice and Welfare, vol. 48(3), pages 679-701, March.
    12. Federico Echenique & Joseph Root & Fedor Sandomirskiy, 2022. "Efficiency in Random Resource Allocation and Social Choice," Papers 2203.06353, arXiv.org, revised Aug 2022.
    13. Demeulemeester, Tom & Goossens, Dries & Hermans, Ben & Leus, Roel, 2025. "Fair integer programming under dichotomous and cardinal preferences," European Journal of Operational Research, Elsevier, vol. 320(3), pages 465-478.
    14. Peter Fishburn & Steven Brams, 1984. "Manipulability of voting by sincere truncation of preferences," Public Choice, Springer, vol. 44(3), pages 397-410, January.
    15. Felix Brandt & Patrick Lederer & René Romen, 2024. "Relaxed notions of Condorcet-consistency and efficiency for strategyproof social decision schemes," Social Choice and Welfare, Springer;The Society for Social Choice and Welfare, vol. 63(1), pages 19-55, August.
    16. Souvik Roy & Soumyarup Sadhukhan, 2019. "A characterization of random min–max domains and its applications," Economic Theory, Springer;Society for the Advancement of Economic Theory (SAET), vol. 68(4), pages 887-906, November.
    17. Picot, Jérémy & Sen, Arunava, 2012. "An extreme point characterization of random strategy-proof social choice functions: The two alternative case," Economics Letters, Elsevier, vol. 115(1), pages 49-52.
    18. McLennan, Andrew, 2011. "Manipulation in elections with uncertain preferences," Journal of Mathematical Economics, Elsevier, vol. 47(3), pages 370-375.
    19. Felix Brandt & Patrick Lederer, 2024. "Weak Strategyproofness in Randomized Social Choice," Papers 2412.11977, arXiv.org.
    20. Felix Brandt & Patrick Lederer & Warut Suksompong, 2022. "Incentives in Social Decision Schemes with Pairwise Comparison Preferences," Papers 2204.12436, arXiv.org, revised Aug 2024.

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2501.19266. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.