IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2409.18417.html
   My bibliography  Save this paper

VickreyFeedback: Cost-efficient Data Construction for Reinforcement Learning from Human Feedback

Author

Listed:
  • Guoxi Zhang
  • Jiuding Duan

Abstract

This paper addresses the cost-efficiency aspect of Reinforcement Learning from Human Feedback (RLHF). RLHF leverages datasets of human preferences over outputs of large language models (LLM) to instill human expectations into LLMs. While preference annotation comes with a monetized cost, the economic utility of a preference dataset has not been considered by far. What exacerbates this situation is that given complex intransitive or cyclic relationships in preference datasets, existing algorithms for fine-tuning LLMs are still far from capturing comprehensive preferences. This raises severe cost-efficiency concerns in production environments, where preference data accumulate over time. In this paper, we see the fine-tuning of LLMs as a monetized economy and introduce an auction mechanism to improve the efficiency of the preference data collection in dollar terms. We show that introducing an auction mechanism can play an essential role in enhancing the cost-efficiency of RLHF while maintaining satisfactory model performance. Experimental results demonstrate that our proposed auction-based protocol is cost-efficient for fine-tuning LLMs by concentrating on high-quality feedback.

Suggested Citation

  • Guoxi Zhang & Jiuding Duan, 2024. "VickreyFeedback: Cost-efficient Data Construction for Reinforcement Learning from Human Feedback," Papers 2409.18417, arXiv.org.
  • Handle: RePEc:arx:papers:2409.18417
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2409.18417
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. William Vickrey, 1961. "Counterspeculation, Auctions, And Competitive Sealed Tenders," Journal of Finance, American Finance Association, vol. 16(1), pages 8-37, March.
    2. Matsushima, Hitoshi & Noda, Shunya, 2023. "Mechanism design with general ex-ante investments," Journal of Mathematical Economics, Elsevier, vol. 106(C).
    3. Rachel R. Chen & Robin O. Roundy & Rachel Q. Zhang & Ganesh Janakiraman, 2005. "Efficient Auction Mechanisms for Supply Chain Procurement," Management Science, INFORMS, vol. 51(3), pages 467-482, March.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Axel Ockenfels & David Reiley & Abdolkarim Sadrieh, 2006. "Online Auctions," NBER Working Papers 12785, National Bureau of Economic Research, Inc.
    2. G. Anandalingam & Robert W. Day & S. Raghavan, 2005. "The Landscape of Electronic Market Design," Management Science, INFORMS, vol. 51(3), pages 316-327, March.
    3. AgralI, Semra & Tan, BarIs & Karaesmen, Fikri, 2008. "Modeling and analysis of an auction-based logistics market," European Journal of Operational Research, Elsevier, vol. 191(1), pages 272-294, November.
    4. Rica Gonen & Erel Segal-Halevi, 2019. "Strongly Budget Balanced Auctions for Multi-Sided Markets," Papers 1911.08094, arXiv.org, revised Dec 2019.
    5. Bin Li & Dong Hao & Dengji Zhao & Tao Zhou, 2018. "Customer Sharing in Economic Networks with Costs," Papers 1807.06822, arXiv.org.
    6. Leon Yang Chu & Zuo-Jun Max Shen, 2008. "Truthful Double Auction Mechanisms," Operations Research, INFORMS, vol. 56(1), pages 102-120, February.
    7. Ginger Zhe Jin & Andrew Kato & John A. List, 2010. "That’S News To Me! Information Revelation In Professional Certification Markets," Economic Inquiry, Western Economic Association International, vol. 48(1), pages 104-122, January.
    8. Hongpeng Guo & Zhihao Lv & Junyi Hua & Hongxu Yuan & Qingyu Yu, 2021. "Design of Combined Auction Model for Emission Rights of International Forestry Carbon Sequestration and Other Pollutants Based on SMRA," Sustainability, MDPI, vol. 13(20), pages 1-18, October.
    9. Güth, W., 1997. "Boundedly Rational Decision Emergence -A General Perspective and Some Selective Illustrations-," SFB 373 Discussion Papers 1997,29, Humboldt University of Berlin, Interdisciplinary Research Project 373: Quantification and Simulation of Economic Processes.
    10. Lau, Stephanie, 2011. "Investment incentives in bilateral trading," Games and Economic Behavior, Elsevier, vol. 73(2), pages 538-552.
    11. Paul Pezanis-Christou & Abdolkarim Sadrieh, 2003. "Elicited bid functions in (a)symmetric first-price auctions," Working Papers 85, Barcelona School of Economics.
    12. Ewerhart, Christian & Cassola, Nuno & Valla, Natacha, 2012. "Overbidding in fixed rate tenders: The role of exposure risk," Journal of Banking & Finance, Elsevier, vol. 36(2), pages 539-549.
    13. Scott Fay & Robert Zeithammer, 2017. "Bidding for Bidders? How the Format for Soliciting Supplier Participation in NYOP Auctions Impacts Channel Profit," Management Science, INFORMS, vol. 63(12), pages 4324-4344, December.
    14. Franziska Voelckner, 2006. "An empirical comparison of methods for measuring consumers’ willingness to pay," Marketing Letters, Springer, vol. 17(2), pages 137-149, April.
    15. Tafreshian, Amirmahdi & Masoud, Neda, 2022. "A truthful subsidy scheme for a peer-to-peer ridesharing market with incomplete information," Transportation Research Part B: Methodological, Elsevier, vol. 162(C), pages 130-161.
    16. Bogetoft, Peter & Nielsen, Kurt, 2003. "Yardstick Based Procurement Design In Natural Resource Management," 2003 Annual Meeting, August 16-22, 2003, Durban, South Africa 25910, International Association of Agricultural Economists.
    17. Shunda, Nicholas, 2009. "Auctions with a buy price: The case of reference-dependent preferences," Games and Economic Behavior, Elsevier, vol. 67(2), pages 645-664, November.
    18. Shrestha, Ratna K., 2017. "Menus of price-quantity contracts for inducing the truth in environmental regulation," Journal of Environmental Economics and Management, Elsevier, vol. 83(C), pages 1-7.
    19. Palma, Marco A. & Ness, Meghan L. & Anderson, David P., 2015. "Buying More than Taste? A Latent Class Analysis of Health and Prestige Determinants of Healthy Food," 2015 Conference (59th), February 10-13, 2015, Rotorua, New Zealand 202566, Australian Agricultural and Resource Economics Society.
    20. Scott Duke Kominers & Alexander Teytelboym & Vincent P Crawford, 2017. "An invitation to market design," Oxford Review of Economic Policy, Oxford University Press and Oxford Review of Economic Policy Limited, vol. 33(4), pages 541-571.

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2409.18417. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.