IDEAS home Printed from https://ideas.repec.org/a/spr/jcsosc/v7y2024i2d10.1007_s42001-024-00300-8.html
   My bibliography  Save this article

The AI community building the future? A quantitative analysis of development activity on Hugging Face Hub

Author

Listed:
  • Cailean Osborne

    (University of Oxford)

  • Jennifer Ding

    (The Alan Turing Institute)

  • Hannah Rose Kirk

    (University of Oxford)

Abstract

Open model developers have emerged as key actors in the political economy of artificial intelligence (AI), but we still have a limited understanding of collaborative practices in the open AI ecosystem. This paper responds to this gap with a three-part quantitative analysis of development activity on the Hugging Face (HF) Hub, a popular platform for building, sharing, and demonstrating models. First, various types of activity across 348,181 model, 65,761 dataset, and 156,642 space repositories exhibit right-skewed distributions. Activity is extremely imbalanced between repositories; for example, over 70% of models have 0 downloads, while 1% account for 99% of downloads. Furthermore, licenses matter: there are statistically significant differences in collaboration patterns in model repositories with permissive, restrictive, and no licenses. Second, we analyse a snapshot of the social network structure of collaboration in model repositories, finding that the community has a core-periphery structure, with a core of prolific developers and a majority of isolate developers (89%). Upon removing these isolates from the network, collaboration is characterised by high reciprocity regardless of developers’ network positions. Third, we examine model adoption through the lens of model usage in spaces, finding that a minority of models, developed by a handful of companies, are widely used on the HF Hub. Overall, the findings show that various types of activity across the HF Hub are characterised by Pareto distributions, congruent with open source software development patterns on platforms like GitHub. We conclude with recommendations for researchers, and practitioners to advance our understanding of open AI development.

Suggested Citation

  • Cailean Osborne & Jennifer Ding & Hannah Rose Kirk, 2024. "The AI community building the future? A quantitative analysis of development activity on Hugging Face Hub," Journal of Computational Social Science, Springer, vol. 7(2), pages 2067-2105, October.
  • Handle: RePEc:spr:jcsosc:v:7:y:2024:i:2:d:10.1007_s42001-024-00300-8
    DOI: 10.1007/s42001-024-00300-8
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s42001-024-00300-8
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s42001-024-00300-8?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Fabian Braesemann & Niklas Stoehr & Mark Graham, 2019. "Global networks in collaborative programming," Regional Studies, Regional Science, Taylor & Francis Journals, vol. 6(1), pages 371-373, January.
    2. Josh Lerner & Jean Tirole, 2002. "Some Simple Economics of Open Source," Journal of Industrial Economics, Wiley Blackwell, vol. 50(2), pages 197-234, June.
    3. Yuan Long & Keng Siau, 2007. "Social Network Structures in Open Source Software Development Teams," Journal of Database Management (JDM), IGI Global, vol. 18(2), pages 25-40, April.
    4. Smilkov, Daniel & Kocarev, Ljupco, 2010. "Rich-club and page-club coefficients for directed graphs," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 389(11), pages 2290-2299.
    5. Dahlander, Linus & Wallin, Martin W., 2006. "A man on the inside: Unlocking communities as complementary assets," Research Policy, Elsevier, vol. 35(8), pages 1243-1259, October.
    6. Sonali K. Shah, 2006. "Motivation, Governance, and the Viability of Hybrid Forms in Open Source Software Development," Management Science, INFORMS, vol. 52(7), pages 1000-1014, July.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Matt Germonprez & Julie E. Kendall & Kenneth E. Kendall & Lars Mathiassen & Brett Young & Brian Warner, 2017. "A Theory of Responsive Design: A Field Study of Corporate Engagement with Open Source Communities," Information Systems Research, INFORMS, vol. 28(1), pages 64-83, March.
    2. David M. Waguespack & Lee Fleming, 2009. "Scanning the Commons? Evidence on the Benefits to Startups Participating in Open Standards Development," Management Science, INFORMS, vol. 55(2), pages 210-223, February.
    3. Smirnova, Inna & Reitzig, Markus & Alexy, Oliver, 2022. "What makes the right OSS contributor tick? Treatments to motivate high-skilled developers," Research Policy, Elsevier, vol. 51(1).
    4. Frank Nagle, 2018. "Learning by Contributing: Gaining Competitive Advantage Through Contribution to Crowdsourced Public Goods," Organization Science, INFORMS, vol. 29(4), pages 569-587, August.
    5. Sheen S. Levine & Michael J. Prietula, 2014. "Open Collaboration for Innovation: Principles and Performance," Organization Science, INFORMS, vol. 25(5), pages 1414-1433, October.
    6. Nicolai j. Foss & Lars Frederiksen & Francesco Rullani, 2016. "Problem‐formulation and problem‐solving in self‐organized communities: How modes of communication shape project behaviors in the free open‐source software community," Strategic Management Journal, Wiley Blackwell, vol. 37(13), pages 2589-2610, December.
    7. F. Rullani & L. Zirulia, 2011. "A supply side story for a threshold model: Endogenous growth of the free and open source community," Working Papers wp781, Dipartimento Scienze Economiche, Universita' di Bologna.
    8. De Noni, Ivan & Ganzaroli, Andrea & Orsi, Luigi, 2013. "The evolution of OSS governance: a dimensional comparative analysis," Scandinavian Journal of Management, Elsevier, vol. 29(3), pages 247-263.
    9. Engelhardt, Sebastian v. & Freytag, Andreas, 2013. "Institutions, culture, and open source," Journal of Economic Behavior & Organization, Elsevier, vol. 95(C), pages 90-110.
    10. Liuan Wang & Lu (Lucy) Yan & Tongxin Zhou & Xitong Guo & Gregory R. Heim, 2020. "Understanding Physicians’ Online-Offline Behavior Dynamics: An Empirical Study," Information Systems Research, INFORMS, vol. 31(2), pages 537-555, June.
    11. Alison J. Bianchi & Soong Moon Kang & Daniel Stewart, 2012. "The Organizational Selection of Status Characteristics: Status Evaluations in an Open Source Community," Organization Science, INFORMS, vol. 23(2), pages 341-354, April.
    12. Andrea Fosfuri & Marco S. Giarratana & Alessandra Luzzi, 2008. "The Penguin Has Entered the Building: The Commercialization of Open Source Software Products," Organization Science, INFORMS, vol. 19(2), pages 292-305, April.
    13. Dejean, Sylvain & Jullien, Nicolas, 2015. "Big from the beginning: Assessing online contributors’ behavior by their first contribution," Research Policy, Elsevier, vol. 44(6), pages 1226-1239.
    14. Francesco Rullani, 2006. "Dragging developers towards the core," KITeS Working Papers 190, KITeS, Centre for Knowledge, Internationalization and Technology Studies, Universita' Bocconi, Milano, Italy, revised Feb 2007.
    15. Dongryul Lee & Byung Kim, 2013. "Motivations for Open Source Project Participation and Decisions of Software Developers," Computational Economics, Springer;Society for Computational Economics, vol. 41(1), pages 31-57, January.
    16. Kuk, George & Schaarschmidt, Mario & Homscheid, Dirk, 2024. "All of the same breed? A networking perspective of private-collective innovation," Journal of Business Research, Elsevier, vol. 172(C).
    17. Massimo G. Colombo & Douglas Cumming & Ali Mohammadi & Cristina Rossi-Lamastra & Anu Wadhwa, 2016. "Open business models and venture capital finance," Industrial and Corporate Change, Oxford University Press and the Associazione ICC, vol. 25(2), pages 353-370.
    18. Nicolas Jullien & Jean-Benoît Zimmermann, 2011. "Floss firms, users and communities: a viable match?," Journal of Innovation Economics, De Boeck Université, vol. 0(1), pages 31-53.
    19. Wachs, Johannes & Nitecki, Mariusz & Schueller, William & Polleres, Axel, 2022. "The Geography of Open Source Software: Evidence from GitHub," Technological Forecasting and Social Change, Elsevier, vol. 176(C).
    20. Adrián Kovács & Bart Looy & Bruno Cassiman, 2015. "Exploring the scope of open innovation: a bibliometric review of a decade of research," Scientometrics, Springer;Akadémiai Kiadó, vol. 104(3), pages 951-983, September.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:jcsosc:v:7:y:2024:i:2:d:10.1007_s42001-024-00300-8. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.