IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0254034.html
   My bibliography  Save this article

Measuring novelty in science with word embedding

Author

Listed:
  • Sotaro Shibayama
  • Deyun Yin
  • Kuniko Matsumoto

Abstract

Novelty is a core value in science, and a reliable measurement of novelty is crucial. This study proposes a new approach of measuring the novelty of scientific articles based on both citation data and text data. The proposed approach considers an article to be novel if it cites a combination of semantically distant references. To this end, we first assign a word embedding–a vector representation of each vocabulary–to each cited reference on the basis of text information included in the reference. With these vectors, a distance between every pair of references is computed. Finally, the novelty of a focal document is evaluated by summarizing the distances between all references. The approach draws on limited text information (the titles of references) and publicly shared library for word embeddings, which minimizes the requirement of data access and computational cost. We share the code, with which one can compute the novelty score of a document of interest only by having the focal document’s reference list. We validate the proposed measure through three exercises. First, we confirm that word embeddings can be used to quantify semantic distances between documents by comparing with an established bibliometric distance measure. Second, we confirm the criterion-related validity of the proposed novelty measure with self-reported novelty scores collected from a questionnaire survey. Finally, as novelty is known to be correlated with future citation impact, we confirm that the proposed measure can predict future citation.

Suggested Citation

  • Sotaro Shibayama & Deyun Yin & Kuniko Matsumoto, 2021. "Measuring novelty in science with word embedding," PLOS ONE, Public Library of Science, vol. 16(7), pages 1-16, July.
  • Handle: RePEc:plo:pone00:0254034
    DOI: 10.1371/journal.pone.0254034
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0254034
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0254034&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0254034?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Matthew Gentzkow & Bryan T. Kelly & Matt Taddy, 2017. "Text as Data," NBER Working Papers 23276, National Bureau of Economic Research, Inc.
    2. Pierre Azoulay & Joshua S. Graff Zivin & Gustavo Manso, 2011. "Incentives and creativity: evidence from the academic life sciences," RAND Journal of Economics, RAND Corporation, vol. 42(3), pages 527-554, September.
    3. Wang, Jian & Veugelers, Reinhilde & Stephan, Paula, 2017. "Bias against novelty in science: A cautionary tale for users of bibliometric indicators," Research Policy, Elsevier, vol. 46(8), pages 1416-1436.
    4. Yan Yan & Shanwu Tian & Jingjing Zhang, 2020. "The impact of a paper’s new combinations and new components on its citation," Scientometrics, Springer;Akadémiai Kiadó, vol. 122(2), pages 895-913, February.
    5. Bornmann, Lutz & Tekles, Alexander & Zhang, Helena H. & Ye, Fred Y., 2019. "Do we measure novelty when we analyze unusual combinations of cited references? A validation study of bibliometric novelty indicators based on F1000Prime data," Journal of Informetrics, Elsevier, vol. 13(4).
    6. Verhoeven, Dennis & Bakker, Jurriën & Veugelers, Reinhilde, 2016. "Measuring technological novelty with patent-based indicators," Research Policy, Elsevier, vol. 45(3), pages 707-723.
    7. Trapido, Denis, 2015. "How novelty in knowledge earns recognition: The role of consistent identities," Research Policy, Elsevier, vol. 44(8), pages 1488-1500.
    8. Stephan, Paula E., 2010. "The Economics of Science," Handbook of the Economics of Innovation, in: Bronwyn H. Hall & Nathan Rosenberg (ed.), Handbook of the Economics of Innovation, edition 1, volume 1, chapter 0, pages 217-273, Elsevier.
    9. Dahlin, Kristina B. & Behrens, Dean M., 2005. "When is an invention really radical?: Defining and measuring technological radicalness," Research Policy, Elsevier, vol. 34(5), pages 717-737, June.
    10. Vahe Tshitoyan & John Dagdelen & Leigh Weston & Alexander Dunn & Ziqin Rong & Olga Kononova & Kristin A. Persson & Gerbrand Ceder & Anubhav Jain, 2019. "Unsupervised word embeddings capture latent knowledge from materials science literature," Nature, Nature, vol. 571(7763), pages 95-98, July.
    11. Veugelers, Reinhilde & Wang, Jian, 2019. "Scientific novelty and technological impact," Research Policy, Elsevier, vol. 48(6), pages 1362-1372.
    12. Kevin J. Boudreau & Eva C. Guinan & Karim R. Lakhani & Christoph Riedl, 2016. "Looking Across and Looking Beyond the Knowledge Frontier: Intellectual Distance, Novelty, and Resource Allocation in Science," Management Science, INFORMS, vol. 62(10), pages 2765-2783, October.
    13. Arthur, W. Brian, 2007. "The structure of invention," Research Policy, Elsevier, vol. 36(2), pages 274-287, March.
    14. Fontana, Magda & Iori, Martina & Montobbio, Fabio & Sinatra, Roberta, 2020. "New and atypical combinations: An assessment of novelty and interdisciplinarity," Research Policy, Elsevier, vol. 49(7).
    15. Kristina Dahlin & Deans M. Behrens, 2005. "When is an invention really radical? Defining and measuring technological radicalness," Post-Print hal-00480416, HAL.
    16. Benjamin Balsmeier & Mohamad Assaf & Tyler Chesebro & Gabe Fierro & Kevin Johnson & Scott Johnson & Guan‐Cheng Li & Sonja Lück & Doug O'Reagan & Bill Yeh & Guangzheng Zang & Lee Fleming, 2018. "Machine learning and natural language processing on the patent corpus: Data, tools, and new measures," Journal of Economics & Management Strategy, Wiley Blackwell, vol. 27(3), pages 535-553, September.
    17. Lee Fleming, 2001. "Recombinant Uncertainty in Technological Search," Management Science, INFORMS, vol. 47(1), pages 117-132, January.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Shiyun Wang & Yaxue Ma & Jin Mao & Yun Bai & Zhentao Liang & Gang Li, 2023. "Quantifying scientific breakthroughs by a novel disruption indicator based on knowledge entities," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 74(2), pages 150-167, February.
    2. Dan Tian & Xin Liu & Jiang Li, 2024. "Accelerated acceptance time for preprint submissions: a comparative analysis based on PubMed," Scientometrics, Springer;Akadémiai Kiadó, vol. 129(7), pages 3787-3807, July.
    3. Van-Thien Nguyen & René Carraz, 2023. "A Novel Matching Algorithm for Academic Patent Paper Pairs: An Exploratory Study of Japan's national research universities and laboratories," Working Papers of BETA 2023-29, Bureau d'Economie Théorique et Appliquée, UDS, Strasbourg.
    4. Jeon, Daeseong & Lee, Junyoup & Ahn, Joon Mo & Lee, Changyong, 2023. "Measuring the novelty of scientific publications: A fastText and local outlier factor approach," Journal of Informetrics, Elsevier, vol. 17(4).
    5. Yulin Yu & Daniel M. Romero, 2024. "Does the Use of Unusual Combinations of Datasets Contribute to Greater Scientific Impact?," Papers 2402.05024, arXiv.org, revised Sep 2024.
    6. Elizabeth S. Vieira, 2023. "The influence of research collaboration on citation impact: the countries in the European Innovation Scoreboard," Scientometrics, Springer;Akadémiai Kiadó, vol. 128(6), pages 3555-3579, June.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Hou, Jianhua & Wang, Dongyi & Li, Jing, 2022. "A new method for measuring the originality of academic articles based on knowledge units in semantic networks," Journal of Informetrics, Elsevier, vol. 16(3).
    2. Kuniko Matsumoto & Sotaro Shibayama & Byeongwoo Kang & Masatsura Igami, 2021. "Introducing a novelty indicator for scientific research: validating the knowledge-based combinatorial approach," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(8), pages 6891-6915, August.
    3. Ke, Qing, 2020. "Technological impact of biomedical research: The role of basicness and novelty," Research Policy, Elsevier, vol. 49(7).
    4. Dongqing Lyu & Kaile Gong & Xuanmin Ruan & Ying Cheng & Jiang Li, 2021. "Does research collaboration influence the “disruption” of articles? Evidence from neurosciences," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(1), pages 287-303, January.
    5. Quentin Plantec & Pascal Le Masson & Benoit Weil, 2020. "Impact of knowledge search practices on the originality of inventions: a study in the oil & gas industry," Post-Print hal-02613665, HAL.
    6. Michele Cincera & Ela Ince, 2019. "Types of Innovation and Firm performance," Working Papers TIMES² 2019-032, ULB -- Universite Libre de Bruxelles.
    7. Brea, Edgar, 2024. "The yin yang of AI: Exploring how commercial and non-commercial orientations shape machine learning innovation," Research Policy, Elsevier, vol. 53(6).
    8. Plantec, Quentin & Le Masson, Pascal & Weil, Benoît, 2021. "Impact of knowledge search practices on the originality of inventions: A study in the oil & gas industry through dynamic patent analysis," Technological Forecasting and Social Change, Elsevier, vol. 168(C).
    9. Dirk Fornahl & Nils Grashof & Alexander Kopka, 2021. "Do not neglect the periphery?! - the emergence and diffusion of radical innovations," Bremen Papers on Economics & Innovation 2102, University of Bremen, Faculty of Business Studies and Economics.
    10. Sam Arts & Nicola Melluso & Reinhilde Veugelers, 2023. "Beyond Citations: Measuring Novel Scientific Ideas and their Impact in Publication Text," Papers 2309.16437, arXiv.org, revised Oct 2024.
    11. Kolja Hesse & Dirk Fornahl, 2020. "Essential ingredients for radical innovations? The role of (un‐)related variety and external linkages in Germany," Papers in Regional Science, Wiley Blackwell, vol. 99(5), pages 1165-1183, October.
    12. Ugo Rizzo & Nicolò Barbieri & Laura Ramaciotti & Demian Iannantuono, 2020. "The division of labour between academia and industry for the generation of radical inventions," The Journal of Technology Transfer, Springer, vol. 45(2), pages 393-413, April.
    13. Nicolas Carayol, 2016. "The Right Job and the Job Right: Novelty, Impact and Journal Stratification in Science," Post-Print hal-02274661, HAL.
    14. Ron Boschma & Ernest Miguelez & Rosina Moreno & Diego B. Ocampo-Corrales, 2021. "Technological breakthroughs in European regions: the role of related and unrelated combinations," Papers in Evolutionary Economic Geography (PEEG) 2118, Utrecht University, Department of Human Geography and Spatial Planning, Group Economic Geography, revised Jun 2021.
    15. Charles Ayoubi & Michele Pezzoni & Fabiana Visentin, 2021. "Does It Pay to Do Novel Science? The Selectivity Patterns in Science Funding," Science and Public Policy, Oxford University Press, vol. 48(5), pages 635-648.
    16. Barbieri, Nicolò & Marzucchi, Alberto & Rizzo, Ugo, 2020. "Knowledge sources and impacts on subsequent inventions: Do green technologies differ from non-green ones?," Research Policy, Elsevier, vol. 49(2).
    17. Pierre Pelletier & Kevin Wirtz, 2023. "Sails and Anchors: The Complementarity of Exploratory and Exploitative Scientists in Knowledge Creation," Papers 2312.10476, arXiv.org.
    18. Plantec, Quentin & Deval, Marie-Alix & Hooge, Sophie & Weil, Benoit, 2023. "Big data as an exploration trigger or problem-solving patch: Design and integration of AI-embedded systems in the automotive industry," Technovation, Elsevier, vol. 124(C).
    19. Doblinger, Claudia & Surana, Kavita & Li, Deyu & Hultman, Nathan & Anadón, Laura Díaz, 2022. "How do global manufacturing shifts affect long-term clean energy innovation? A study of wind energy suppliers," Research Policy, Elsevier, vol. 51(7).
    20. Pezzoni, Michele & Veugelers, Reinhilde & Visentin, Fabiana, 2022. "How fast is this novel technology going to be a hit? Antecedents predicting follow-on inventions," Research Policy, Elsevier, vol. 51(3).

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0254034. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.