IDEAS home Printed from https://ideas.repec.org/a/eee/tefoso/v206y2024ics0040162524003329.html
   My bibliography  Save this article

PatentSBERTa: A deep NLP based hybrid model for patent distance and classification using augmented SBERT

Author

Listed:
  • Bekamiri, Hamid
  • Hain, Daniel S.
  • Jurowetzki, Roman

Abstract

This study presents an efficient approach for utilizing text data to calculate patent-to-patent (p2p) technological similarity and proposes a hybrid framework for leveraging the resulting p2p similarity in applications such as semantic search and automated patent classification. To achieve this, we create embeddings using Sentence-BERT (SBERT) on patent claims. For domain adaptation of the general SBERT model, we implement an augmented approach to fine-tune SBERT using in-domain supervised patent claims data. The study utilizes SBERT's efficiency in creating embedding distance measures to map p2p similarity in large sets of patent data. We demonstrate applications of the framework for the use case of automated patent classification with a simple K Nearest Neighbors (KNN) model that predicts assigned Cooperative Patent Classification (CPC) based on the class assignment of the K patents with the highest p2p similarity. The results show that p2p similarity captures technological features in terms of CPC overlap, and the approach is useful for automatic patent classification based on text data. Moreover, the presented classification framework is simple, and the results are easy to interpret and evaluate by end-users via instance-based explanations. The study performs an out-of-sample model validation, predicting all assigned CPC classes on the subclass (663) level with an F1 score of 66 %, outperforming the current state-of-the-art in text-based multi-label patent classification. The study also discusses the applicability of the presented framework for semantic intellectual property (IP) search, patent landscaping, and technology mapping. Finally, the study outlines a future research agenda to leverage multi-source patent embeddings, evaluate their appropriateness across applications, and improve and validate patent embeddings by creating domain-expert curated Semantic Textual Similarity (STS) benchmark datasets.

Suggested Citation

  • Bekamiri, Hamid & Hain, Daniel S. & Jurowetzki, Roman, 2024. "PatentSBERTa: A deep NLP based hybrid model for patent distance and classification using augmented SBERT," Technological Forecasting and Social Change, Elsevier, vol. 206(C).
  • Handle: RePEc:eee:tefoso:v:206:y:2024:i:c:s0040162524003329
    DOI: 10.1016/j.techfore.2024.123536
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0040162524003329
    Download Restriction: Full text for ScienceDirect subscribers only

    File URL: https://libkey.io/10.1016/j.techfore.2024.123536?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Jeff Alstott & Giorgio Triulzi & Bowen Yan & Jianxi Luo, 2017. "Mapping technology space by normalizing patent networks," Scientometrics, Springer;Akadémiai Kiadó, vol. 110(1), pages 443-479, January.
    2. Duen-Ren Liu & Meng-Jung Shih, 2011. "Hybrid-patent classification based on patent-network analysis," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 62(2), pages 246-256, February.
    3. Adam B. Jaffe & Manuel Trajtenberg & Rebecca Henderson, 1993. "Geographic Localization of Knowledge Spillovers as Evidenced by Patent Citations," The Quarterly Journal of Economics, President and Fellows of Harvard College, vol. 108(3), pages 577-598.
    4. Yuan Zhou & Fang Dong & Yufei Liu & Zhaofu Li & JunFei Du & Li Zhang, 2020. "Forecasting emerging technologies using data augmentation and deep learning," Scientometrics, Springer;Akadémiai Kiadó, vol. 123(1), pages 1-29, April.
    5. Arts, Sam & Hou, Jianan & Gomez, Juan Carlos, 2021. "Natural language processing to identify the creation and impact of new technologies in patent text: Code, data, and new measures," Research Policy, Elsevier, vol. 50(2).
    6. Shaobo Li & Jie Hu & Yuxin Cui & Jianjun Hu, 2018. "DeepPatent: patent classification with convolutional neural networks and word embedding," Scientometrics, Springer;Akadémiai Kiadó, vol. 117(2), pages 721-744, November.
    7. Aharonson, Barak S. & Schilling, Melissa A., 2016. "Mapping the technological landscape: Measuring technology distance, technological footprints, and technology evolution," Research Policy, Elsevier, vol. 45(1), pages 81-96.
    8. Jie Hu & Shaobo Li & Jianjun Hu & Guanci Yang, 2018. "A Hierarchical Feature Extraction Model for Multi-Label Mechanical Patent Classification," Sustainability, MDPI, vol. 10(1), pages 1-22, January.
    9. Sam Arts & Bruno Cassiman & Juan Carlos Gomez, 2018. "Text matching to measure patent similarity," Strategic Management Journal, Wiley Blackwell, vol. 39(1), pages 62-84, January.
    10. Chao Yang & Donghua Zhu & Xuefeng Wang & Yi Zhang & Guangquan Zhang & Jie Lu, 2017. "Requirement-oriented core technological components’ identification based on SAO analysis," Scientometrics, Springer;Akadémiai Kiadó, vol. 112(3), pages 1229-1248, September.
    11. Breschi, Stefano & Lissoni, Francesco & Malerba, Franco, 2003. "Knowledge-relatedness in firm technological diversification," Research Policy, Elsevier, vol. 32(1), pages 69-87, January.
    12. Dieter F. Kogler & David L. Rigby & Isaac Tucker, 2013. "Mapping Knowledge Space and Technological Relatedness in US Cities," European Planning Studies, Taylor & Francis Journals, vol. 21(9), pages 1374-1391, September.
    13. Kim, Tae San & Sohn, So Young, 2020. "Machine-learning-based deep semantic analysis approach for forecasting new technology convergence," Technological Forecasting and Social Change, Elsevier, vol. 157(C).
    14. Trappey, Amy & Trappey, Charles V. & Hsieh, Alex, 2021. "An intelligent patent recommender adopting machine learning approach for natural language processing: A case study for smart machinery technology mining," Technological Forecasting and Social Change, Elsevier, vol. 164(C).
    15. Choi, Seokkyu & Lee, Hyeonju & Park, Eunjeong & Choi, Sungchul, 2022. "Deep learning for patent landscaping using transformer and graph embedding," Technological Forecasting and Social Change, Elsevier, vol. 175(C).
    16. Hain, Daniel S. & Jurowetzki, Roman & Buchmann, Tobias & Wolf, Patrick, 2022. "A text-embedding-based approach to measuring patent-to-patent technological similarity," Technological Forecasting and Social Change, Elsevier, vol. 177(C).
    17. Zhen-Wu Wang & Si-Kai Wang & Ben-Ting Wan & William Wei Song, 2020. "A novel multi-label classification algorithm based on K-nearest neighbor and random walk," International Journal of Distributed Sensor Networks, , vol. 16(3), pages 15501477209, March.
    18. WANG, La-yin & ZHAO, Dong, 2021. "Cross-domain function analysis and trend study in Chinese construction industry based on patent semantic analysis," Technological Forecasting and Social Change, Elsevier, vol. 162(C).
    19. Ma, Tingting & Zhou, Xiao & Liu, Jia & Lou, Zhenkai & Hua, Zhaoting & Wang, Ruitao, 2021. "Combining topic modeling and SAO semantic analysis to identify technological opportunities of emerging technologies," Technological Forecasting and Social Change, Elsevier, vol. 173(C).
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Hain, Daniel S. & Jurowetzki, Roman & Buchmann, Tobias & Wolf, Patrick, 2022. "A text-embedding-based approach to measuring patent-to-patent technological similarity," Technological Forecasting and Social Change, Elsevier, vol. 177(C).
    2. Puccetti, Giovanni & Giordano, Vito & Spada, Irene & Chiarello, Filippo & Fantoni, Gualtiero, 2023. "Technology identification from patent texts: A novel named entity recognition method," Technological Forecasting and Social Change, Elsevier, vol. 186(PB).
    3. Lars Mewes & Tom Broekel, 2020. "Subsidized to change? The impact of R&D policy on regional technological diversification," The Annals of Regional Science, Springer;Western Regional Science Association, vol. 65(1), pages 221-252, August.
    4. Stefano Basilico & Holger Graf, 2023. "Bridging technologies in the regional knowledge space: measurement and evolution," Journal of Evolutionary Economics, Springer, vol. 33(4), pages 1085-1124, September.
    5. Escolar, Emerson G. & Hiraoka, Yasuaki & Igami, Mitsuru & Ozcan, Yasin, 2023. "Mapping firms’ locations in technological space: A topological analysis of patent statistics," Research Policy, Elsevier, vol. 52(8).
    6. Dieter F. Kogler & Jürgen Essletzbichler & David L. Rigby, 2017. "The evolution of specialization in the EU15 knowledge space," Journal of Economic Geography, Oxford University Press, vol. 17(2), pages 345-373.
    7. Pierre-Alexandre Balland & David L. Rigby, 2015. "The geography and evolution of complex knowledge," Papers in Evolutionary Economic Geography (PEEG) 1502, Utrecht University, Department of Human Geography and Spatial Planning, Group Economic Geography, revised Jan 2015.
    8. Fusillo, Fabrizio, 2020. "Are Green Inventions really more complex? Evidence from European Patents," Department of Economics and Statistics Cognetti de Martiis LEI & BRICK - Laboratory of Economics of Innovation "Franco Momigliano", Bureau of Research in Innovation, Complexity and Knowledge, Collegio 202002, University of Turin.
    9. Jeon, Eunji & Yoon, Naeun & Sohn, So Young, 2023. "Exploring new digital therapeutics technologies for psychiatric disorders using BERTopic and PatentSBERTa," Technological Forecasting and Social Change, Elsevier, vol. 186(PA).
    10. Katsuyuki Kaneko & Yuya Kajikawa, 2023. "Novelty Score and Technological Relatedness Measurement Using Patent Information in Mergers and Acquisitions: Case Study in the Japanese Electric Motor Industry," Global Journal of Flexible Systems Management, Springer;Global Institute of Flexible Systems Management, vol. 24(2), pages 163-177, June.
    11. Dario Diodato & Andrea Morrison, 2019. "Technological regimes and the geography of innovation: a long-run perspective on US inventions," Papers in Evolutionary Economic Geography (PEEG) 1924, Utrecht University, Department of Human Geography and Spatial Planning, Group Economic Geography, revised Jul 2019.
    12. Maryann Feldman & Dieter Kogler & David Rigby, 2013. "rKnowledge: The Spatial Diffusion of rDNA Methods," Papers in Evolutionary Economic Geography (PEEG) 1311, Utrecht University, Department of Human Geography and Spatial Planning, Group Economic Geography, revised Aug 2013.
    13. Yuan Zhou & Fang Dong & Yufei Liu & Liang Ran, 2021. "A deep learning framework to early identify emerging technologies in large-scale outlier patents: an empirical study of CNC machine tool," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(2), pages 969-994, February.
    14. Higham, Kyle & Contisciani, Martina & De Bacco, Caterina, 2022. "Multilayer patent citation networks: A comprehensive analytical framework for studying explicit technological relationships," Technological Forecasting and Social Change, Elsevier, vol. 179(C).
    15. Liang Chen & Shuo Xu & Lijun Zhu & Jing Zhang & Xiaoping Lei & Guancan Yang, 2020. "A deep learning based method for extracting semantic information from patent documents," Scientometrics, Springer;Akadémiai Kiadó, vol. 125(1), pages 289-312, October.
    16. François Lafond & Daniel Kim, 2019. "Long-run dynamics of the U.S. patent classification system," Journal of Evolutionary Economics, Springer, vol. 29(2), pages 631-664, April.
    17. Pinheiro, Flávio L. & Hartmann, Dominik & Boschma, Ron & Hidalgo, César A., 2022. "The time and frequency of unrelated diversification," Research Policy, Elsevier, vol. 51(8).
    18. Maria Tsouri & Ron Boschma, 2024. "The importance of science for the development of new PV technologies in European regions," Papers in Evolutionary Economic Geography (PEEG) 2410, Utrecht University, Department of Human Geography and Spatial Planning, Group Economic Geography, revised Apr 2024.
    19. Plantec, Quentin & Le Masson, Pascal & Weil, Benoît, 2021. "Impact of knowledge search practices on the originality of inventions: A study in the oil & gas industry through dynamic patent analysis," Technological Forecasting and Social Change, Elsevier, vol. 168(C).
    20. Just, Julian, 2024. "Natural language processing for innovation search – Reviewing an emerging non-human innovation intermediary," Technovation, Elsevier, vol. 129(C).

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:tefoso:v:206:y:2024:i:c:s0040162524003329. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.sciencedirect.com/science/journal/00401625 .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.