IDEAS home Printed from https://ideas.repec.org/a/spr/scient/v82y2010i2d10.1007_s11192-009-0046-6.html
   My bibliography  Save this article

Exploring the feasibility and accuracy of Latent Semantic Analysis based text mining techniques to detect similarity between patent documents and scientific publications

Author

Listed:
  • Tom Magerman

    (Centre for R&D Monitoring (ECOOM)
    K. U. Leuven)

  • Bart Looy

    (Centre for R&D Monitoring (ECOOM)
    K. U. Leuven
    Leuven Research & Development)

  • Xiaoyan Song

    (Centre for R&D Monitoring (ECOOM)
    K. U. Leuven)

Abstract

In this study, we examine and validate the use of existing text mining techniques (based on the vector space model and latent semantic indexing) to detect similarities between patent documents and scientific publications. Clearly, experts involved in domain studies would benefit from techniques that allow similarity to be detected—and hence facilitate mapping, categorization and classification efforts. In addition, given current debates on the relevance and appropriateness of academic patenting, the ability to assess content-relatedness between sets of documents—in this case, patents and publications—might become relevant and useful. We list several options available to arrive at content based similarity measures. Different options of a vector space model and latent semantic indexing approach have been selected and applied to the publications and patents of a sample of academic inventors (n = 6). We also validated the outcomes by using independently obtained validation scores of human raters. While we conclude that text mining techniques can be valuable for detecting similarities between patents and publications, our findings also indicate that the various options available to arrive at similarity measures vary considerably in terms of accuracy: some generally accepted text mining options, like dimensionality reduction and LSA, do not yield the best results when working with smaller document sets. Implications and directions for further research are discussed.

Suggested Citation

  • Tom Magerman & Bart Looy & Xiaoyan Song, 2010. "Exploring the feasibility and accuracy of Latent Semantic Analysis based text mining techniques to detect similarity between patent documents and scientific publications," Scientometrics, Springer;Akadémiai Kiadó, vol. 82(2), pages 289-306, February.
  • Handle: RePEc:spr:scient:v:82:y:2010:i:2:d:10.1007_s11192-009-0046-6
    DOI: 10.1007/s11192-009-0046-6
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s11192-009-0046-6
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s11192-009-0046-6?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Mario Calderini & Chiara Franzoni & Andrea Vezzulli, 2005. "If Star Scientists do not Patent: an Event History Analysis of Scientific Eminence and the Decision to Patent in the Academic World," KITeS Working Papers 169, KITeS, Centre for Knowledge, Internationalization and Technology Studies, Universita' Bocconi, Milano, Italy, revised Jun 2005.
    2. Meyer, Martin, 2006. "Are patenting scientists the better scholars?: An exploratory comparison of inventor-authors with their non-inventing peers in nano-science and technology," Research Policy, Elsevier, vol. 35(10), pages 1646-1662, December.
    3. Van Looy, Bart & Ranga, Marina & Callaert, Julie & Debackere, Koenraad & Zimmermann, Edwin, 2004. "Combining entrepreneurial and scientific performance in academia: towards a compounded and reciprocal Matthew-effect?," Research Policy, Elsevier, vol. 33(3), pages 425-441, April.
    4. Martin Meyer, 2006. "Knowledge integrators or weak links? An exploratory comparison of patenting researchers with their non-inventing peers in nano-science and technology," Scientometrics, Springer;Akadémiai Kiadó, vol. 68(3), pages 545-560, September.
    5. Noyons, E. C. M. & van Raan, A. F. J. & Grupp, H. & Schmoch, U., 1994. "Exploring the science and technology interface: inventor-author relations in laser medicine research," Research Policy, Elsevier, vol. 23(4), pages 443-457, July.
    6. Carl Eckart & Gale Young, 1936. "The approximation of one matrix by another of lower rank," Psychometrika, Springer;The Psychometric Society, vol. 1(3), pages 211-218, September.
    7. Van Looy, Bart & Callaert, Julie & Debackere, Koenraad, 2006. "Publication and patent behavior of academic researchers: Conflicting, reinforcing or merely co-existing?," Research Policy, Elsevier, vol. 35(4), pages 596-608, May.
    8. Patrick Glenisson & Wolfgang Glänzel & Olle Persson, 2005. "Combining full-text analysis and bibliometric indicators. A pilot study," Scientometrics, Springer;Akadémiai Kiadó, vol. 63(1), pages 163-180, March.
    9. Scott Deerwester & Susan T. Dumais & George W. Furnas & Thomas K. Landauer & Richard Harshman, 1990. "Indexing by latent semantic analysis," Journal of the American Society for Information Science, Association for Information Science & Technology, vol. 41(6), pages 391-407, September.
    10. Martin Meyer, 2000. "Patent Citations in a Novel Field of Technology — What Can They Tell about Interactions between Emerging Communities of Science and Technology?," Scientometrics, Springer;Akadémiai Kiadó, vol. 48(2), pages 151-178, September.
    11. Pierre Azoulay & Waverly Ding & Toby Stuart, 2006. "The Impact of Academic Patenting on the Rate, Quality, and Direction of (Public) Research Output," NBER Working Papers 11917, National Bureau of Economic Research, Inc.
    12. Fabrizio, Kira R. & Di Minin, Alberto, 2008. "Commercializing the laboratory: Faculty patenting and the open science environment," Research Policy, Elsevier, vol. 37(5), pages 914-931, June.
    13. Fiona Murray & Scott Stern, 2005. "Do Formal Intellectual Property Rights Hinder the Free Flow of Scientific Knowledge? An Empirical Test of the Anti-Commons Hypothesis," NBER Working Papers 11465, National Bureau of Economic Research, Inc.
    14. Engelsman, E. C. & van Raan, A. F. J., 1994. "A patent-based cartography of technology," Research Policy, Elsevier, vol. 23(1), pages 1-26, January.
    15. Loet Leydesdorff, 2004. "The university–industry knowledge relationship: Analyzing patents and the science base of technologies," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 55(11), pages 991-1001, September.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Ioanna Kountouri & Eleftherios Manousakis & Andrianos E. Tsekrekos, 2019. "Latent semantic analysis of corporate social responsibility reports (with an application to Hellenic firms)," International Journal of Disclosure and Governance, Palgrave Macmillan, vol. 16(1), pages 1-19, March.
    2. Julie Callaert & Joris Grouwels & Bart Looy, 2012. "Delineating the scientific footprint in technology: Identifying scientific publications within non-patent references," Scientometrics, Springer;Akadémiai Kiadó, vol. 91(2), pages 383-398, May.
    3. Sabrina L. Woltmann & Lars Alkærsig, 2018. "Tracing university–industry knowledge transfer through a text mining approach," Scientometrics, Springer;Akadémiai Kiadó, vol. 117(1), pages 449-472, October.
    4. Jongchan Kim & Jaehyun Choi & Sangsung Park & Dongsik Jang, 2018. "Patent Keyword Extraction for Sustainable Technology Management," Sustainability, MDPI, vol. 10(4), pages 1-18, April.
    5. Xuefeng Wang & Huichao Ren & Yun Chen & Yuqin Liu & Yali Qiao & Ying Huang, 2019. "Measuring patent similarity with SAO semantic analysis," Scientometrics, Springer;Akadémiai Kiadó, vol. 121(1), pages 1-23, October.
    6. Wei Du & Yibo Wang & Wei Xu & Jian Ma, 2021. "A personalized recommendation system for high-quality patent trading by leveraging hybrid patent analysis," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(12), pages 9369-9391, December.
    7. Higham, Kyle & de Rassenfosse, Gaétan & Jaffe, Adam B., 2021. "Patent Quality: Towards a Systematic Framework for Analysis and Measurement," Research Policy, Elsevier, vol. 50(4).
    8. Yongcong Luo & Jing Ma & Chi Li, 2020. "Entity name recognition of cross-border e-commerce commodity titles based on TWs-LSTM," Electronic Commerce Research, Springer, vol. 20(2), pages 405-426, June.
    9. Tomaz Bartol & Karmen Stopar, 2015. "Nano language and distribution of article title terms according to power laws," Scientometrics, Springer;Akadémiai Kiadó, vol. 103(2), pages 435-451, May.
    10. WANG, La-yin & ZHAO, Dong, 2021. "Cross-domain function analysis and trend study in Chinese construction industry based on patent semantic analysis," Technological Forecasting and Social Change, Elsevier, vol. 162(C).
    11. Chen, Lixin, 2017. "Do patent citations indicate knowledge linkage? The evidence from text similarities between patents and their citations," Journal of Informetrics, Elsevier, vol. 11(1), pages 63-79.
    12. Magerman, Tom & Looy, Bart Van & Debackere, Koenraad, 2015. "Does involvement in patenting jeopardize one’s academic footprint? An analysis of patent-paper pairs in biotechnology," Research Policy, Elsevier, vol. 44(9), pages 1702-1713.
    13. Samira Ranaei & Arho Suominen & Alan Porter & Stephen Carley, 2020. "Evaluating technological emergence using text analytics: two case technologies and three approaches," Scientometrics, Springer;Akadémiai Kiadó, vol. 122(1), pages 215-247, January.
    14. Yonghan Ju & So Young Sohn, 2015. "Identifying patterns in rare earth element patents based on text and data mining," Scientometrics, Springer;Akadémiai Kiadó, vol. 102(1), pages 389-410, January.
    15. Su Jin Seo & Eun Jin Han & So Young Sohn, 2015. "Trend analysis of academic research and technical development pertaining to gas hydrates," Scientometrics, Springer;Akadémiai Kiadó, vol. 105(2), pages 905-920, November.
    16. Veugelers, Reinhilde & Wang, Jian, 2019. "Scientific novelty and technological impact," Research Policy, Elsevier, vol. 48(6), pages 1362-1372.
    17. Gurney, Thomas & Horlings, Edwin & van den Besselaar, Peter & Sumikura, Koichi & Schoen, Antoine & Laurens, Patricia & Pardo, Daniel, 2014. "Analysing knowledge capture mechanisms: Methods and a stylised bioventure case," Journal of Informetrics, Elsevier, vol. 8(1), pages 259-272.
    18. Wagner, Stefan & Sternitzke, Christian & Walter, Sascha, 2022. "Mapping Markush," Research Policy, Elsevier, vol. 51(10).
    19. Sam Arts & Francesco Paolo Appio & Bart Looy, 2013. "Inventions shaping technological trajectories: do existing patent indicators provide a comprehensive picture?," Scientometrics, Springer;Akadémiai Kiadó, vol. 97(2), pages 397-419, November.
    20. Puccetti, Giovanni & Giordano, Vito & Spada, Irene & Chiarello, Filippo & Fantoni, Gualtiero, 2023. "Technology identification from patent texts: A novel named entity recognition method," Technological Forecasting and Social Change, Elsevier, vol. 186(PB).
    21. Chunjuan Luan & Zeyuan Liu & Xianwen Wang, 2013. "Divergence and convergence: technology-relatedness evolution in solar energy industry," Scientometrics, Springer;Akadémiai Kiadó, vol. 97(2), pages 461-475, November.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Callaert, Julie & Landoni, Paolo & Van Looy, Bart & Verganti, Roberto, 2015. "Scientific yield from collaboration with industry: The relevance of researchers’ strategic approaches," Research Policy, Elsevier, vol. 44(4), pages 990-998.
    2. Wang, Gangbo & Guan, Jiancheng, 2010. "The role of patenting activity for scientific research: A study of academic inventors from China's nanotechnology," Journal of Informetrics, Elsevier, vol. 4(3), pages 338-350.
    3. Czarnitzki, Dirk & Glänzel, Wolfgang & Hussinger, Katrin, 2009. "Heterogeneity of patenting activity and its implications for scientific research," Research Policy, Elsevier, vol. 38(1), pages 26-34, February.
    4. Van Looy, Bart & Landoni, Paolo & Callaert, Julie & van Pottelsberghe, Bruno & Sapsalis, Eleftherios & Debackere, Koenraad, 2011. "Entrepreneurial effectiveness of European universities: An empirical assessment of antecedents and trade-offs," Research Policy, Elsevier, vol. 40(4), pages 553-564, May.
    5. Breschi, Stefano & Catalini, Christian, 2010. "Tracing the links between science and technology: An exploratory analysis of scientists' and inventors' networks," Research Policy, Elsevier, vol. 39(1), pages 14-26, February.
    6. Nicola Baldini, 2008. "Negative effects of university patenting: Myths and grounded evidence," Scientometrics, Springer;Akadémiai Kiadó, vol. 75(2), pages 289-311, May.
    7. Meyer, Martin, 2006. "Are patenting scientists the better scholars?: An exploratory comparison of inventor-authors with their non-inventing peers in nano-science and technology," Research Policy, Elsevier, vol. 35(10), pages 1646-1662, December.
    8. Antje Klitkou & Stian Nygaard & Martin Meyer, 2007. "Tracking techno-science networks: A case study of fuel cells and related hydrogen technology R&D in Norway," Scientometrics, Springer;Akadémiai Kiadó, vol. 70(2), pages 491-518, February.
    9. Landry, Réjean & Saïhi, Malek & Amara, Nabil & Ouimet, Mathieu, 2010. "Evidence on how academics manage their portfolio of knowledge transfer activities," Research Policy, Elsevier, vol. 39(10), pages 1387-1403, December.
    10. Shuo Xu & Ling Li & Xin An, 2023. "Do academic inventors have diverse interests?," Scientometrics, Springer;Akadémiai Kiadó, vol. 128(2), pages 1023-1053, February.
    11. Antje Klitkou & Magnus Gulbrandsen, 2010. "The relationship between academic patenting and scientific publishing in Norway," Scientometrics, Springer;Akadémiai Kiadó, vol. 82(1), pages 93-108, January.
    12. Malwina Mejer, 2011. "Entrepreneurial Scientists and their Publication Performance. An Insight from Belgium," Working Papers ECARES ECARES 2011-017, ULB -- Universite Libre de Bruxelles.
    13. Beaudry, Catherine & Allaoui, Sedki, 2012. "Impact of public and private research funding on scientific production: The case of nanotechnology," Research Policy, Elsevier, vol. 41(9), pages 1589-1606.
    14. repec:wip:wpaper:4 is not listed on IDEAS
    15. Pluvia Zuniga, 2011. "The State of Patenting at Research Institutions in Developing Countries: Policy Approaches and Practices," WIPO Economic Research Working Papers 04, World Intellectual Property Organization - Economics and Statistics Division, revised Dec 2011.
    16. Daniel Ogachi & Lydia Bares & Zoltan Zeman, 2021. "Innovation and Scientific Research as a Sustainable Development Goal in Spanish Public Universities," Sustainability, MDPI, vol. 13(7), pages 1-12, April.
    17. Albert Banal-Estañol & Mireia Jofre-Bonet & Cornelia Meissner, 2008. "Theimpact of industry collaboration on research: Evidence from engineering academics in the UK," Economics Working Papers 1190, Department of Economics and Business, Universitat Pompeu Fabra, revised Aug 2010.
    18. Ani Gerbin & Mateja Drnovsek, 2016. "Determinants and public policy implications of academic-industry knowledge transfer in life sciences: a review and a conceptual framework," The Journal of Technology Transfer, Springer, vol. 41(5), pages 979-1076, October.
    19. Buenstorf, Guido, 2009. "Is commercialization good or bad for science? Individual-level evidence from the Max Planck Society," Research Policy, Elsevier, vol. 38(2), pages 281-292, March.
    20. Banal-Estañol, Albert & Jofre-Bonet, Mireia & Lawson, Cornelia, 2015. "The double-edged sword of industry collaboration: Evidence from engineering academics in the UK," Research Policy, Elsevier, vol. 44(6), pages 1160-1175.
    21. Uwe Cantner & Martin Kalthaus & Indira Yarullina, 2024. "Outcomes of science-industry collaboration: factors and interdependencies," The Journal of Technology Transfer, Springer, vol. 49(2), pages 542-580, April.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:scient:v:82:y:2010:i:2:d:10.1007_s11192-009-0046-6. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.