IDEAS home Printed from https://ideas.repec.org/a/bla/jemstr/v27y2018i3p535-553.html
   My bibliography  Save this article

Machine learning and natural language processing on the patent corpus: Data, tools, and new measures

Author

Listed:
  • Benjamin Balsmeier
  • Mohamad Assaf
  • Tyler Chesebro
  • Gabe Fierro
  • Kevin Johnson
  • Scott Johnson
  • Guan‐Cheng Li
  • Sonja Lück
  • Doug O'Reagan
  • Bill Yeh
  • Guangzheng Zang
  • Lee Fleming

Abstract

Drawing upon recent advances in machine learning and natural language processing, we introduce new tools that automatically ingest, parse, disambiguate, and build an updated database using U.S. patent data. The tools identify unique inventor, assignee, and location entities mentioned on each granted U.S. patent from 1976 to 2016. We describe data flow, algorithms, user interfaces, descriptive statistics, and a novelty measure based on the first appearance of a word in the patent corpus. We illustrate an automated coinventor network mapping tool and visualize trends in patenting over the last 40 years. Data and documentation can be found at https://console.cloud.google.com/launcher/partners/patents-public-data.

Suggested Citation

  • Benjamin Balsmeier & Mohamad Assaf & Tyler Chesebro & Gabe Fierro & Kevin Johnson & Scott Johnson & Guan‐Cheng Li & Sonja Lück & Doug O'Reagan & Bill Yeh & Guangzheng Zang & Lee Fleming, 2018. "Machine learning and natural language processing on the patent corpus: Data, tools, and new measures," Journal of Economics & Management Strategy, Wiley Blackwell, vol. 27(3), pages 535-553, September.
  • Handle: RePEc:bla:jemstr:v:27:y:2018:i:3:p:535-553
    DOI: 10.1111/jems.12259
    as

    Download full text from publisher

    File URL: https://doi.org/10.1111/jems.12259
    Download Restriction: no

    File URL: https://libkey.io/10.1111/jems.12259?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Michele Pezzoni & Francesco Lissoni & Gianluca Tarasconi, 2014. "How to kill inventors: testing the Massacrator© algorithm for inventor disambiguation," Scientometrics, Springer;Akadémiai Kiadó, vol. 101(1), pages 477-504, October.
    2. Balsmeier, Benjamin & Fleming, Lee & Manso, Gustavo, 2017. "Independent boards and innovation," Journal of Financial Economics, Elsevier, vol. 123(3), pages 536-557.
    3. Manuel Trajtenberg & Gil Shiff & Ran Melamed, 2009. "The "Names Game": Harnessing Inventors, Patent Data for Economic Research," Annals of Economics and Statistics, GENES, issue 93-94, pages 67-77.
    4. Hall, B. & Jaffe, A. & Trajtenberg, M., 2001. "The NBER Patent Citations Data File: Lessons, Insights and Methodological Tools," Papers 2001-29, Tel Aviv.
    5. Bernstein, Shai, 2014. "Does Going Public Affect Innovation?," Research Papers 3011, Stanford University, Graduate School of Business.
    6. Jasjit Singh, 2005. "Collaborative Networks as Determinants of Knowledge Diffusion Patterns," Management Science, INFORMS, vol. 51(5), pages 756-770, May.
    7. Raffo, Julio & Lhuillery, Stéphane, 2009. "How to play the "Names Game": Patent retrieval comparing different heuristics," Research Policy, Elsevier, vol. 38(10), pages 1617-1627, December.
    8. Trajtenberg, Manuel & Shiff, Gil & Melamed, Ran, 2006. "The ˆNames Game˜: Harnessing Inventors Patent Data for Economic Research," Foerder Institute for Economic Research Working Papers 275702, Tel-Aviv University > Foerder Institute for Economic Research.
    9. Bronwyn H. Hall & Dietmar Harhoff, 2012. "Recent Research on the Economics of Patents," Annual Review of Economics, Annual Reviews, vol. 4(1), pages 541-565, July.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Carayol, Nicolas & Bergé, Laurent & Cassi, Lorenzo & Roux, Pascale, 2019. "Unintended triadic closure in social networks: The strategic formation of research collaborations between French inventors," Journal of Economic Behavior & Organization, Elsevier, vol. 163(C), pages 218-238.
    2. Li, Guan-Cheng & Lai, Ronald & D’Amour, Alexander & Doolin, David M. & Sun, Ye & Torvik, Vetle I. & Yu, Amy Z. & Fleming, Lee, 2014. "Disambiguation and co-authorship networks of the U.S. patent inventor database (1975–2010)," Research Policy, Elsevier, vol. 43(6), pages 941-955.
    3. Ventura, Samuel L. & Nugent, Rebecca & Fuchs, Erica R.H., 2015. "Seeing the non-stars: (Some) sources of bias in past disambiguation approaches and a new public tool leveraging labeled records," Research Policy, Elsevier, vol. 44(9), pages 1672-1701.
    4. YIN Deyun & MOTOHASHI Kazuyuki, 2018. "Inventor Name Disambiguation with Gradient Boosting Decision Tree and Inventor Mobility in China (1985-2016)," Discussion papers 18018, Research Institute of Economy, Trade and Industry (RIETI).
    5. Varshney, Mayank & Jain, Amit, 2023. "Understanding “reverse” knowledge flows following inventor exit in the semiconductor industry," Technovation, Elsevier, vol. 121(C).
    6. Stefano Breschi & Francesco Lissoni & Gianluca Tarasconi, 2014. "Inventor Data for Research on Migration and Innovation: A Survey and a Pilot," WIPO Economic Research Working Papers 17, World Intellectual Property Organization - Economics and Statistics Division.
    7. Li, Xiaogang, 2020. "Innovation, market valuations, policy uncertainty and trade: Theory and evidence," ISU General Staff Papers 202001010800009179, Iowa State University, Department of Economics.
    8. Nakajima, Ryo & Tamura, Ryuichi & Hanaki, Nobuyuki, 2010. "The effect of collaboration network on inventors' job match, productivity and tenure," Labour Economics, Elsevier, vol. 17(4), pages 723-734, August.
    9. Martin Ganco & Rosemarie H. Ziedonis & Rajshree Agarwal, 2015. "More stars stay, but the brightest ones still leave: Job hopping in the shadow of patent enforcement," Strategic Management Journal, Wiley Blackwell, vol. 36(5), pages 659-685, May.
    10. Jeongsik “Jay” Lee, 2010. "Heterogeneity, Brokerage, and Innovative Performance: Endogenous Formation of Collaborative Inventor Networks," Organization Science, INFORMS, vol. 21(4), pages 804-822, August.
    11. Stefano Breschi & Francesco Lissoni & Ernest Miguelez, 2017. "Foreign-origin inventors in the USA: testing for diaspora and brain gain effects," Journal of Economic Geography, Oxford University Press, vol. 17(5), pages 1009-1038.
    12. Amit Jain & Will Mitchell, 2022. "Specialization as a double‐edged sword: The relationship of scientist specialization with R&D productivity and impact following collaborator change," Strategic Management Journal, Wiley Blackwell, vol. 43(5), pages 986-1024, May.
    13. Ernest Miguelez & Carsten Fink, 2013. "Measuring the International Mobility of Inventors: A New Database," WIPO Economic Research Working Papers 08, World Intellectual Property Organization - Economics and Statistics Division, revised May 2013.
    14. Marx, Matt & Singh, Jasjit & Fleming, Lee, 2015. "Regional disadvantage? Employee non-compete agreements and brain drain," Research Policy, Elsevier, vol. 44(2), pages 394-404.
    15. Clément Gorin, 2017. "Accessibility, absorptive capacity and innovation in European urban areas," Working Papers 1722, Groupe d'Analyse et de Théorie Economique Lyon St-Étienne (GATE Lyon St-Étienne), Université de Lyon.
    16. Massimiliano Ferrara & Roberto Mavilia & Bruno Antonio Pansera, 2017. "Extracting knowledge patterns with a social network analysis approach: an alternative methodology for assessing the impact of power inventors," Scientometrics, Springer;Akadémiai Kiadó, vol. 113(3), pages 1593-1625, December.
    17. Giuri, Paola & Mariani, Myriam, 2007. "Inventors and invention processes in Europe: Results from the PatVal-EU survey," Research Policy, Elsevier, vol. 36(8), pages 1105-1106, October.
    18. repec:wip:wpaper:8 is not listed on IDEAS
    19. Hanaki, Nobuyuki & Nakajima, Ryo & Ogura, Yoshiaki, 2010. "The dynamics of R&D network in the IT industry," Research Policy, Elsevier, vol. 39(3), pages 386-399, April.
    20. Martha Prevezer & Pietro Panzarasa & Tore Opsahl, 2010. "Geographic clustering and network evolution of innovative activities: Evidence from China’s patents," Working Papers 32, Queen Mary, University of London, School of Business and Management, Centre for Globalisation Research.
    21. Castellani, Davide & Perri, Alessandra & Scalera, Vittoria G., 2022. "Knowledge integration in multinational enterprises: The role of inventors crossing national and organizational boundaries," Journal of World Business, Elsevier, vol. 57(3).

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bla:jemstr:v:27:y:2018:i:3:p:535-553. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Wiley Content Delivery (email available below). General contact details of provider: http://www.kellogg.northwestern.edu/research/journals/JEMS/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.