IDEAS home Printed from https://ideas.repec.org/h/spr/prbchp/978-3-030-75166-1_21.html
   My bibliography  Save this book chapter

Research on Patent Information Extraction Based on Deep Learning

In: AI and Analytics for Public Health

Author

Listed:
  • Xiaolei Cui

    (College of Economics and Management, Nanjing University of Aeronautics and Astronautics)

  • Lingfei Qian

    (College of Economics and Management, Nanjing University of Aeronautics and Astronautics)

Abstract

In the context of the era of big data, enterprises are paying more and more attention to the information management of internal big data. Patent is one of the important technical documents within the enterprise. Transforming it into a structured form for storage can improve the accuracy and convenience of patent information retrieval. However, most companies do not establish their own domain knowledge base which leads them to face huge and messy data. To solve this problem, we propose a patent information extraction method which is based on sequence tagging and semantic matching for extracting entity relation triples and patent features. It can provide a basis for the construction of knowledge models. Firstly, we apply python to preprocess the patent text of the field of battery technology for new energy vehicles, including data cleaning, word segmentation and so on. Then this study introduces a character-based pre-trained model and incorporates it with a bi-directional long short-term memory (BiLSTM) and a conditional random field (CRF) to extract entity words, relation words, and feature words from 6829 annotated datasets. Since the triple formed by random combination of entity words and relation words contains noise data, we consider the triple as a short text for semantic matching with the patent text. In this process, we also use pre-trained model combine BiLSTM to extract semantic information and remove noise data. In addition, we have improved the performance of the model by changing the way of data tagging. The results show that adding a pre-trained model before the traditional model can capture more semantic information and significantly improve the model performance. It also proves that the method we proposed is effective and can realize the automatic extraction of patent information in the field of new energy vehicle battery technology.

Suggested Citation

  • Xiaolei Cui & Lingfei Qian, 2022. "Research on Patent Information Extraction Based on Deep Learning," Springer Proceedings in Business and Economics, in: Hui Yang & Robin Qiu & Weiwei Chen (ed.), AI and Analytics for Public Health, pages 291-302, Springer.
  • Handle: RePEc:spr:prbchp:978-3-030-75166-1_21
    DOI: 10.1007/978-3-030-75166-1_21
    as

    Download full text from publisher

    To our knowledge, this item is not available for download. To find whether it is available, there are three options:
    1. Check below whether another version of this item is available online.
    2. Check on the provider's web page whether it is in fact available.
    3. Perform a search for a similarly titled item that would be available.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:prbchp:978-3-030-75166-1_21. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.