IDEAS home Printed from https://ideas.repec.org/a/spr/scient/v125y2020i1d10.1007_s11192-020-03634-y.html
   My bibliography  Save this article

A deep learning based method for extracting semantic information from patent documents

Author

Listed:
  • Liang Chen

    (Institute of Scientific and Technical Information of China)

  • Shuo Xu

    (Beijing University of Technology)

  • Lijun Zhu

    (Institute of Scientific and Technical Information of China)

  • Jing Zhang

    (Institute of Scientific and Technical Information of China)

  • Xiaoping Lei

    (Institute of Scientific and Technical Information of China)

  • Guancan Yang

    (Renmin University of China)

Abstract

The text-based patent analysis is grounded in information extraction technique. However, such technique suffers from obvious defects such as low degree of automation and unsatisfactory extraction accuracy. To deal with these problems, after an information schema is pre-defined, which contains 17 types of entities and 15 types of semantic relations, a dataset of 1010 patent abstracts is annotated and opened freely to the research community. Then, a novel patent information extraction framework is proposed, in which two deep-learning models, BiLSTM-CRF and BiGRU-HAN, are respectively used for entity identification and semantic relation extraction. Finally, to demonstrate the advantages of the new framework, extensive experiments are conducted, and the SAO method and PCNNs model are taken as respective baselines on the framework and module levels. Experimental results show that our framework out-performs the traditional one in terms of automation and accuracy, and is capable of extracting fine-grained structured information from patent texts.

Suggested Citation

  • Liang Chen & Shuo Xu & Lijun Zhu & Jing Zhang & Xiaoping Lei & Guancan Yang, 2020. "A deep learning based method for extracting semantic information from patent documents," Scientometrics, Springer;Akadémiai Kiadó, vol. 125(1), pages 289-312, October.
  • Handle: RePEc:spr:scient:v:125:y:2020:i:1:d:10.1007_s11192-020-03634-y
    DOI: 10.1007/s11192-020-03634-y
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s11192-020-03634-y
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s11192-020-03634-y?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Xuefeng Wang & Huichao Ren & Yun Chen & Yuqin Liu & Yali Qiao & Ying Huang, 2019. "Measuring patent similarity with SAO semantic analysis," Scientometrics, Springer;Akadémiai Kiadó, vol. 121(1), pages 1-23, October.
    2. An, Jaehyeong & Kim, Kyuwoong & Mortara, Letizia & Lee, Sungjoo, 2018. "Deriving technology intelligence from patents: Preposition-based semantic analysis," Journal of Informetrics, Elsevier, vol. 12(1), pages 217-236.
    3. Yuan Zhou & Fang Dong & Yufei Liu & Zhaofu Li & JunFei Du & Li Zhang, 2020. "Forecasting emerging technologies using data augmentation and deep learning," Scientometrics, Springer;Akadémiai Kiadó, vol. 123(1), pages 1-29, April.
    4. Yang, Chao & Huang, Cui & Su, Jun, 2018. "An improved SAO network-based method for technology trend analysis: A case study of graphene," Journal of Informetrics, Elsevier, vol. 12(1), pages 271-286.
    5. Shaobo Li & Jie Hu & Yuxin Cui & Jianjun Hu, 2018. "DeepPatent: patent classification with convolutional neural networks and word embedding," Scientometrics, Springer;Akadémiai Kiadó, vol. 117(2), pages 721-744, November.
    6. Chao Yang & Donghua Zhu & Xuefeng Wang & Yi Zhang & Guangquan Zhang & Jie Lu, 2017. "Requirement-oriented core technological components’ identification based on SAO analysis," Scientometrics, Springer;Akadémiai Kiadó, vol. 112(3), pages 1229-1248, September.
    7. Changyong Lee & Gyumin Lee, 2019. "Technology opportunity analysis based on recombinant search: patent landscape analysis for idea generation," Scientometrics, Springer;Akadémiai Kiadó, vol. 121(2), pages 603-632, November.
    8. Xu, Jianguo & Guo, Lixiang & Jiang, Jiang & Ge, Bingfeng & Li, Mengjun, 2019. "A deep learning methodology for automatic extraction and discovery of technical intelligence," Technological Forecasting and Social Change, Elsevier, vol. 146(C), pages 339-351.
    9. Saber A Akhondi & Alexander G Klenner & Christian Tyrchan & Anil K Manchala & Kiran Boppana & Daniel Lowe & Marc Zimmermann & Sarma A R P Jagarlapudi & Roger Sayle & Jan A Kors & Sorel Muresan, 2014. "Annotated Chemical Patent Corpus: A Gold Standard for Text Mining," PLOS ONE, Public Library of Science, vol. 9(9), pages 1-8, September.
    10. Hyunseok Park & Janghyeok Yoon & Kwangsoo Kim, 2012. "Identifying patent infringement using SAO based semantic technological similarities," Scientometrics, Springer;Akadémiai Kiadó, vol. 90(2), pages 515-529, February.
    11. Guo, Junfang & Wang, Xuefeng & Li, Qianrui & Zhu, Donghua, 2016. "Subject–action–object-based morphology analysis for determining the direction of technological change," Technological Forecasting and Social Change, Elsevier, vol. 105(C), pages 27-40.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Xu, Haiyun & Yue, Zenghui & Pang, Hongshen & Elahi, Ehsan & Li, Jing & Wang, Lu, 2022. "Integrative model for discovering linked topics in science and technology," Journal of Informetrics, Elsevier, vol. 16(2).
    2. Chen, Liang & Xu, Shuo & Zhu, Lijun & Zhang, Jing & Yang, Guancan & Xu, Haiyun, 2022. "A deep learning based method benefiting from characteristics of patents for semantic relation classification," Journal of Informetrics, Elsevier, vol. 16(3).
    3. Jaewoong Choi & Jiho Lee & Janghyeok Yoon & Sion Jang & Jaeyoung Kim & Sungchul Choi, 2022. "A two-stage deep learning-based system for patent citation recommendation," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(11), pages 6615-6636, November.
    4. Hain, Daniel S. & Jurowetzki, Roman & Buchmann, Tobias & Wolf, Patrick, 2022. "A text-embedding-based approach to measuring patent-to-patent technological similarity," Technological Forecasting and Social Change, Elsevier, vol. 177(C).
    5. Arousha Haghighian Roudsari & Jafar Afshar & Wookey Lee & Suan Lee, 2022. "PatentNet: multi-label classification of patent documents using deep learning based language understanding," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(1), pages 207-231, January.
    6. Xu, Shuo & Hao, Liyuan & Yang, Guancan & Lu, Kun & An, Xin, 2021. "A topic models based framework for detecting and forecasting emerging technologies," Technological Forecasting and Social Change, Elsevier, vol. 162(C).
    7. Shuo Xu & Ling Li & Xin An & Liyuan Hao & Guancan Yang, 2021. "An approach for detecting the commonality and specialty between scientific publications and patents," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(9), pages 7445-7475, September.
    8. An, Xin & Li, Jinghong & Xu, Shuo & Chen, Liang & Sun, Wei, 2021. "An improved patent similarity measurement based on entities and semantic relations," Journal of Informetrics, Elsevier, vol. 15(2).
    9. Teng, Hao & Wang, Nan & Zhao, Hongyu & Hu, Yingtong & Jin, Haitao, 2024. "Enhancing semantic text similarity with functional semantic knowledge (FOP) in patents," Journal of Informetrics, Elsevier, vol. 18(1).

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Chen, Liang & Xu, Shuo & Zhu, Lijun & Zhang, Jing & Yang, Guancan & Xu, Haiyun, 2022. "A deep learning based method benefiting from characteristics of patents for semantic relation classification," Journal of Informetrics, Elsevier, vol. 16(3).
    2. Liu, Zhenfeng & Feng, Jian & Uden, Lorna, 2023. "Technology opportunity analysis using hierarchical semantic networks and dual link prediction," Technovation, Elsevier, vol. 128(C).
    3. Hain, Daniel S. & Jurowetzki, Roman & Buchmann, Tobias & Wolf, Patrick, 2022. "A text-embedding-based approach to measuring patent-to-patent technological similarity," Technological Forecasting and Social Change, Elsevier, vol. 177(C).
    4. Mun, Changbae & Yoon, Sejun & Raghavan, Nagarajan & Hwang, Dongwook & Basnet, Subarna & Park, Hyunseok, 2021. "Function score-based technological trend analysis," Technovation, Elsevier, vol. 101(C).
    5. Jiang, Cuiqing & Zhou, Yiru & Chen, Bo, 2023. "Mining semantic features in patent text for financial distress prediction," Technological Forecasting and Social Change, Elsevier, vol. 190(C).
    6. Ren, Haiying & Zhao, Yuhui, 2021. "Technology opportunity discovery based on constructing, evaluating, and searching knowledge networks," Technovation, Elsevier, vol. 101(C).
    7. Yuan Zhou & Fang Dong & Yufei Liu & Liang Ran, 2021. "A deep learning framework to early identify emerging technologies in large-scale outlier patents: an empirical study of CNC machine tool," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(2), pages 969-994, February.
    8. An, Xin & Li, Jinghong & Xu, Shuo & Chen, Liang & Sun, Wei, 2021. "An improved patent similarity measurement based on entities and semantic relations," Journal of Informetrics, Elsevier, vol. 15(2).
    9. Puccetti, Giovanni & Giordano, Vito & Spada, Irene & Chiarello, Filippo & Fantoni, Gualtiero, 2023. "Technology identification from patent texts: A novel named entity recognition method," Technological Forecasting and Social Change, Elsevier, vol. 186(PB).
    10. Li, Xin & Wu, Yundi & Cheng, Haolun & Xie, Qianqian & Daim, Tugrul, 2023. "Identifying technology opportunity using SAO semantic mining and outlier detection method: A case of triboelectric nanogenerator technology," Technological Forecasting and Social Change, Elsevier, vol. 189(C).
    11. Myeongji Oh & Hyejin Jang & Sunhye Kim & Byungun Yoon, 2023. "Main path analysis for technological development using SAO structure and DEMATEL based on keyword causality," Scientometrics, Springer;Akadémiai Kiadó, vol. 128(4), pages 2079-2104, April.
    12. Vicente-Gomila, J.M. & Artacho-Ramírez, M.A. & Ting, Ma & Porter, A.L., 2021. "Combining tech mining and semantic TRIZ for technology assessment: Dye-sensitized solar cell as a case," Technological Forecasting and Social Change, Elsevier, vol. 169(C).
    13. Chao Yang & Donghua Zhu & Xuefeng Wang & Yi Zhang & Guangquan Zhang & Jie Lu, 2017. "Requirement-oriented core technological components’ identification based on SAO analysis," Scientometrics, Springer;Akadémiai Kiadó, vol. 112(3), pages 1229-1248, September.
    14. Wang, Jinfeng & Zhang, Zhixin & Feng, Lijie & Lin, Kuo-Yi & Liu, Peng, 2023. "Development of technology opportunity analysis based on technology landscape by extending technology elements with BERT and TRIZ," Technological Forecasting and Social Change, Elsevier, vol. 191(C).
    15. Jaewoong Choi & Jiho Lee & Janghyeok Yoon & Sion Jang & Jaeyoung Kim & Sungchul Choi, 2022. "A two-stage deep learning-based system for patent citation recommendation," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(11), pages 6615-6636, November.
    16. Yang, Chao & Huang, Cui & Su, Jun, 2018. "An improved SAO network-based method for technology trend analysis: A case study of graphene," Journal of Informetrics, Elsevier, vol. 12(1), pages 271-286.
    17. Zhai, Dongsheng & Zhai, Liang & Li, Mengyang & He, Xijun & Xu, Shuo & Wang, Feifei, 2022. "Patent representation learning with a novel design of patent ontology: Case study on PEM patents," Technological Forecasting and Social Change, Elsevier, vol. 183(C).
    18. Teng, Hao & Wang, Nan & Zhao, Hongyu & Hu, Yingtong & Jin, Haitao, 2024. "Enhancing semantic text similarity with functional semantic knowledge (FOP) in patents," Journal of Informetrics, Elsevier, vol. 18(1).
    19. Percia David, Dimitri & Maréchal, Loïc & Lacube, William & Gillard, Sébastien & Tsesmelis, Michael & Maillart, Thomas & Mermoud, Alain, 2023. "Measuring security development in information technologies: A scientometric framework using arXiv e-prints," Technological Forecasting and Social Change, Elsevier, vol. 188(C).
    20. Park, Inchae & Yoon, Byungun, 2018. "Technological opportunity discovery for technological convergence based on the prediction of technology knowledge flow in a citation network," Journal of Informetrics, Elsevier, vol. 12(4), pages 1199-1222.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:scient:v:125:y:2020:i:1:d:10.1007_s11192-020-03634-y. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.