IDEAS home Printed from https://ideas.repec.org/a/gam/jstats/v7y2024i3p50-841d1449566.html
   My bibliography  Save this article

Patent Keyword Analysis Using Bayesian Zero-Inflated Model and Text Mining

Author

Listed:
  • Sunghae Jun

    (Department of Data Science, Cheongju University, Cheongju 28503, Chungbuk, Republic of Korea)

Abstract

Patent keyword analysis is used to analyze the technology keywords extracted from collected patent documents for specific technological fields. Thus, various methods related to this type of analysis have been researched in the industrial engineering fields, such as technology management and new product development. To analyze the patent document data, we have to search for patents related to the target technology and preprocess them to construct the patent–keyword matrix for statistical and machine learning algorithms. In general, a patent–keyword matrix has an extreme zero-inflated problem. This is because each keyword occupies one column even if it is included in only one document among all patent documents. General zero-inflated models have a limit at which the performance of the model deteriorates when the proportion of zeros becomes extremely large. To solve this problem, we applied a Bayesian inference to a general zero-inflated model. In this paper, we propose a patent keyword analysis using a Bayesian zero-inflated model to overcome the extreme zero-inflated problem in the patent–keyword matrix. In our experiments, we collected practical patents related to digital therapeutics technology and used the patent–keyword matrix preprocessed from them. We compared the performance of our proposed method with other comparative methods. Finally, we showed the validity and improved performance of our patent keyword analysis. We expect that our research can contribute to solving the extreme zero-inflated problem that occurs not only in patent keyword analysis, but also in various text big data analyses.

Suggested Citation

  • Sunghae Jun, 2024. "Patent Keyword Analysis Using Bayesian Zero-Inflated Model and Text Mining," Stats, MDPI, vol. 7(3), pages 1-15, August.
  • Handle: RePEc:gam:jstats:v:7:y:2024:i:3:p:50-841:d:1449566
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2571-905X/7/3/50/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2571-905X/7/3/50/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Arman Oganisian & Nandita Mitra & Jason A. Roy, 2021. "A Bayesian nonparametric model for zero‐inflated outcomes: Prediction, clustering, and causal estimation," Biometrics, The International Biometric Society, vol. 77(1), pages 125-135, March.
    2. Brian Neelon & Dongjun Chung, 2017. "The LZIP: A Bayesian latent factor model for correlated zero-inflated counts," Biometrics, The International Biometric Society, vol. 73(1), pages 185-196, March.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Qihuang Zhang & Grace Y. Yi, 2023. "Zero‐inflated Poisson models with measurement error in the response," Biometrics, The International Biometric Society, vol. 79(2), pages 1089-1102, June.
    2. Eoghan O'Neill, 2022. "Type I Tobit Bayesian Additive Regression Trees for Censored Outcome Regression," Papers 2211.07506, arXiv.org, revised Feb 2024.
    3. Wenchen Liu & Yincai Tang & Ancha Xu, 2021. "Zero-and-one-inflated Poisson regression model," Statistical Papers, Springer, vol. 62(2), pages 915-934, April.
    4. Ma, Xuan & Brynjarsdóttir, Jenný & LaFramboise, Thomas, 2024. "A double Pólya-Gamma data augmentation scheme for a hierarchical Negative Binomial - Binomial data model," Computational Statistics & Data Analysis, Elsevier, vol. 199(C).

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jstats:v:7:y:2024:i:3:p:50-841:d:1449566. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.