IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0264481.html
   My bibliography  Save this article

Hierarchical lifelong topic modeling using rules extracted from network communities

Author

Listed:
  • Muhammad Taimoor Khan
  • Nouman Azam
  • Shehzad Khalid
  • Furqan Aziz

Abstract

Topic models extract latent concepts from texts in the form of topics. Lifelong topic models extend topic models by learning topics continuously based on accumulated knowledge from the past which is updated continuously as new information becomes available. Hierarchical topic modeling extends topic modeling by extracting topics and organizing them into a hierarchical structure. In this study, we combine the two and introduce hierarchical lifelong topic models. Hierarchical lifelong topic models not only allow to examine the topics at different levels of granularity but also allows to continuously adjust the granularity of the topics as more information becomes available. A fundamental issue in hierarchical lifelong topic modeling is the extraction of rules that are used to preserve the hierarchical structural information among the rules and will continuously update based on new information. To address this issue, we introduce a network communities based rule mining approach for hierarchical lifelong topic models (NHLTM). The proposed approach extracts hierarchical structural information among the rules by representing textual documents as graphs and analyzing the underlying communities in the graph. Experimental results indicate improvement of the hierarchical topic structures in terms of topic coherence that increases from general to specific topics.

Suggested Citation

  • Muhammad Taimoor Khan & Nouman Azam & Shehzad Khalid & Furqan Aziz, 2022. "Hierarchical lifelong topic modeling using rules extracted from network communities," PLOS ONE, Public Library of Science, vol. 17(3), pages 1-22, March.
  • Handle: RePEc:plo:pone00:0264481
    DOI: 10.1371/journal.pone.0264481
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0264481
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0264481&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0264481?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Caroline Ceribeli & Henrique Ferraz de Arruda & Luciano da Fontoura Costa, 2021. "How coupled are capillary electrophoresis and mass spectrometry?," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(5), pages 3841-3851, May.
    2. Silva, Filipi N. & Amancio, Diego R. & Bardosova, Maria & Costa, Luciano da F. & Oliveira, Osvaldo N., 2016. "Using network science and text analytics to produce surveys in a scientific topic," Journal of Informetrics, Elsevier, vol. 10(2), pages 487-502.
    3. Raphael Cohen & Iddo Aviram & Michael Elhadad & Noémie Elhadad, 2014. "Redundancy-Aware Topic Modeling for Patient Record Notes," PLOS ONE, Public Library of Science, vol. 9(2), pages 1-7, February.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Benatti, Alexandre & de Arruda, Henrique Ferraz & Silva, Filipi Nascimento & Comin, César Henrique & da Fontoura Costa, Luciano, 2023. "On the stability of citation networks," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 610(C).
    2. Benatti, Alexandre & Ferraz de Arrruda, Henrique & Nascimento Silva, Filipi & da Fontoura Costa, Luciano, 2021. "Enriching and analyzing small citation networks: A case study on transistor’s history," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 573(C).
    3. Corrêa, Edilson A. & Marinho, Vanessa Q. & Amancio, Diego R., 2020. "Semantic flow in language networks discriminates texts by genre and publication date," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 557(C).
    4. Marie Katsurai & Shunsuke Ono, 2019. "TrendNets: mapping emerging research trends from dynamic co-word networks via sparse representation," Scientometrics, Springer;Akadémiai Kiadó, vol. 121(3), pages 1583-1598, December.
    5. Minjin Lee & Hangil Kim & SangHyun Cheon, 2021. "A Network Approach to Revealing Dynamic Succession Processes of Urban Land Use and User Experience," Sustainability, MDPI, vol. 13(21), pages 1-16, October.
    6. Corrêa Jr., Edilson A. & Silva, Filipi N. & da F. Costa, Luciano & Amancio, Diego R., 2017. "Patterns of authors contribution in scientific manuscripts," Journal of Informetrics, Elsevier, vol. 11(2), pages 498-510.
    7. Jeong, Yoo Kyung & Xie, Qing & Yan, Erjia & Song, Min, 2020. "Examining drug and side effect relation using author–entity pair bipartite networks," Journal of Informetrics, Elsevier, vol. 14(1).
    8. Wang, Haiying & Wang, Jun & Small, Michael & Moore, Jack Murdoch, 2019. "Review mechanism promotes knowledge transmission in complex networks," Applied Mathematics and Computation, Elsevier, vol. 340(C), pages 113-125.
    9. Dejian Yu & Wanru Wang & Shuai Zhang & Wenyu Zhang & Rongyu Liu, 2017. "Hybrid self-optimized clustering model based on citation links and textual features to detect research topics," PLOS ONE, Public Library of Science, vol. 12(10), pages 1-21, October.
    10. Adilson Vital & Diego R. Amancio, 2022. "A comparative analysis of local similarity metrics and machine learning approaches: application to link prediction in author citation networks," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(10), pages 6011-6028, October.
    11. de Arruda, Henrique F. & Silva, Filipi N. & Comin, Cesar H. & Amancio, Diego R. & Costa, Luciano da F., 2019. "Connecting network science and information theory," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 515(C), pages 641-648.
    12. Jason Portenoy & Jevin D. West, 2020. "Constructing and evaluating automated literature review systems," Scientometrics, Springer;Akadémiai Kiadó, vol. 125(3), pages 3233-3251, December.
    13. Jorge A. V. Tohalino & Laura V. C. Quispe & Diego R. Amancio, 2021. "Analyzing the relationship between text features and grants productivity," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(5), pages 4255-4275, May.
    14. Brito, Ana C.M. & Silva, Filipi N. & de Arruda, Henrique F. & Comin, Cesar H. & Amancio, Diego R. & Costa, Luciano da F., 2021. "Classification of abrupt changes along viewing profiles of scientific articles," Journal of Informetrics, Elsevier, vol. 15(2).
    15. Tosi, Mauro Dalle Lucca & dos Reis, Julio Cesar, 2021. "SciKGraph: A knowledge graph approach to structure a scientific field," Journal of Informetrics, Elsevier, vol. 15(1).
    16. Ferraz de Arruda, Henrique & Reia, Sandro Martinelli & Silva, Filipi Nascimento & Amancio, Diego Raphael & da Fontoura Costa, Luciano, 2022. "Finding contrasting patterns in rhythmic properties between prose and poetry," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 598(C).
    17. Pradhan, Dinesh K. & Chakraborty, Joyita & Choudhary, Prasenjit & Nandi, Subrata, 2020. "An automated conflict of interest based greedy approach for conference paper assignment system," Journal of Informetrics, Elsevier, vol. 14(2).
    18. Giulio Giacomo Cantone, 2024. "How to measure interdisciplinary research? A systemic design for the model of measurement," Scientometrics, Springer;Akadémiai Kiadó, vol. 129(8), pages 4937-4982, August.
    19. de Arruda, Henrique F. & Marinho, Vanessa Q. & Lima, Thales S. & Amancio, Diego R. & Costa, Luciano da F., 2018. "An image analysis approach to text analytics based on complex networks," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 510(C), pages 110-120.
    20. Xiomara S. Q. Chacon & Thiago C. Silva & Diego R. Amancio, 2020. "Comparing the impact of subfields in scientific journals," Scientometrics, Springer;Akadémiai Kiadó, vol. 125(1), pages 625-639, October.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0264481. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.