IDEAS home Printed from https://ideas.repec.org/a/eee/phsmap/v590y2022ics0378437121009444.html
   My bibliography  Save this article

A network-based CNN model to identify the hidden information in text data

Author

Listed:
  • Liu, Yanyan
  • Li, Keping
  • Yan, Dongyang
  • Gu, Shuang

Abstract

With the development of the internet and big data, the missing or hidden information identification of text data has become an imperative task. At present, the challenge in the hidden information study is judging whether there is hidden information and where it exists. In this paper, hidden information refers to the words that do not appear in a sentence, however, they have certain correlations with the existing words or sentence and have a great influence on the comprehension of a sentence or part of the text data. This paper focuses on discovering the key and influential hidden information in the text data. A keyword-based hidden information extraction framework is proposed in this paper to search hidden entities, with the assumption that the importance of hidden objects is reflected by the keywords in the text data. A network-based Convolution Neural Network (CNN) model is developed to identify the hidden information related to keywords. The model is based on the results of CNN, and cosine similarity is used to judge whether there is hidden information in the source text data or not. We primarily form the word co-occurrence network of text, select the words with the highest degree as keywords, and generate random walk paths on the network. Besides, we use the random walk path where the last word is the keyword to train CNN. In the experimental section, the proposed model is applied to the dataset in 20Newgroups. The results show that the proposed model can effectively identify the hidden information associated with the keywords in the source text data, and the detection accuracy of keywords can reach 98%–99% achieved by CNN.

Suggested Citation

  • Liu, Yanyan & Li, Keping & Yan, Dongyang & Gu, Shuang, 2022. "A network-based CNN model to identify the hidden information in text data," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 590(C).
  • Handle: RePEc:eee:phsmap:v:590:y:2022:i:c:s0378437121009444
    DOI: 10.1016/j.physa.2021.126744
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0378437121009444
    Download Restriction: Full text for ScienceDirect subscribers only. Journal offers the option of making the article available online on Science direct for a fee of $3,000

    File URL: https://libkey.io/10.1016/j.physa.2021.126744?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Rybski, Diego & Bunde, Armin, 2009. "On the detection of trends in long-term correlated records," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 388(8), pages 1687-1695.
    2. Amancio, D.R. & Nunes, M.G.V. & Oliveira, O.N. & Pardo, T.A.S. & Antiqueira, L. & da F. Costa, L., 2011. "Using metrics from complex networks to evaluate machine translation," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 390(1), pages 131-142.
    3. Jamaati, Maryam & Mehri, Ali, 2018. "Text mining by Tsallis entropy," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 490(C), pages 1368-1376.
    4. Li, Ping & Wang, Bing-Hong, 2007. "Extracting hidden fluctuation patterns of Hang Seng stock index from network topologies," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 378(2), pages 519-526.
    5. Assouli, Nora & Benahmed, Khelifa & Gasbaoui, Brahim, 2021. "How to predict crime — informatics-inspired approach from link prediction," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 570(C).
    6. Ramon Ferrer i Cancho & Ricard V. Solé, 2001. "The Small-World of Human Language," Working Papers 01-03-016, Santa Fe Institute.
    7. Antiqueira, L. & Nunes, M.G.V. & Oliveira Jr., O.N. & F. Costa, L. da, 2007. "Strong correlations between text quality and complex networks features," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 373(C), pages 811-820.
    8. Camilo Akimushkin & Diego Raphael Amancio & Osvaldo Novais Oliveira Jr., 2017. "Text Authorship Identified Using the Dynamics of Word Co-Occurrence Networks," PLOS ONE, Public Library of Science, vol. 12(1), pages 1-15, January.
    9. Tohalino, Jorge V. & Amancio, Diego R., 2018. "Extractive multi-document summarization using multilayer networks," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 503(C), pages 526-539.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Yanyan Liu & Keping Li & Dongyang Yan & Shuang Gu, 2023. "The prediction of disaster risk paths based on IECNN model," Natural Hazards: Journal of the International Society for the Prevention and Mitigation of Natural Hazards, Springer;International Society for the Prevention and Mitigation of Natural Hazards, vol. 117(1), pages 163-188, May.
    2. Ma, Changxi & Liu, Tao, 2024. "Demand forecasting of shared bicycles based on combined deep learning models," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 635(C).
    3. Liu, Yanyan & Li, Keping & Yan, Dongyang, 2024. "Quantification analysis of potential risk in railway accidents: A new random walk based approach," Reliability Engineering and System Safety, Elsevier, vol. 242(C).

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Cui, Xue-Mei & Yoon, Chang No & Youn, Hyejin & Lee, Sang Hoon & Jung, Jean S. & Han, Seung Kee, 2017. "Dynamic burstiness of word-occurrence and network modularity in textbook systems," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 487(C), pages 103-110.
    2. Amancio, Diego R. & Nunes, Maria G.V. & Oliveira, Osvaldo N. & Costa, Luciano da F., 2012. "Extractive summarization using complex networks and syntactic dependency," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 391(4), pages 1855-1864.
    3. Samuel Zanferdini Oliva & Livia Oliveira-Ciabati & Denise Gazotto Dezembro & Mário Sérgio Adolfi Júnior & Maísa Carvalho Silva & Hugo Cesar Pessotti & Juliana Tarossi Pollettini, 2021. "Text structuring methods based on complex network: a systematic review," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(2), pages 1471-1493, February.
    4. D. R. Amancio & M. G. V. Nunes & O. N. Oliveira & L. F. Costa, 2012. "Using complex networks concepts to assess approaches for citations in scientific papers," Scientometrics, Springer;Akadémiai Kiadó, vol. 91(3), pages 827-842, June.
    5. Amancio, Diego R. & Oliveira Jr., Osvaldo N. & Costa, Luciano da F., 2012. "Structure–semantics interplay in complex networks and its effects on the predictability of similarity in texts," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 391(18), pages 4406-4419.
    6. Wei, Daijun & Deng, Xinyang & Zhang, Xiaoge & Deng, Yong & Mahadevan, Sankaran, 2013. "Identifying influential nodes in weighted networks based on evidence theory," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 392(10), pages 2564-2575.
    7. Jiang, Jingchi & Zheng, Jichuan & Zhao, Chao & Su, Jia & Guan, Yi & Yu, Qiubin, 2016. "Clinical-decision support based on medical literature: A complex network approach," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 459(C), pages 42-54.
    8. Mehri, Ali & Agahi, Hamzeh & Mehri-Dehnavi, Hossein, 2019. "A novel word ranking method based on distorted entropy," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 521(C), pages 484-492.
    9. Diego R Amancio, 2015. "Probing the Topological Properties of Complex Networks Modeling Short Written Texts," PLOS ONE, Public Library of Science, vol. 10(2), pages 1-17, February.
    10. Shakibian, Hadi & Charkari, Nasrollah Moghadam, 2018. "Statistical similarity measures for link prediction in heterogeneous complex networks," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 501(C), pages 248-263.
    11. Mehri, Ali & Jamaati, Maryam, 2021. "Statistical metrics for languages classification: A case study of the Bible translations," Chaos, Solitons & Fractals, Elsevier, vol. 144(C).
    12. Li, Jianyu & Zhou, Jie, 2007. "Chinese character structure analysis based on complex networks," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 380(C), pages 629-638.
    13. Wang, Yanhui & Bi, Lifeng & Lin, Shuai & Li, Man & Shi, Hao, 2017. "A complex network-based importance measure for mechatronics systems," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 466(C), pages 180-198.
    14. Jiang, Zhi-Qiang & Zhou, Wei-Xing, 2010. "Complex stock trading network among investors," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 389(21), pages 4929-4941.
    15. John Halley & Dimitris Kugiumtzis, 2011. "Nonparametric testing of variability and trend in some climatic records," Climatic Change, Springer, vol. 109(3), pages 549-568, December.
    16. Xiao, Wenjun & Liu, Yanxia & Chen, Guanrong, 2014. "Characterizing vertex-degree sequences in scale-free networks," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 404(C), pages 291-295.
    17. Adilson Vital & Diego R. Amancio, 2022. "A comparative analysis of local similarity metrics and machine learning approaches: application to link prediction in author citation networks," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(10), pages 6011-6028, October.
    18. Jorge A. V. Tohalino & Laura V. C. Quispe & Diego R. Amancio, 2021. "Analyzing the relationship between text features and grants productivity," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(5), pages 4255-4275, May.
    19. Markelov, Oleg & Nguyen Duc, Viet & Bogachev, Mikhail, 2017. "Statistical modeling of the Internet traffic dynamics: To which extent do we need long-term correlations?," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 485(C), pages 48-60.
    20. Li, Sange & Shang, Pengjian, 2021. "Analysis of nonlinear time series using discrete generalized past entropy based on amplitude difference distribution of horizontal visibility graph," Chaos, Solitons & Fractals, Elsevier, vol. 144(C).

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:phsmap:v:590:y:2022:i:c:s0378437121009444. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.journals.elsevier.com/physica-a-statistical-mechpplications/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.