IDEAS home Printed from https://ideas.repec.org/a/eee/infome/v18y2024i2s1751157724000117.html
   My bibliography  Save this article

Data labeling through the centralities of co-reference networks improves the classification accuracy of scientific papers

Author

Listed:
  • Xie, Zheng
  • Lv, Yiqin
  • Song, Yiping
  • Wang, Qi

Abstract

Labeled data are fed to learning models of classification tasks to help them learn to classify unlabeled data. Massive papers are hinged by citations to a few influential papers, much smaller than the total size, which, if labeled, would cause the spread of label information to the most of the papers. We utilized the co-reference relationship between papers since the references cited by a paper dataset usually cannot be contained by the dataset. We stated the problem of optimal paper labeling as the problem of picking a given fraction of nodes from a co-reference network to maximize the number of their neighbors, which is a submodular maximization problem with a cardinality constraint, NP-hard for general networks. We approximately solved it by picking nodes according to the ranks of specific network centralities. We further proved that labeling papers according to the rank of degree, the lowest-complexity centrality, can give a near-optimal solution with specific constraints on the maximum degree of co-reference network and labeling proportion. Experimental results show that our method brings a significant improvement in the accuracy of classification.

Suggested Citation

  • Xie, Zheng & Lv, Yiqin & Song, Yiping & Wang, Qi, 2024. "Data labeling through the centralities of co-reference networks improves the classification accuracy of scientific papers," Journal of Informetrics, Elsevier, vol. 18(2).
  • Handle: RePEc:eee:infome:v:18:y:2024:i:2:s1751157724000117
    DOI: 10.1016/j.joi.2024.101498
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S1751157724000117
    Download Restriction: Full text for ScienceDirect subscribers only

    File URL: https://libkey.io/10.1016/j.joi.2024.101498?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Wang, Jingjing & Xu, Shuqi & Mariani, Manuel S. & Lü, Linyuan, 2021. "The local structure of citation networks uncovers expert-selected milestone papers," Journal of Informetrics, Elsevier, vol. 15(4).
    2. Tosi, Mauro Dalle Lucca & dos Reis, Julio Cesar, 2021. "SciKGraph: A knowledge graph approach to structure a scientific field," Journal of Informetrics, Elsevier, vol. 15(1).
    3. Gert Sabidussi, 1966. "The centrality index of a graph," Psychometrika, Springer;The Psychometric Society, vol. 31(4), pages 581-603, December.
    4. Chyi-Kwei Yau & Alan Porter & Nils Newman & Arho Suominen, 2014. "Clustering scientific documents with topic modeling," Scientometrics, Springer;Akadémiai Kiadó, vol. 100(3), pages 767-786, September.
    5. Flaviano Morone & Hernán A. Makse, 2015. "Correction: Corrigendum: Influence maximization in complex networks through optimal percolation," Nature, Nature, vol. 527(7579), pages 544-544, November.
    6. Zheng Xie, 2021. "A distributed hypergraph model for simulating the evolution of large coauthorship networks," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(6), pages 4609-4638, June.
    7. Tokmachev, Andrey M., 2023. "Hidden scales in statistics of citation indicators," Journal of Informetrics, Elsevier, vol. 17(1).
    8. Gerson Pech & Catarina Delgado & Silvio Paolo Sorella, 2022. "Classifying papers into subfields using Abstracts, Titles, Keywords and KeyWords Plus through pattern detection and optimization procedures: An application in Physics," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 73(11), pages 1513-1528, November.
    9. Jiang, Zhuoren & Lin, Tianqianjin & Huang, Cui, 2023. "Deep representation learning of scientific paper reveals its potential scholarly impact," Journal of Informetrics, Elsevier, vol. 17(1).
    10. Flaviano Morone & Hernán A. Makse, 2015. "Influence maximization in complex networks through optimal percolation," Nature, Nature, vol. 524(7563), pages 65-68, August.
    11. Chao Min & Qingyu Chen & Erjia Yan & Yi Bu & Jianjun Sun, 2021. "Citation cascade and the evolution of topic relevance," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 72(1), pages 110-127, January.
    12. Jinxuan Ma & Brady Lund, 2021. "The evolution and shift of research topics and methods in library and information science," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 72(8), pages 1059-1074, August.
    13. Yiqin Lv & Zheng Xie & Xiaojing Zuo & Yiping Song, 2022. "A multi-view method of scientific paper classification via heterogeneous graph embeddings," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(8), pages 4847-4872, August.
    14. Zheng Xie, 2019. "A cooperative game model for the multimodality of coauthorship networks," Scientometrics, Springer;Akadémiai Kiadó, vol. 121(1), pages 503-519, October.
    15. Chen, Duanbing & Lü, Linyuan & Shang, Ming-Sheng & Zhang, Yi-Cheng & Zhou, Tao, 2012. "Identifying influential nodes in complex networks," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 391(4), pages 1777-1787.
    16. Guijie Zhang & Luning Liu & Yuqiang Feng & Zhen Shao & Yongli Li, 2014. "Cext-N index: a network node centrality measure for collaborative relationship distribution," Scientometrics, Springer;Akadémiai Kiadó, vol. 101(1), pages 291-307, October.
    17. Xie, Zheng, 2020. "Predicting the number of coauthors for researchers: A learning model," Journal of Informetrics, Elsevier, vol. 14(2).
    18. Yongjun Zhang & Jialin Ma & Zijian Wang & Bolun Chen & Yongtao Yu, 2018. "Collective topical PageRank: a model to evaluate the topic-dependent academic impact of scientific papers," Scientometrics, Springer;Akadémiai Kiadó, vol. 114(3), pages 1345-1372, March.
    19. Leo Katz, 1953. "A new status index derived from sociometric analysis," Psychometrika, Springer;The Psychometric Society, vol. 18(1), pages 39-43, March.
    20. Xie, Zheng, 2020. "Predicting publication productivity for researchers: A piecewise Poisson model," Journal of Informetrics, Elsevier, vol. 14(3).
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Wang, Zhixiao & Zhao, Ya & Xi, Jingke & Du, Changjiang, 2016. "Fast ranking influential nodes in complex networks using a k-shell iteration factor," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 461(C), pages 171-181.
    2. Wu, Tao & Xian, Xingping & Zhong, Linfeng & Xiong, Xi & Stanley, H. Eugene, 2018. "Power iteration ranking via hybrid diffusion for vital nodes identification," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 506(C), pages 802-815.
    3. Zhai, Li & Yan, Xiangbin & Zhang, Guojing, 2018. "Bi-directional h-index: A new measure of node centrality in weighted and directed networks," Journal of Informetrics, Elsevier, vol. 12(1), pages 299-314.
    4. Namtirtha, Amrita & Dutta, Animesh & Dutta, Biswanath, 2018. "Identifying influential spreaders in complex networks based on kshell hybrid method," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 499(C), pages 310-324.
    5. Yu, Senbin & Gao, Liang & Xu, Lida & Gao, Zi-You, 2019. "Identifying influential spreaders based on indirect spreading in neighborhood," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 523(C), pages 418-425.
    6. Wandelt, Sebastian & Sun, Xiaoqian & Zhang, Anming, 2023. "Towards analyzing the robustness of the Integrated Global Transportation Network Abstraction (IGTNA)," Transportation Research Part A: Policy and Practice, Elsevier, vol. 178(C).
    7. Huang, Wencheng & Li, Haoran & Yin, Yanhui & Zhang, Zhi & Xie, Anhao & Zhang, Yin & Cheng, Guo, 2024. "Node importance identification of unweighted urban rail transit network: An Adjacency Information Entropy based approach," Reliability Engineering and System Safety, Elsevier, vol. 242(C).
    8. Zhong, Lin-Feng & Shang, Ming-Sheng & Chen, Xiao-Long & Cai, Shi-Ming, 2018. "Identifying the influential nodes via eigen-centrality from the differences and similarities of structure," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 510(C), pages 77-82.
    9. Bao, Zhong-Kui & Ma, Chuang & Xiang, Bing-Bing & Zhang, Hai-Feng, 2017. "Identification of influential nodes in complex networks: Method from spreading probability viewpoint," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 468(C), pages 391-397.
    10. Zhang, Dayong & Men, Hao & Zhang, Zhaoxin, 2024. "Assessing the stability of collaboration networks: A structural cohesion analysis perspective," Journal of Informetrics, Elsevier, vol. 18(1).
    11. Yeruva, Sujatha & Devi, T. & Reddy, Y. Samtha, 2016. "Selection of influential spreaders in complex networks using Pareto Shell decomposition," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 452(C), pages 133-144.
    12. Zhong, Lin-Feng & Liu, Quan-Hui & Wang, Wei & Cai, Shi-Min, 2018. "Comprehensive influence of local and global characteristics on identifying the influential nodes," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 511(C), pages 78-84.
    13. Annamaria Ficara & Francesco Curreri & Giacomo Fiumara & Pasquale De Meo & Antonio Liotta, 2022. "Covert Network Construction, Disruption, and Resilience: A Survey," Mathematics, MDPI, vol. 10(16), pages 1-43, August.
    14. Hou, Lei, 2022. "Network versus content: The effectiveness in identifying opinion leaders in an online social network with empirical evaluation," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 592(C).
    15. Wang, Jingjing & Xu, Shuqi & Mariani, Manuel S. & Lü, Linyuan, 2021. "The local structure of citation networks uncovers expert-selected milestone papers," Journal of Informetrics, Elsevier, vol. 15(4).
    16. Yin, Haofei & Zhang, Aobo & Zeng, An, 2023. "Identifying hidden target nodes for spreading in complex networks," Chaos, Solitons & Fractals, Elsevier, vol. 168(C).
    17. Gao, Shuai & Ma, Jun & Chen, Zhumin & Wang, Guanghui & Xing, Changming, 2014. "Ranking the spreading ability of nodes in complex networks based on local structure," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 403(C), pages 130-147.
    18. Almeira, Nahuel & Perotti, Juan Ignacio & Chacoma, Andrés & Billoni, Orlando Vito, 2021. "Explosive dismantling of two-dimensional random lattices under betweenness centrality attacks," Chaos, Solitons & Fractals, Elsevier, vol. 153(P1).
    19. Liu, Ying & Tang, Ming & Zhou, Tao & Do, Younghae, 2016. "Identify influential spreaders in complex networks, the role of neighborhood," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 452(C), pages 289-298.
    20. Wumei Du & Zheng Xie & Yiqin Lv, 2021. "Predicting publication productivity for authors: Shallow or deep architecture?," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(7), pages 5855-5879, July.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:infome:v:18:y:2024:i:2:s1751157724000117. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/joi .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.